Calling Functions

Section 2.8 Calling Functions

As soon as programs get larger there is a danger of producing spaghetti code, i.e. writing very large chunks of code that does not have a clear focus but does multiple things all at once. Such code is hard to understand and almost never correct. Therefore, we want to break up programs into smaller modules with clear responsibilities. Each module consists of a set of functions and is grouped in one sometimes a few translation units. The benefit of writing a subroutine for each single task is that we can develop, debug, test, and reuse such subroutines individually. In this section, we look at the mechanics of defining subroutines in machine code and making them interact using function calls.

Consider the following piece of code that calls a function bar:

    .text
foo:
    add  t0, a0, a1
    call bar
    add  a0, a0, t0
    ret

First of all, note that call bar is a pseudo-instruction for jal x1, bar. jal x1, bar jumps to the function bar and bar, being a function, will ultimately return to its caller, which is our function foo by executing the instruction ret which is a pseudo-instruction for jalr x0 x1 0. (jalr x0, x1, 0 jumps to the address in register x1 and sets the program counter to this address.) Remember that jal puts the address of the instruction behind jal into register x1 (also known as ra) before jumping. This allows bar to return to from where it was called.

Let us assume that bar is defined in a different translation unit that we do not make any assumption about. So, how can we know that the value we stored into t0 with the first add is still intact? Now you might say that we have to look inside the code of bar to be sure that it doesn't overwrite t0. This is a bad idea for several reasons: First, if the project is bigger this may mean that you have to look at way more code than your own. Above a certain code size, this is no longer feasible. Second, the implementation of bar may change in the future and so might its usage of t0. So it is a good idea to make as few assumptions as possible about the inner workings of bar. Pessimistically, we have to assume that bar uses t0 for its own purposes.

One way to “transport” a value over a function call might be to store the value at some memory address specifically chosen for this purpose. This however does not work for recursive functions. A function \(f\) is called recursively if, during execution there is a call to \(f\) before all previous calls have returned. In this case, we would need multiple such memory cells. The following execution trace illustrates the problem:

f           g           f
...
sw   t0 p
call g
            ...
            call f
                        ...
                        sw   t0, p
                        call g
                        lw   t0, p
                        ...
                        ret
            ...
            ret
lw  t0 p
...
ret

f is called a second time before the first call returned. Hence, the second store sw t0 p would overwrite the saved value of t0 at p.

So, we need to allocate space in memory for every call to save the registers of this particular instance. Because each function returns before its caller, we can organize this memory using a stack. A stack is a data structure (a way to organize data) with the policy that data can only be removed from the data structure in the reverse order of their insertion which means that the last in is first out (LIFO). To this end, each function that wants to save register contents allocates a stack frame (certain amount of bytes) on the stack where the register contents can be saved. Allocation is simple because it is a stack: One register that holds the current height of the stack is sufficient. In practice this register, called the stack pointer, points to the last byte that is currently on the stack. Since the stack grows downwards decreasing the stack pointer by \(n\) allocates \(n\) bytes on the stack. On RISC-V systems register 2 is assigned the role of the stack pointer and therefore bears the short name sp. The following listing extends the example program above with the allocation and de-allocation of a stack frame and loads and stores to save and restore the values of t0 and ra.

    .text
foo:
    addi  sp, sp, -8  # decrease sp, allocate frame
    add   t0, a0, a1
    sw    t0, 0(sp)   # save t0
    sw    ra, 4(sp)   # save ra
    call  bar
    lw    ra, 4(sp)   # restore ra
    lw    t0, 0(sp)   # restore t0
    add   a0, a0, t0
    addi  sp, sp, 8   # deallocate frame
    ret

Typically, the size of the stack frame is constant (i.e. the same for all program executions). In some situations, one wants to allocate more memory on the stack, maybe also dependent on some input. In such a case, it may not be statically known how deep the stack pointer will move throughout the execution of a procedure's code. This makes using constant offsets in the form of immediates like in the example above impossible. In such a situation, one uses an additional register that is called the frame pointer. In RISC-V, frame pointers reside in register 8 and are called fp. Simultanously, register 8 is also s0. It is common to set the frame pointer right at the entrance of the procedure by saving the old frame pointer and setting the frame pointer to the current stack pointer value. However, one seldomly uses the frame pointer in RISC-V.

The stack frame (see Figure 2.8.1) typically contains the following elements:

Arguments, i.e. values that are passed to the procedure. (The convention in RISC-V is that the first eight arguments are passed in registers a0 to a7, hence their name. Remaining arguments are passed on the stack.)
Space to store the contents of registers that …
- the procedure wants to overwrite but the caller expects to be unchanged. These are so-called callee-save registers because the callee has to make sure the values of these registers are preserved from entering the procedure to its return.
- might be overwritten by procedures that are called by the procedure. These are so-called caller-save registers because the caller needs to preserve their value across a procedure call.
space for further data local to the procedure that don't fit into a register because there is either no free register available or the data is too big to fit into a 32-bit register.

Figure 2.8.1. A stack frame set up when calling a function.

The stack frame is constructed by the caller and the callee together. The caller puts additional arguments on the stack while the callee allocates space to save registers as discussed above. The code of the procedure that builds and tears down the stack frame is called the prologue and the epilogue.

Subsection 2.8.1 Recursion

Let us reconsider our initial example, the computation of the factorial function. The standard way of defining the factorial function mathematically is by recursion:

\begin{equation*} f_r(n)\defeq\begin{cases} n\cdot f_r(n-1) \amp n \ge 1 \\ 1 \amp n \lt 1 \end{cases} \end{equation*}

The implementation in Listing 2.3.5 is not recursive: It does not contain any procedure calls especially not to itself. This implementation is based on a different but equivalent definition of the factorial function which is tail recursive

\begin{equation*} f_e(n, r)\defeq\begin{cases} f_e(n-1, n\cdot r) \amp n \ge 1 \\ r \amp n \lt 1 \end{cases} \end{equation*}

Tail recursive means that the result of the recursive function application is the result of the function and does not appear in a larger expression.

Let us look at the implementations of both variants in RISC-V assembly shown in Figure 2.8.2

                  .text
    .globl fact
fact:
    li     t0, 1
    blez   a0, end
loop:



    mul    t0, t0, a0
    addi   a0, a0, -1
    bgtz   a0, loop




end:
    mv     a0, t0
    ret

                  .text
    .globl fact_rec
fact_rec:
    li     t0, 1
    blez   a0, end

    addi   sp, sp, -8
    sw     ra, 0(sp)
    sw     a0, 4(sp)

    addi   a0, a0, -1
    call   fact_rec
    lw     a0, 4(sp)
    mul    t0, t0, a0
    lw     ra, 0(sp)
    addi   sp, sp, 8
end:
    mv     a0, t0
    ret

Figure 2.8.2. Tail-recursive and recursive implementation of the factorial procedure. The recursive implementation has to save the argument and the return address across the recursive call causing a stack frame to be created. The tail-recursive version does neither need a stack frame nor loads and stores to save the registers.

The recursive implementation incurs a significant amount of overhead to create and deallocate the stack frame and to save and restore the arguments that need to be saved across the recursive procedure call. Observing the evolution of the stack during an execution of the recursive implementation, one can see that the first \(n\) calls only allocate frames and store the respective argument on the stack. implicitly, it builds a list of all numbers from \(1\) to \(n\) and ever the same return address. Only when returning from the recursive calls the computation is performed.

Subsection 2.8.2 The Calling Convention

Calling conventions regulate who has to save register contents when. Registers that the callee may change at its own discretion are called caller-save because the caller is responsible to save their contents before calling a function. Registers that the caller can assume to be unchanged by a call are called callee-save because the callee has to guarantee that when the function returns, these registers have the same contents as when the function was entered.

For efficiency reasons, callers pass the arguments to called functions in registers. In RISC-V registers a0 to a7 are used for this purpose. The result of a function call (i.e. a value that the called function wants to return to the caller) is passed in a0. The following table shows all RISC-V registers and their role in the calling convention.

Table 2.8.3. RISC-V registers with their short name, role in the calling convention, and purpose.

Number	Name	save	Comment
0	`x0`	--	Always reads zero
1	`ra`	caller	Return address
2	`sp`	callee	Stack pointer
3	`gp`	--	Global data pointer. Used to address elements in the global data segment.
4	`tp`	--	Thread pointer. Used to address thread-local memory in multi-threaded programs.
5 –7	`t0`–`t2`	caller	Temporaries
8	`s0/fp`	callee	Temporaries/Frame pointer
9	`s1`	callee	Temporaries
10–17	`a0`–`a7`	caller	Arguments / return values
18–27	`s2`–`s11`	callee	Temporaries
27–31	`t3`–`t6`	caller	Temporaries

An Introduction to Imperative Programming

Search Results:

Section 2.8 Calling Functions

Subsection 2.8.1 Recursion

Subsection 2.8.2 The Calling Convention