Section 2.8 Calling Functions
As soon as programs get larger there is a danger of producing spaghetti code, i.e. writing very large chunks of code that does not have a clear focus but does multiple things all at once. Such code is hard to understand and almost never correct. Therefore, we want to break up programs into smaller modules with clear responsibilities. Each module consists of a set of functions and is grouped in one sometimes a few translation units. The benefit of writing a subroutine for each single task is that we can develop, debug, test, and reuse such subroutines individually. In this section, we look at the mechanics of defining subroutines in machine code and making them interact using function calls.
Consider the following piece of code that calls a function
bar
: .text
foo:
add t0, a0, a1
call bar
add a0, a0, t0
ret
First of all, note that
call bar
is a pseudo-instruction for jal x1, bar
. jal x1, bar
jumps to the function bar
and bar
, being a function, will ultimately return to its caller, which is our function foo
by executing the instruction ret
which is a pseudo-instruction for jalr x0 x1 0
. (jalr x0, x1, 0
jumps to the address in register x1
and sets the program counter to this address.) Remember that jal
puts the address of the instruction behind jal
into register x0
(also known as ra
) before jumping. This allows bar
to return to from where it was called.Let us assume that
bar
is defined in a different translation unit that we do not make any assumption about. So, how can we know that the value we stored into t0
with the first add
is still intact? Now you might say that we have to look inside the code of bar
to be sure that it doesn't overwrite t0
. This is a bad idea for several reasons: First, if the project is bigger this may mean that you have to look at way more code than your own. Above a certain code size, this is no longer feasible. Second, the implementation of bar
may change in the future and so might its usage of t0
. So it is a good idea to make as few assumptions as possible about the inner workings of bar
. Pessimistically, we have to assume that bar
uses t0
for its own purposes.One way to “transport” a value over a function call might be to store the value at some memory address specifically chosen for this purpose. This however does not work for recursive functions. A function \(f\) is called recursively if, during execution there is a call to \(f\) before all previous calls have returned. In this case, we would need multiple such memory cells. The following execution trace illustrates the problem:
f g f ... sw t0 p call g ... call f ... sw t0, p call g lw t0, p ... ret ... ret lw t0 p ... ret
f
is called a second time before the first call returned. Hence, the second store sw t0 p
would overwrite the saved value of t0
at p
.So, we need to allocate space in memory for every call to save the registers of this particular instance. Because each function returns before its caller, we can organize this memory using a stack. A stack is a data structure (a way to organize data) with the policy that data can only be removed from the data structure in the reverse order of their insertion which means that the last in is first out (LIFO). To this end, each function that wants to save register contents allocates a stack frame (certain amount of bytes) on the stack where the register contents can be saved. Allocation is simple because it is a stack: One register that holds the current height of the stack is sufficient. In practice this register, called the stack pointer, points to the last byte that is currently on the stack. Since the stack grows downwards decreasing the stack pointer by \(n\) allocates \(n\) bytes on the stack. On RISC-V systems register 2 is assigned the role of the stack pointer and therefore bears the short name
sp
. The following listing extends the example program above with the allocation and de-allocation of a stack frame and loads and stores to save and restore the values of t0
and ra
. .text
foo:
addi sp, sp, -8 # decrease sp, allocate frame
add t0, a0, a1
sw t0, 0(sp) # save t0
sw ra, 4(sp) # save ra
call bar
lw ra, 4(sp) # restore ra
lw t0, 0(sp) # restore t0
add a0, a0, t0
addi sp, sp, 8 # deallocate frame
ret
Typically, the size of the stack frame is constant (i.e. the same for all program executions). In some situations, one wants to allocate more memory on the stack, maybe also dependent on some input. In such a case, it may not be statically known how deep the stack pointer will move throughout the execution of a procedure's code. This makes using constant offsets in the form of immediates like in the example above impossible. In such a situation, one uses an additional register that is called the frame pointer. In RISC-V, frame pointers reside in register 8 and are called
fp
. It is common to set the frame pointer right at the entrance of the procedure by saving the old frame pointer and setting the frame pointer to the current stack pointer value.The stack frame (see Figure 2.8.1) typically contains the following elements:
- Arguments, i.e. values that are passed to the procedure. (The convention in RISC-V is that the first eight arguments are passed in registers
a0
toa7
, hence their name. Remaining arguments are passed on the stack.) - Space to store the contents of registers that …
- the procedure wants to overwrite but the caller expects to be unchanged. These are so-called callee-save registers because the callee has to make sure the values of these registers are preserved from entering the procedure to its return.
- might be overwritten by procedures that are called by the procedure. These are so-called caller-save registers because the caller needs to preserve their value across a procedure call.
- space for further data local to the procedure that don't fit into a register because there is either no free register available or the data is too big to fit into a 32-bit register.
The stack frame is constructed by the caller and the callee together. The caller puts additional arguments on the stack while the callee allocates space to save registers as discussed above. The code of the procedure that builds and tears down the stack frame is called the prologue and the epilogue.
Subsection 2.8.1 Recursion
Let us reconsider our initial example, the computation of the factorial function. The standard way of defining the factorial function mathematically is by recursion:
\begin{equation*}
f_r(n)\defeq\begin{cases}
n\cdot f_r(n-1) \amp n \ge 1 \\
1 \amp n \lt 1
\end{cases}
\end{equation*}
The implementation in Listing 2.3.5 is not recursive: It does not contain any procedure calls especially not to itself. This implementation is based on a different but equivalent definition of the factorial function which is tail recursive
\begin{equation*}
f_e(n, r)\defeq\begin{cases}
f_e(n-1, n\cdot r) \amp n \ge 1 \\
r \amp n \lt 1
\end{cases}
\end{equation*}
Tail recursive means that the result of the recursive function application is the result of the function and does not appear in a larger expression.
Let us look at the implementations of both variants in RISC-V assembly shown in Figure 2.8.2
.text
.globl fact
fact:
li t0, 1
blez a0, end
loop:
mul t0, t0, a0
addi a0, a0, -1
bgtz a0, loop
end:
mv a0, t0
ret
.text
.globl fact_rec
fact_rec:
li t0, 1
blez a0, end
addi sp, sp, -8
sw ra, 0(sp)
sw a0, 4(sp)
addi a0, a0, -1
jal fact_rec
lw a0, 4(sp)
mul t0, t0, a0
lw ra, 0(sp)
addi sp, sp, 8
end:
mv a0, t0
ret
The recursive implementation incurs a significant amount of overhead to create and deallocate the stack frame and to save and restore the arguments that need to be saved across the recursive procedure call. Observing the evolution of the stack during an execution of the recursive implementation, one can see that the first \(n\) calls only allocate frames and store the respective argument on the stack. implicitly, it builds a list of all numbers from \(1\) to \(n\) and ever the same return address. Only when returning from the recursive calls the computation is performed.
Subsection 2.8.2 The Calling Convention
Calling conventions regulate who has to save register contents when. Registers that the callee may change at its own discretion are called caller-save because the caller is responsible to save their contents before calling a function. Registers that the caller can assume to be unchanged by a call are called callee-save because the callee has to guarantee that when the function returns, these registers have the same contents as when the function was entered.
For efficiency reasons, callers pass the arguments to called functions in registers. In RISC-V registers
a0
to a7
are used for this purpose. The result of a function call (i.e. a value that the called function wants to return to the caller) is passed in a0
. The following table shows all RISC-V registers and their role in the calling convention.Number | Name | save | Comment |
0 | x0 |
-- | Always reads zero |
1 | ra |
caller | Return address |
2 | sp |
callee | Stack pointer |
3 | gp |
-- | Global data pointer. Used to address elements in the global data segment. |
4 | tp |
-- | Thread pointer. Used to address thread-local memory in multi-threaded programs. |
5 –7 |
t0 –t2
|
caller | Temporaries |
8 | fp |
callee | Frame pointer |
9 | s0 |
callee | Temporaries |
10–17 |
a0 –a7
|
caller | Arguments / return values |
18–27 |
s2 –s11
|
callee | Temporaries |
27–31 |
t3 –t6
|
caller | Temporaries |