As soon as programs get larger there is a danger of producing ‘spaghetti code’, i.e. writing very large chunks of code that does not have a clear focus but does ‘multiple things’ all at once. Such code is hard to understand and almost never correct. Therefore, we want to break up programs into smaller modules with clear responsibilities. Each module consists of a set of functions and is grouped in one sometimes a few translation units. The benefit of writing a subroutine for each single task is that we can develop, debug, test, and reuse such subroutines individually. In this section, we look at the mechanics of defining subroutines in machine code and making them interact using function calls.
Consider the following piece of code that calls a function bar:
First of all, note that call bar is a pseudo-instruction for jal x1, bar. jal x1, bar jumps to the function bar and bar, being a function, will ultimately return to its caller, which is our function foo by executing the instruction ret which is a pseudo-instruction for jalr x0 x1 0. (jalr x0, x1, 0 jumps to the address in register x1 and sets the program counter to this address.) Remember that jal puts the address of the instruction behind jal into register x1 (also known as ra) before jumping. This allows bar to return to from where it was called.
Let us assume that bar is defined in a different translation unit that we do not make any assumption about. So, how can we know that the value we stored into t0 with the first add is still intact? Now you might say that we have to look inside the code of bar to be sure that it doesn’t overwrite t0. This is a bad idea for several reasons: First, if the project is bigger this may mean that you have to look at way more code than your own. Above a certain code size, this is no longer feasible. Second, the implementation of bar may change in the future and so might its usage of t0. So it is a good idea to make as few assumptions as possible about the inner workings of bar. Pessimistically, we have to assume that bar uses t0 for its own purposes.
One way to “transport” a value over a function call might be to store the value at some memory address specifically chosen for this purpose. This however does not work for recursive functions. A function is called recursively if, during its execution, there is another call to before all previous calls have returned. In this case, we would need multiple such memory cells. The following execution trace illustrates the problem:
f is called a second time before the first call returned. Hence, the second store sw t0 p would overwrite the saved value of t0 at p.
So, we need to allocate space in memory for every call to save the registers of this particular instance. Because each function returns before its caller, we can organize this memory using a stack. A stack is a data structure (a way to organize data) with the policy that data can only be removed from the data structure in the reverse order of their insertion which means that the ‘last in’ is ‘first out’ (LIFO). To this end, each function that wants to save register contents allocates a stack frame (certain amount of bytes) on the stack where the register contents can be saved. Allocation is simple because it is a stack: One register that holds the current height of the stack is sufficient. In practice this register, called the stack pointer, points to the last byte that is currently on the stack. Since the stack grows downwards decreasing the stack pointer by allocates bytes on the stack. On RISC-V systems register 2 is assigned the role of the stack pointer and therefore bears the short name sp. The following listing extends the example program above with the allocation and de-allocation of a stack frame and loads and stores to save and restore the values of t0 and ra.
Typically, the size of the stack frame is constant (i.e. the same for all program executions). In some situations, one wants to allocate more memory on the stack, maybe also dependent on some input. In such a case, it may not be statically known how deep the stack pointer will move throughout the execution of a procedure’s code. This makes using constant offsets in the form of immediates like in the example above impossible. In such a situation, one uses an additional register that is called the frame pointer. In RISC-V, frame pointers reside in register 8 and are called fp. Simultanously, register 8 is also s0. It is common to set the frame pointer right at the entrance of the procedure by saving the old frame pointer and setting the frame pointer to the current stack pointer value. However, one seldomly uses the frame pointer in RISC-V.
The stack frame (see Figure 2.8.1) typically contains the following elements:
a0 to a7, hence their name. Remaining arguments are passed on the stack.)The stack frame is constructed by the caller and the callee together. The caller puts additional arguments on the stack while the callee allocates space to save registers as discussed above. The code of the procedure that builds and tears down the stack frame is called the prologue and the epilogue.
Let us reconsider our initial example, the computation of the factorial function. The standard way of defining the factorial function mathematically is by recursion:
The implementation in Listing 2.3.5 is not recursive: It does not contain any procedure calls especially not to itself. This implementation is based on a different but equivalent definition of the factorial function, which is tail recursive:
Tail recursive means that the result of the recursive function application is the result of the function and does not appear in a larger expression.
Let us look at the implementations of both variants in RISC-V assembly shown in Figure 2.8.2
Calling conventions regulate who has to save register contents when. Registers that the callee may change at its own discretion are called caller-save because the caller is responsible to save their contents before calling a function. Registers that the caller can assume to be unchanged by a call are called callee-save because the callee has to guarantee that when the function returns, these registers have the same contents as when the function was entered.
For efficiency reasons, callers pass the arguments to called functions in registers. In RISC-V registers a0 to a7 are used for this purpose. The result of a function call (i.e. a value that the called function wants to return to the caller) is passed in a0. The following table shows all RISC-V registers and their role in the calling convention.
| Number | Name | save | Comment |
| 0 | x0 |
— | Always reads zero |
| 1 | ra |
caller | Return address |
| 2 | sp |
callee | Stack pointer |
| 3 | gp |
— | Global data pointer. Used to address elements in the global data segment. |
| 4 | tp |
— | Thread pointer. Used to address thread-local memory in multi-threaded programs. |
| 5 – 7 | t0 – t2 |
caller | Temporaries |
| 8 | s0 / fp |
callee | Temporaries / Frame pointer |
| 9 | s1 |
callee | Temporaries |
| 10 – 17 | a0 – a7 |
caller | Arguments / return values |
| 18 – 27 | s2 – s11 |
callee | Temporaries |
| 27 – 31 | t3 – t6 |
caller | Temporaries |