Skip to main content

Section 4.4 The C Memory Model

We have already seen that containers have addresses. Many programmers believe that C's notion of an address is identical to address notion of a processor that we have seen in Chapter 2. This is however not true. In particular, C does not require the address of a container to be an address in the main memory of the computer. Even more, while C allows some address arithmetic (computing new addresses by adding or subtracting offsets to the base address of the container), this address arithmetic is restricted to the container the base address belongs to. It is in fact undefined behavior (we will talk about that later in Section 4.15. For now just think of undefined behavior as something equally bad as dividing by 0) to create an out-of-bounds address by address arithmetic. These restrictions may seem counterintuitive and limiting at first, since C is intended to enable programming “close to the hardware”. However, they allow compilers (the tools that translate C programs to machine code) more freedom when deciding where containers are actually located and make the meaning of programs independent of where containers actually reside. Let us consider the following code snippet to better understand why C makes these restrictions.

int foo() {
  int a = 1;
  int b = 2;
  int* p = &b;
  p = p + 4;
  *p = 42;
  printf("%d\n", a);

The function foo defines two local variables a and b and initializes them to 1 and 2. The third local variable p is a pointer, i.e. a variable that can hold an address. It is initialized to the address of variable b. In the next statement the programmer adds 4 to the address of b. Maybe the programmer believed that all local variables reside in a stack frame (see Section 2.8) in the order they were declared in the program. And maybe he thought that an int is four bytes long, so p + 4 would essentially give him the address of a. But this is far from true. This program actually has undefined behavior because the address p + 4 is invalid. It is out of the bounds of a's container.

Let us briefly ponder on what would happen if the program above actually had the semantics (meaning) our inexperienced programmer had in mind. Then, the program's behavior actually was defined and it printed 42. However, this would also entail that if swapped the order in which a and b were declared the program would do something else! We certainly do not want to make our program's behavior dependent on the order in which variables are declared. Furthermore, giving our example program a meaning would also entail that the compiler is much more limited in where it allocates the containers' memory. If address arithmetic on a base address could yield a valid address of another container, it would be much harder for a compiler to allocate a container to a processor register (which is certainly much more pleasant because we have seen that registers are much faster operate on than memory). The more abstract memory model C defines permits the compiler to hold the value of a local variable in a register as long as the address of that variable has not been taken.

As mentioned before, C does however allow containers for compound data that have space for more than one value. The separate components of such a container can be addressed individually. We can obtain the address of a byte within a container by adding an offset to the containers address. There can therefore be multiple different addresses to the same container that refer to different positions in the container. Formally, an address is a pair of the container address and an offset, which needs to be a number between 0 and the container's size. Addresses that refer to the same container are totally ordered. This enables pointer arithmetic, which we discuss in Section 4.10.

Example 4.4.1.

Consider the following function and an execution trace for it:

void foo() {
  int a[2];
  int *p = a;
  int *q = a + 1;
  *p = 1;
  *q = 2;

Line 2 allocates a container for two ints as a local variable a. Lines 3 and 4 allocate more containers, bound to p and q, both of which can carry addresses. In line 3, the address of a's first int is stored in p's container. Similarly, in line 4, the address of a's second int is stored in q's container. The last two statements store the values 1 and 2 in the first and second int component of a. In execution traces, we denote addresses as arrows to the referenced container.