Pointers

Section 4.10 Pointers

A pointer is a variable with a type T* derived from a referenced type T. The value of a pointer is an address. Therefore, the content of a pointer's container is the address of another container. It is important to not confuse a pointer's value (which is an address) with the address of its container.

Table 4.10.1. Overview of C's pointer operators

Operator	Name	Evaluation Rule	L-evaluable	Operand has Type	Expression has Type
`*`	indirection	\(\text{L-eval}(*e) \defeq \text{R-eval}(e)\)	yes	\(T*\)	\(T\)
`&`	address of	\(\text{R-eval}(\&e) \defeq \text{L-eval}(e)\)	no	\(T\)	\(T*\)

There are two operators that are particularly relevant for dealing with pointers: The address-of operator & and the indirection operator *. In a nutshell, R-evaluating &e gives the address of the L-evaluable expression e. *e on the other hand makes an L-evaluable expression out of e, if e R-evaluates to an address. Thus, *e denotes the content of the container referred to by the address e. Table 4.10.1 summarizes the properties of both operators.

The program in Figure 4.10.2 shows how the operators * and & are used. Let us look at the effects of the individual instructions in detail:

The value \(1\) is written into the container of x.
The container of y can carry addresses of int containers. The expression x is L-evaluable: The result is the address of the container of x. R-evaluating the address-of operator therefore also results in this value. This address is then written to the container of y. Without the address-of operator, the compiler would find a type error since expressions on the right side of an assignment are R-evaluated, and the R-evaluation of x reads the value from the container of x and does not provide its address.
R-evaluating the expression 2 gives the value \(2\text{.}\) The left side of the assignment is more exciting: The expression *y is, by definition, L-evaluable; otherwise it could not occur on the left side of an assignment. Its value is the result of the R-evaluation of its operand. Here, this is the content of the container of y: the address of x. Therefore, the value \(2\) is written to the container bound to x.
L-evaluating z results in the address of the container of z, whereas R-evaluating y gives the content of the container of y, which is the address of x. Consequently, the address of x is written to the container of z.
As z contains the address of x, the content of x's container is now changed to \(3\text{.}\)
The interesting part is the occurrence of the dereference operator on the right side: R-evaluating y yields the content of y's container, i.e. the address of x. We have seen that the expression *y is L-evaluable. The R-evaluation of an L-evaluable expression just reads the contents of the container whose address results from the L-evaluation. In this example, that is the content of the container of x, \(3\text{.}\) In the end, w therefore carries the value \(3\text{.}\)

Subsection 4.10.1 Arrays

So far, our examples mostly contained only scalar types. A container for a variable with scalar type can carry exactly one value of the type at a time. If we want to process more data, for example a list of \(n\) numbers, we need aggregate data types. A value of an aggregated type consists of multiple (sub-)values of other (possibly also aggregated) types. We consider two kinds of aggregate types in C: arrays and structs (sometimes also referred to as record). We will discuss structs in Section 4.13.

An array with size \(n\) and type \(T\) can carry \(n\) values of the type \(T\text{.}\) The following code declares an array of \(100000\) elements of type int and allocates one container that can carry \(100000\) int values:

int a[100000];
/* ... */
int m = max(a);

A peculiarity of C-arrays is that R-evaluating an array a of type T[n] does not provide a value of this type, but the array's base address. This is a pointer to the first element of a, which has the type T*.

Remark 4.10.3.

A construct that seems like an exception to this rule is the occurrence of a variable with an array type in sizeof. Assuming that sizeof(int) = 4, sizeof(a) will be the value \(400000\) instead of sizeof(int*). The sizeof operator does not (R-)evaluate its operand and is therefore not affected by this implicit conversion.

For this reason, in a function that expects an array of elements of type \(T\text{,}\) the corresponding parameter needs to have the type pointer to \(T\text{.}\) Since an array only contains its content and not its length, the length needs to be provided separately:

int max(int* numbers, int len)
{
    int m = 0;
    for (int i = 0; i < len; i++)
        m = numbers[i] > m ? numbers[i] : m;
    return m;
}

We can access the individual elements of an array via pointer arithmetic relative to the base address. As illustrated in the above example, we use the subscripting operator e[i] for this purpose. The expression e[i] is however only “syntactic sugar” for the expression *(e + i). If e has type T*, we can add an expression i of type int to e and obtain another pointer value of type T*, offset from e. With this construct, we can create different pointers into the same container. Unfortunately, it is easy to construct invalid pointers with pointer arithmetic: For example, with the above declaration of the array numbers, the addresses numbers + 100042 and numbers - 1 are invalid. The corresponding offsets (\(100042\) and \(-1\)) are outside the limits \([0;100000[\) implied by the size (\(100000\)) of the container. Constructing (not only accessing!) such “out-of-bounds” pointers causes undefined behavior (Section 4.15).

Remark 4.10.4.

It is a common beginner's mistake to use sizeof(numbers) to obtain the array's length when numbers is only a pointer to the first element. As the value of sizeof is determined during the compilation based on the type of the given expression, it will provide the size of a pointer.

Remark 4.10.5.

An exception is the pointer value p+S where S is the size of the container with base address p. This address can be computed without undefined behavior, but accessing a value at this address is undefined. The motivation for this exception is to allow iterating over arrays by incrementing a pointer into the array until this address is reached.

Subsection 4.10.2 const

C provides means to decorate types with so-called qualifiers. A qualifier that is commonly used with pointers is const. The container of a variable with a const-qualified type cannot be overwritten (which is enforced by the compiler). For example:

int const x = 2;
x = 4; /* Forbidden! */

For pointers, there are several variants of using const, controlling what exactly cannot be overwritten:

int w = 0;
int x = 1;
int const* p = &x;       /* pointer to a const int */
w  = *p;                 /* allowed */
*p = w;                  /* forbidden */
p  = &w;                 /* allowed, since the pointer is not const */
int * const q = &x;      /* const pointer to an int */
*q = 1;                  /* allowed, as the pointed-to value is not const */
q  = &w;                 /* forbidden, since the pointer is const */
int const *const r = &x; /* const pointer to const int */
*r = 1;                  /* forbidden */
r  = &w;                 /* forbidden */

Listing 4.10.6. Different positions of const in pointer types

Big data structures are usually passed to functions as a pointer. It is common for functions to only read data from such a data structure. Such behavior can be documented by marking the corresponding parameter as a pointer to a const container in the function's prototype. The compiler enforces this behavior by not allowing writes to the corresponding container.

For example, we can now implement a max function that finds the maximal int in an array:

int max(int const* table, unsigned len) {
  assert(len > 0 && "cannot take the maximum of an empty array");
  int max = table[0];
  for (unsigned i = 1; i < len; i++) {
      int v = table[i];
      max = v > max ? v : max;
  }
  return max;
}

With the assertion in the first line, we validate that the function is not called with inputs that the remaining code does not expect. The string connected to the condition on len with the logical “and” operation is meaningless for the assertion, but it is displayed in the output of the program if it terminates because the assertion is violated in an execution.

Exercises 4.10.3 Exercises

1. C Types and Strings.

You can solve this exercise online at https://cdltools.cs.uni-saarland.de/prog2interpreter/chapter/advanced_c?num=2.

Write a C function that computes and prints the Caesar encryption of a string with an offset of \(n\text{.}\) The Caesar encryption replaces each letter of the message with the one \(n\) places later in the alphabet. For letters at the end of the alphabet, the offset wraps around to the beginning of the alphabet.

As arguments, the function obtains \(n\) and the character string to be encrypted. You may assume that the string only contains the ASCII symbols a-z.

What is different from and what is similar to an implementation in assembly?

202507111614

Solution.

#include <stdio.h>

int caesar(char* text, int n) {
  if (n > 25 || n < 0) {
    printf("The value of n needs to be between 0 and 25 (was %i)", n);
    return 1;
  }

  while (*text != '\0') {

    // shift
    *text += n;

    // fix shifting boundary violations
    if (*text > 'z') {
      *text = *text - 'z' + 'a' - 1;
    }

    text++;
  }

  return 0;
}

int main() {
  int shift = 5;
  char text[] = "abcdz";

  printf("Plain text:\n");
  printf("%s\n", text);
  printf("Shift by %i\n", shift);

  int ret = caesar(text, shift);

  if (ret == 0) {
    printf("Cipher:\n");
    printf("%s\n", text);
  }

  return ret;
}

2. Pointers.

What is the output of the following program? Track the pointers and values manually before you run the program.

#include <stdio.h>

int main(int argc, char* argv[])
{
  int *dr, *ds;
  int **drr, **dss;
  int i;
  int aa[5];

  for (i = 0; i < 5; i++) *(aa + i) = -123;

  dr = &aa[3];
  ds = dr - 3;
  drr = &dr;
  dss = &ds;
  *dr = 23;
  *ds = 11;
  *(dr + 1) = 5;
  ds += 1;
  *ds = 66;
  aa[2] = 42;
  printf("%d %d %d %d %d\n", aa[0], aa[1], aa[2], aa[3], aa[4]);

  aa[0] = *(aa+2) - *(aa + 1);
  *(*drr - 1) = 7;
  *(*dss + 3) = 4;
  **dss = **drr + *(*drr - 2);
  printf("%d %d %d %d %d\n", aa[0], aa[1], aa[2], aa[3], aa[4]);

  return 0;
}