Skip to main content

Section 6.3 The Semantics of C0

Subsection 6.3.1 The State

We have already spoken about the structure of a C program in Section 4.3. Here, we are going to define the notion of a state formally. We assume that there is given a finite set of Variables \(\Var\) and a finite set of addresses \(\Addr\text{.}\) We do not assume any further structure on these sets. The set of values is then defined to be \(\Val\defeq\Addr\cup\CInt\text{.}\)

In Section 4.3 we said that a variable is an identifier that is bound to the address of a container. So, one part of the state is the partial map from variables to addresses:

\begin{equation*} \Var\pto\Addr \end{equation*}

We call this map variable assignment. Note that it is a partial map because not at every time in the program's execution there must be an address bound to a variable: For example, a local variable is only bound to the address of the corresponding container during the execution of its scope.

To keep it simple, we only consider integer variables for now. This means that, in contrast to real C, we do not allow address arithmetic. So in every container there is only a single value. Therefore, we can model containers as a partial map from addresses to values:

\begin{equation*} \Addr\pto\Val\cup\{\indetval\} \end{equation*}

We call that map memory assignment. The value \(\indetval\) is the undefined value which is the content of a container before the container is first assigned a value. Note that the memory assignment is also a a partial map. If some address is not in the domain of a memory assignment, this means that there (currently) does not exist a container with that address.

In summary, the state \(\Sigma\) of a C0 program consists of a pair of partial maps:

\begin{equation*} \Sigma\defeq (\Var\pto\Addr)\times(\Addr\pto\Val\cup\{\indetval\}) \end{equation*}

In our formal discussion we typically use the letter \(\rho\) for the variable assignment and the letter \(\mu\) for the memory assignment. To make the writing a bit more compact we will write the pair \((\rho, \mu)\) as \(\rho;\mu\text{.}\) If we do not care about the individual components of a state, we just write it as \(\sigma\text{.}\)

Definition 6.3.1. Abbreviated Notation.

In many examples that we will discuss in the following, the actual address of a container does not really matter. We will therefore use the short-hand notation \(\sigma\,x\defeq (\mu\circ\rho)\,x\) or use the following to specify a state explicitly

\begin{equation*} \{x_1\mapsto v_1,\dots,x_n\mapsto v_n\} \end{equation*}

instead of spelling out the variable and memory assignment explicitly

\begin{equation*} \{x_1\mapsto a_1,\dots,x_n\mapsto a_n\}; \{a_1\mapsto v_1,\dots,a_n\mapsto v_n\} \end{equation*}

where the \(a_i\) are pairwise different addresses.

Subsection 6.3.2 Expressions

Intuitively, it is clear how expressions are evaluated. For example, when looking at the expression \(\CVar x+\CConst 1\text{,}\) we know from our practical experience that in order to evaluate this expression, first the container that \(\CVar x\) is bound to is read and then 1 is being added to that value. So in order to evaluate an expression, we need the abstract syntax of the expression itself and a state to lookup the values of the variables, or more precisely get the contents of the containers.

According to our discussion in Subsection 4.8.2, there are two ways to evaluate an expression. The L-evaluation \(\denotL\cdot\) that evaluates some expressions to addresses of containers and the R-evaluation \(\denotR\cdot\) that yields actual values. In the basic version of C0 that we have discussed so far, the only L-evaluable expression is the identifier (that denotes a variable) because C0 does not have pointers (yet).

Definition 6.3.2. Expression Evaluation in C0.

R-evaluation and L-evaluation are defined by the following two functions

\begin{align*} \denotR\cdot\cdot\amp:\Expr\to\Sigma\pto\Val\\ \denotL\cdot\cdot\amp:\LExpr\to\Sigma\pto\Addr \end{align*}
\begin{equation*} \begin{array}[t]{lcl} \denotL{\CVar x} \rho;\mu \amp = \amp \rho\,\CVar x \\ \denotR c \sigma \amp = \amp c \\ \denotR{e_1\mathbin o e_2} \sigma \amp = \amp \denotR{e_1} \sigma\mathbin o\denotR{e_2} \sigma \\ \denotR{l\,}\rho;\mu \amp = \amp \mu\left(\denotL{l\,}\rho;\mu\right) \\ \end{array} \\ \end{equation*}

Maybe the last row is the most interesting one. R-evaluating an L-evaluable expression is defined to be the L-evaluation which gives an address. This address is then used to get the container value with the memory assignment \(\mu\text{.}\)

Note that both functions, the one for the L-evaluation and the one for the R-evaluation are partial, which means that they are not defined for some combinations of expressions and states. For example, \(\denotR{e/0}\sigma\) is not defined for any expression \(e\) and any state \(\sigma\text{.}\) Similarly, \(\denotL{\CVar x}\) is not defined on any state in which \(\CVar x\) does not have an image.

Subsection 6.3.3 Statements

We define the execution of a C0 statement as a sequence of configurations. Each configuration either marks the end of the execution or can step to another configuration.

Definition 6.3.3. Configuration.

A configuration \(k\) is an element of the set \(K=\{\Lightning\}\cup\Sigma\cup(\Stmt\times\Sigma)\)

So a configuration can be one of three things:

  1. \(\Lightning\) to indicate that the program has aborted.

  2. A state \(\sigma\in\Sigma\) to indicate that the program execution ended in a state \(\sigma\text{.}\)

  3. And a pair \(\config{s}{\sigma}\) that consists of a program \(s\) that remains to be executed one some state \(\sigma\text{.}\)

The semantics of C0 now defines from which configuration we can step into which. This way, the semantics describes which statements still have to be executed and how the state of the program evolves. Formally, we define the semantics by a binary relation \(\steprel\) of configurations. In the literature, this approach is known as small-step operational semantics. \(\steprel\) is defined inductively over the abstract of C0 by a set of inference rules.

Definition 6.3.4. Operational Semantics of C0.

The first rule defines the semantics of the assignment statement. An assignment terminates in a state, if the left-hand side and the right-hand side expressions are defined on the state. The right-hand side expression is then R-evaluated to a value and the left-hand side expression is L-evaluated giving an address. In the resulting state, the contents of the container bound to the address is updated to the value.

\begin{equation*} \begin{prooftree} \AxiomC{$\denotR{e}\sigma=v\in\Val$} \AxiomC{$\denotL{l}\sigma=a\in\Addr$} \LeftLabel{[Assign]} \BinaryInfC{$\config{\CAssign le}{(\rho,\mu)}\steprel (\rho,\mu[a\mapsto v])$} \end{prooftree} \end{equation*}

The next rule steps to the then-case statement in an if if the condition evaluates to true (unequals to 0). Note that it does not step if the condition expression evaluation is not defined on \(\sigma\)

\begin{equation*} \begin{prooftree} \AxiomC{$\denotR{e}\sigma\ne 0$} \LeftLabel{[IfTrue]} \UnaryInfC{$\config{\CIf e{s_1}{s_2}}{\sigma}\steprel\config{s_1}{\sigma}$} \end{prooftree} \end{equation*}

Similar to the rule above but for the false case.

\begin{equation*} \begin{prooftree} \AxiomC{$\denotR{e}\sigma=0$} \LeftLabel{[IfFalse]} \UnaryInfC{$\config{\CIf e{s_1}{s_2}}{\sigma}\steprel\config{s_2}{\sigma}$} \end{prooftree} \end{equation*}

The while rule “rewrites” the program such that the execution of the first iteration corresponds to an if in which the while loop appears recursively.

\begin{equation*} \begin{prooftree} \AxiomC{} \LeftLabel{[While]} \UnaryInfC{$\config{\CWhile es}{\sigma}\steprel\config{\CIf{e}{\CBlock{s\ \CWhile es}}{\CBlock{}}}{\sigma}$} \end{prooftree} \end{equation*}

The empty block terminates in the same state. Essentially, this statement “does nothing”.

\begin{equation*} \begin{prooftree} \AxiomC{} \LeftLabel{[Empty]} \UnaryInfC{$\config{\CBlock{}}{\sigma}\steprel\sigma$} \end{prooftree} \end{equation*}

The [Exec] rule allows us to “strip off” the first statement of a block if that statement terminates. The remaining configuration consists of the “rest program” under the state in which the first statement terminated.

\begin{equation*} \begin{prooftree} \AxiomC{$\config{s_1}{\sigma}\steprel\sigma'$} \LeftLabel{[Exec]} \UnaryInfC{$\config{\CBlock{s_1,s_2,\dots,s_n}}{\sigma}\steprel\config{\CBlock{s_2,\dots,s_n}}{\sigma'}$} \end{prooftree} \end{equation*}

If the first statement in a block cannot terminate in one step, we have to allow it to take intermediate steps and “rewrite” the program accordingly.

\begin{equation*} \begin{prooftree} \AxiomC{$\config{s_1}{\sigma}\steprel\config{s_1'}{\sigma'}$} \LeftLabel{[Subst]} \UnaryInfC{$\config{\CBlock{s_1,\dots,s_n}}{\sigma}\steprel\config{\CBlock{s_1',\dots,s_n}}{\sigma'}$} \end{prooftree} \end{equation*}

Later in this chapter we will discuss program abortion which denotes a kind of improper program termination.

\begin{equation*} \begin{prooftree} \AxiomC{} \LeftLabel{[Abort]} \UnaryInfC{$\config{\CAbort}{\sigma}\steprel\Lightning$} \end{prooftree} \end{equation*}

If a statement aborts, an entire block “crashes”.

\begin{equation*} \begin{prooftree} \AxiomC{$\config{s_1}{\sigma}\steprel\Lightning$} \LeftLabel{[Crash]} \UnaryInfC{$\config{\CBlock{s_1,\dots,s_n}}{\sigma}\steprel\Lightning$} \end{prooftree} \end{equation*}

Based on this semantics we can now formally define some of the terms we have already used colloquially. To this end, let \(s\) to be a C0 program and \(\sigma\) a state.

Definition 6.3.5. Execution Trace.

An execution trace (or trace for short) of \(s\) under \(\sigma\) is an element of the transitive closure of \(\steprel\text{.}\) This means it is a finite sequence

\begin{equation*} \config s{\sigma}=k_1,\dots,k_n \end{equation*}

of configurations such that \(k_i\steprel k_{i+1}\) for all \(1\le i\lt n\)

Definition 6.3.6. Termination.

\(s\) terminates under \(\sigma\) if there is a trace \(c_1,\dots,c_n\) such that there is no configuration \(c'\) with \(c_n\steprel c'\text{.}\) We then write \(\config s\sigma\terminates c_n\text{.}\)

Definition 6.3.7. Divergence.

If there is no configuration under which \(s\) terminates, then we say that \(s\) diverges.

Subsection 6.3.4 Digression: Getting Stuck

The reason why a program may get stuck is because \(\denotL\cdot\) or \(\denotR\cdot\) may not be total and the rules [Assign,IfTrue,IfFalse] only allow progress if the evaluation of their constituent expressions is defined. To have a fourth outcome of program execution next to termination, divergence, abortion is surprising and unpleasant. It is possible to define the semantics of a language in such a way that programs never can get stuck. The simplest way to achieve that is to define some kind of “default” behavior to the cases where our semantics gets stuck. For example, in Java, dividing by 0 causes a specific exception to be thrown.

Another way of preventing getting stuck is to determine getting stuck behavior statically by some form of static code analysis, e.g. a type system 6.6. For example, Java mandates that local variables are never read before they are assigned and each Java implementation has to ensure this by some sort of code analysis.

Programming languages in which well-typed programs (so programs that adhere to all the static rules) cannot get stuck are called type-safe programming languages. C and C0 are not type-safe.

Let us briefly reflect on how we as programmers experience that a program gets stuck and how we can distinguish it from termination, divergence, and abortion. If we execute programs on paper using the rules above 6.3.4, we can easily detect if a program got stuck. In practice however, we typically would use compiler that translates some C/C0 program \(s\) to a machine code program \(\hat s\text{.}\) The C standard mandates that \(\hat s\) shows one of the defined behaviors of \(s\) if it terminates, aborts, or diverges. However, since C is all about performance, it defines situations where \(s\) gets stuck as undefined behavior. This means that there are no constraints on \(\hat s\) as soon as \(s\) gets stuck: The machine program may behave in any possible way. Notoriously this means that we cannot tell if \(s\) got stuck by observing \(\hat s\text{.}\)

Example 6.3.9.

consider the following C0 program and a piece of MIPS code a hypothetical C0 compiler has generated for our program.

if (y == 0)
    x = 42;
while (x != 0)
    if (x == 42)
        abort();
      li   $t0 42
      bnez $a1 L2
      move $a0 $t0
      b    L2
L1:   bne  $a0 $t0 L2
      jal  abort
L2:   bnez $a0 L1

Now let us consider some of the executions of that program on different states. The variable x is only written if y is 0. This means that the C0 program gets stuck when trying to evaluate the loop condition on all starting states in which y is unequal to 0. However, depending on the contents of the registers $a0 (parameter x) and $a1 (parameter y), the machine code program may terminate, abort or diverge.

Subsection 6.3.5 Some Examples

Let us now consider some example programs that we then execute with the C0 semantics that we have just defined.

Example 6.3.10. Swapping.

The following program swaps the contents of two variables.

{ t = x; x = y; y = t; }

We will now derive the execution trace of this program started on the initial state

\begin{align*} \rho \amp \defeq \{\CVar x\mapsto\adra, \CVar y\mapsto\adrb, \CVar t\mapsto\adrc\} \\ \mu \amp \defeq \{\adra\mapsto 1, \adrb\mapsto 2,\adrc\mapsto ?\} \end{align*}

The symbols \(\adra\text{,}\) \(\adrb\text{,}\) \(\adrc\) denote concrete addresses. We use geometric symbols here to emphasize that the address of a container in C/C0 is something different than the address of a memory cell in a computer. Here we, spelled out \(\rho\) and \(\mu\) separately. In the following we will use the short-hand notation from Definition 6.3.1.

We derive the first step of the program using the rules [Exec] and [Assign].

\begin{equation*} \begin{prooftree} \AXC{$\denotL{\CVar t}\rho;\mu=\adrc$} \AXC{$\denotR{\CVar x}\rho;\mu=1$} \LL{[Assign]} \BIC{$\config{\CAssign{\CVar t}{\CVar x}}{\sigma}\steprel\rho;\{\adra\mapsto 1,\adrb\mapsto 2,\adrc\mapsto 1\}\eqdef\sigma'$} \LL{[Exec]} \UIC{$\config{\CBlock{\CAssign{\CVar t}{\CVar x} \CAssign{\CVar x}{\CVar y} \CAssign{\CVar y}{\CVar t}}}{\sigma}\steprel\config{\CBlock{\CAssign{\CVar x}{\CVar y} \CAssign{\CVar y}{\CVar t}}}{\sigma'}$} \end{prooftree} \end{equation*}

The next steps are similar. From now on, we will not derive the individual steps explicitly using the inference rules but just give the configurations in the terminating trace like so:

\begin{align*} \amp\config{\CBlock{\CAssign{\CVar x}{\CVar y}\CAssign{\CVar y}{\CVar t}}}{\rho, \{\adra\mapsto 1, \adrb\mapsto 2, \adrc\mapsto 1\}} \amp \text{[Assign], [Exec]}\\ \steprel\amp\config{\CBlock{\CAssign{\CVar y}{\CVar t}}}{(\rho, \{\adra\mapsto 1, \adrb\mapsto 2, \adrc\mapsto 1\})} \amp \text{[Assign], [Exec]}\\ \steprel\amp\config{\CBlock{}}{(\rho, \{\adra\mapsto 2, \adrb\mapsto 1, \adrc\mapsto 1\})} \amp \text{[Assign], [Exec]} \\ \steprel\amp(\rho, \{\adra\mapsto 2, \adrb\mapsto 1, \adrc\mapsto 1\}) \amp \text{[Empty]} \end{align*}

Example 6.3.11. Minimum.

The following program computes the minimum of \(\CVar x\) and \(\CVar y\) and puts it into variable \(\CVar r\text{:}\)

if (x < y) r = x; else r = y;

Let us run the program on the state \(\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto\indetval\}\text{.}\)

\begin{align*} \amp\config{\CIf{\CBinary{\lt}{\CVar x}{\CVar y}}{\CAssign{\CVar r}{\CVar x}}{\CAssign{\CVar r}{\CVar y}}}{\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto\indetval\}}\\ \steprel\amp\config{\CAssign{\CVar r}{\CVar x}}{\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto\indetval\}}\amp\text{[IfTrue]}\\ \steprel\amp\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto 1\}\amp\text{[Assign]} \end{align*}

Example 6.3.12. Division.

The following program computes the quotient \(\CVar q\) and the remainder \(\CVar r\) of the division of a non-negative number \(\CVar x\) by a positive number \(\CVar y\text{.}\)

q = 0;
r = x;
while (y <= r) {
    r = r - y;
    q = q + 1;
}

Let us run the program on the state \(\{\CVar x\mapsto 5,\CVar y\mapsto 2,\CVar q\mapsto\indetval,\CVar r\mapsto\indetval\}\text{.}\) To improve the readability, we abbreviate the loop body with \(S\text{.}\)

\begin{equation*} \small \begin{array}{llll} \amp \langle\CBlock{\CAssign{\CVar q}{\CConst 0} \CAssign{\CVar r}{\CVar x} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S} \amp| {\{\CVar q\mapsto ?, \CVar r\mapsto ?, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \\ \steprel \amp \langle\CBlock{ \CAssign{\CVar r}{\CVar x} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto ?, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign], [Exec]} \\ \steprel \amp \langle\CBlock{ \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 5, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign], [Exec]} \\ \steprel \amp \langle\CBlock{\CIf{\CBinary{\le}{\CVar y}{\CVar r}}{\CBlock{S; \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S}}{\CBlock{}}} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 5, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[While], [Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{\CAssign{\CVar r}{\CBinary{-}{\CVar r}{\CVar y}} \CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}}} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 5, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[IfTrue], [Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{ \CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}}\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}}} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign], [Exec], 2x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign], [Exec], 2x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CIf{\CBinary{\le}{\CVar y}{\CVar r}}{\CBlock{S; \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S}}{\CBlock{}}}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[While], 2x[Subst], [Empty], [Exec], [Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{\CBlock{\CAssign{\CVar r}{\CBinary{-}{\CVar r}{\CVar y}} \CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}}}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[IfTrue], 2x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{\CBlock{ \CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}}}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign], [Exec], 3x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{\CBlock{} \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}}}} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign], [Exec], 3x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{\CIf{\CBinary{\le}{\CVar y}{\CVar r}}{\CBlock{S; \CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S}}{\CBlock{}}}}} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[While], 3x[Subst], [Empty], [Exec], [Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{\CBlock{}}}} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[IfFalse], 3x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{\CBlock{}}} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Empty], [Exec], 2x[Subst]}\\ \steprel \amp \langle\CBlock{\CBlock{}} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Empty], [Exec], [Subst]}\\ \steprel \amp \langle\CBlock{} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Empty], [Exec]}\\ \steprel \amp {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}} \amp \amp \ \text{[Empty]}\\ \end{array} \end{equation*}