The Semantics of C0

Section 6.3 The Semantics of C0

Subsection 6.3.1 The State

We have already spoken about the structure of a C program in Section 4.3. Here, we are going to define the notion of a state formally. We assume that there is given a finite set of Variables $\Var$ and a finite set of addresses $\Addr\text{.}$ We do not assume any further structure on these sets. The set of values is then defined to be $\Val\defeq\Addr\cup\CInt\text{.}$

In Section 4.3, we said that a variable is an identifier that is bound to the address of a container. So, one part of the state is the partial map from variables to addresses:

\begin{equation*} \Var\pto\Addr \end{equation*}

We call this map variable assignment. Note that it is a partial map because not at every time in the program's execution there must be an address bound to a variable: For example, a local variable is only bound to the address of the corresponding container during the execution of its scope.

To keep it simple, we only consider integer variables for now. This means that, in contrast to real C, we do not allow address arithmetic. So in every container there is only a single value. Therefore, we can model containers as a partial map from addresses to values:

\begin{equation*} \Addr\pto\Val\cup\{\indetval\} \end{equation*}

We call that map memory assignment. The value $\indetval$ is the undefined value which is the content of a container before the container is first assigned a value. Note that the memory assignment is also a a partial map. If some address is not in the domain of a memory assignment, this means that there (currently) does not exist a container with that address.

In summary, the state $\Sigma$ of a C0 program consists of a pair of partial maps:

\begin{equation*} \Sigma\defeq (\Var\pto\Addr)\times(\Addr\pto\Val\cup\{\indetval\}) \end{equation*}

In our formal discussion we typically use the letter $\rho$ for the variable assignment and the letter $\mu$ for the memory assignment. To make the writing a bit more compact we will write the pair $(\rho, \mu)$ as $\rho;\mu\text{.}$ If we do not care about the individual components of a state, we just write it as $\sigma\text{.}$

Definition 6.3.1. Abbreviated Notation.

In many examples that we will discuss in the following, the actual address of a container does not really matter. We will therefore use the short-hand notation $\sigma\,x\defeq (\mu\circ\rho)\,x$ or use the following to specify a state explicitly

\begin{equation*} \{x_1\mapsto v_1,\dots,x_n\mapsto v_n\} \end{equation*}

instead of spelling out the variable and memory assignment explicitly

\begin{equation*} \{x_1\mapsto a_1,\dots,x_n\mapsto a_n\}; \{a_1\mapsto v_1,\dots,a_n\mapsto v_n\} \end{equation*}

where the $a_i$ are pairwise different addresses.

Subsection 6.3.2 Expressions

Intuitively, it is clear how expressions are evaluated. For example, when looking at the expression $\CVar x+\CConst 1\text{,}$ we know from our practical experience that in order to evaluate this expression, first the container that $\CVar x$ is bound to is read and then 1 is being added to that value. So in order to evaluate an expression, we need the abstract syntax of the expression itself and a state to lookup the values of the variables, or more precisely get the contents of the containers.

According to our discussion in Subsection 4.8.2, there are two ways to evaluate an expression. The L-evaluation $\denotL\cdot$ that evaluates some expressions to addresses of containers and the R-evaluation $\denotR\cdot$ that yields actual values. In the basic version of C0 that we have discussed so far, the only L-evaluable expression is the identifier (that denotes a variable) because C0 does not have pointers (yet).

Definition 6.3.2. Expression Evaluation in C0.

R-evaluation and L-evaluation are defined by the following two functions

\begin{align*} \denotR\cdot\cdot\amp:\Expr\to\Sigma\pto\Val\\ \denotL\cdot\cdot\amp:\LExpr\to\Sigma\pto\Addr \end{align*}

\begin{equation*} \begin{array}[t]{lcl} \denotL{\CVar x} \rho;\mu \amp = \amp \rho\,\CVar x \\ \denotR c \sigma \amp = \amp c \\ \denotR{e_1\mathbin o e_2} \sigma \amp = \amp \denotR{e_1} \sigma\mathbin o\denotR{e_2} \sigma \\ \denotR{l\,}\rho;\mu \amp = \amp \mu\left(\denotL{l\,}\rho;\mu\right) \\ \end{array} \\ \end{equation*}

Maybe the last row is the most interesting one. R-evaluating an L-evaluable expression is defined via the L-evaluation, which gives an address. This address is then used to get the container value with the memory assignment $\mu\text{.}$

Note that both functions, the one for the L-evaluation and the one for the R-evaluation are partial, which means that they are not defined for some combinations of expressions and states. For example, $\denotR{e/0}\sigma$ is not defined for any expression $e$ and any state $\sigma\text{.}$ Similarly, $\denotL{\CVar x}$ is not defined on any state in which $\CVar x$ does not have an image.

Subsection 6.3.3 Statements

We define the execution of a C0 statement as a sequence of configurations. Configurations capture the current state of the execution that consists of a remainder program still to be executed and the current state.

Definition 6.3.3. Configuration.

A configuration $c\in\Conf\defeq\Prg\times\Sigma$ is a pair of a program and a state.

The semantics of C0 now determines from which configuration we can step into which. This way, the semantics describes the flow of the program and how the state of the program evolves. Formally, we define the semantics by a binary relation $\steprel$ among configurations. In the literature, this approach is known as small-step operational semantics.

There are also big-step semantics which relate inputs to outputs and abstract from the program steps that lead from an input to an output. Therefore, big-step semantics cannot model programs that diverge, i.e. do not terminate because they inherently do not result in a final state. Small-step semantics however capture the detailed execution of each statement in “small” steps. In fact, Definition 6.3.5 can be seen as a big-step semantics of C0 defined on top of a small-step semantics.

Definition 6.3.4. Operational Semantics of C0.

We define the small-step semantics $\mathrel{\steprel}\mathrel{\subseteq}\Conf\times\Conf$ of C0 by the following rules:

\begin{align*} \config{\CSeq{\CAssign le}p}{(\rho,\mu)} \amp\steprel\config p{(\rho,\mu[a\mapsto v])} \amp\amp\text{if }\denotR{e}\sigma=v\in\Val \amp\text{[Assign]}\\ \amp \amp\amp\text{and }\denotL{l}\sigma=a\in\Addr\\ \config{\CSeq{\CIf e{s_1}{s_2}}p}\sigma \amp\steprel\config{\CSeq{s_1}{p}}{\sigma} \amp\amp\text{if } \denotR{e}\sigma\ne 0 \amp\text{[IfTrue]}\\ \config{\CSeq{\CIf e{s_1}{s_2}}p}\sigma \amp\steprel\config{\CSeq{s_2}{p}}{\sigma} \amp\amp\text{if } \denotR{e}\sigma=0 \amp\text{[IfFalse]}\\ \config{\CSeq{\CWhile es}p}\sigma \amp\steprel\config{\CSeq{s}{\CSeq{\CWhile es}{p}}}{\sigma} \amp\amp\text{if } \denotR{e}\sigma\ne 0 \amp\text{[WhileTrue]}\\ \config{\CSeq{\CWhile es}p}\sigma \amp\steprel\config{p}{\sigma} \amp\amp\text{if } \denotR{e}\sigma=0 \amp\text{[WhileFalse]}\\ \config{\CSeq{\CBlock{p_1}}p_2}\sigma \amp\steprel\config{\CConcat{p_1}{p_2}}{\sigma}\amp\amp \amp\text{[Block]}\\ \config{\CSeq{\CAbort}p}\sigma \amp\steprel\config{\ACrash}{\sigma} \amp\amp \amp\text{[Crash]} \end{align*}

The rule [Assign] defines the semantics of the assignment statement. The right-hand side expression is R-evaluated to a value and the left-hand side expression is L-evaluated giving an address. In the resulting state, the content of the container bound to the address is updated to the value. Note that [Assign] only relates two configurations if the R-evaluation or the L-evaluation are actually defined, which may not necessarily be the case.

The next two rules [IfTrue] and [IfFalse] handle the if-then-else statement which either steps to $s_1$ or $s_2$ depending on how the conditional expression evaluates. Note again that if the R-evaluation of $e$ is not defined, both rules do not relate any two configurations.

The two while rules [WhileTrue] and [WhileFalse] are similar to the if rules. [WhileTrue] however appends the while loop to the body to “implement” the iterative behavior of while.

The block rule concatenates programs $p_1$ and $p_2$ which is denoted by the $\CConcat{}{}$ operator. $\CConcat{}{}$ is defined recursively in the following way:

\begin{equation*} \CConcat\CTerm p\defeq p\qquad \CConcat{(\CSeq sp)}{p'}\defeq \CSeq{s}{(\CConcat p{p'})} \end{equation*}

Based on this semantics we can now formally define some of the terms we have already used colloquially. To this end, let $p$ to be a C0 program and $\sigma$ a state.

Definition 6.3.5. Execution Trace.

An execution trace (or trace for short) of the program $p$ under $\sigma$ is a finite sequence of configurations

\begin{equation*} \config p{\sigma}=c_1,\quad\dots,\quad c_n \end{equation*}

such that $c_i\steprel c_{i+1}$ for all $1\le i\lt n$

Definition 6.3.6. Termination and Divergence.

$p$ terminates under $\sigma$ if there is a trace $c_1,\dots,c_n$ such that there is no configuration $c'$ with $c_n\steprel c'\text{.}$ We then write $\config p\sigma\terminates c_n\text{.}$ If $\config p\sigma$ does not terminate, we say it diverges.

Remark 6.3.7. Kinds of Termination.

Based on the definition of program syntax 6.2.1, there are three kinds of how a program $p$ can terminate:

Proper termination: $\config p\sigma\terminates\configt{\sigma'}$

The program finished execution properly and leaves behind a final state. There is no remainder program left to be executed.
Abort: $\config p\sigma\terminates\config{\ACrash}{\sigma'}$

The program ended in an abort statement which terminates the execution immediately with the “crash” program.
Getting stuck: $\config p\sigma\terminates\config{\CSeq{s}{p'}}{\sigma'}$

In this case the program cannot make another step although there still is a rest of the program to be executed. this may for instance occur if the assignment statement cannot step because the evaluation of one of its constituent expressions is not defined on the particular state. This corresponds to situations where C exhibits undefined behavior.

Subsection 6.3.4 Digression: Getting Stuck

The reason why a program may get stuck is because $\denotL\cdot$ or $\denotR\cdot$ may not be total and the rules [Assign,IfTrue,IfFalse,WhileTrue,WhileFalse] only allow progress if the evaluation of their constituent expressions is defined. To have a fourth outcome of program execution next to termination, divergence, abort is surprising and unpleasant. It is possible to define the semantics of a language in such a way that programs can never get stuck. The simplest way to achieve that is to define some kind of “default” behavior to the cases where our semantics gets stuck. For example, in Java, dividing by 0 causes a specific exception to be thrown.

Another way of preventing getting stuck is to determine getting-stuck behavior statically by some form of static code analysis, e.g. a type system 6.6. For example, Java mandates that local variables are never read before they are assigned and each Java implementation has to ensure this by some sort of code analysis.

Programming languages in which well-typed programs (so programs that adhere to all the static rules) cannot get stuck are called type-safe programming languages. C and C0 are not type-safe.

Let us briefly reflect on how we as programmers experience that a program gets stuck and how we can distinguish it from termination, divergence, and abort. If we execute programs on paper using the rules above 6.3.4, we can easily detect if a program got stuck. In practice however, we typically would use a compiler that translates some C/C0 program $s$ to a machine code program $\hat s\text{.}$ The C standard mandates that $\hat p$ shows one of the defined behaviors of $p$ if it terminates, aborts, or diverges. However, since C is all about performance, it defines situations where $p$ gets stuck as undefined behavior. This means that there are no constraints on $\hat p$ as soon as $p$ gets stuck: The machine program may behave in any possible way. Notoriously this means that we cannot tell if $p$ got stuck by observing $\hat p\text{.}$

Example 6.3.8.

consider the following C0 program and a piece of RISC-V code a hypothetical C0 compiler has generated for our program.

if (y == 0)
    x = 42;
while (x != 0)
    if (x == 42)
        abort();

      li   t0, 42
      bnez a1, L2
      mv   a0, t0
      b    L2
L1:   bne  a0, t0, L2
      call abort
L2:   bnez a0 L1

Now let us consider some of the executions of that program on different states. The variable x is only written if y is 0. This means that the C0 program gets stuck when trying to evaluate the loop condition on all starting states in which y is unequal to 0. However, depending on the contents of the registers $a0 (parameter x) and $a1 (parameter y), the machine code program may terminate, abort or diverge.

Subsection 6.3.5 Some Examples

Let us now consider some example programs that we then execute with the C0 semantics that we have just defined.

Example 6.3.9. Swapping.

The following program swaps the contents of two variables.

t = x; x = y; y = t;

We will now derive the execution trace of this program started on the initial state

\begin{align*} \rho \amp \defeq \{\CVar x\mapsto\adra, \CVar y\mapsto\adrb, \CVar t\mapsto\adrc\} \\ \mu \amp \defeq \{\adra\mapsto 1, \adrb\mapsto 2,\adrc\mapsto ?\} \end{align*}

The symbols $\adra\text{,}$ $\adrb\text{,}$ $\adrc$ denote concrete addresses. We use geometric symbols here to emphasize that the address of a container in C/C0 is something different than the address of a memory cell in a computer.

\begin{align*} \amp\config{\CSeqThree{\CAssign{\CVar t}{\CVar x}}{\CAssign{\CVar x}{\CVar y}}{\CAssign{\CVar y}{\CVar t}}} {(\rho, \{\adra\mapsto 1, \adrb\mapsto 2, \adrc\mapsto \indetval\})} \amp \text{}\\ \steprel\amp\config{\CSeq {\CAssign{\CVar x}{\CVar y}}{\CAssign{\CVar y}{\CVar t}}} {(\rho, \{\adra\mapsto 1, \adrb\mapsto 2, \adrc\mapsto 1 \})} \amp \text{[Assign]}\\ \steprel\amp\config{ {\CAssign{\CVar y}{\CVar t}}} {(\rho, \{\adra\mapsto 2, \adrb\mapsto 2, \adrc\mapsto 1 \})} \amp \text{[Assign]}\\ \steprel\amp\config{\CTerm}{(\rho, \{\adra\mapsto 2, \adrb\mapsto 1, \adrc\mapsto 1\})} \amp \text{[Assign]} \end{align*}

In the last example, we spelled out $\rho$ and $\mu$ individually. In the following we will use the short-hand notation from Definition 6.3.1.

Example 6.3.10. Minimum.

The following program computes the minimum of $\CVar x$ and $\CVar y$ and puts it into variable $\CVar r\text{:}$

if (x < y) r = x; else r = y;

Let us run the program on the state $\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto\indetval\}\text{.}$

\begin{align*} \amp\config{\CIf{\CBinary{\lt}{\CVar x}{\CVar y}}{\CAssign{\CVar r}{\CVar x}}{\CAssign{\CVar r}{\CVar y}}}{\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto\indetval\}}\\ \steprel\amp\config{\CAssign{\CVar r}{\CVar x}}{\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto\indetval\}}\amp\text{[IfTrue]}\\ \steprel\amp\config{\CTerm}{\{\CVar x\mapsto 1,\CVar y\mapsto 2,\CVar r\mapsto 1\}}\amp\text{[Assign]} \end{align*}

Example 6.3.11. Division.

The following program computes the quotient $\CVar q$ and the remainder $\CVar r$ of the division of a non-negative number $\CVar x$ by a positive number $\CVar y\text{.}$

q = 0;
r = x;
while (y <= r) {
    r = r - y;
    q = q + 1;
}

Let us run the program on the state $\{\CVar x\mapsto 5,\CVar y\mapsto 2,\CVar q\mapsto\indetval,\CVar r\mapsto\indetval\}\text{.}$ To improve the readability, we abbreviate the loop body with $S\text{.}$

\begin{equation*} \begin{array}{llll} \amp \langle \CSeqThree{\CAssign{\CVar q}{\CConst 0}} {\CAssign{\CVar r}{\CVar x}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S} \amp| {\{\CVar q\mapsto ?, \CVar r\mapsto ?, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \\ \steprel \amp \langle \CSeq {\CAssign{\CVar r}{\CVar x}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto ?, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign]} \\ \steprel \amp \langle {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}S} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 5, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign]} \\ \steprel \amp \langle\CSeq{\CBlock{\CSeq{\CAssign{\CVar r}{\CBinary{-}{\CVar r}{\CVar y}}}{\CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}}}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 5, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[WhileTrue]}\\ \steprel \amp \langle \CSeqThree{\CAssign{\CVar r}{\CBinary{-}{\CVar r}{\CVar y}}} {\CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 5, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Block]}\\ \steprel \amp \langle \CSeq {\CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 0, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign]}\\ \steprel \amp \langle {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign]}\\ \steprel \amp \langle\CSeq{\CBlock{\CSeq{\CAssign{\CVar r}{\CBinary{-}{\CVar r}{\CVar y}}}{\CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}}}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[WhileTrue]}\\ \steprel \amp \langle \CSeqThree{\CAssign{\CVar r}{\CBinary{-}{\CVar r}{\CVar y}}} {\CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 3, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Block]}\\ \steprel \amp \langle \CSeq {\CAssign{\CVar q}{\CBinary{+}{\CVar q}{\CConst 1}}} {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 1, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign]}\\ \steprel \amp \langle {\CWhile{\CBinary{\le}{\CVar y}{\CVar r}}{S}} \amp| {\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}}\rangle \amp \ \text{[Assign]}\\ \steprel \amp \config{\CTerm}{\{\CVar q\mapsto 2, \CVar r\mapsto 1, \CVar x\mapsto 5, \CVar y\mapsto 2\}} \amp \amp \ \text{[WhileFalse]}\\ \end{array} \end{equation*}

An Introduction to Imperative Programming

Search Results: