Skip to main content

Section 6.6 A Simple Type System (C0t)

The set of C0 programs that are defined by the abstract syntax in Section 6.2 contains programs that get stuck for some input. For example, the following program is a syntactically-correct C0 program. (Which means that the sequence of characters in the program text comply with the concrete syntax of C0 and therefore we can attribute a unique abstract syntax tree with this text.) However, the execution of the program, as defined by the C0 semantics directly leads to a configuration in which the program gets stuck.

  int x;
  x = 666;
  *x = 42;

The reason is that the contents of x's container is not an address but an integer. For some programs, such as the program above, we can detect and prevent their getting stuck statically. We do this by means of a static semantics. This semantics gives “meaning” to the abstract syntax of a C0 program, hence the name semantics. This meaning is however not dependent on some input but independent of any input, hence it is static. Types of variables and expressions are one example for such a static property.

From the information that variable \(\CVar x\) has type \(\CInt\text{,}\) we want to deduce that for every program execution \(\CVar x\) contains an int and never, for example, an address. Hence, \(\CIndir{\CVar x}\) will always get stuck if \(\CVar x\) has type \(\CInt\text{.}\) In the following, we will formulate type rules that characterize programs for which such errors do not occur. We will call these programs then well-typed.

Since C0 is not a statically type-safe language, there will be programs that are well-typed but nevertheless get stuck, for example:

int x;
x = x + 1
Listing 6.6.1. \(\CVar x\) is read from but not written to before.
int x;
int *p;
{ int y; p = &y; }
x = *p;
Listing 6.6.2. After the inner block has been left, the pointer \(\CVar p\) is dangling (i.e. contains an invalid address) because the container of \(\CVar y\) has been freed.
int x = 666;
x = x / 0;
Listing 6.6.3. Division by zero also is undefined behavior in C.

This means that in C0 and C there are getting-stuck situations that are not ruled out by the static semantics. Languages where well-typed programs never get stuck are called statically type-safe 13 

The static semantics of C0 is defined by relations that relate the abstract syntax of C0 with a type environment. The type environment maps, based on variable declarations, identifiers to types. These relations are defined inductively for each language element of C0 which means that we can elaborate the typing relation for an expression based on the typing relations of its sub-expressions.

Before defining the typing relations for statements and expressions, we need to add types to the abstract syntax of C0.

Definition 6.6.4.

\begin{equation*} \begin{array}{r@{\,}c@{\,}lcll} \mathit{ITy} \amp \ni \amp i \amp \syndef \amp \CChar\mid \CInt \amp \text{integer type} \\ \mathit{PTy} \amp \ni \amp p \amp \syndef \amp \CIndir t \amp \text{pointer type} \\ \mathit{STy} \amp \ni \amp k \amp \syndef \amp p\mid i \amp \text{scalar type} \\ \mathit{ Ty} \amp \ni \amp t \amp \syndef \amp k\mid \CVoid \amp \text{type} \end{array} \end{equation*}

C supports implicit type conversion which means that at some places values of some type can be automatically (without further annotations by the programmer) converted to a value of a different type. For example, a \(\CIndir\CVoid\) can be implicitly converted into a \(\CIndir\CInt\) We model this here by the relation \(\castrel\text{.}\)

Definition 6.6.5. Implicit Type Conversion.

\begin{equation*} \begin{prooftree} \AxiomC{} \UnaryInfC{$i_1\castrel i_2$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfC{$\CPtr t\castrel\CPtr t$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfC{$\CPtr t\castrel\CPtr\CVoid$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfC{$\CPtr\CVoid\castrel\CPtr t$} \end{prooftree} \end{equation*}

Subsection 6.6.1 Expressions

The static semantics of the C0 expression language is defined by a relation

\begin{equation*} \mathit{ExprS}\subseteq(\Var\pto\Ty)\times\Expr\times\Ty \end{equation*}

which we will inductively define over the syntax of C0. It is standard to use the following notation to indicate that a triple of type environment, expression, and type is in \(\ExprS\text{:}\)

\begin{equation*} \typeGamma et \quad:\Longleftrightarrow\quad(\Gamma,e,t)\in\mathit{ExprS} \end{equation*}

So, \(\typeGamma et\) says that expression \(e\) has type \(t\) under type environment \(\Gamma\text{.}\)

Definition 6.6.6. Static Semantics of the C0 Expression Language.

\begin{equation*} \begin{prooftree} \AxiomC{$\Gamma\,x=k$} \LeftLabel{[Var]} \UnaryInfC{$\typeGamma xk$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$-N\le c\lt N$} \AxiomC{$N=2^{31}$} \LeftLabel{[Const]} \BinaryInfC{$\typeGamma c{\CInt}$} \end{prooftree} \end{equation*}
\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e_1}{i_1}$} \AxiomC{$\typeGamma{e_2}{i_2}$} \LeftLabel{[Arith]} \BinaryInfC{$\typeGamma {e_1\mathrel{r}e_2}{\CInt}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{e_1}{k_1}$} \AxiomC{$\typeGamma{e_2}{k_2}$} \AxiomC{$k_1\castrel k_2$} \LeftLabel{[Cmp]} \TrinaryInfC{$\typeGamma {e_1\mathrel{m}e_2}{\CInt}$} \end{prooftree} \end{equation*}
\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e_1}{\CPtr k}$} \AxiomC{$\typeGamma{e_2}{i}$} \LeftLabel{[PtrArith]} \BinaryInfC{$\typeGamma {e_1+e_2}{\CPtr k}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{e_1}{\CPtr k}$} \AxiomC{$\typeGamma{e_2}{\CPtr k}$} \AxiomC{$k_1\castrel k_2$} \LeftLabel{[PtrDiff]} \TrinaryInfC{$\typeGamma {e_1-e_2}{\CInt}$} \end{prooftree} \end{equation*}
\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e}{\CPtr t}$} \LeftLabel{[PtrCmp]} \UnaryInfC{$\typeGamma {e\mathrel{\mathtt{==}}0}{\CInt}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{e}{\CPtr t}$} \LeftLabel{[PtrCmpN]} \UnaryInfC{$\typeGamma {e\mathrel{\mathtt{!=}}0}{\CInt}$} \end{prooftree} \end{equation*}
\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e}{\CPtr k}$} \LeftLabel{[Indir]} \UnaryInfC{$\typeGamma{\CIndir e}{k}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{l}{t}$} \LeftLabel{[Addr]} \UnaryInfC{$\typeGamma {\CAddrOf l}{\CPtr t}$} \end{prooftree} \end{equation*}

The rule [Var] determines the type of the expression ‘occurrence of a variable’ by looking up the type of the variable in the type environment. Constants are always int. Binary arithmetic expressions also always have the type int irrespective of the operand types. Arithmetic is only allowed for integer types. Pointer arithmetic is handled by two specific rules: [PtrArith] to add an offset to a pointer and [PtrDiff] to subtract two pointers. Comparisons also are int (executing them yields either 0 or 1). Pointers can also be compared. However, their operands have to be implicitly convertible to each other: We can compare a \(\CPtr\CVoid\) with a \(\CPtr\CInt\) but not a \(\CPtr\CChar\) with a \(\CPtr\CInt\text{.}\) As a special case, pointers can be compared against 0 ([PtrCmp] and [PtrCmpN]). Additionally, [PtrArith], [PtrDiff], [PtrCmp], [Cmp], and [PtrCmpN] also need variants to handle commutativity which we omit here for the sake of brevity.

Note however that comparing two pointers or subtracting two pointers that don't point to the same object is undefined behavior and will cause the program to get stuck. These cases cannot be caught by our type system.

Subsection 6.6.2 Statements

Similar to expressions we also use a relation to indicate if a statement is well-typed. Now, statements do not have a type themselves that we can associate with them. The well-typedness relation for statements therefore merely captures if the expressions that are contained in the statements are well-typed. Therefore, our relation is only binary and not ternary as the one for expressions:

\begin{equation*} \mathit{StmtS}\subseteq(\Var\pto\Ty)\times\Stmt \end{equation*}

and again for better readability and to comply to the standard convention, we use the shorthand notation:

\begin{equation*} \stmtGamma s \quad:\Longleftrightarrow\quad(\Gamma,s)\in\mathit{StmtS} \end{equation*}

Definition 6.6.7. Static Semantics of the C0 Statement Language.

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma e{k_1}$} \AxiomC{$\typeGamma l{k_2}$} \AxiomC{$k_1\castrel k_2$} \LeftLabel{[Assign]} \TrinaryInfC{$\stmtGamma{\CAssign le}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \LeftLabel{[Abort]} \UnaryInfC{$\stmtGamma{\CAbort}$} \end{prooftree} \end{equation*}
\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma ek$} \AxiomC{$\stmtGamma s$} \LeftLabel{[While]} \BinaryInfC{$\stmtGamma{\CWhile es}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma ek$} \AxiomC{$\stmtGamma{s_1}$} \AxiomC{$\stmtGamma{s_2}$} \LeftLabel{[If]} \TrinaryInfC{$\stmtGamma{\CIf e{s_1}{s_2}}$} \end{prooftree} \end{equation*}
\begin{equation*} \begin{prooftree} \AxiomC{$\Gamma'=\Gamma[x_1\mapsto k_1,\dots,x_m\mapsto k_m]$} \noLine \UnaryInfC{$\stmtjudge{\Gamma'}{s_1}\quad\cdots\quad\stmtjudge{\Gamma'}{s_n}$} \LeftLabel{[Block]} \UnaryInfC{$\stmtGamma{\CBlock{k_1\,x_1;\dots k_m\,x_m; s}}$} \end{prooftree} \end{equation*}

The rules [While] and [If] make sure that the condition expression they contain have a scalar type (i.e. are not \(\CVoid\)). The rule [Assign] makes sure that the right-hand side type can be converted to the left-hand side type. This rules out a program like the following:

{ int *p; p = 5; }

Most notably however is the rule [Block] which administers the type environment: In order for a block to be well-typed, its constituent statements most be well-typed under the type environment that additionally contains the local variables declared in the block. Note that this also models variable hiding: Variables declared in inner blocks hide the declarations of outer blocks.

Subsection 6.6.3 Examples

Let us consider a couple of examples that put our small type system to work and compute the types of several expressions or check the well-typedness of statements.

Example 6.6.8.

Let's consider the derivation of the type of an expression with the type environment \(\Gamma\defeq\{\CVar x\mapsto\CPtr\CChar,\CVar y\mapsto\CInt\}\)

\begin{equation*} \begin{prooftree} \AxiomC{$\Gamma\ \CVar x=\CPtr\CChar$} \LeftLabel{[Var]} \UnaryInfC{$\typeGamma{\CVar x}{\CPtr\CChar}$} \LeftLabel{[Indir]} \UnaryInfC{$\typeGamma{\CIndir\CVar x}{\CChar}$} \AxiomC{$\Gamma\CVar y=\CInt$} \LeftLabel{[Var]} \UnaryInfC{$\typeGamma{\CVar y}{\CInt}$} \BinaryInfC{$\typeGamma{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CInt}$} \AxiomC{} \LeftLabel{[Const]} \UnaryInfC{$\typeGamma {\CConst 1}{\CInt}$} \LeftLabel{[Arith]} \BinaryInfC{$\typeGamma{\CBinary{-}{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CConst 1}}{\CInt}$} \end{prooftree} \end{equation*}

Example 6.6.9.

In this example, we consider the same expression but with a different type environment:

\begin{equation*} \Gamma\defeq\{\CVar x\mapsto\CChar,\CVar y\mapsto\CInt\} \end{equation*}

Here, we can see nicely that we are not able to prove the premise \(\typeGamma{\CVar x}{\CPtr\CChar}\) with [Var] because the the type environment does not provide the type \(\CPtr\CChar\) for \(\CVar x\text{.}\)

\begin{equation*} \begin{prooftree} \AxiomC{Error} \LeftLabel{[Var]} \UnaryInfC{$\typeGamma{\CVar x}{\CPtr\CChar}$} \LeftLabel{[Indir]} \UnaryInfC{$\typeGamma{\CIndir\CVar x}{\CChar}$} \AxiomC{$\Gamma\ \CVar y=\CInt$} \LeftLabel{[Var]} \UnaryInfC{$\typeGamma{\CVar y}{\CInt}$} \BinaryInfC{$\typeGamma{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CInt}$} \AxiomC{} \LeftLabel{[Const]} \UnaryInfC{$\typeGamma {\CConst 1}{\CInt}$} \LeftLabel{[Arith]} \BinaryInfC{$\typeGamma{\CBinary{-}{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CConst 1}}{\CInt}$} \end{prooftree} \end{equation*}

Example 6.6.10.

Finally, let's consider the derivation of the static semantics for a more complex example in which we declare local variables in blocks.

\begin{equation*} \begin{prooftree} \AxiomC{$\vdots$} \LeftLabel{[Assign]} \UnaryInfC{$\stmtGamma{\CAssign{\CVar x}{\CConst 3}}$} \AxiomC{$\vdots$} \LeftLabel{[Assign]} \UnaryInfC{$\stmtjudge{\Gamma[\CVar x\mapsto\CPtr\CInt]}{\CAssign{\CVar x}{\CAddrOf{\CVar y}}}$} \LeftLabel{[Block]} \UnaryInfC{$\stmtGamma{\CBlock{\CPtr\CInt\,\CVar x;\ \CAssign{\CVar x}{\CAddrOf{\CVar y}}}}$} \AxiomC{$\vdots$} \LeftLabel{[Assign]} \UnaryInfC{$\stmtjudge{\Gamma\defeq\{\CVar x\mapsto\CInt,\CVar y\mapsto\CInt\}}{\CAssign{\CVar y}{\CVar x}}$} \TrinaryInfC{$\stmtjudge{\emptyset}{\CBlock{ \CInt\,\CVar x; \ \CInt\,\CVar y; \ \CAssign{\CVar x}{\CConst 3} \CBlock{ \CPtr\CInt\,\CVar x; \ \CAssign{\CVar x}{\CAddrOf{\CVar y}} } \ \CAssign{\CVar y}{\CVar x}}}$} \end{prooftree} \end{equation*}
In statically type-safe languages (like C\#, Java, OCaml, etc.), the semantics ensures that in exceptional situations that cannot be covered statically because they may depend on the input of a program, a certain well-defined behavior is triggered, such as throwing an exception.