A Simple Type System (C0t)

Section 6.6 A Simple Type System (C0t)

The set of C0 programs that are defined by the abstract syntax in Section 6.2 contains programs that get stuck for some input. For example, the following program is a syntactically-correct C0 program. (Which means that the sequence of characters in the program text complies with the concrete syntax of C0 and therefore we can attribute a unique abstract syntax tree with this text.) However, the execution of the program, as defined by the C0 semantics directly leads to a configuration in which the program gets stuck.

{
  int x;
  x = 666;
  *x = 42;
}

The reason is that the contents of x's container is not an address but an integer. For some programs, such as the program above, we can detect and prevent their getting stuck statically. We do this by means of a static semantics. This semantics gives “meaning” to the abstract syntax of a C0 program, hence the name semantics. This meaning is however not dependent on some input but independent of any input, hence it is static. Types of variables and expressions are one example for such a static property.

From the information that variable $\CVar x$ has type $\CInt\text{,}$ we want to deduce that for every program execution $\CVar x$ contains an int and never, for example, an address. Hence, $\CIndir{\CVar x}$ will always get stuck if $\CVar x$ has type $\CInt\text{.}$ In the following, we will formulate type rules that characterize programs for which such errors do not occur. We will call these programs then well-typed.

Since C0 is not a statically type-safe language, there will be programs that are well-typed but nevertheless get stuck, for example:

int x;
x = x + 1

Listing 6.6.1. $\CVar x$ is read from but not written to before.

int x;
int *p;
{ int y; p = &y; }
x = *p;

Listing 6.6.2. After the inner block has been left, the pointer $\CVar p$ is dangling (i.e. contains an invalid address) because the container of $\CVar y$ has been freed.

int x = 666;
x = x / 0;

Listing 6.6.3. Division by zero also is undefined behavior in C.

This means that in C0 and C there are getting-stuck situations that are not ruled out by the static semantics. Languages where well-typed programs never get stuck are called statically type-safe¹²

The static semantics of C0 is defined by relations that relate the abstract syntax of C0 with a type environment. The type environment maps, based on variable declarations, identifiers to types. These relations are defined inductively for each language element of C0 which means that we can elaborate the typing relation for an expression based on the typing relations of its sub-expressions.

Before defining the typing relations for statements and expressions, we need to add types to C0. For the sake of simplicity, we only define the concrete syntax here, since we mostly use concrete syntax for better readability in our formal development.

Definition 6.6.4.

\begin{equation*} \begin{array}{r@{\,}c@{\,}lcll} \text{Category} \amp \amp \amp \amp \text{Concrete Syntax} \amp \text{Description} \\ \hline \mathit{ITy} \amp \ni \amp i \amp \syndef \amp \CChar\mid \CInt \amp \text{integer type} \\ \mathit{PTy} \amp \ni \amp p \amp \syndef \amp \CPtr t \amp \text{pointer type} \\ \mathit{STy} \amp \ni \amp k \amp \syndef \amp p\mid i \amp \text{scalar type} \\ \mathit{ Ty} \amp \ni \amp t \amp \syndef \amp k\mid \CVoid \amp \text{type} \end{array} \end{equation*}

C supports implicit type conversion which means that at some places values of some type can be automatically (without further annotations by the programmer) converted to a value of a different type. For example, a $\CPtr\CVoid$ can be implicitly converted into a $\CPtr\CInt$ We model this here by the relation $\castrel\text{.}$

Definition 6.6.5. Implicit Type Conversion.

\begin{equation*} \begin{prooftree} \AxiomC{} \UnaryInfC{$i_1\castrel i_2$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfC{$\CPtr t\castrel\CPtr t$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfC{$\CPtr t\castrel\CPtr\CVoid$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfC{$\CPtr\CVoid\castrel\CPtr t$} \end{prooftree} \end{equation*}

Subsection 6.6.1 Expressions

The static semantics of the C0 expression language is defined by a relation

\begin{equation*} \mathit{ExprS}\subseteq(\Var\pto\Ty)\times\Expr\times\Ty \end{equation*}

which we will inductively define over the syntax of C0. It is standard to use the following notation to indicate that a triple of type environment, expression, and type is in $\ExprS\text{:}$

\begin{equation*} \typeGamma et \quad:\Longleftrightarrow\quad(\Gamma,e,t)\in\mathit{ExprS} \end{equation*}

So, $\typeGamma et$ says that expression $e$ has type $t$ under type environment $\Gamma\text{.}$

Definition 6.6.6. Static Semantics of the C0 Expression Language.

\begin{equation*} \begin{prooftree} \AxiomC{$\Gamma\,x=k$} \LeftLabel{[TVar]} \UnaryInfC{$\typeGamma xk$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$-N\le c\lt N$} \AxiomC{$N=2^{31}$} \LeftLabel{[TConst]} \BinaryInfC{$\typeGamma c{\CInt}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e_1}{i_1}$} \AxiomC{$\typeGamma{e_2}{i_2}$} \LeftLabel{[TArith]} \BinaryInfC{$\typeGamma {e_1\mathrel{r}e_2}{\CInt}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{e_1}{k_1}$} \AxiomC{$\typeGamma{e_2}{k_2}$} \AxiomC{$k_1\castrel k_2$} \LeftLabel{[TCmp]} \TrinaryInfC{$\typeGamma {e_1\mathrel{m}e_2}{\CInt}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e_1}{\CPtr k}$} \AxiomC{$\typeGamma{e_2}{i}$} \LeftLabel{[TPtrArith]} \BinaryInfC{$\typeGamma {e_1+e_2}{\CPtr k}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{e_1}{\CPtr k}$} \AxiomC{$\typeGamma{e_2}{\CPtr k}$} \LeftLabel{[TPtrDiff]} \BinaryInfC{$\typeGamma {e_1-e_2}{\CInt}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e}{\CPtr t}$} \LeftLabel{[TPtrCmp]} \UnaryInfC{$\typeGamma {e\mathrel{\mathtt{==}}0}{\CInt}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{e}{\CPtr t}$} \LeftLabel{[TPtrCmpN]} \UnaryInfC{$\typeGamma {e\mathrel{\mathtt{!=}}0}{\CInt}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma{e}{\CPtr k}$} \LeftLabel{[TIndir]} \UnaryInfC{$\typeGamma{\CIndir e}{k}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma{l}{t}$} \LeftLabel{[TAddr]} \UnaryInfC{$\typeGamma {\CAddrOf l}{\CPtr t}$} \end{prooftree} \end{equation*}

The rule [TVar] determines the type of the expression ‘occurrence of a variable’ by looking up the type of the variable in the type environment. Constants are always int. Binary arithmetic expressions also always have the type int irrespective of the operand types. Arithmetic is only allowed for integer types. Pointer arithmetic is handled by two specific rules: [TPtrArith] to add an offset to a pointer and [TPtrDiff] to subtract two pointers of the same type. The results of comparisons are of integer type (executing them yields either 0 or 1). Pointers can also be compared. However, the operands of a comparison have to be implicitly convertible to each other: We can compare a $\CPtr\CVoid$ with a $\CPtr\CInt$ and a $\CChar$ with a $\CInt$ but not a $\CPtr\CChar$ with a $\CPtr\CInt\text{.}$ As a special case, pointers can be compared against 0 ([TPtrCmp] and [TPtrCmpN]). Additionally, [TPtrArith], [TPtrCmp], [TCmp], and [TPtrCmpN] also need variants to handle commutativity which we omit here for the sake of brevity.

Note however that comparing two pointers or subtracting two pointers that do not point to the same object is undefined behavior and will cause the program to get stuck. These cases are not caught by our type system.

Subsection 6.6.2 Statements and Programs

Similar to expressions we also use a relation to indicate if a statement and a program is well-typed. Now, statements and programs do not have a type themselves that we can associate with them. The well-typedness relation for statements and programs therefore merely captures if the expressions that are contained in the statements are well-typed. Therefore, the well-typedness relation is only binary and not ternary as the one for expressions:

\begin{equation*} \mathit{StmtS}\subseteq(\Var\pto\Ty)\times\Stmt \quad \mathit{PrgS}\subseteq(\Var\pto\Ty)\times\Prg \end{equation*}

and again for better readability and to comply to the standard convention, we use the shorthand notation (similarly for programs):

\begin{equation*} \stmtGamma s \quad:\Longleftrightarrow\quad(\Gamma,s)\in\mathit{StmtS} \end{equation*}

Definition 6.6.7. Static Semantics of C0 Statements and Programs.

\begin{equation*} \begin{prooftree} \AxiomC{$\stmtGamma s$} \AxiomC{$\stmtGamma p$} \LeftLabel{[TSeq]} \BinaryInfC{$\stmtGamma{\CSeq sp}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \LeftLabel{[TTerm]} \UnaryInfC{$\stmtGamma{\CTerm}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma e{k_1}$} \AxiomC{$\typeGamma l{k_2}$} \AxiomC{$k_1\castrel k_2$} \LeftLabel{[TAssign]} \TrinaryInfC{$\stmtGamma{\CAssign le}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \LeftLabel{[TAbort]} \UnaryInfC{$\stmtGamma{\CAbort}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\typeGamma ek$} \AxiomC{$\stmtGamma s$} \LeftLabel{[TWhile]} \BinaryInfC{$\stmtGamma{\CWhile es}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\typeGamma ek$} \AxiomC{$\stmtGamma{s_1}$} \AxiomC{$\stmtGamma{s_2}$} \LeftLabel{[TIf]} \TrinaryInfC{$\stmtGamma{\CIf e{s_1}{s_2}}$} \end{prooftree} \end{equation*}

\begin{equation*} \begin{prooftree} \AxiomC{$\Gamma'=\Gamma[x_1\mapsto k_1,\dots,x_m\mapsto k_m]$} \AxiomC{$\stmtjudge{\Gamma'}{p}$} \LeftLabel{[TBlock]} \BinaryInfC{$\stmtGamma{\CBlock{k_1\,x_1;\dots k_m\,x_m; p}}$} \end{prooftree} \end{equation*}

The following rule [TSeqS] summarises two applications of [TSeq] and one application of [TTerm] and makes type derivations on programs more compact.

\begin{equation*} \begin{prooftree} \AxiomC{$\stmtGamma s_1$} \AxiomC{$\stmtGamma s_2$} \LeftLabel{[TSeqS]} \BinaryInfC{$\stmtGamma{\CSeq{s_1}{\CSeq{s_2}\CTerm}}$} \end{prooftree} \end{equation*}

The rules [TWhile] and [TIf] make sure that the condition expression they contain have a scalar type (i.e. are not $\CVoid$). The rule [TAssign] makes sure that the right-hand side type can be converted to the left-hand side type. This rules out a program like the following:

{ int *p; p = 5; }

Most notably however is the rule [TBlock] which administers the type environment: In order for a block to be well-typed, its constituent statements must be well-typed under the type environment that additionally contains the local variables declared in the block. Note that this also models variable hiding: Variables declared in inner blocks hide the declarations of outer blocks.

Subsection 6.6.3 Examples

Let us consider a couple of examples that put our small type system to work and compute the types of several expressions or check the well-typedness of statements.

Example 6.6.8.

Let's consider the derivation of the type of an expression with the type environment $\Gamma\defeq\{\CVar x\mapsto\CPtr\CChar,\CVar y\mapsto\CInt\}$

\begin{equation*} \begin{prooftree} \AxiomC{$\Gamma\ \CVar x=\CPtr\CChar$} \LeftLabel{[TVar]} \UnaryInfC{$\typeGamma{\CVar x}{\CPtr\CChar}$} \LeftLabel{[TIndir]} \UnaryInfC{$\typeGamma{\CIndir\CVar x}{\CChar}$} \AxiomC{$\Gamma\CVar y=\CInt$} \LeftLabel{[TVar]} \UnaryInfC{$\typeGamma{\CVar y}{\CInt}$} \LeftLabel{[TArith]} \BinaryInfC{$\qquad\typeGamma{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CInt}$} \AxiomC{$-2^{31}\le 1 \lt 2^{31}$} \LeftLabel{[TConst]} \UnaryInfC{$\typeGamma {\CConst 1}{\CInt}$} \LeftLabel{[TArith]} \BinaryInfC{$\typeGamma{\CBinary{-}{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CConst 1}}{\CInt}$} \end{prooftree} \end{equation*}

Example 6.6.9.

In this example, we consider the same expression but with a different type environment:

\begin{equation*} \Gamma\defeq\{\CVar x\mapsto\CChar,\CVar y\mapsto\CInt\} \end{equation*}

Here, we can see nicely that we are not able to prove the premise $\typeGamma{\CVar x}{\CPtr\CChar}$ with [Var] because the the type environment does not provide the type $\CPtr\CChar$ for $\CVar x\text{.}$

\begin{equation*} \begin{prooftree} \AxiomC{Error} \LeftLabel{[TVar]} \UnaryInfC{$\typeGamma{\CVar x}{\CPtr\CChar}$} \LeftLabel{[TIndir]} \UnaryInfC{$\typeGamma{\CIndir\CVar x}{\CChar}$} \AxiomC{$\Gamma\ \CVar y=\CInt$} \LeftLabel{[TVar]} \UnaryInfC{$\typeGamma{\CVar y}{\CInt}$} \BinaryInfC{$\typeGamma{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CInt}$} \AxiomC{$-2^{31}\le 1 \lt 2^{31}$} \LeftLabel{[TConst]} \UnaryInfC{$\typeGamma {\CConst 1}{\CInt}$} \LeftLabel{[TArith]} \BinaryInfC{$\typeGamma{\CBinary{-}{\CBinary{+}{\CIndir{\CVar x}}{\CVar y}}{\CConst 1}}{\CInt}$} \end{prooftree} \end{equation*}

Example 6.6.10.

Finally, let's consider the derivation of the static semantics for a more complex example in which we declare local variables in blocks.

\begin{equation*} \begin{prooftree} \AxiomC{$\vdots$} \LeftLabel{[TAssign]} \UnaryInfC{$\stmtGamma{\CAssign{\CVar x}{\CConst 3}}$} \AxiomC{$\vdots$} \LeftLabel{[TSeq]} \UnaryInfC{$\stmtjudge{\Gamma[\CVar x\mapsto\CPtr\CInt]}{\CAssign{\CVar x}{\CAddrOf{\CVar y}}}$} \LeftLabel{[TBlock]} \UnaryInfC{$\stmtGamma{\CBlock{\CPtr\CInt\,\CVar x;\ \CAssign{\CVar x}{\CAddrOf{\CVar y}}};}$} \AxiomC{$\vdots$} \LeftLabel{[TAssign]} \UnaryInfC{$\stmtGamma{\CAssign{\CVar y}{\CVar x}}$} \AxiomC{} \LeftLabel{[TTerm]} \UnaryInfC{$\stmtGamma\CTerm$} \LeftLabel{[TSeq]} \BinaryInfC{$\stmtGamma{\CAssign{\CVar y}{\CVar x}}$} \LeftLabel{[TSeq]} \BinaryInfC{$\stmtGamma{\CBlock{\CPtr\CInt\,\CVar x;\ \CAssign{\CVar x}{\CAddrOf{\CVar y}}}; \CAssign{\CVar y}{\CVar x}}$} \LeftLabel{[TSeq]} \BinaryInfC{$\stmtjudge{\{\CVar x\mapsto\CInt,\CVar y\mapsto\CInt\}\eqdef\Gamma}{ \ \CAssign{\CVar x}{\CConst 3} \CBlock{ \CPtr\CInt\,\CVar x; \ \CAssign{\CVar x}{\CAddrOf{\CVar y}} } \ \CAssign{\CVar y}{\CVar x}}$} \LeftLabel{[TBlock]} \UnaryInfC{$\stmtjudge{\emptyset}{\CBlock{ \CInt\,\CVar x; \ \CInt\,\CVar y; \ \CAssign{\CVar x}{\CConst 3} \CBlock{ \CPtr\CInt\,\CVar x; \ \CAssign{\CVar x}{\CAddrOf{\CVar y}} } \ \CAssign{\CVar y}{\CVar x}}}$} \end{prooftree} \end{equation*}

In statically type-safe languages (like C#, Java, OCaml, etc.), the semantics ensures that in exceptional situations that cannot be covered statically because they may depend on the input of a program, a certain well-defined behavior is triggered, such as throwing an exception.

An Introduction to Imperative Programming

Search Results: