Section 6.1 The Syntax of Programming Languages
First of all, we need to define what a C0 program actually is. When formalizing the syntax of a programming language, one distinguishes between two different kinds of syntaxes: The abstract syntax defines the structure of the language's programs. By structure we mean the way how one syntactical construct can be composed from other components (i.e. a while loop consists of a body and an expression). In the abstract syntax, each program corresponds to a tree, the so-called abstract syntax tree (AST). The nodes of the tree correspond to syntactical constructs and the edges denote their composition.
The concrete syntax defines how a program in a language can be encoded as a sequence of symbols (characters, digits, etc.). The concrete syntax is relevant for the programmers because they have to type in their programs somehow. When reasoning about programs formally, we focus on the abstract syntax because it captures the structure of the program and is not “contaminated” with syntactical artifacts like parentheses and other separators which are only meaningful in a linear (textual) representation of the program.
Checkpoint 6.1.3.
The abstract syntax of a language is given by a syntax definition using a table like the following:
\begin{equation*}
\begin{array}{rclll}
\text{Category} \amp \amp \text{Abstract Syntax} \amp \text{Concrete Syntax} \amp \text{Description} \\
\hline
\mathit{Expr}\ni e \amp \syndef \amp \mathtt{Add}[e_1,e_2] \amp e_1+e_2 \amp \text{Addition} \\
\amp | \amp \mathtt{Mul}[e_1,e_2] \amp e_1*e_2 \amp \text{Multiplication} \\
\amp | \amp \mathtt{Const}_c \amp c \amp \text{Constant} \\
\amp | \amp \mathtt{Var}_x \amp x \amp \text{Variable} \\
\end{array}
\end{equation*}
The column “Category” defines a specific syntactic category (expression, statement, and so on). For each category there may be multiple rows. Each row defines one syntactic construct of the respective category. The column “Abstract Syntax” defines the structure for that specific construct: A piece of abstract syntax consists of
a label (such as \(\mathtt{Add}\)) that denotes the construct. It is also used as the label of a node in an abstract syntax tree that represents the instance of such a construct in a concrete program.
a list of children given in brackets. These are given by variable names of specific categories, for example \(\mathtt{Add}[e_1, e_2]\) means that an \(\mathtt{Add}\) construct consists of two sub-expressions.
possibly some attributes given as subscripts to the label. These attributes constitute additional information for a piece of abstract syntax such as the concrete name of an identifier or the value of a constant. These are not important for the structure of the program but may be important later on to analyze, execute, or transform the program.
The column “Concrete Syntax” shows how the construct looks like in written program text. In both columns one can use variables which stand for other syntactic constructs with the same name (e.g. \(e\) for expressions). Each variable must be defined in the category column to indicate from which category a construct comes. The example above for instance defines a construct \(\mathtt{Add}\) which belongs to the category \(\Expr\) and stands for an addition expression and consists of two other constructs that are from the category \(\Expr\) as well.