Skip to main content

Chapter 9 A Simple Compiler

In this chapter, we will develop a very simple compiler from C0 to MIPS. The goal is to give some introduction into how compilers work on a very high-level and improve our understanding of how programming languages are implemented on machine code. As a side effect, we'll also be able to put some of the object-oriented modelling skills we have discussed in Section 8.10 to good use.

A compiler translates a program, given as a stream of characters, into a machine code program that is semantically equivalent to the source program. We have gained some impression on how to formally define the semantics of a programming language in Chapter 6. Here, we will not discuss the preservation of semantics formally because it is out of the scope of this text. It would require defining a formal semantics of machine code as well and then dwell into formal notions of program equivalence which is the subject of more advanced texts. Figure 9.0.1 shows the architecture of a simple compiler.

Figure 9.0.1. Schematic architecture of a very simple compiler. The ellipses indicate data. The boxes are passes of the compiler. Modern compilers typically use several other program representations in addition to the AST. They also “optimize” the program (= rewrite it in a semantics-preserving way) with the goal of improving its run time.

Lexing and parsing (which we discuss in the next section) are part of syntactic analysis, the so-called front-end of the compiler. Its goal is to check if the input character stream adheres to the concrete syntax of the programming language and, if that is the case, construct the abstract syntax tree. Type checking is a static program analysis that determines if the program is well-typed and elaborates the types for the program. Well-typedness is an important part of the static semantics of the programming language. 25  Code generation is part of the compiler's back-end and generates code for the program in some machine language. The back-ends of modern compilers can be very complicated because they perform sophisticated program analyses and transformations to optimize the performance (or size) of the machine code program. Here, we concentrate on a very simple code generation technique, called syntax-directed code generation that produces less efficient code but is very easy to understand and implement.

It may also be worthwhile mentioning that a substantial part of modern compilers is the optimizer in the “middle-end” which performs various program analyses and transformations in order to make the program more efficient. These optimizations are mostly done independent of the source and target language such that they can be reused when compiling from and to different languages.

The static semantics can comprise additional things, for example, making sure that every local variable is assigned before read.