Translation Units

Section 4.5 Translation Units

A C program usually consists of multiple translation units. A translation unit is a file with the file ending .c. The C compiler translates the functions of a translation unit into object code, which it stores in a binary file with the file ending .o. In a Unix system, the command

$ cc -o x.o -c x.c

translates the translation unit x.c into the binary file x.o. The C compiler can than link multiple such translated translation units to an executable file:

$ cc -o prg x.o y.o z.o

Here, prg is the name of the to be produced executable program and can be chosen freely. Afterwards, the current directory contains a file prg that can be started with

$ ./prg

Remark 4.5.1.

Separating the translation into these steps is helpful for larger projects for which the translation can take a long time: If only one translation unit is changed after a previous build, the compiler only needs to re-compile the changed translation unit and perform the final linking step. Additionally, separate translation units can be translated to object code independently, which makes this step easy to parallelize. This can speed up large builds considerably on modern systems.

The C compiler can find errors in the program code and issue warnings in both stages of the translation process. While the warnings do not abort the translation process, they should be considered carefully as they can hint towards subtle programming errors. A good C program should be translated by the compiler without warnings.

Subsection 4.5.1 main

To successfully build an executable, exactly one contributing translation unit needs to contain a function with the name main. Program execution starts with this function. Unix programs (and, consequently, C programs) can be started with arguments. These are given to the main function in the form of two parameters, argc and argv. argc contains the number of the provided program arguments, including the program name as mandatory first argument. The actual character strings of the arguments are available in the argv array. A character string, usually just called string in C is a null-terminated sequence of characters. Strings are referred to by the address of their first character. argv is therefore an array of pointers to the first character of each argument (see Figure 4.5.2).

Figure 4.5.2. Depiction of the variable `argv` when starting the program as `./factorial 5`.

For an example, consider a main function that calls the factorial function from the previous section with an argument obtained from the command line.

#include <stdio.h>
#include <stdlib.h>

/* here is the declaration of the factorial function  */

int main(int argc, char *argv[]) {
    if (argc <= 1) {
        fprintf(stderr, "syntax: %s <value>\n",
                argv[0]);
        return 1;
    }

    int n = atoi(argv[1]);
    int r = factorial(n);
    printf("%u\n", r);
    return 0;
}

Listing 4.5.3. A main function for the factorial function.

We build this program with the following commands:

$ cc -o factorial.o -c factorial.c
$ cc -o factorial factorial.o

Then, the following program execution will fail:

$ ./factorial
syntax: ./factorial <value>

The provided message tells us to provide a number whose factorial the program should compute. The following invocation will produce the desired result:

$ ./factorial 5
120

The main function first checks whether an argument was provided. This is the case if the value of argc is greater than 1 (since the program name is always the first parameter.) If no argument was given, the program prints an explanatory message to the user and terminates with the value 1. Otherwise, the first argument (a string of characters) is converted to an integer number and its factorial is computed. The program displays the result via printf and then terminates successfully with the value 0.

Remark 4.5.4.

In Unix, every program can provide an “exit code” upon termination. In a C program, this is the return value of the main function. By convention, an exit code of 0 signifies a successful execution, whereas other numbers can encode different errors.

Subsection 4.5.2 Calling Functions from Other Translation Units

Let us assume that we want to separate the main function and the factorial function into different translation units. It is often good practice to bundle functions that are thematically connected, for example because they operate on similar data, into their own translation unit. We usually separate the main function from other functions since it contains mostly argument handling and gives the relevant arguments to the other functions. The factorial function could be reused in a different project where factorials need to be computed; our main function less so. Therefore, it is reasonable to separate the functions into two translation units, which are compiled separately.

#include "factorial.h"

int factorial(int n) {
    int res = 1;
    while (n > 1) {
        res = res * n;
        n   = n - 1;
    }
    return res;
}

(a) factorial.c

#ifndef FACTORIAL_H
#define FACTORIAL_H
int factorial(int n);
#endif /* FACTORIAL_H */

(b) factorial.h

#include <stdio.h>
#include <stdlib.h>

#include "factorial.h"

int main(int argc, char *argv[]) {
    if (argc <= 1) {
        fprintf(stderr,
            "syntax: %s <value>\n",
            argv[0]);
        return 1;
    }

    int n = atoi(argv[1]);
    int r = factorial(n);
    printf("%u\n", r);
    return 0;
}

Figure 4.5.5. Factorial function and main function in two separate translation units. The header file factorial.h contains the prototype of the function factorial. The preprocessing directives (#ifdef, etc.) ensure that the file content is only included once per translation unit. While not necessary here, this convention becomes important when header files include other header files, to break infinite recursive include sequences.

When we delete the factorial function from the translation unit in Listing 4.5.3, the compiler rejects the translation unit since it does not know factorial's type. For a successful translation, the compiler needs to know the type of every called function. ⁷ The type of factorial can be established in the main.c translation unit by providing the prototype of factorial. The prototype of a function consists of its name, its return type, and the types of its parameters:

        int factorial(int);

This is commonly called a function declaration, in contrast to a function definition where additionally, the function's code is provided in a body. In practice, we do not manually duplicate the prototype of every function into every translation unit in which it is used. On the one hand, this would require writing a lot of redundant code. On the other hand, it would be prone to errors since all translation units would need to be adjusted if we change the function, e.g., by adding or changing a parameter. For this reason, we create header files that are included by the C preprocessor.

The C preprocessor is a separate program that is invoked by the C compiler before it performs the actual translation. It transforms a text into a new text by expanding preprocessing directives. All preprocessing directives start with a hash sign (#). The directive #include "x.h" for example interrupts the preprocessing of the current file, preprocesses the file x.h, and then resumes preprocessing the original file. As a result, in our example in Figure 4.5.5, the preprocessing inserts the content of factorial.h into both translation units, factorial.c and main.c.

Subsection 4.5.3 Makefiles

In practice, projects can easily consist of hundreds to thousands of translation units. To avoid building them all by hand, there is the Unix tool make. It operates based on a file with the name Makefile, in which we specify how to build the project. This description contains the dependencies between the involved files and a description how to produce files from their prerequisites.

factorial: factorial.o main.o
	cc -o $@ $^

main.o: main.c factorial.h
factorial.o: factorial.c factorial.h

%.o: %.c
	cc -o $@ -c $<

clean:
	rm -f factorial *.o

Listing 4.5.6. A simple Makefile for the factorial program.

The first two lines of the Makefile specify that we need the files factorial.o and main.o to build the file factorial, and that the latter file is built from the former two files with the command cc -o factorial factorial.o main.o. The last two lines determine that any file ending with .o is built from a similarly named file with ending with .c. The command cc -o x.o -c x.c performs this translation.

The placeholders in the build rules have the following meaning:

$@: the “target” of the rule, i.e., the text on the left of the colon
$<: the first “prerequisite” of the target, i.e., the first word on right of the colon
$^: all prerequisites of the target, i.e. all words on right of the colon

Usage of make is not restricted to C. It is a general tool to describe build processes and dependencies. It is however most commonly used for C projects.

The parameter types contained in the function's type tell the compiler which parameters are to be passed in which register (Subsection 2.8.2). All translation units that declare or call a function need to agree on the same prototype to ensure that the code generated for the call sites interacts correctly with the function.

An Introduction to Imperative Programming

Search Results: