Section 8.10 Using Object Orientation Properly
Object-oriented programming languages help to solve an important problem for developing large software systems: easy extensibility. “Easy” here means that extending the system only requires changes at a small number of confined program locations. This is important since modifications that address many program locations are very prone to mistakes: A developer that performs such modifications needs to understand the entire code affected by the changes to ensure that the modification is correct and complete. Object-oriented languages facilitate this by providing means that, if used properly, confine the required changes for an extension of the program. The following properties of object-oriented languages are central for this goal:
- The separation of interface and implementation is helpful for implementing abstract data types. This supports extensibility since code can refer to the interface rather than a concrete implementation. Subtyping ensures that all extensions provide the functionality required by the interface.
- The concentration of data and corresponding code in a class prevents implementation details from being scattered among large parts of the code and from being used by code parts that actually do not need them.
We want to discuss these aspects now by means of a simple example that we implement in functional (OCaml), imperative (C), and an object-oriented (Java) language. The goal of looking at three different languages from three different “paradigms” is to contrast the support each language offers for implementing our small example and to compare and relate object-oriented programming to other programming paradigms.
Our program shall represent the abstract syntax trees of a simple arithmetic expression language. This language consists only of variables and the addition of expressions:
\begin{equation}
\begin{array}{rclll}
\text{Category} \amp \amp \text{Abstract Syntax} \amp \text{Concrete Syntax} \amp \text{Description} \\
\hline
\mathit{Exp}\ni e \amp \syndef \amp \mathtt{Add}[e_1,e_2] \amp e_1+e_2 \amp \text{Addition} \\
\amp | \amp \mathtt{Var}_x \amp x \amp \text{Variable} \\
\end{array}\tag{8.1}
\end{equation}
This example is representative for many situations in programming: There is a category of “things” and different variants of them. Here the category is the syntactical category of expressions and the variants are addition and occurrence of a variable. But you can also think of the category “element of a graphical user interface” and the variants would be “button”, “checkbox field”, “text entry field” and so on. Or, just to give another example, the category could be “input/output stream” and the variants could be “file”, “inter-process pipe”, “network socket”, and so on.
Coming back to our example, we want to provide two operations for these expressions: Creating a textual representation of an expression as a string and evaluating an expression. Listing 8.10.1, Listing 8.10.2, and Listing 8.10.3 show the implementation of this data structure and the operations in OCaml, C, and Java.
OCaml.
Let us first consider the implementation in OCaml. Functional languages traditionally have good support for implementing data types like the one we need for the AST of our expression language. They provide Algebraic Data Types (not to be confused with abstract data types) which allow the programmer to specify the variants of which the data type is composed. Here
Add
and Var
. The operations that operate on values of that data type typically use match
constructs to discriminate the individual cases.C.
Let us now consider the implementation in C (Listing 8.10.2). The C implementation exposes the low-level details of algebraic data types and shows how they might be implemented in a functional language “under the hood”. The struct
exp_t
contains a union containing the data for the two variants of our ADT. The member field op
determines whether the expression is an addition or the occurrence of a variable. Which variant is represented is set by the corresponding constructor (exp_init_var
and exp_init_add
).The implementations for expression evaluation (in
exp_eval
) and printing (in exp_print
) use this field to execute the corresponding variant of the code. These two functions thus contain the corresponding code for all expression variants. A variant's implementation is therefore scattered over many places in the C code: The enum that lists all variants, a member field in the union for the data and a branch in every function implementing an operation of the data type.Since C does not have inheritance, the
exp_t
struct collects fields for all subclasses. The lack of classes requires branches in case distinctions for the method implementations (see exp_eval
and exp_print
). These functions can access all fields, meaning that there is no encapsulation between the Add
and Const
classes.Java.
The Java implementation uses the object-oriented way of implementing algebraic data types. An interface declares all operations that can be applied to expressions. Every class implementing this interface corresponds to a variant. Every aspect of this variant (implementation of operations, initialization, data) are gathered in the class. Subtyping ensures that such a class can be used whenever an object of type
Exp
is expected. The value of the field op
in the C implementation corresponds to the concrete type of the object in the Java implementation. The resolution of the overridden methods replaces the case distinctions in the C functions exp_eval
and exp_print
. The code that handles this internally is, without further ado by the programmer, inserted automatically by the compiler.