Skip to main content

Section 8.11 The Expression Problem and the Visitor Pattern

In this section, we will investigate what is known as the “expression problem” [14]. The expression problem refers to a fundamental and old problem in software engineering that is concerned with extending a data type (like the one we used in the previous section 8.10) in two dimensions: (1) adding new variants to a data type and (2) adding new functionality to all variants. It is interesting to study the expression problem because it exposes specific differences in functional and object-oriented programming languages which both tend to support one of the two dimensions of extensions better than the other.

Let us stick to the example of modeling the AST of a small expression language (8.1) and let us reconsider the OCaml (Listing 8.10.1) and Java (Listing 8.10.3) implementations.

Subsection 8.11.1 Extensions

Now, let us discuss two extensions to this data type. The first extension is to add a new variant, let's say for representing constants in the expression. The second extension will be to add a new operation to all variants, let's say formatting the AST as an XML document. Both languages handle these extensions differently.

Adding a new Variant.

Let us add a constant to our simple language. In the Java implementation this corresponds to a new class Const that implements the Exp interface. In this class, all relevant operations are implemented as methods. The existing code does not need to be modified, hence this extension is local.

class public Const implements Exp {
    private int value;

    public Const(int value) {
        this.value = value;
    }

    public int eval(Env e) {
        return this.value;
    }

    public String toString() {
        return "" + this.value;
    }
}

In the OCaml implementation however, the extension is not local because it involves changing the existing code at various places. First, a new variant has to be added to the exp type. Then, the match branches of each existing function that operates on exp values have to be extended:

type exp =
  | Var of string
  | Add of exp * exp
  | Const of int

let rec to_string e = 
  match e with
  | Var v -> v
  | Add (l, r) -> (to_string l) ^ " + " ^ (to_string r)
  | Const i -> string_of_int i

let rec eval e st =
  match e with
  | Var v -> st v 
  | Add (l, r) -> (eval l st) + (eval r st)
  | Const i -> i

Adding a new Operation.

When adding a new operation, the locality of change is just the other way around. Let us explore this by extending our implementation by a feature that allows for formatting an AST as an XML document. In the OCaml implementation, it is sufficient to add a new function to_xml that contains the implementation of this feature for all existing variants. This is a local change because none of the existing code needs to be modified.

 | Const i -> i

let rec to_xml e =
  match e with
  | Var v -> "<var>" ^ v ^ "</var>"
  | Add (l, r) -> "<add>" ^ (to_xml l) ^ (to_xml r) ^ "</add>"
  | Const i -> "<const>" ^ (string_of_int i) ^ "</const>"

In the Java implementation however, this extension is non-local. It requires to add a new method toXML to the interface and each class that implements it.

public interface Exp {
    public int eval(Env e);
    public String toString();
    public String toXML();
}

public class Add implements Exp {
    private Exp left, right;
    // ...
    public String toXML() {
      return "<add>" + left.toXML() + right.toXML() + "</add>";
    }
}

public class Var implements Exp {
    private String name;
    // ...
    public String toXML() {
      return "<var>" + name + "</var>";
    }
}

public class Const implements Exp {
    private int value;
    // ...
    public String toXML() {
      return "<const>" + value + "</const>";
    }
}

The expression problem now is a challenge in programming language design to support both kinds of extensions in a local way.

Subsection 8.11.2 The Visitor Pattern

One way of making the second extension (adding new operations) in an object-oriented language more local and thus more bearable is the visitor pattern 24 . The visitor pattern factors out the extension of a class hierarchy to a single extension point, the so-called visitor.

Let us make a first attempt to extend the Exp data type with a new operation in a local way by reusing the example above: adding an XML printer.

public class XMLFormatter {
    public String format(Exp e) {
        if (e instanceof Const) {
            Const c = (Const) e;
            return "<const>" + c.getValue() + "</const>";
        }
        else if (e instanceof Var) {
            Var v = (Var) e;
            return "<var>" + v.getName() + "</var>");
        }
        else if // etc.
    }
}

This code isolates the extension in one class XMLFormatter. However, the if-then-else case distinction using instanceof seems very awkward because it involves dynamic casts which should only be a last resort. This code is also bad for several other reasons. For one and maybe most problematic: it is easy to forget cases without causing any warning by the compiler. Furthermore, the code is inefficient, since potentially many instanceof conditions need to be checked before the matching case is found. Lastly, the order of the ifs affects the run time in ways that can lead to surprising performance behaviors.

The better solution is to cleanly separate the code for each subclass into its own method:

public class XMLFormatter {
    public String format(Const c) {
        write("<const>" + c.getValue() + "</const>");
    }
    public String format(Var v) {
        write("<var>" + v.getName() + "</var>");
    }
    // the remaining cases...
}

We now however need to ensure that these methods are also called. This is problematic since we might only know the static type Exp and not the concrete type at the desired call site:

public class Main {
    public static void main(String[] args) {
        Exp e = constructExpressionFromInput();
        System.out.println(new XMLFormatter().format(e));
    }
}

This code will not work since no format(Exp) method is available in XMLFormatter. We would therefore again need to introduce a case distinction to obtain the concrete type of the objects, insert an explicit type cast, and then call the corresponding method which would bring us back to square one.

We can however eliminate the case distinction with an indirection. The object e from the previous code section does know its concrete type. We therefore extend the interface Exp with a single additional method accept, which only calls another method visit of an interface ExpVisitor with this as an argument.

public interface ExpVisitor<T> {
    T visit(Const c);
    T visit(Var c);
    T visit(Add c);
}

public interface Exp {
    <T> T accept(ExprVisitor<T> v);
}

public class Const implements Exp {
    // ...
    <T> public T accept(ExpVisitor<T> v) {
        return v.visit(this);
    }
}

public class Var implements Exp {
    // ...
    <T> public T accept(ExpVisitor<T> v) {
        return v.visit(this);
    }
}

public class Add implements Exp {
    // ...
    <T> public T accept(ExpVisitor<T> v) {
        return v.visit(this);
    }
}

XMLFormatter now implements the ExpVisitor interface and provides methods for every case:

public class XMLFormatter implements ExpVisitor<String> {
    // ...
    public String visit(Const c) {
        return "<const>" + c.getValue() + "</const>";
    }

    public String visit(Var v) {
        return "<var>" + v.getName() + "</var>";
    }

    public String visit(Add b) {
        return "<add>" 
             + b.getLeft().accept(this)
             + b.getRight().accept(this)
             + "</add>";
    }
}

The main method can then pass an XMLFormatter object as an argument to the accept method:

public class Main {
    public static void main(String[] args) {
        Exp e = constructExpressionFromInput();
        e.accept(new XMLFormatter());
    }
}

The whole point of the visitor is to use dynamic type information to select the appropriate accept method in the visitor object. At first sight, it seems superfluous to implement the accept method in every subclass of Exp. However, this is where the selection is made: The static type of this in each method of a class \(T\) is \(T\text{.}\) In the accept methods, the static type of this is used to select one of the overloaded visit methods. Hence, if the dynamic type of an object \(o\) is \(T\) in some execution, o.accept(visitor) will call the method visit(T obj) of the visitor class. This technique avoids an instanceof if-then-else cascade by using the accept/visit method call sequence to promote dynamic type information into static one.

To summarize, object-oriented languages have an inherent weak spot when extending a class hierarchy with new operations. They force the programmer to perform non-local changes to the code because the new operation has to be added to each subclass. The visitor pattern is suitable to solve this problem because it allows to factor out the new functionality into a separate visitor class. It uses the accept/visit call sequence to dispatch the desired visit method in the visitor object based on the dynamic type of the “visited” object. Thereby it nicely implements a match-like operation over the dynamic type of an object without resorting to if-then-else cascades.

The visitor pattern is called a pattern because it appears in a larger library of so-called design patterns that are best practices in the design of (object-oriented) software to solve specific, recurring problems.