The Java Collections Framework

Section 8.9 The Java Collections Framework

One very helpful component of the Java standard library is the extensive Java Collections Framework ¹⁷ that provides implementations of many commonly used data structures. Some of them we have already discussed in Chapter 5. In this section, we only give a very high-level overview and leave out many fine-grained details that the avid reader can obtain from the official documentation.

The collections framework is organized as a class hierarchy that provides interfaces for many common abstract data types and several implementations of them. The top interface is called Collection<T>¹⁸ which essentially stands for a multi-set, i.e. something similar to a set except for the fact that elements can appear multiple times in it. It is different from a list in that elements are not ordered. A collection provides, among other things, the following operations: add, remove, (check if it) contains (an element), size, iterate (over all elements):

interface Collection<E> {
  boolean add(E elem);     // Ensure that the collection
                           // contains element after call
  boolean remove(E elem);  // Remove a single instance of 
                           // element if coll contains it
  void clear();            // Remove all elements
  int size();              // Get size
  Iterator<E> iterator(); // Get an iterator
  // more methods ...
}

From the base interface collection, more detailed interfaces and classes are derived: lists, sets, queues, deques (double-ended queues). Queues are data structures where elements can be added to and removed from. Depending on the kind of queue this happens in a first-in last-out style similar to a list or according to a completely different policy as in priority queues. We will not go into further detail about queues here. In this text, we focus on lists and sets because we have already discussed some of them. We have discussed array lists in Subsection 5.1.1 and doubly-linked lists in Subsection 5.1.3. The Java implementation follows our discussion quite closely. The tree sets are self-balancing binary search trees and hash sets are based on the techniques discussed in Section 5.3.

Figure 8.9.1. Coarse overview over some of the most important classes in the Java Collections Framework. We left out some intermediary classes and other data structures we have not discussed here.

Lists ¹⁹ extend collections by imposing an order on the inserted elements. Adding will append to the list, removing will remove the first instance of the list. Additionally, one can also obtain the i-th element of a list.

interface List<E> implements Collection<E> {
  E get(int index);
  // more methods ...
}

Sets ²⁰ don't add any new relevant methods but enforce “set semantics” most importantly that adding an element that is already contained a second time does not change the set. Furthermore, there is also no get(int index); method because sets are not ordered.²¹

Subsection 8.9.1 Maps

A map assigns elements from one set, so-called keys, an element from another set, called the value set. Similar to mathematical maps, each key can be assigned to at most one value. Sets can be seen as a special case of maps where the value set is a singleton set (containing only one element).

In the Java collection framework, maps are no collections but form a class hierarchy of their own with the interface Map<K, V> at its top.

interface Map<K, V> {
  V put(K key, V value);  // Associate key with val
                          // Return old association or null
  boolean remove(K key);  // Remove association from key
  int size();             // Number of keys in map
  V get(K key);           // Get value for key or null
  boolean containsKey(K key); // check if there's an assoc for key
}

Subsection 8.9.2 Hash Sets

We discuss data structures and algorithms for hashing in Section 5.3. Here, we discuss specific details that are relevant to use the hash sets and maps in the Java collection framework.

When adding an object o to a hash table (either a set or a map), Java invokes o.hashCode() (which is defined in Object) to compute the hash code for the object. Since Java collections also use equals when searching for an entry in a collection, equals and hashCode have to be consistent in the following way:

Definition 8.9.2. hashCode Consistency.

Assume that q and p are both objects of a class \(T\text{.}\) Then, whenever p.equals(q) == true it must hold that p.hashCode() == q.hashCode().

Note that the opposite is in general not always true since it would imply that \(T\) is isomorphic to int.

As a rule of thumb, you should, whenever you want to override equals or hashCode also override the other and check that both are consistent with respect to Definition 8.9.2.

To understand why it is important to maintain hashCode consistency, consider this class which violates hashCode consistency:

public class Fraction {
  private int numerator, denominator;

  public boolean equals(Object o) {
    if (! (o instanceof Fraction))
      return false;
    Fraction f = (Fraction) o;
    return this.numerator * f.denominator
        == f.numerator * this.denominator;
  }

  public int hashCode() { return this.numerator; }
}

The two fractions \(1/2\) and \(2/4\) are clearly equal according to the implementation of equal in Fraction. However, in each hash table longer than 2, they end up in different buckets which has the effect that after inserting \(1/2\text{,}\) contains(new Fraction(2, 4)) will return false.

Subsection 8.9.3 Iteration

One aspect that is particularly noteworthy about collections is iteration. Each collection can create iterators which are objects that capture the state of an iteration over all elements in a collection. Based on the particular collection (e.g. list, set, etc.) the order of iteration is defined or not. An iterator ²² provides three main methods:

interface Iterator<E> {
  boolean hasNext(); // check, if the iterator is valid
  E next();          // if valid, return current element 
                     // and proceed to next
  void remove();     // if valid, remove current element
                     // and proceed to next
}

For example, the iterator of an array list could look like this:

public class ArrayList<E> implements List<E> {
  private E[] elements;
  private int size;

  // ...

  public Iterator<E> iterator() {
    return new Iterator() {
      private int i = 0;
      public E next() {
        return elements[i++];
      }

      boolean hasNext() {
        return i < size;
      }

      // ...
    };
  }
}

Since iteration is very frequent, Java offers a special for-loop to iterate over collections (and also arrays):

Set<Vec2> s = ...;

for (Vec2 v : s) {
  System.out.println(v.length());
}

This is (more or less) equivalent to this more verbose piece of code:

Set<Vec2> s = ...;

for (Iterator<Vec2> it = s.iterator(); it.hasNext(); ) {
  Vec2 v = it.next();
  System.out.println(v.length());
}

Subsection 8.9.4 Genericity

The collections framework extensively uses genericity in the form of type variables. These provide an upper bound on the type of the objects that can be added to the collection. For example:

Set<Vec2> s = new HashSet<Vec2>();

The type Set<Vec2> makes sure that only objects that are a Vec2 can be added to s. Accordingly, when e.g. iterating over the set, the type variable ensures that the type of each object in the set is at least Vec2.

Because type variables have only been added to Java later, they have not made it into Java byte code. The Java compiler replaces type variables by Object when compiling the Java program into Java byte code in a process called type erasure. This is the cause of some inconsistencies that occasionally cause trouble. Most prominently has Java been designed with covariant arrays, i.e. you can write Vec2[] o = new PolarVec2[10];. This requires the language to perform dynamic type checks when you assign to an array. For example, when writing o = new CartesianVec2(...); an ArrayStoreException has to be thrown. This in turn requires that the array knows its element type at run time to be able to perform this check. However, since type variables are erased to Object, you cannot create arrays with type variables, i.e. the following is not possible: T[] a = new T[10];

docs.oracle.com

There is however an interface OrderedSet.