One very helpful component of the Java standard library is the extensive Java Collections Framework 18 that provides implementations of many commonly used data structures. Some of them we have already discussed in Chapter 5. In this section, we only give a very high-level overview and leave out many fine-grained details that the avid reader can obtain from the official documentation.
The collections framework is organized as a class hierarchy that provides interfaces for many common abstract data types and several implementations of them. The top interface is called Collection<T> 19 which essentially stands for a multi-set, i.e. something similar to a set except for the fact that elements can appear multiple times in it. It is different from a list in that elements are not ordered. A collection provides, among other things, the following operations: add, remove, (check if it) contains (an element), size, iterate (over all elements):
interface Collection<E> {
boolean add(E elem); // Ensure that the collection
// contains element after call
boolean remove(E elem); // Remove a single instance of
// element if coll contains it
void clear(); // Remove all elements
int size(); // Get size
Iterator<E> iterator(); // Get an iterator
// more methods ...
}
From the base interface collection, more detailed interfaces and classes are derived: lists, sets, queues, deques (double-ended queues). Queues are data structures where elements can be added to and removed from. Depending on the kind of queue this happens in a first-in last-out style similar to a list or according to a completely different policy as in priority queues. We will not go into further detail about queues here. In this text, we focus on lists and sets because we have already discussed some of them. We have discussed array lists in Subsection 5.1.1 and doubly-linked lists in Subsection 5.1.3. The Java implementation follows our discussion quite closely. The tree sets are self-balancing binary search trees and hash sets are based on the techniques discussed in Section 5.3.
Lists 20 extend collections by imposing an order on the inserted elements. Adding will append to the list, removing will remove the first instance of the list. Additionally, one can also obtain the i-th element of a list.
interface List<E> implements Collection<E> {
E get(int index);
// more methods ...
}
Sets 21 don't add any new relevant methods but enforce “set semantics” most importantly that adding an element that is already contained a second time does not change the set. Furthermore, there is also no get(int index); method because sets are not ordered. 22
Subsection8.9.1Maps
A map assigns elements from one set, so-called keys, an element from another set, called the value set. Similar to mathematical maps, each key can be assigned to at most one value. Sets can be seen as a special case of maps where the value set is a singleton set (containing only one element).
In the Java collection framework, maps are no collections but form a class hierarchy of their own with the interface Map<K, V> at its top.
interface Map<K, V> {
V put(K key, V value); // Associate key with val
// Return old association or null
boolean remove(K key); // Remove association from key
int size(); // Number of keys in map
V get(K key); // Get value for key or null
boolean containsKey(K key); // check if there's an assoc for key
}
Subsection8.9.2Hash Sets
We discuss data structures and algorithms for hashing in Section 5.3. Here, we discuss specific details that are relevant to use the hash sets and maps in the Java collection framework.
When adding an object o to a hash table (either a set or a map), Java invokes o.hashCode() (which is defined in Object) to compute the hash code for the object. Since Java collections also use equals when searching for an entry in a collection, equals and hashCode have to be consistent in the following way:
Definition8.9.2.hashCode Consistency.
Assume that q and p are both objects of a class \(T\text{.}\) Then, whenever p.equals(q) == true it must hold that p.hashCode() == q.hashCode().
Note that the opposite is in general not always true since it would imply that \(T\) is isomorphic to int.
As a rule of thumb, you should, whenever you want to override equals or hashCode also override the other and check that both are consistent with respect to Definition 8.9.2.
To understand why it is important to maintain hashCode consistency, consider this class which violates hashCode consistency:
public class Fraction {
private int numerator, denominator;
public boolean equals(Object o) {
if (! (o instanceof Fraction))
return false;
Fraction f = (Fraction) o;
return this.numerator * f.denominator
== f.numerator * this.denominator;
}
public int hashCode() { return this.numerator; }
}
The two fractions \(1/2\) and \(2/4\) are clearly equal according to the implementation of equal in Fraction. However, in each hash table longer than 2, they end up in different buckets which has the effect that after inserting \(1/2\text{,}\)contains(new Fraction(2, 4)) will return false.
Subsection8.9.3Iteration
One aspect that is particularly noteworthy about collections is iteration. Each collection can create iterators which are objects that capture the state of an iteration over all elements in a collection. Based on the particular collection (e.g. list, set, etc.) the order of iteration is defined or not. An iterator 23 provides three main methods:
interface Iterator<E> {
boolean hasNext(); // check, if the iterator is valid
E next(); // if valid, return current element
// and proceed to next
void remove(); // if valid, remove current element
// and proceed to next
}
For example, the iterator of an array list could look like this:
public class ArrayList<E> implements List<E> {
private E[] elements;
private int size;
// ...
public Iterator<E> iterator() {
return new Iterator() {
private int i = 0;
public E next() {
return elements[i++];
}
boolean hasNext() {
return i < size;
}
// ...
};
}
}
Since iteration is very frequent, Java offers a special for-loop to iterate over collections (and also arrays):
Set<Vec2> s = ...;
for (Vec2 v : s) {
System.out.println(v.length());
}
This is (more or less) equivalent to this more verbose piece of code:
Set<Vec2> s = ...;
for (Iterator<Vec2> it = s.iterator(); it.hasNext(); ) {
Vec2 v = it.next();
System.out.println(v.length());
}
Subsection8.9.4Genericity
The collections framework extensively uses genericity in the form of type variables. These provide an upper bound on the type of the objects that can be added to the collection. For example:
Set<Vec2> s = new HashSet<Vec2>();
The type Set<Vec2> makes sure that only objects that are a Vec2 can be added to s. Accordingly, when e.g. iterating over the set, the type variable ensures that the type of each object in the set is at least Vec2.
Because type variables have only been added to Java later, they have not made it into Java byte code. The Java compiler replaces type variables by Object when compiling the Java program into Java byte code in a process called type erasure. This is the cause of some inconsistencies that occasionally cause trouble. Most prominently has Java been designed with covariant arrays, i.e. you can write Vec2[] o = new PolarVec2[10];. This requires the language to perform dynamic type checks when you assign to an array. For example, when writing o = new CartesianVec2(...); an ArrayStoreException has to be thrown. This in turn requires that the array knows its element type at run time to be able to perform this check. However, since type variables are erased to Object, you cannot create arrays with type variables, i.e. the following is not possible: T[] a = new T[10];