Section C.20 A3: Java Collections Framework and Hashing
Short Overview over the Java Collection Framework
The Java Collections Framework is a library of of data structures. Some we have discussed in Chapter 5
Discuss the basic interfaces
Java has special syntax for Iterators, briefly discuss the Iterator interface and explain for loop.
Discussed binary search trees as an implementation for maps/sets.
Recap that implementing maps subsumes sets
New idea that is popular in practice: Use function to map keys to array indexes.
Works well if function is injective: Hardly the case in practice because key set is too large and would be dependent on array size.
Non-injective function means that we have collisions.
First idea: Use collision chains/lists to list all collisions per bucket.
Discuss simple code to search/add in a hash table with separate chaining.
Discuss the hashCode/equals problem. Refer to Java Collections Framework.
Introduce load factor. If load factor less than one and hash function scatters elements evenly, time for search constant.
Needs however resizing of hash table upon insertion based on a threshold.
Resizing of hash table requires re-hashing: keys can end up in different bins.
Mention problem of mutability: Mutating objects such that their hash value changes after they have been inserted may make them “vanish”.
Introduce probing. Chaining conceptionally nice but linked lists require more memory and provoke not so local accesses; bad for the cache.
New hash function that takes hash value and index into collision chain: determines array index.
Linear probing: simple, contiguous accesses but easy cluster formation.
Quadratic probing: more complex, not so contiguous but maybe less cluster formation. Interesting question if the it fully utilizes table, i.e. if probing hash function is surjective on array. Yes, if \(c=d=1/2\) and table size is power of two.
Final example with all three presented techniques.