Trees

Section 5.2 Trees

Trees are an important data structure to represent data hierarchically. Trees generalize lists in that one node can have multiple (possibly no) successors, called children. However, each node has exactly one parent node, except for the root of the tree which has no parent. Trees are used in a large variety of contexts. Most notably perhaps are they used to implement sets and maps.

Subsection 5.2.1 General Notions and Algorithms

For each tree, there is a directed graph in which the nodes of the graph are the tree nodes and there is an edge in the graph from node \(a\) to a node \(b\) if \(b\) is a child of \(a\text{.}\) A node in a tree that has no children is called a leaf. A node \(a\) is a descendant of another node \(b\) if there is a path of edges from \(a\) to \(b\) in the tree's graph. A tree is a binary tree if each node has at most two children. A tree is labelled if each node carries additional information, such as a number or a string for example. That label is often also referred to as a key. It is very common to demand that the children of a node are ordered so that it actually matters if a child is left or a right child in a binary tree, for example. So in fact, when talking about trees, we most often talk about ordered labelled trees.

Figure 5.2.1. A binary tree with three leaf nodes.

Implementation.

The most straightforward implementation of a tree is to have a struct containing fields for the label(s) and pointers to the children of a node. It is common that the number of possible children is fixed, e.g. in a binary tree there are at most two children. In that case, the child pointers can be either individual fields in that struct (like a left and right field in the case of a binary tree) or an array of constant size. A child pointer is set to NULL if the node does not have any children. If each child pointer equals NULL then the node is a leaf.

 struct binary_tree_t {
  int key;
  struct binary_tree_t *left;
  struct binary_tree_t *right;
};

struct tree_t {
  int key;
  struct tree_t *children[10];
};

Remark 5.2.2. Sets as Maps.

A set can be seen as a map where

the domain of the map is the set
the codomain of the map is a singleton set (a set with one member)
the map maps each set element to the singleton

So, often one implements the more general map case where each tree node has a key and a value. The key acts as the label of the node and the value is the value the key is mapped to. When one is just interested in a set, the value is just set to a default value (like 0 or NULL for example).

Height, Depth, Size.

The height of a node is the length of the longest path from that node to a leaf. The height of the tree is the height of root. Computing the height can, like many tree algorithms, be done nicely in a recursive fashion.

int binary_tree_height(binary_tree_t* n) {
  if (n)
    return 1 + max(binary_tree_height(n->left), 
                   binary_tree_height(n->right));
  else
    return 0;
}

The depth of a node is its distance (in edges) from the root.

The size of the tree is the number of its nodes.

Subsection 5.2.2 Binary Search Trees

A binary search tree (BST) is an ordered labelled binary tree that requires the existence of a total order on the nodes' keys. For example, the labels could be integers and the total order the standard less-than comparison.

Definition 5.2.3. Binary Search Tree Property.

A tree is a binary search tree, if for each node \(n\) with label \(\ell\) it holds that

the key of each node in the left subtree of \(n\) is less than \(\ell\)
\(\ell\) is less than the key of each node in the right subtree of \(n\)

By this definition there exist no two nodes in a binary search tree that have the same key.

Figure 5.2.4. A binary search tree. Every node left of the root is smaller than seven. Every node on the right is larger.

Searching.

Searching for a key in a binary search tree can be nicely done using a tail-recursive algorithm. First, we compare the key \(k\) that is searched with the key of the root. If they are not equal, the search tree property tells us if we must look for the key in the left or the right subtree. If \(k\) is smaller than the root's key, all nodes in right subtree have greater keys than \(k\text{,}\) so if \(k\) is in the key, it will be in the left subtree (and vice versa).

This procedure will take at most as many steps as the tree is high. If the binary search tree is balanced, the height of the tree is bounded by the binary logarithm of its size. A node is balanced, if the height of its left subtree differs by at most one from the height of its right subtree. A binary tree is balances if all its nodes are balanced.

Checkpoint 5.2.5. Balanced Binary Trees.

Prove:

For a balanced tree of height \(h\) and size \(n\text{,}\) holds:

\begin{equation*} h \leq 2\log_2 n\text{.} \end{equation*}

202507111614

Solution.

We first prove the dual formulation of the above statement:

\begin{equation*} n \geq 2^{h/2}\text{.} \end{equation*}

To do this, we introduce the minimal size of a balanced tree of height \(h\) as \(s_{min}(h)\text{.}\)

\(h = 0:\)

The tree is a single node.

\begin{equation*} s_{min}(h) := 1\text{.} \end{equation*}

\(h = 1:\)

The tree is either a root with a single child or a root node with two children. The first case has less nodes, so we define:

\begin{equation*} s_{min}(h) := 2\text{.} \end{equation*}

\(h-1,h\rightarrow h + 1:\)

By the definition of height, one subtree has height \(h-1\text{.}\) Using proof by contradiction, we can show that the other subtree has height \(h-2\) as the size \(s_{min}\) would not be minimal otherwise. By recursion, we obtain the minimal sized trees for \(h-1\) and \(h-2\text{.}\)

\begin{equation*} s_{min}(h+1) := s_{min}(h) + s_{min}(h-1) + 1\text{.} \end{equation*}

By definition of \(s_{min}\text{,}\) we have for every tree of size \(n\) and height \(h\text{:}\)

\begin{equation*} n \geq s_{min}(h)\text{.} \end{equation*}

We show \(s_{min} \geq 2^{h/2}\) to obtain \(n\geq 2^{h/2}\) by transitivity.

By complete induction on \(h\text{:}\)

\(h = 0:\)

\begin{equation*} s_{min}(h) = s_{min}(0) = 1 \geq 1 = 2^{0} = 2^{h/2}\text{.} \end{equation*}

\(h = 1:\)

\begin{equation*} s_{min}(h) = s_{min}(1) = 2 \geq \sqrt{2} = 2^{0.5} = 2^{h/2}\text{.} \end{equation*}

\(h = 2:\)

\begin{align*} s_{min}(h) \amp= s_{min}(h-1) + s_{min}(h-2) + 1\\ \amp = s_{min}(1) + s_{min}(0) + 1\\ \amp = 2 + 1 = 3 \geq 2 = 2^{1}\\ \amp = 2^{h/2} \end{align*}

\(h-1,h\rightarrow h + 1:\)

Induction hypothesis: \(s_{min}(h-1)\geq 2^{(h-1)/2} = 2^{h/2-1}\)

\begin{align*} s_{min}(h+1) \amp= s_{min}(h) + s_{min}(h-1) + 1\\ \amp = (s_{min}(h-1) + s_{min}(h-2) + 1) + s_{min}(h-1) + 1\\ \amp = 2s_{min}(h-1) + s_{min}(h-2) + 2\\ \amp \geq 2s_{min}(h-1) \\ \amp \overset{IH}{\geq} 2\cdot 2^{h/2-1}\\ \amp = 2^{h/2} \end{align*}

Lastly, we obtain from \(n\geq 2^{h/2}\) the inequality:

\begin{equation*} h \leq 2\log_2(n)\text{.} \end{equation*}

binary_tree_t* binary_tree_search(binary_tree_t* root, int k) {
  if (!root || root->key == k)
    return root;
  else if (k < root->key)
    return binary_tree_search(root->left, k);
  else
    return binary_tree_search(root->right, k);
}

Insertion.

One simple way to insert a key into binary search tree while maintaining the search tree property is to insert new nodes as leaves. to this end, we need to find the suitable current leaf under which the new node has to be inserted. This is very similar to searching for the node but instead of returning NULL when the key was not found, we return the leaf at which the search ended. Under this leaf the new node will be added to the tree. Either as its left child if the key to be inserted is smaller than the key of the leaf or as its right child otherwise.

binary_tree_t* binary_tree_insert(binary_tree_t** root, int k) {
  binary_tree_t* y = NULL;
  binary_tree_t* x = *root;

  while (x) {
    // y now points to the parent of x
    y = x;
    if (k == x->key)
      return x;
    else if (k < x->key)
      x = x->left;
    else
      x = x->right;
  }

  // The key is not in the tree, so allocate a new tree node
  binary_tree_t* new_node = malloc(sizeof(*new_node));
  new_node->left = NULL;
  new_node->right = NULL;
  new_node->key = k;

  if (y) {
    // y points to a leaf
    if (k < y->key)
      y->left = new_node;
    else 
      y->right = new_node;
  }
  else {
    // if y is NULL, then the tree is empty
    *root = new_node;
  }

  return new_node;
}

Note that insertion into a balanced binary search tree can destroy the balanced-ness property. There exist various extensions to binary search trees (such as AVL trees ⁹ or red-black trees ¹⁰) that preserve balanced-ness upon insertion and deletion by modifying the tree accordingly. These are out of scope of this book and we refer to a standard algorithms and data structures text (e.g. [4]) for more detailed information.

Subsection 5.2.3 Tries

A trie (sometimes also called prefix tree), is a search tree specifically designed to implement maps (and therefore sets, see Remark 5.2.2) where the keys are words (sequences) of ‘items’ that are drawn from a given set \(\Sigma\text{.}\) Here, a word could for example be a string (a sequence of characters) or also a number (as a sequence of digits, see Chapter 1).

One possibility to implement such a map would be to use the words directly as keys in a binary search tree. However, comparing two sequences boils down to comparing individual elements in practice. For example, consider comparing "Helmet" against "Hello": We need to compare three characters (H, e, l) before we can discriminate both words because both words share the common prefix "Hel". So comparing two sequences takes in the worst case as long as the shorter of both words. Since we might need to visit several nodes of the search tree until we either find the sequence in question or verified that it is not contained in the map, the overall worst-case runtime is the maximum depth of the BST times the length of the sequence we are looking for.

Figure 5.2.6. The search of "Hello" in a binary search tree. For every comparison, the deciding letter is underlined.

Tries reduce this worst-case runtime to the length of the sequence that is searched for by using a different kind of tree structure. The idea is that for each word in the map (set), there is exactly one path from the trie's root to a leaf and vice versa. So each node in a trie has at most \(|\Sigma|\) children and each the edge stands for one occurrence of an ‘item’ in a word. More precisely, consider a path \(\pi\) from the trie's root to a leaf and let \(n_i\) be the \(i\)-th node on this path. If \(n_i\) is not a leaf and \(n_{i+1}\) is the \(j\)-th child of \(n_i\text{,}\) then the \(i\)-th item in the word that the path corresponds to, is \(j\text{.}\)

Figure 5.2.7. The tree from above but as trie. The path \(\pi\) is marked.

To illustrate inserting and searching in a trie, we will use a trie that implements a map from ints to some other data type T. So, a word now is the sequence of bits in an int. Therefore, each node in the trie will have at most two children (each corresponding to the digits 0 and 1). An int that is in the map the trie implements is therefore represented by a path that is as long as the number of bits of an int (sizeof(int) * 8).

Searching.

When searching for a key in the map, we iteratively visit nodes in a path that starts at the root. In each step, we use the next bit in the key to identify the next child node to visit: If the next bit is 0, we pursue the child 0, if it is 1, we pursue the child 1. If the identified child node exists in the trie, we visit it, if not (indicated by a NULL pointer), the key is not in the trie and the search returns false. If we succeed to visit as many nodes as the key has bits, the key is in the map.

int_trie_t* int_trie_search(int_trie_t* root, int key) {
  int n = sizeof(key) * 8;
  while (n > 1) {
    _Assert (root != NULL, "A correctly initilized tree can not be NULL.");
    int_trie_t* next = root->next[key & 1];
    if (! next)
        return NULL;
    root = next;
    key = key >> 1;
    n--;
  }

  _Assert(n == 1, "We should be at the leaves in the end.");
  return root;
}

Insertion.

Insertion is very similar to searching. The only difference is that if we want to visit a child that is not in the trie, we do not abort and return false but create a new node and insert it.

int_trie_t* int_trie_insert(int_trie_t* root, int key, T value) {
  int n = sizeof(key) * 8; // number of bits in key
  while (n > 0) {
    _Assert(root != NULL, "A correctly initilized tree can not be NULL.");
    int_trie_t* next = root->next[key & 1];
    if (! next) {
        next = int_trie_create();
        root->next[key & 1] = next;
    }
    root = next;
    key = key >> 1;
    n--;
  }
  root->value = value;
  return root;
}

wikipedia.org

An Introduction to Imperative Programming

Search Results:

Section 5.2 Trees

Subsection 5.2.1 General Notions and Algorithms

Implementation.

Remark 5.2.2. Sets as Maps.

Height, Depth, Size.

Subsection 5.2.2 Binary Search Trees

Definition 5.2.3. Binary Search Tree Property.

Searching.

Checkpoint 5.2.5. Balanced Binary Trees.

Insertion.

Subsection 5.2.3 Tries

Searching.

Insertion.