Skip to main content

Section 5.4 Dynamic Programming

A very common technique in algorithm design is divide and conquer. Hereby, the problem to solve is split into several sub-problems that are solved individually. After they have been solved, their solutions are combined to a solution into the original problem. A prominent example for a divide-and-conquer algorithm is merge sort. Merge sort splits the array in half, applies itself recursively to each of the halves and recombines the two sorted halves into a fully sorted array using a linear-time algorithm.

In some problems we want to solve, there is however the situation that some of the sub-problems appear several times, i.e. we have overlapping sub-problems. A straightforward recursive implementation would recompute their solution over and over again. To avoid unnecessary recomputation, we can use memory to memoize the result of the first solution of such a subproblem and reuse the memoized value in any further situation that solution of the sub-problem is required.

Dynamic programming is a programming technique to solve certain mathematical optimization problems using memoization. An optimization problem is a problem where there are multiple valid solutions of which each has a certain cost. One is then interested in finding the best solution with respect to that cost. For example, in route planning there may be several routes that get you from \(s\) to \(t\text{.}\) However each route has a certain length and so there will be shorter and longer routes. Hence, routing is an optimization problem because one is interested in finding the shortest route.

Dynamic programming is applicable if the optimization problem has so-called optimal substructure. This means that problem can be, in a way similar to divide-and-conquer, partitioned into sub-problems and the optimal solution of the original problem can be efficiently obtained from the optimal solutions of the sub-problems. Finding the shortest route/path from a node \(s\) in a directed graph to another node \(t\) is an optimization problem that possesses optimal substructure: Having found the shortest path from \(s\) to each predecessor of \(t\) allows us to quickly identify the shortest path to \(t\) itself by locally considering all incoming edges of \(t\) (see Figure 5.4.1).

Figure 5.4.1. We can compute the length of the shortest path from \(s\) to \(t\) from the lengths of the shortest paths to \(t\)'s predecessors.

We will first discuss two examples that have overlapping sub-problems to investigate how we can replace straightforward recursion using memoization. Then, we will consider some optimization problems that have optimal substructure and devise dynamic programming algorithms to solve them.

Subsection 5.4.1 Computing Fibonacci Numbers

We start with a simple example that has overlapping sub-problems: Computing Fibonacci numbers. These are recursively defined as follows:

\begin{equation*} F_0\defeq 0\qquad F_1\defeq 1\qquad F_n\defeq F_{n-1}+F_{n-2}\text{ for }n>1 \end{equation*}

A first straightforward recursive implementation looks as follows:

 unsigned fib(unsigned n) {
  if (n <= 1)
    return n;
    return fib(n - 1) + fib(n - 2);

Let's consider the call fib(6). Figure 5.4.2 shows the call tree of fib(6).

Figure 5.4.2. The call tree of fib(6). Each node corresponds to an argument to a call to fib. An edge indicates that the call with the upper argument calls fib with the lower argument.

The number of calls fib(n) creates is \(F_n\text{.}\) For example, \(F_{45}=1,836,311,903\text{.}\) So, assuming that one call takes 10 nanoseconds, fib(45) will take 45 seconds to compute. This can be drastically accelerated because the simple recursive scheme computes the same calls multiple times. For example, fib(4) is computed twice. By memorizing the result of the first call, all subsequent calls with the same argument can be avoided.

unsigned fib_memoize_c(unsigned n, int* table) {
  if (n <= 1)
    return n;
  else if (table[n] == 0)
    table[n] = fib_memoize_c(n - 1, table)
             + fib_memoize_c(n - 2, table);
  return table[n];

unsigned fib_memoize(unsigned n) {
  unsigned table[n+1];
  memset(table, 0, sizeof(table));
  return fib_memoize_c(n, table);

This version uses a table to memoize all results of recursive calls and checks if the result of a call has already been computed in the past. If so, that value is read from the table and not recomputed. However, when looking at the computation of the Fibonacci numbers more closely, one observes that we only need the last two values, so memoizing everything is not necessary. We can achieve the same with an iterative program that uses only two local variables:

unsigned fib_iterative(unsigned n) {
  unsigned f1 = 1;
  unsigned f2 = 0;
  unsigned f  = n;
  while (n-- > 1) {
    f = f1 + f2;
    f2 = f1;
    f1 = f;
  return f;

Subsection 5.4.2 Computing Binomial Coefficients

The binomial coefficient of two natural numbers \(k\le n\) is defined as:

\begin{equation} \binom nk \defeq \frac{n!}{k!(n-k)!}\qquad\text{with } n! \defeq n\cdot (n-1)\cdots 2\text{ and } 0! \defeq 1\tag{5.1} \end{equation}

To compute the binomial coefficient by the formula above, we will need

\begin{equation*} (n - 2) + (k - 2) + (n- k -2) + 2 = 2n -4 \end{equation*}

multiplications. This can be optimized a little by the following observation: The terms \(k!\) and \((n-k)!\) are intermediate results when computing \(n!\text{.}\) Hence, we can memorize their value and re-use it in the computation of \(n!\) like so:

\begin{equation*} \binom nk = \frac{n(n-1)\cdots (n-m+1)}{m!}\qquad\text{with } m \defeq \min(k, n-k)\le n/2 \end{equation*}

Then, we only need \((n-(n-m+1))+(m-2) + 1=2m-2\) multiplications. This means \(n-2\) multiplications in the worst case because \(m\lt n/2\text{.}\)

However, the nominator and denominator can get quite large when computing the binomial coefficient this way. If we assume that we compute with unsigned ints of 32-bit length, the largest \(n\) with \(n!\lt 2^{32}\) is \(n=12\text{.}\) We can empirically determine that the largest binomial coefficient that we can compute with the above equation without overflow is \(\binom{17}{8}=24310\text{.}\) However, the largest binomial coefficient we can represent with 32-bit is way larger.

One way of making the computation of larger binomial coefficients feasible and efficient is to use the following recurrence equation

\begin{equation} \binom nk = \binom{n-1}{k-1}+\binom{n-1}{k}\qquad\qquad \binom n0=\binom nn=1\tag{5.2} \end{equation}

that directly leads to what is known as Pascal's triangle:

\begin{equation*} \begin{array}{r|rrrrrrrrr} \amp k=0 \amp 1 \amp 2 \amp 3 \amp 4 \amp 5 \amp 6 \amp 7 \amp \cdots \\ \hline n = 0 \amp 1 \amp \amp \amp \amp \amp \amp \amp \amp \\ 1 \amp 1 \amp 1 \amp \amp \amp \amp \amp \amp \amp \\ 2 \amp 1 \amp 2 \amp 1 \amp \amp \amp \amp \amp \amp \\ 3 \amp 1 \amp 3 \amp 3 \amp 1 \amp \amp \amp \amp \amp \\ 4 \amp 1 \amp 4 \amp 6 \amp 4 \amp 1 \amp \amp \amp \amp \\ 5 \amp 1 \amp 5 \amp 10\amp 10\amp 5 \amp 1 \amp \amp \amp \\ 6 \amp 1 \amp 6 \amp 15\amp 20\amp 15\amp 6 \amp 1 \amp \amp \\ 7 \amp 1 \amp 7 \amp 21\amp 35\amp 35\amp 21\amp 7 \amp 1 \amp \\ \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \\ \end{array} \end{equation*}

These recursive equations directly yield a recursive program:

unsigned int binomial(unsigned int n, unsigned int k) {
    if (k > n)
        return 0;
    if (k == 0 || k == n)
        return 1;
    return binomial(n - 1, k - 1) + binomial(n - 1, k);

However, as in the previous section, some of the partial results are re-computed multiple times, because

\begin{equation*} \binom n{k+1}=\binom{n-1}k+\binom{n-1}{k+1} \end{equation*}

and the first summand is also used to to compute \(\binom nk\text{.}\) Without memoization, this program performs exponentially many calls in \(n\text{.}\) Memoizing the partial results then yields a more efficient algorithm that we develop in the following.

From (5.1) follows that

\begin{equation*} \binom nk=\binom n{n-k} \end{equation*}

which is equivalent to saying that Pascal's triangle is symmetrical. This means that we never need to memoize a binomial coefficient larger than \(\lfloor n/2\rfloor\text{.}\) From (5.2) follows that we don't need to compute binomial coefficients larger than \(k\) in order to compute \(\binom nk\text{.}\) So, to compute \(\binom nk\text{,}\) we need a two-dimensional table with \(k\) columns and \(n\) rows. For example, this is the table for computing \(\binom 73\text{:}\)

\begin{equation*} \begin{array}{r|rrrrrrrrr} \amp k=0 \amp 1 \amp 2 \amp 3 \amp 4 \amp 5 \amp 6 \amp 7 \amp \cdots \\ \hline n = 0 \amp 1 \amp {\color{gray} 0} \amp {\color{gray}0} \amp {\color{gray}0} \amp \amp \amp \amp \amp \\ 1 \amp 1 \amp 1 \amp {\color{gray}0} \amp {\color{gray}0} \amp \amp \amp \amp \amp \\ 2 \amp 1 \amp 2 \amp 1 \amp {\color{gray}0} \amp \amp \amp \amp \amp \\ 3 \amp 1 \amp 3 \amp 3 \amp 1 \amp \amp \amp \amp \amp \\ 4 \amp 1 \amp 4 \amp 6 \amp 4 \amp . \amp \amp \amp \amp \\ 5 \amp 1 \amp 5 \amp 10\amp 10\amp . \amp . \amp \amp \amp \\ 6 \amp 1 \amp 6 \amp 15\amp 20\amp . \amp . \amp . \amp \amp \\ 7 \amp 1 \amp 7 \amp 21\amp 35\amp . \amp . \amp . \amp . \amp \\ \end{array} \end{equation*}

The following code implements the memoization technique to compute binomial coefficients:

unsigned int binomial_table(unsigned int N, unsigned int K) {
    if (K > N)
        return 0;
    if (K > N - K)
        K = N - K;
    if (K == 0)
        return 1;

    unsigned table[N + 1][K + 1];
    for (unsigned n = 0; n <= N; n++) {
        table[n][0] = 1;
        unsigned const end = n < K ? n : K;
        for (unsigned k = 1; k <= end; k++)
            table[n][k] = table[n - 1][k] + table[n - 1][k - 1];
        for (unsigned i = end + 1; i <= K; i++)
            table[n][i] = 0;
    return table[N][K];
Note that we could even further optimize this code because in order to compute the k-th row of the table, we only need the previous row. So we actually don't need to memoize all rows but just two and “double-buffer” them.

Subsection 5.4.3 Computing the Minimum Edit Distance

Let us turn to our first real dynamic programming algorithm. Here, we are not only considering overlapping sub-problems but also optimal substructure.

Given two words \(a\) and \(b\text{,}\) the minimum edit distance (sometimes also called the Levenshtein distance) between \(a\) and \(b\) is the minimum number of edit operations (delete/replace/add a character) to transform \(a\) to \(b\text{.}\) Solving edit distance problems is relevant in fuzzy string search settings, such as spell checking. Also, problems like computing genome sequence alignments are related very closely to edit distance problems.

Let's consider an example:

\begin{equation*} a=\mathtt{tore}\qquad b=\mathtt{tag} \end{equation*}

Trivially, we can transform tore into tag by four delete and three add operations: First delete all letters and then add the three letters t, a, g. This results in seven edit operations. So each sequence of edit operations that takes us from tore to tag has a certain cost, i.e. the number of additions, replacements, and deletions.

However, we are interested in the shortest/minimum sequence of edits which in this example has three edit operations: Keep t (no edit operations), replace o by a, replace r by g and delete e.

Let us discuss the edit distance problem more formally.

Definition 5.4.3. Minimum Edit Distance.

The minimum edit distance of two words \(a\) and \(b\) up to positions \(i\) and \(j\) is defined as follows:

\begin{equation*} \begin{array}{lcl} \edist_{a,b}\amp:\amp\{0,\dots,|a|\}\times\{0,\dots,|b|\}\to\Nat \\ \edist_{a,b}(0, j) \amp \mapsto \amp j \\ \edist_{a,b}(i, 0) \amp \mapsto \amp i \\ \edist_{a,b}(i, j) \amp \mapsto \amp \min \begin{cases} % TODO@hack: Hier müsste meiner Meinung nach vom Sinn her die Wörter anfügen und löschen getauscht werden \edist_{a,b}(i-1,j) + 1 \amp \text{delete} \\ \edist_{a,b}(i,j-1) + 1 \amp \text{add} \\ \edist_{a,b}(i-1,j-1) \amp \text{if } a_i = b_j\qquad \text{ keep} \\ \edist_{a,b}(i-1,j-1)+1 \amp \text{if } a_i \ne b_j\qquad \text{ replace} \end{cases} \end{array} \end{equation*}

From the recursive definition of \(\edist\) we can see that some partial results are used multiple times. Hence, we will also use to memoization to compute the edit distance of two words efficiently. Let us apply Definition 5.4.3 to the example above.

Let us first consider the top row and left column. The entries of the first row correspond to the costs to transform the empty word to the words \(\varepsilon\text{,}\) t, ta, tag by adding the respective characters.

\begin{equation*} \begin{array}{r|llll} \amp \varepsilon \amp \mathtt{t} \amp \mathtt{a} \amp \mathtt{g} \\ \hline \varepsilon \amp 0 \amp 1 \amp 2 \amp 3 \\ \mathtt{t} \amp \amp \amp \amp \\ \mathtt{o} \amp \amp \amp \amp \\ \mathtt{r} \amp \amp \amp \amp \\ \mathtt{e} \amp \amp \amp \amp \\ \end{array} \end{equation*}

Accordingly, the entries in the first column corresponds to the costs to get to the empty word from \(\varepsilon\text{,}\) t, to, tor, tore by deleting the respective letters.

\begin{equation*} \begin{array}{r|llll} \amp \varepsilon \amp \mathtt{t} \amp \mathtt{a} \amp \mathtt{g} \\ \hline \varepsilon \amp 0 \amp 1 \amp 2 \amp 3 \\ \mathtt{t} \amp 1 \amp \amp \amp \\ \mathtt{o} \amp 2 \amp \amp \amp \\ \mathtt{r} \amp 3 \amp \amp \amp \\ \mathtt{e} \amp 4 \amp \amp \amp \\ \end{array} \end{equation*}

The remaining table entries are filled with the minimum expression of Definition 5.4.3. For example, for the entry at (1, 1), the third case of the case distinction contributes the minimum value:

\begin{equation*} \begin{array}{r|llll} \amp \varepsilon \amp \mathtt{t} \amp \mathtt{a} \amp \mathtt{g} \\ \hline \varepsilon \amp 0 \amp \color{blue}{1} \amp 2 \amp 3 \\ \mathtt{t} \amp 1 \amp \color{blue}{2} \amp \amp \\ \mathtt{o} \amp 2 \amp \amp \amp \\ \mathtt{r} \amp 3 \amp \amp \amp \\ \mathtt{e} \amp 4 \amp \amp \amp \\ \end{array} \end{equation*}
\begin{equation*} \begin{array}{r|llll} \amp \varepsilon \amp \mathtt{t} \amp \mathtt{a} \amp \mathtt{g} \\ \hline \varepsilon \amp 0 \amp 1 \amp 2 \amp 3 \\ \mathtt{t} \amp \color{blue}{1} \amp \color{blue}{2} \amp \amp \\ \mathtt{o} \amp 2 \amp \amp \amp \\ \mathtt{r} \amp 3 \amp \amp \amp \\ \mathtt{e} \amp 4 \amp \amp \amp \\ \end{array} \end{equation*}
\begin{equation*} \begin{array}{r|llll} \amp \varepsilon \amp \mathtt{t} \amp \mathtt{a} \amp \mathtt{g} \\ \hline \varepsilon \amp \color{blue}{0} \amp 1 \amp 2 \amp 3 \\ \mathtt{t} \amp 1 \amp \color{blue}{0} \amp \amp \\ \mathtt{o} \amp 2 \amp \amp \amp \\ \mathtt{r} \amp 3 \amp \amp \amp \\ \mathtt{e} \amp 4 \amp \amp \amp \\ \end{array} \end{equation*}

The rest of the matrix is filled accordingly:

\begin{equation*} \begin{array}{r|llll} \amp \varepsilon \amp \mathtt{t} \amp \mathtt{a} \amp \mathtt{g} \\ \hline \varepsilon \amp \amp 1 \amp 2 \amp 3 \\ \mathtt{t} \amp 1 \amp 0 \amp 1 \amp 2 \\ \mathtt{o} \amp 2 \amp 1 \amp 1 \amp 2 \\ \mathtt{r} \amp 3 \amp 2 \amp 2 \amp 2 \\ \mathtt{e} \amp 4 \amp 3 \amp 3 \amp \color{red}{3} \\ \end{array} \end{equation*}

According to the definition in Definition 5.4.3, the entry in row \(i\) and column \(j\) equals to the minimum edit distance of the words \(a[1:i]\) and \(b[1:j]\text{.}\) The elements of the first row and column are always the same for each pair of words. All other elements can be computed from the elements above, left, and above left.

Computing the minimum edit distance is an optimization problem because we are looking not only for the cost of some sequence of edits but for the minimum cost. The property of this problem that makes it amenable to be solved with dynamic programming is that it possesses optimal substructure: The optimal solution \(\edist(i,j)\) can be computed from the optimal solutions \(\edist(i,j-1)\text{,}\) \(\edist(i-1,j)\text{,}\) \(\edist(i-1,j-1)\) of the respective sub-problems.