Recursion

Recursion is the analog of induction in computer science. A recursive function is a function that calls itself.

One thing to be wary of is that a function that calls itself can potentially go into an infinite loop. For example the simplest possible recursive function is a total disaster because it does not terminate:

In [1]:
def dont_call_me():
    dont_call_me()

To solve this problem, your function needs to make sure that every case eventually reaches a base case in which the function does not call itself. Here is a simple recursive function that does terminate (as long as \(n\) is a small integer.)

In [2]:
def call_me(n):
    if n<=0:
        print("Reached base case of n=",n)
    else:
        print("Function called with n=", n)
        call_me(n-1)
    print("Exiting call of function with n=", n)

Below, I call this function. Can you figure out what is happening?

In [3]:
call_me(5)
Function called with n= 5
Function called with n= 4
Function called with n= 3
Function called with n= 2
Function called with n= 1
Reached base case of n= 0
Exiting call of function with n= 0
Exiting call of function with n= 1
Exiting call of function with n= 2
Exiting call of function with n= 3
Exiting call of function with n= 4
Exiting call of function with n= 5

Example: Factorial

The factorial operation is usually defined inductively by \(0!=1\) and \(n!=n * (n-1)!\) for integers \(n \geq 1\). This translates into the following function definition:

In [4]:
def factorial(n):
    if n==0:
        return 1
    return n*factorial(n-1)

We test this by printing the first few values:

In [5]:
for n in range(7):
    print(n, "! = ", factorial(n), ".", sep="")
0! = 1.
1! = 1.
2! = 2.
3! = 6.
4! = 24.
5! = 120.
6! = 720.

Example: Pascal's triangle

Consider a polynomial \(p(t)=a_k t^k+ a_{k-1} t^{k-1}+\ldots+a_1 t + a_0.\) Observe that \[(t+1) p(t)=a_k t^{k+1} + (a_k+a_{k-1}) t^k + (a_{k-1}+a_{k-2}) t^{k-1}+ \ldots+(a_0+a_1) t + a_0.\]

We can define a sequence of polynomials by \(q_0=1\) and inductively defining \(q_{n+1}(t)=(t+1) q_n(t)\) for integers \(n \geq 0\). Then each \(q_n(t)\) is a polynomial of degree \(n\). We define \(a_{n,k}\) to be the coefficient of the \(t^k\) term of the polynomial \(q_n(t)\). When arranged into a rectangular array, these numbers are known as Pascal's triangle.

Problem: Write a recursive function pascals_triangle(n,k) which returns \(a_{n,k}\) when provided an integer \(n \geq 0\) and an integer \(k\) satisfying \(0 \leq k \leq n\).

To do this, first observe that the constant term of \(q_n(t)\) is always one, as is the coefficient of \(t^n\) in \(q_n(t)\). That is, \(q_{n,0}=1\) and \(q_{n,n}=1\) for all integers \(n \geq 0\).

Second, using the formula relating the coefficients for \((t+1) p(t)\) to the coefficients for \(p(t)\) above, we see that \(a_{n,k}=a_{n-1,k-1}+a_{n-1,k}\) when \(0<k<n\).

The following function satisfies the requirements above.

In [6]:
def pascals_triangle(n,k):
    if k==0 or k==n:
        return 1
    return pascals_triangle(n-1,k-1)+pascals_triangle(n-1,k)

The following prints out the coefficients of \((t+1)^n\) in successive lists:

In [7]:
for n in range(10):
    l=[]
    for k in range(n+1):
        l.append(pascals_triangle(n,k))
    print(l)
[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]
[1, 5, 10, 10, 5, 1]
[1, 6, 15, 20, 15, 6, 1]
[1, 7, 21, 35, 35, 21, 7, 1]
[1, 8, 28, 56, 70, 56, 28, 8, 1]
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]

The above is Pascal's triangle. You can read more about this in Wikipedia's article on Pascal's triangle. Pascal's triangle is pretty neat. For instance, the locations of the odd integers in the triangle are related to the Sierpinski gasket (a fractal).

Remark: The definition of Pascal's triangle is pretty slow. You can see this because if you think about the algorithm, to compute the 126 in the bottom row, we are adding 126 ones! There is a much faster version: \(a_{n,k}=\frac{n!}{k! (n-k)!}.\)

Example: Expansion of a number in base b

Let \(b \geq 2\) be an integer. Then any natural number \(n\) has an expansion in base \(b\). This is the unique finite sequence of numbers \([a_0, a_1, \ldots, a_k]\) so that
  • each \(a_i \in {\mathbb Z}\) with \(0 \leq a_i < b\),
  • \(a_k \neq 0\), and
  • \(\displaystyle n=\sum_{i} a_i b^i.\)

You are very familiar with base \(10\). The expansion of the number \(123\) in base \(10\) is \(a_0=3\), \(a_1=2\) and \(a_3=1\) because \[123=3 \cdot 10^0+ 2 \cdot 10^1 + 1 \cdot 10^2.\]

We will explain how to solve the following problem: Give a function expansion(n,b) which returns the list \([a_0, a_1, \ldots, a_k]\) representing the expansion of an integer \(n \geq 1\) in base \(b\), where \(b \geq 2\) is an integer.

There are multiple ways to do this. One way is recursive. Suppose the expansion of \(n\) is \([a_0, \ldots, a_k]\) and \(k>0\). Then \[n=\sum_{i=0}^k a_i \cdot b^i=a_0+b\left(\sum_{i=0}^{k-1} a_i \cdot b^i\right).\] This shows two important things. First, \(a_0\) is the remainder obtained when dividing \(n\) by \(b\), i.e., \(a_0= n%b\). Second, the sum in parenthesis above represents \((n-a_0)/b\). The sum in these parentheses gives the expansion of this number. We need a base case for the recursion. We can use the fact that if \(n < b\) then the expansion is just \([n]\).

In [8]:
def expansion(n,b):
    if n<b:
        return [n]
    a0 = n%b
    lst=expansion((n-a0)//b, b)
    lst.insert(0,a0)
    return lst
In [9]:
expansion(123,10)
Out[9]:
[3, 2, 1]
In [10]:
expansion(5,2)
Out[10]:
[1, 0, 1]

Example: Towers of Hanoi

Definition of the problem:

The Towers of Hanoi is famous problem in computer science which involves \(n \geq 1\) disks of decreasing sizes attached to three poles. Initially, all \(n\) disks are attached to the first pole in order, with the largest disk on the bottom.
You are allowed to move the disks following certain rules:
  1. Only one disk can be moved at a time from one pole to another. The disk moved must be at the top of a tower.
  2. A larger disk can never be placed on a smaller disk.
So, for instance, in the first move, we could move the smallest disk to the second pole. In the second move, we could move the second smallest pole onto the third pole. These moves result in the following configuration:

We are interested in the following question. Is it possible to move all the disks onto a different pole while following these rules? If so, we should be able to express this as an algorithm.

Setup:

We can think of \(n\) disks as represented by numbers in \(\{1, 2, \ldots, n\}\) with the larger numbers representing the bigger disks. Then the state of each pole can be represented by a list of the numbers representing each disk. The disks will be listed from bottom to top. There are three poles. So the whole configuration can be represented as a tuple of three lists. The following is a function which provides the initial configuration when working with \(n\) disks:

In [11]:
def setup_towers(n):
    lst=list(range(1,n+1))
    lst.reverse()
    towers=(lst,[],[])
    return towers

For example, the initial configuration with \(n=7\) is represented by the following:

In [12]:
towers = setup_towers(7)
print(towers)
([7, 6, 5, 4, 3, 2, 1], [], [])

We will always use the variable towers to store the current state of the system of towers and disks.

Now we consider how we move one of the disks. To move the disks we will define a function move_top(towers,i,j). The variable towers stores the current state of the system. The function will peform the act of moving the top disk of pole \(i \in \{0,1,2\}\) to the top of the tower on pole \(j \in \{0,1,2\}\).

We want to make sure that we do not do any illegal moves. To ensure this we will check in our function if the move is legal. In Python 3, the expression assert statement, message produces an error if statement is false with the error message message included, and does nothing if statement is true. The error will cause Python to exit the function where assert is called.

We will use it to test to make sure a move is allowed. For a move from pole \(i\) to pole \(j\) to be allowed, we need the following to be true:
  • The two poles \(i\) and \(j\) should be different.
  • Pole \(i\) must have at least one disk.
  • Pole \(j\) must either have no disks or the smallest disk on pole \(j\) must be larger than the disk at the top of pole \(i\).

The following is our move_top function:

In [13]:
def move_top(towers, i, j):
    assert i!=j, "Illegal move: The poles must be different."
    assert len(towers[i])>0, "Illegal move: Attempting to move disk from an empty pole."
    assert len(towers[j])==0 or towers[i][len(towers[i])-1]<towers[j][len(towers[j])-1], \
        "Illegal move: Attempting to move a larger disk onto a smaller disk."
    towers[j].append(towers[i].pop())

Remarks: The top four lines in the function make the tests mentioned above and print an informative error. The third and forth lines above are really just one line. The forward slash at the end of the line tells Python that the line continues into the next line, i.e., lines \(3\) and \(4\) should be considered to be the same line.

Now we test it. First we setup the towers again.

In [14]:
towers = setup_towers(7)
print(towers)
([7, 6, 5, 4, 3, 2, 1], [], [])

Now we move the top disk from the first pole to the second:

In [15]:
move_top(towers, 0, 1)
print(towers)
([7, 6, 5, 4, 3, 2], [1], [])

Now we move from the first pole to the third:

In [16]:
move_top(towers, 0, 2)
print(towers)
([7, 6, 5, 4, 3], [1], [2])

Now we move from the second to the third.

In [17]:
move_top(towers, 1, 2)
print(towers)
([7, 6, 5, 4, 3], [], [2, 1])

Now we move from the first to the second again.

In [18]:
move_top(towers, 0, 1)
print(towers)
([7, 6, 5, 4], [3], [2, 1])

Now we will try to move from the first pole to the second again. This should result in an error.

In [19]:
move_top(towers, 0, 1)
print(towers)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-19-170d9bb6dfda> in <module>()
----> 1 move_top(towers, 0, 1)
      2 print(towers)

<ipython-input-13-8adb084ccb65> in move_top(towers, i, j)
      2     assert i!=j, "Illegal move: The poles must be different."
      3     assert len(towers[i])>0, "Illegal move: Attempting to move disk from an empty pole."
----> 4     assert len(towers[j])==0 or towers[i][len(towers[i])-1]<towers[j][len(towers[j])-1],         "Illegal move: Attempting to move a larger disk onto a smaller disk."
      5     towers[j].append(towers[i].pop())

AssertionError: Illegal move: Attempting to move a larger disk onto a smaller disk.

If we only move the disks using the method move_top, we will only be allowed to do legal moves!

The recursive solution

In order to solve the problem, the first idea is to think about a related problem.

We will write a function move_group(towers, m, i, j). The function will take as input the configuration given by towers, an integer \(m \geq 1\), and \(i,j \in \{0,1,2\}\). We assume that the \(m\) smallest disks sit on the top of pole \(i\). Our function will move these disks to pole \(j\).

Observe that when \(m=1\), the problem is easy. Namely, we just move the top disk from pole \(i\) to pole \(j\). For \(m > 1\), we will give a recursive solution. Namely, we choose \(k \in \{0,1,2\}\) so that \(i \neq k\) and \(j \neq k\). We move the \(m-1\) smallest disks from pole \(i\) to pole \(k\), then move the \(m\)-th smallest disk to pole \(j\), and then move the \(m-1\) smallest disks back from pole \(k\) to pole \(j\).

We now define this function:

In [20]:
def move_group(towers, m, i, j):
    if m==1:
        move_top(towers, i, j)
    else:
        # k will be the index of the poll which is neither i or j. Can you figure out why it works?
        k=(3-(j-i)%3+i)%3
        move_group(towers, m-1, i, k)
        move_top(towers, i, j)
        move_group(towers, m-1, k, j)

Here is proof that the function works for \(n=7\) disks.

In [21]:
towers=setup_towers(7)
print("Initially, towers is",towers)
move_group(towers,7,0,2)
print("Now, towers is",towers)
Initially, towers is ([7, 6, 5, 4, 3, 2, 1], [], [])
Now, towers is ([], [], [7, 6, 5, 4, 3, 2, 1])

To see that the individual steps, we can add a print statements to the function whenever we call move_top.

In [22]:
def move_group(towers, m, i, j):
    if m==1:
        move_top(towers, i, j)
        print("Moving from pole", i, "to pole", j, "leaves the towers in state", towers)
    else:
        # k will be the index of the poll which is neither i or j. Can you figure out why it works?
        k=(3-(j-i)%3+i)%3
        move_group(towers, m-1, i, k)
        move_top(towers, i, j)
        print("Moving from pole", i, "to pole", j, "leaves the towers in state", towers)
        move_group(towers, m-1, k, j)

Here are the steps used for the solution to the problem \(n=4\) disks.

In [23]:
towers = setup_towers(4)
print("Initially the state is", towers)
move_group(towers, 4, 0, 2)
Initially the state is ([4, 3, 2, 1], [], [])
Moving from pole 0 to pole 1 leaves the towers in state ([4, 3, 2], [1], [])
Moving from pole 0 to pole 2 leaves the towers in state ([4, 3], [1], [2])
Moving from pole 1 to pole 2 leaves the towers in state ([4, 3], [], [2, 1])
Moving from pole 0 to pole 1 leaves the towers in state ([4], [3], [2, 1])
Moving from pole 2 to pole 0 leaves the towers in state ([4, 1], [3], [2])
Moving from pole 2 to pole 1 leaves the towers in state ([4, 1], [3, 2], [])
Moving from pole 0 to pole 1 leaves the towers in state ([4], [3, 2, 1], [])
Moving from pole 0 to pole 2 leaves the towers in state ([], [3, 2, 1], [4])
Moving from pole 1 to pole 2 leaves the towers in state ([], [3, 2], [4, 1])
Moving from pole 1 to pole 0 leaves the towers in state ([2], [3], [4, 1])
Moving from pole 2 to pole 0 leaves the towers in state ([2, 1], [3], [4])
Moving from pole 1 to pole 2 leaves the towers in state ([2, 1], [], [4, 3])
Moving from pole 0 to pole 1 leaves the towers in state ([2], [1], [4, 3])
Moving from pole 0 to pole 2 leaves the towers in state ([], [1], [4, 3, 2])
Moving from pole 1 to pole 2 leaves the towers in state ([], [], [4, 3, 2, 1])

The following illustrates the solution to the problem with \(n=4\).

This is Tower of Hanoi 4.gif by André Karwath aka Aka and is licensed under the Creative Commons Attribution-Share Alike 2.5 Generic license.

Question: Can you figure out how many moves the algorithm takes for any number \(n\) of disks?

Exercises:

  1. (Catalan Numbers) The Catalan numbers a sequence of integers defined inductively via \(C_0=1\) and \(C_{n+1}=\frac{2(2n+1)}{n+2} C_n\) for integers \(n \geq 0\). Define a recursive function catalan(n) which takes as input an integer \(i \geq 0\) and returns the Catalan number \(C_i\). (The Catalan numbers first appeared as an exercise in the page on Lists.)
  2. (Multiplication using addition) Write a recursive function multiply(m,n) which takes as input two integers \(m\) and \(n\), and returns their product \(m*n\) only using addition and subtraction. Hints: \[m*0=0, \quad m*n=m*(n-1)+m \quad \text{and} \quad m*n=m*(n+1)-m.\]
  3. (Greatest common divisor) Write a recursive function gcd(m,n) which returns the greatest common divisor of integers \(m \geq 0\) and \(n \geq 0\) (not both zero). Use the following observations:
    • \(gcd(m,n)=gcd(n,m)\) for all \(m\) and \(n\).
    • \(gcd(0,n)=n\) when \(n>0\).
    • \(gcd(m,n)=gcd(n \% m, m)\) whenever \(0 < m \leq n\).
  4. (Iteration) Suppose \(f: {\mathbb R} \to {\mathbb R}\). Then, we define \(f^{\circ k}\) to be \(f\) applied \(k \in {\mathbb N}\) times: \[f^{\circ k}(x)=\rlap{\overbrace{\phantom{f \circ f \circ \ldots \circ f}}^k}f \circ f \circ \ldots \circ f(x)\] for \(x \in {\mathbb R}\). Write a recursive function iterate(f,k,x) which takes as input a function \(f: {\mathbb R} \to {\mathbb R}\), a natural number \(k\), and a floating point number \(x\) and returns the value of \(f^{\circ k}(x)\).
  5. (Cantor set) The middle third Cantor set is defined by a limiting process. We define \(C_0=[0,1]\) and will define \(C_i\) inductively for integers \(i \geq 0\). Each \(C_i\) is a finite union of closed intervals. For each integer \(i \geq 0\), we define \(C_{i+1} \subset C_i\) by removing the middle third of the intervals in \(C_i\). So for example \[C_1=\left[0,\frac{1}{3}\right] \cup \left[\frac{2}{3},1\right] \quad \text{and} \quad C_2=\left[0,\frac{1}{9}\right]\cup\left[\frac{2}{9}, \frac{1}{3}\right] \cup \left[\frac{2}{3}, \frac{7}{9}\right] \cup \left[\frac{8}{9}, 1\right].\] The middle third Cantor set is defined by \(C= \bigcap_{i=0}^\infty C_i\).

    Write a function cantors_set_contains(n,x) which takes as input an integer \(n \geq 0\) and returns the truth value of the statement \(x \in C_n\).

    You do not have to use recursion. But, if you want to use recursion, it might be helpful to use the function \(f:C_1 \to C_0\) by \[f(x)=\begin{cases} 3x & \text{if $x \in \left[0, \frac{1}{3}\right]$}\\ 3x-2 & \text{if $x \in \left[\frac{2}{3},1\right]$.}\end{cases}\] Then you can define \(C_i\) by iteration: \[C_i=\Big\{x \in [0,1]: \{x, f(x), f^{\circ 2}(x), \ldots, f^{\circ (i-1)}(x)\} \subset C_1\Big\}.\] An equivalent way to define \(C_i\) is \[C_i=\{x: \text{$f^{\circ i}(x)$ is well defined}\}.\] (Can you see why these versions of \(C_i\) are correct?)

    Hint: For testing, it may be useful to note that \(\frac{1}{4} \in C\) while \(\frac{1}{2 \cdot 3^k} \not \in C\) for any integer \(k \geq 0\).

References:

  • The book How to Think Like a Computer Scientist: Learning with Python 3 has a chapter on recursion including examples of drawing fractals.