Gaussian elimination

At its most basic level, Gaussian elimination is simply an algorithm to solve the equation $$A x = b,$$ where $A$ is an $m \times n$ matrix and $b$ is a vector in ${\mathbb R}^n$ (or ${\mathbb C}^n$) that is given to you. You want to find a solution $x \in {\mathbb R}^n$ (or ${\mathbb C}^n$).

We will give a treatment of this process roughly following TAK § 4.2. The algorithms and examples given there are very enlightening.

We will concentrate on the case when $A$ is a square invertible matrix. This simplifies things quite a bit, though there are still issues of round-off error.

In [1]:
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
import math as m
from mpmath import mp, iv

Example of Gaussian elimination without pivoting

We will first carry it out by "hand" on a $4 \times 4$ matrix of integers $a$. We generated this matrix with the command:

A = np.floor(-10 + 19*np.random.random((4,4)))
In [2]:
A = np.array([[ 7., -1.,  0., -9.],
              [ 5.,  2.,  3.,  5.],
              [ 5.,  5.,  1., -6.],
              [-7., -3.,  1., -8.]])
print(A)
[[ 7. -1.  0. -9.]
 [ 5.  2.  3.  5.]
 [ 5.  5.  1. -6.]
 [-7. -3.  1. -8.]]

To build a random vector $b$ as a column vector, we used:

b = np.floor(-10 + 19*np.random.random((4,1)))
In [3]:
b = np.array([[ 3.],
              [ 6.],
              [-4.],
              [-9.]])
print(b)
[[ 3.]
 [ 6.]
 [-4.]
 [-9.]]

Since the two matrices $A$ and $b$ have the same number of rows, they can be joined together with np.concatenate as demonstrated below. Here axis=1 indicates that we are joining horizontally. Using axis=0 would indicate joining vertically.

In [4]:
C = np.concatenate((A,b), axis=1)
print(C)
[[ 7. -1.  0. -9.  3.]
 [ 5.  2.  3.  5.  6.]
 [ 5.  5.  1. -6. -4.]
 [-7. -3.  1. -8. -9.]]

We declare entry $(0,0)$ to be a pivot and clear out the rows below by adding multiples of the first row. The following adjusts row $1$.

In [5]:
C[1] -= C[1,0]/C[0,0] * C[0]
print(C)
[[  7.          -1.           0.          -9.           3.        ]
 [  0.           2.71428571   3.          11.42857143   3.85714286]
 [  5.           5.           1.          -6.          -4.        ]
 [ -7.          -3.           1.          -8.          -9.        ]]

Now we do the same to rows $2$ and $3$.

In [6]:
C[2] -= C[2,0]/C[0,0] * C[0]
C[3] -= C[3,0]/C[0,0] * C[0]
print(C)
[[  7.          -1.           0.          -9.           3.        ]
 [  0.           2.71428571   3.          11.42857143   3.85714286]
 [  0.           5.71428571   1.           0.42857143  -6.14285714]
 [  0.          -4.           1.         -17.          -6.        ]]

Then our next pivot is in position $(1,1)$. We clear out the entries below.

In [7]:
for row in range(2,4):
    C[row] -= C[row,1]/C[1,1] * C[1]
print(C)
[[  7.          -1.           0.          -9.           3.        ]
 [  0.           2.71428571   3.          11.42857143   3.85714286]
 [  0.           0.          -5.31578947 -23.63157895 -14.26315789]
 [  0.           0.           5.42105263  -0.15789474  -0.31578947]]

Our next pivot is in position $(2,2)$. So, we clear out the entry below.

In [8]:
C[3] -= C[3,2]/C[2,2] * C[2]
print(C)
[[  7.00000000e+00  -1.00000000e+00   0.00000000e+00  -9.00000000e+00
    3.00000000e+00]
 [  0.00000000e+00   2.71428571e+00   3.00000000e+00   1.14285714e+01
    3.85714286e+00]
 [  0.00000000e+00   0.00000000e+00  -5.31578947e+00  -2.36315789e+01
   -1.42631579e+01]
 [  0.00000000e+00   0.00000000e+00   8.88178420e-16  -2.42574257e+01
   -1.48613861e+01]]

Note that due to round off errors, the entry in position $(3,2)$ is not exactly zero. This is what caused the switch to exponential notation in the above print out.

In [9]:
C[3,2]
Out[9]:
8.8817841970012523e-16

We set things up so that this position should be exactly zero. So, we can fix this just by explicityly setting that value.

In [10]:
C[3,2] = 0
print(C)
[[  7.          -1.           0.          -9.           3.        ]
 [  0.           2.71428571   3.          11.42857143   3.85714286]
 [  0.           0.          -5.31578947 -23.63157895 -14.26315789]
 [  0.           0.           0.         -24.25742574 -14.86138614]]

The above matrix is in echelon form. This means we can easily solve the corresponding equation.

The equation corresponding to the last row is approximately $$-24.26\,x_3 = -14.86.$$ So we can solve for $x_3$. First we build an $x$ vector to hold the entries. Then we store the value.

In [11]:
x = np.zeros((4,1))
x[3] = C[3,4]/C[3,3]
print(x)
[[ 0.        ]
 [ 0.        ]
 [ 0.        ]
 [ 0.61265306]]

Then the third equation has the form $$-5.42\,x_2 - 23.63\,x_3 = -14.26.$$ We can solve for $x_2$.

In [12]:
x[2] = (C[2,4]-C[2,3]*x[3])/C[2,2]
print(x)
[[ 0.        ]
 [ 0.        ]
 [-0.04040816]
 [ 0.61265306]]

Now we want to solve for $x_1$ using equation one: $$2.71\,x_1 + 3\,x_2 + 11.43\,x_3 = 3.86.$$ It is useful to think of the terms involving $x_2$ and $x_3$ as a dot product, i.e.: $$2.71\,x_1 + \left[\begin{array}{r} 3 \\ 11.43 \end{array}\right] \cdot \left[\begin{array}{r} x_2 \\ x_3 \end{array}\right]=3.86.$$ We can access these two vectors using slices.

In [13]:
print(C[1,2:4])
print(x[2:])
[  3.          11.42857143]
[[-0.04040816]
 [ 0.61265306]]

Therefore, we can write our value of $x_1$ as

In [14]:
x[1] = (C[1,4] - C[1,2:4].dot(x[2:]))/C[1,1]
print(x)
[[ 0.        ]
 [-1.11387755]
 [-0.04040816]
 [ 0.61265306]]

We can use a similar approach to working out $x_0$.

In [15]:
x[0] = (C[0,4] - C[0,1:4].dot(x[1:]))/C[0,0]
print(x)
[[ 1.05714286]
 [-1.11387755]
 [-0.04040816]
 [ 0.61265306]]

Now we can check our solution. We compute $Ax - b$, which should be close to zero.

In [16]:
A @ x - b
Out[16]:
array([[  0.00000000e+00],
       [  0.00000000e+00],
       [ -1.77635684e-15],
       [  0.00000000e+00]])

Non-exactness of our solution is due to rounding errors. An error of order $10^{-15}$ is about the best we could hope for because floats in Python have about $15$ digits of precision.

Implementation without pivoting

In [17]:
def gauss_elim_1(A, b):
    # Ensure that A is square
    num_rows, num_cols = A.shape
    assert num_rows == num_cols, "This only works if A has the same number of rows and columns"
    
    # Join the matrix and the vector.
    C = np.concatenate((A, b), axis=1)
    
    # Now we do forward elimination.
    
    # i will iterate through the pivots (i,i)
    for i in range(num_rows):
        # Iterate through the rows below row
        for j in range(i+1, num_rows):
            # Add a multiple of row i to row j to make entry (j,i) zero.
            C[j] -= C[j,i]/C[i,i] * C[i]
            # Ensure C[j,i] is zero (and not round off error)
            C[j,i] = 0
    
    # Print statement for debugging.
    print("After forward elimination, C is")
    print(C, end="\n\n")
    
    # Now we will carry out back substitution
    
    # Build a vector for our solution.
    x = np.zeros((num_rows,1))
    
    # We will do the last entry by hand.
    x[num_rows-1] = C[num_rows-1,num_rows] / C[num_rows-1,num_rows-1]
    
    # Now we work through the remaining rows backwards.
    for i in range(num_rows-2, -1, -1):
        # We use the dot product formula we worked out.
        x[i] = (C[i,num_rows] - C[i,i+1:num_rows].dot(x[i+1:]))/C[i,i]

    print("After back substitution, we see that x is")
    print(x)
    return x

To check that our code works, we used

In [18]:
x = gauss_elim_1(A,b)
After forward elimination, C is
[[  7.          -1.           0.          -9.           3.        ]
 [  0.           2.71428571   3.          11.42857143   3.85714286]
 [  0.           0.          -5.31578947 -23.63157895 -14.26315789]
 [  0.           0.           0.         -24.25742574 -14.86138614]]

After back substitution, we see that x is
[[ 1.05714286]
 [-1.11387755]
 [-0.04040816]
 [ 0.61265306]]

This is the same solution we found before, which is what we expect.

Bad cases and numerical errors

The algorithm gauss_elim_1 will fail if zero appears in a pivot entry. Observe below.

In [19]:
A = np.array([[ 0., -2.,  3.,  6.],
              [-7.,  0., -1., -9.],
              [-9.,  6.,  7.,  8.],
              [-7.,  8., -2., -1.]])
print("A = \n{}.".format(A))
b = np.array([[ 6.],
              [-2.],
              [ 6.],
              [ 0.]])
print("b = \n{}.".format(b))
x = gauss_elim_1(A,b)
A = 
[[ 0. -2.  3.  6.]
 [-7.  0. -1. -9.]
 [-9.  6.  7.  8.]
 [-7.  8. -2. -1.]].
b = 
[[ 6.]
 [-2.]
 [ 6.]
 [ 0.]].
After forward elimination, C is
[[  0.  -2.   3.   6.   6.]
 [  0. -inf  inf  inf  inf]
 [ nan   0.  nan  nan  nan]
 [ nan  nan   0.  nan  nan]]

After back substitution, we see that x is
[[ nan]
 [ nan]
 [ nan]
 [ nan]]
/usr/lib/python3/dist-packages/ipykernel_launcher.py:16: RuntimeWarning: divide by zero encountered in double_scalars
  app.launch_new_instance()
/usr/lib/python3/dist-packages/ipykernel_launcher.py:16: RuntimeWarning: invalid value encountered in multiply
  app.launch_new_instance()
/usr/lib/python3/dist-packages/ipykernel_launcher.py:16: RuntimeWarning: invalid value encountered in double_scalars
  app.launch_new_instance()

What is going on? Well our algorithm assumes that pivot entries are non-zero. The first step in our algorithm takes $C$ and uses multiples of the first row to zero out the entries below. Here we just attempt to zero out the second row:

In [20]:
C = np.concatenate((A, b), axis=1)
print("Before the operation C =\n{}".format(C))
C[1] -= C[1,0] / C[0,0] * C[0]
print("After the operation C =\n{}".format(C))
Before the operation C =
[[ 0. -2.  3.  6.  6.]
 [-7.  0. -1. -9. -2.]
 [-9.  6.  7.  8.  6.]
 [-7.  8. -2. -1.  0.]]
After the operation C =
[[  0.  -2.   3.   6.   6.]
 [ nan -inf  inf  inf  inf]
 [ -9.   6.   7.   8.   6.]
 [ -7.   8.  -2.  -1.   0.]]
/usr/lib/python3/dist-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in double_scalars
  This is separate from the ipykernel package so we can avoid doing imports until
/usr/lib/python3/dist-packages/ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in multiply
  This is separate from the ipykernel package so we can avoid doing imports until

Since C[0,0] is (positive) zero, we are dividing by zero. Numpy handles this in arrays by returning inf if the numerator is positive, -inf if the numerator is negative, and nan if the numerator is also zero. Numpy also produces warning messages.

(In other places, Python handles dividing by zero differently. Dividing by zero using a float or int results in a ZeroDivisionError, which is an exception. Typically, such an error will stop computation, but these errors can be handled using try and except. You can tell Numpy to change how errors like this are treated. See how numpy handles numerical exceptions.)

A related issue occurs if a pivot entry is very small. For example, consider our matrix as above, but with the zero replaced by $10^{-6}$:

In [21]:
A = np.array([[  1e-6,  -9.,  -9.,  -7.],
              [  6.,  -7.,   4.,  -8.],
              [ -2.,  -5.,   8.,  -2.],
              [ -7.,   5.,  -8., -10.]])
C = np.concatenate((A,b),axis=1)
print("Before the operation row 1 of C is {}".format(C[1]))
C[1] -= C[1,0] / C[0,0] * C[0]
print("After the operation row 1 of C is \n{}".format(C[1]))
Before the operation row 1 of C is [ 6. -7.  4. -8. -2.]
After the operation row 1 of C is 
[        0.  53999993.  54000004.  41999992. -36000002.]

Observe that the $-7, 4, -8, 2$ originally in row $1$ have ended up in the 8th most significant digit of the new matrix. The same will happen in rows $2$ and $3$. Then further row reduction will will result in subtracting multiples these rows from each other, resulting in loss of precision because the rows will agree to their $7$ most significant digits.

In [22]:
x = gauss_elim_1(A,b)
After forward elimination, C is
[[  1.00000000e-06  -9.00000000e+00  -9.00000000e+00  -7.00000000e+00
    6.00000000e+00]
 [  0.00000000e+00   5.39999930e+07   5.40000040e+07   4.19999920e+07
   -3.60000020e+07]
 [  0.00000000e+00   0.00000000e+00   1.66666682e+01   1.03703669e+00
    4.44443539e-01]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00  -1.68600002e+01
   -4.44000042e+00]]

After back substitution, we see that x is
[[-1.01779362]
 [-0.88177153]
 [ 0.01028069]
 [ 0.26334522]]
In [23]:
A @ x - b
Out[23]:
array([[  0.00000000e+00],
       [ -1.82732851e-09],
       [  5.46804380e-10],
       [  2.85819857e-09]])

We can see from the above that our error is larger than the error in the first example we looked at (where we had an error of order $10^{-15}$). Having a pivot entry of order $10^{-6}$ resulted in a loss of $6$ digits of precision. One could imagine that a matrix could have several small pivot entries resulting in a repeated loss of precision.

Pivoting

Switching rows (pivoting) largely solves the issue with zeros appearing in pivot positions and avoids some precision loss due to having small pivot entries. Consider our failed example from above:

In [24]:
A = np.array([[  0.,  -9.,  -9.,  -7.],
              [  6.,  -7.,   4.,  -8.],
              [ -2.,  -5.,   8.,  -2.],
              [ -7.,   5.,  -8., -10.]])
A[0,0] = 0
b = np.array([[ 3.],
              [ 8.],
              [ 8.],
              [-7.]])
C = np.concatenate((A,b), axis=1)
print("C = \n{}.".format(C))
C = 
[[  0.  -9.  -9.  -7.   3.]
 [  6.  -7.   4.  -8.   8.]
 [ -2.  -5.   8.  -2.   8.]
 [ -7.   5.  -8. -10.  -7.]].

The issue is that there is a zero in position $(0,0)$, so we can't clear out the lines below. Thinking of the rows as equations, it is natural to reorder them so that a non-zero entry appears in position $(0,0)$. In the spirit of minimizing numerical error, the best row to choose is the one whose first entry has largest absolute value.

Numpy has a function argmax which returns the position of the largest element of a list or array. Here is an example:

In [25]:
np.argmax([1, 12, -13])
Out[25]:
1

The number 1 was returned because $12$ is the largest element of the list [1, 12, -13] and the 12 appears in position 1. If we want the position of the largest absolute value we can use:

In [26]:
np.argmax(np.abs([1, 12, -13]))
Out[26]:
2

We can get the position of the largest absolute value in column $0$ of $C$ by:

In [27]:
row = np.argmax(np.abs(C[:,0]))
row
Out[27]:
3

So our first step in row reduction would be to switch row $0$ with row $3$. This can be done with:

In [28]:
C[[0,3],:] = C[[3,0],:]
print(C)
[[ -7.   5.  -8. -10.  -7.]
 [  6.  -7.   4.  -8.   8.]
 [ -2.  -5.   8.  -2.   8.]
 [  0.  -9.  -9.  -7.   3.]]

Let's continue for one more step. We clear out the entries below $(0,0)$.

In [29]:
C[1] -= C[1,0]/C[0,0] * C[0]
C[2] -= C[2,0]/C[0,0] * C[0]
print(C)
[[ -7.           5.          -8.         -10.          -7.        ]
 [  0.          -2.71428571  -2.85714286 -16.57142857   2.        ]
 [  0.          -6.42857143  10.28571429   0.85714286  10.        ]
 [  0.          -9.          -9.          -7.           3.        ]]

Now we want the entry of largest absolute value at or below the $-2.71$ in entry $(1,1)$. To get the vector of entries we are considering, we can do:

In [30]:
C[1::,1]
Out[30]:
array([-2.71428571, -6.42857143, -9.        ])

Note that the indices are shifted by one: while -2.71 is in row $1$, it appears in position $0$ in the list above. So to get the row containing the largest absolute value, we can do:

In [31]:
1 + np.argmax(np.abs(C[1::,1]))
Out[31]:
3

Thus, we'd want to switch rows $1$ and $3$:

In [32]:
C[[1,3],:] = C[[3,1],:]
print(C)
[[ -7.           5.          -8.         -10.          -7.        ]
 [  0.          -9.          -9.          -7.           3.        ]
 [  0.          -6.42857143  10.28571429   0.85714286  10.        ]
 [  0.          -2.71428571  -2.85714286 -16.57142857   2.        ]]

Note also that sometimes we won't need to switch rows. I think we are now ready to update our implmentation.

Implementation with pivoting

In [33]:
def gauss_elim_2(A, b):
    # Ensure that A is square
    num_rows, num_cols = A.shape
    assert num_rows == num_cols, "This only works if A has the same number of rows and columns"
    
    # Join the matrix and the vector.
    C = np.concatenate((A, b), axis=1)
    
    # Now we do forward elimination.
    
    # i will iterate through the pivots (i,i)
    for i in range(num_rows):
        # Find the row containing the entry with 
        # largest absolute value below (i,i):
        k = i + np.argmax(np.abs(C[i::,i]))
        if k != i:
            # Swap rows i and k:
            C[[i,k],:] = C[[k,i],:]
        
        # Iterate through the rows below row
        for j in range(i+1, num_rows):
            # Add a multiple of row i to row j to make entry (j,i) zero.
            C[j] -= C[j,i]/C[i,i] * C[i]
            # Ensure C[j,i] is zero (and not round off error)
            C[j,i] = 0
    
    # Print statement for debugging.
    print("After forward elimination, C is")
    print(C, end="\n\n")
    
    # Now we will carry out back substitution
    
    # Build a vector for our solution.
    x = np.zeros((num_rows,1))
    
    # We will do the last entry by hand.
    x[num_rows-1] = C[num_rows-1,num_rows] / C[num_rows-1,num_rows-1]
    
    # Now we work through the remaining rows backwards.
    for i in range(num_rows-2, -1, -1):
        # We use the dot product formula we worked out.
        x[i] = (C[i,num_rows] - C[i,i+1:num_rows].dot(x[i+1:]))/C[i,i]

    print("After back substitution, we see that x is")
    print(x)
    return x

We can still have numerical errors appear in this new algorithm. The book TAK brings up the case of the Hilbert matrix. We will go through this example. Here is the $7 \times 7$ Hilbert matrix:

In [34]:
n = 7
H = np.array([[1/(col+row+1) for col in range(n)] for row in range(n)])
print(H)
[[ 1.          0.5         0.33333333  0.25        0.2         0.16666667
   0.14285714]
 [ 0.5         0.33333333  0.25        0.2         0.16666667  0.14285714
   0.125     ]
 [ 0.33333333  0.25        0.2         0.16666667  0.14285714  0.125
   0.11111111]
 [ 0.25        0.2         0.16666667  0.14285714  0.125       0.11111111
   0.1       ]
 [ 0.2         0.16666667  0.14285714  0.125       0.11111111  0.1
   0.09090909]
 [ 0.16666667  0.14285714  0.125       0.11111111  0.1         0.09090909
   0.08333333]
 [ 0.14285714  0.125       0.11111111  0.1         0.09090909  0.08333333
   0.07692308]]

We will test our algorithm by setting $b$ to be $A$ times the vector of all ones. Then we will see if the algorithm gives back the vector of all ones.

In [35]:
sol = np.array(n * [1.]).reshape((n,1))
print(sol)
[[ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]]
In [36]:
b = H @ sol
In [37]:
b
Out[37]:
array([[ 2.59285714],
       [ 1.71785714],
       [ 1.32896825],
       [ 1.09563492],
       [ 0.93654401],
       [ 0.81987734],
       [ 0.73013376]])

This should give us back the vector of all ones.

In [38]:
x = gauss_elim_2(H, b)
After forward elimination, C is
[[  1.00000000e+00   5.00000000e-01   3.33333333e-01   2.50000000e-01
    2.00000000e-01   1.66666667e-01   1.42857143e-01   2.59285714e+00]
 [  0.00000000e+00   8.33333333e-02   8.88888889e-02   8.33333333e-02
    7.61904762e-02   6.94444444e-02   6.34920635e-02   4.64682540e-01]
 [  0.00000000e+00   0.00000000e+00   6.34920635e-03   1.07142857e-02
    1.33580705e-02   1.48809524e-02   1.56985871e-02   6.10011021e-02]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   1.04166667e-03
    2.16450216e-03   3.10019841e-03   3.81562882e-03   1.01219961e-02]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   -3.29828901e-05  -8.50340136e-05  -1.42714428e-04  -2.60731332e-04]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
    0.00000000e+00   8.85770975e-07   2.67589553e-06   3.56166651e-06]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
    0.00000000e+00   0.00000000e+00  -3.00324975e-08  -3.00324972e-08]]

After back substitution, we see that x is
[[ 1.        ]
 [ 1.        ]
 [ 1.        ]
 [ 1.00000001]
 [ 0.99999997]
 [ 1.00000002]
 [ 0.99999999]]

Note that we don't get all ones due to numerical errors. However interestingly $H x -b$ is as close as we could expect to zero:

In [39]:
H @ x - b
Out[39]:
array([[  4.44089210e-16],
       [ -2.22044605e-16],
       [ -2.22044605e-16],
       [  0.00000000e+00],
       [  0.00000000e+00],
       [ -1.11022302e-16],
       [  0.00000000e+00]])

We have found two near solutions to the equation $H x = b$, x and sol. Then H(x-sol) should be nearly zero. Here we rescale x-sol so that its entry of largest absolute value is $1$. We store he rescaled version of x-sol as v.

In [40]:
v0 = x-sol
i = np.argmax(np.abs(v0))
v = v0 / v0[i]
v
Out[40]:
array([[  3.61034575e-04],
       [ -1.44664010e-02],
       [  1.39769746e-01],
       [ -5.44550907e-01],
       [  1.00000000e+00],
       [ -8.65269603e-01],
       [  2.84421143e-01]])

Now consider $H v$. We see that $Hv$ is a factor of roughly $10^{-8}$ closer to zero than $v$.

In [41]:
H @ v
Out[41]:
array([[  1.39696298e-08],
       [  4.94500171e-09],
       [  9.89000340e-10],
       [ -3.32242301e-09],
       [  1.61961293e-09],
       [ -5.46432053e-09],
       [ -2.56939811e-09]])

Because $Hv$ is very close to zero, it means that numerical solutions won't be that accurate. Indeed if $x$ is a solution to the equation $Hx=b$, then $H(x+v)$ is also very nearly equal to $b$. The matrix $H$ is ill-conditioned.