We briefly discuss some algorithms for finding some eigenvalues and eigenvectors of square matrices.
To read more about this topic see § 4.6 of TAK, which we follow somewhat.
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
import math as m
from mpmath import mp, iv
from scipy import linalg
We review the basic definitions related to eigenvalues and eigenvectors. Any textbook in linear algebra will give a more expansive explaination of these concepts.
Let $A$ be a $n \times n$ matrix. A non-zero vector $v \in {\mathbb R}^n$ is an eigenvector if there is a real number $\lambda$ such that $A v = \lambda v$. This number $\lambda$ is called an eigenvalue of $A$ and is eigenvalue associated to $v$.
An eigenvector determines an eigenvalue. In the other direction, if you have an eigenvalue $\lambda$, you can determine all the associated eigenvectors by solving the equation $A v = \lambda v$. This can be written as the matrix equation $$(A - \lambda I) v = 0.$$ The set of solutions to this equation forms a subspace of ${\mathbb R}^n$ called the eigenspace of $\lambda$. This eigenspace consists of the zero vector and all eigenvectors whose eigenvalue is $\lambda$.
From the fact that we are only considering non-zero vectors $v$, it can be seen that eigenvalues are the solutions to the characteristic equation: $$\det(A-\lambda I)=0.$$ It follows that an $n \times n$ matrix can have at most $n$ eigenvalues. The function of $\lambda$, $\det(A-\lambda I)$ is a polynomial and the roots are the eigenvalues. The algebraic multiplicity of an eigenvalue $r$ is the number of times the term $\lambda-r$ can be factored out of this polynomial. The geometric multiplicity of an eigenvalue is the dimension of the associated eigenspace. The geometric multiplicity of an eigenvalue is always less than or equal its algebraic multiplicity.
Remark. Since a non-zero scalar multiple of an eigenvector is an eigenvector, it is common to look for eigenvectors with length one.
An eigenvalue of a matrix $A$ is a called dominant if its absolute value is larger than the absolute values of the other eigenvalues of $A$. We will first consider trying to find a dominant eigenvalue.
Diagonal matrices are the simplest to understand. Let $D$ be the matrix $$D=\left(\begin{matrix} d_0 & 0 & \ldots & & \ldots & 0 \\ 0 & d_1 & 0 & & & \vdots \\ 0 & 0 & d_2 & 0 & \\ \vdots & & \ddots & \ddots & \ddots & \vdots \\ \vdots & & & 0 & d_{n-2} & 0 \\ 0 & \ldots & & & 0 & d_{n-1} \\ \end{matrix}\right).$$
Observe that if $e^k$ is the standard basis vector, which has all zero entries except in position $k$, where there is a one. Then we have $$D e^k = d_k e^k,$$ so $e^k$ is an eigenvector with eigenvalue $d_k$.
It follows that the set of eigenvalues is the set of diagonal elements. The eigenspace associated to an eigenvalue $\lambda$ is $$\textrm{span}~\{e^k:~d_k=\lambda\}.$$ (So typically, the eigenspace of $d_j$ is the set of all scalar multiples of $e_j$. But, for example if $i \neq j$ but $d_i=d_j$, then the eigenspace will include vectors of the form $a e^i+b e^j$.)
Iteration means repeatedly applying an operation. We will be repeatedly applying matrix multiplication.
Let $D$ be the diagonal matrix above, and let $x \in {\mathbb R}^n$ be a column vector. Iteratively multiplying by $D$ produces the sequence of vectors defined by $x_0=x$ and $$x^{k+1} = D x^{k} \quad \text{for integers $k \geq 0$}.$$ Since $D$ is a diagonal matrix, we can see that $$x^k=\left(\begin{matrix} d_0^k x_0 \\ d_1^k x_1 \\ \vdots \\ d_{n-1}^k x_{n-1}\end{matrix}\right).$$ Observe that the $j$-th entry of $x^k$ grows exponentially with constant $d_j$.
Assuming all entries of our initial vector $x$ are non-zero, observe that for $k$ very large, the entry of $x^k$ with largest absolute value will be the entry $j$ where $|d_j|$ is largest. (This $d_j$ is the dominant eigenvalue.) To see this clearly, we'll do an experiment. To keep entries from blowing up, we'll rescale our vectors to make them unit length.
# A diagonal matrix.
# Note entry with largest absolute value is in the second position.
D = np.diag([-1, -4, 0.25, 3])
print(D)
# Build a vector with random entries in (-1, 1)
x = 2*np.random.random(4)-1
x
We'll arrange the $x^k$ vectors we compute under interation in a list x_list
. Here x_list[k]
will store $x^k$, except we will rescale all our vectors to have unit length.
The norm of a vector $x$ can be computed with
np.linalg.norm(x)
Dividing by this norm gives a unit vector. (We are doing this rescaling to keep entries from blowing up.)
x_list = []
x_list.append(x/np.linalg.norm(x))
x_list
temp = x_list[0]
temp
np.linalg.norm(temp)
m.sqrt(temp[0]**2 + temp[1]**2 + temp[2]**2 + temp[3]**2)
We will compute up to $x^{20}$:
for k in range(0,200):
x_k = x_list[k]
x_next = D @ x_k
x_list.append(x_next / np.linalg.norm(x_next))
x_list
The last entry printed above is $x^{20}$. Observe that the entry with the largest absolute value is the second entry, and the other entries are very close to zero. Thus, we have found an eigenvector of the form $(0,\pm 1,0,0)$.
Since $x^{20}$ is a unit vector close to an eigenvector, the associated eigenvalue can be (nearly) recovered by computing $$x^{20} \cdot (D x^{20}).$$ (This works because $D x^{20} \approx \lambda x^{20}$, so $$x^{20} \cdot (D x^{20}) \approx x^{20} \cdot (\lambda x^{20}) = \lambda (x^{20} \cdot x^{20})= \lambda.$$
x20 = x_list[20]
x20
eigvalue = x20 @ (D @ x20)
eigvalue
Thus, we have an approximate eigenvalue--eigenvector pair (eigvalue, x20)
.
Of course, it is easy to find eigenvectors in a diagonal matrix. We looked at them because the method we used above makes it easy to see how iteration works there.
A slightly more general setting is the case of diagonalizable matrices. A $n \times n$ matrix $A$ is diagonalizable if it has the form $$A = P D P^{-1},$$ where $P$ is an invertible $n \times n$ matrix and $D$ is a diagonal matrix as above. In this case, the diagonal entries $d_j$ of $D$ are eigenvalues associated to eigenvectors of the form $v^j = P e^j$. This is because $$A v^j = P D P^{-1} P e^j = P D e^j = P d_j e^j = d_j P e^j = d_j v^j.$$
For a diagonalizable matrix, we can apply the same algorithm to find the dominant eigenvalue (the eigenvalue with larger absolute value than any other).
It is a well-known and very useful fact that symmetric matrices are diagonalizable. A symmetric matrix is a matrix $A$ which equals its transpose.
We will use this to give an example of an $n \times n$ matrix with $n=4$.
n = 4
We computed a random $4 \times 4$ matrix $A$ with entries in the interval $(-1,1)$ using the command
A = (2*np.random.random(n**2)-1).reshape(4,4)
Below we show what we ended up with:
(2*np.random.random(n**2)-1).reshape(4,4)
A = np.array([[-0.25171678, 0.58924155, -0.32522934, 0.92260052],
[ 0.58924155, 0.99124496, -0.85124213, -0.93359258],
[-0.32522934, -0.85124213, 0.74766801, -0.79890941],
[ 0.92260052, -0.93359258, -0.79890941, 0.66430218]])
A
To make the matrix symmetric, we fix it by setting $a_{j,i}=a_{i,j}$ for $j>i$.
for i in range(n):
for j in range(i+1,n):
A[j,i]=A[i,j]
A
Now the matrix is symmetric. We also choose a random initial vector using
x = 2*np.random.random(n)-1
Again, we just write what we computed below.
x = np.array([-0.79842236, -0.50141247, 0.21444716, 0.64910438])
x
Now we will iterate it for a while, continually rescaling so that the vector has unit length. We define $x_0 = x / \|x\|$ and then define $$x_{k+1} = \frac{1}{\|A x_k\|} A x_k \quad \text{for $k \geq 0$}.$$ We'll repeat this operation $100$ times, discarding intermediate values.
k = 0
x_k = x / np.linalg.norm(x)
for i in range(1000):
temp = A @ x_k
k += 1
x_k = temp / np.linalg.norm(temp)
print("x_{} = {}".format(k, x_k))
Note that because we are continually dividing by the norm, $x_{100}$ is a unit vector:
np.linalg.norm(x_k)
Since $\|x_k\|=1$, assuming $x_k$ is close to a eigenvector, the associated eigenvalue can be computed with:
eigvalue = x_k @ A @ x_k
eigvalue
To verify that $x_{100}$ is nearly an eigenvector with eigenvalue $\lambda$, we could compute $$A x^{100} - \lambda x^{100}$$ and check that all entries are small. We do that below:
A @ x_k - eigvalue * x_k
The small values above suggest that (eigvalue, x_k)
is approximately an eigenvalue–eigenvector pair.
We can improve it further by iterating more.
for i in range(100):
temp = A @ x_k
k += 1
x_k = temp / np.linalg.norm(temp)
print("x_{} = {}".format(k, x_k))
eigvalue = x_k @ A @ x_k
eigvalue
A @ x_k - eigvalue * x_k
The above difference is even smaller. Continuing further:
for i in range(200):
temp = A @ x_k
k += 1
x_k = temp / np.linalg.norm(temp)
print("x_{} = {}".format(k, x_k))
eigvalue = x_k @ A @ x_k
eigvalue
A @ x_k - eigvalue * x_k
The above difference is on the order of round off error, so we can't expect to do much better. We store off the eigenvector:
eigenvector = x_k
eigenvector
Lets look at the error in computation of eigenvalues and eigenvectors. Here we make a list of approximate eigenvalues, computed as $$x_k \cdot (A x^k).$$ We'll store these in a list.
k_list = []
approx_eigenvalues = []
approx_eigenvectors = []
k = 0
x_k = x / np.linalg.norm(x)
k_list.append(k)
approx_eigenvalues.append(x_k @ A @ x_k)
approx_eigenvectors.append(x_k)
for i in range(100):
temp = A @ x_k
k += 1
x_k = temp / np.linalg.norm(temp)
k_list.append(k)
approx_eigenvalues.append(x_k @ A @ x_k)
approx_eigenvectors.append(x_k)
Below we compute the error in our eigenvalue approximations as a function of $k$.
# Convert k_list and approx_eigenvalues to numpy arrays
k_list = np.array(k_list)
approx_eigenvalues = np.array(approx_eigenvalues)
plt.plot(k_list, eigvalue - approx_eigenvalues, "ob", label="approximate eigenvalues")
plt.xlabel("k")
plt.ylabel("eigenvalue approximate")
plt.title("Plot of eigenvalue approximates")
plt.show()
To better see what is going on, we can plot the above graph on a log scale.
# Convert k_list and approx_eigenvalues to numpy arrays
k_list = np.array(k_list)
approx_eigenvalues = np.array(approx_eigenvalues)
plt.plot(k_list, eigvalue - approx_eigenvalues, "ob", label="approximate eigenvalues")
plt.xlabel("k")
plt.ylabel("eigenvalue approximate")
plt.title("Plot of eigenvalue approximates (log scale)")
plt.yscale("log")
plt.show()
The fact that the graph is asymptotic to a downward sloping line means that our eigenvalues approximates are approaching the true eigenvalue exponentially quickly (with exponent related to the slope of the line).
Now we turn our attention to our approximate eigenvectors $x_k$. Since these values are approaching eigenvector
, we can look at the norm of the difference of eigenvector
and $x^k$. This is our error in eigenvector approximation.
eigenvector_error = []
for x_k in approx_eigenvectors:
eigenvector_error.append(np.linalg.norm(eigenvector - x_k))
# Convert to a numpy array:
eigenvector_error = np.array(eigenvector_error)
We plot these errors:
plt.plot(k_list, eigenvector_error, "ob", label="eigenvector error")
plt.xlabel("k")
plt.ylabel("eigenvalue approximate")
plt.title("Plot of eigenvector errors")
plt.show()
We switch to a log scale:
plt.plot(k_list, eigenvector_error, "ob", label="eigenvector error")
plt.xlabel("k")
plt.ylabel("eigenvalue approximate")
plt.title("Plot of eigenvector errors (log scale)")
plt.yscale("log")
plt.show()
Again, we can see that the error is decaying exponentially.
Here we rigorously spell out the algorithm we have been describing above.
It is useful to know that a matrix $A$ and its transpose $A^T$ are conjugate. (That is there is an invertible matrix $P$ such that $A^T=P A P^{-1}$.) Conjugate matrices have the same eigenvalues and the same multiplicity of eigenvectors.
Remark: The condition that $w \cdot x=0$ means you can't use just any initial vector $x$. Also, of course we don't want to assume we know $w$. But, it is very unlikely if you choose $x$ at random that $w \cdot x=0$. This is why we have been using random vectors for $x$.
The algorithm in the Theorem above is basically the same as the Algorithm 8 of TAK § 4.6.
Suppose $A$ is invertible. Observe that if $A v = \lambda v$ then $$A^{-1} v = \frac{1}{\lambda} v.$$ This means that we can apply the power method with $A^{-1}$ to find the eigenvalue with smallest absolute value.
However, instead of using the inverse, we use $LU$ factorization. The algorithm is spelled out in Algorithm 9 of TAK § 4.6.
Let's find the smallest eigenvalue of the matrix $A$ as above.
A
First we compute the $LU$ factorization. We'll use the linalg
module from scipy
, and use a slightly different function so we can make use of the packages abilities to solve equations.
LU_data = linalg.lu_factor(A)
LU_data
Our variable LU_data
stores the information of the $PLU$-factorization. The pair consists of a matrix storing $L+U-I$ and a vector representing the permutation matrix $P$. You can find out more by typing the following command into jupyter:
linalg.lu_factor?
Recall that we use $LU$-factorization to efficiently solve the equation $Ay=b$ for $y$. The linalg
module in scipy
has a function that does this for us. For example:
b = np.array([1, 2, 3, 4])
y = linalg.lu_solve(LU_data, b)
y
To check that it worked observe:
A @ y
Here we apply the power method to find the smallest eigenvalue and eigenvector pair. We'll start with the same random vector $x$ as before.
x
We take $x^0$ to be $x$ rescaled to have unit length:
k = 0
x_k = x / np.linalg.norm(x)
Then we iterate backward $100$ times.
for i in range(100):
temp = linalg.lu_solve(LU_data, x_k)
k += 1
x_k = temp / np.linalg.norm(temp)
print("x_{} = {}".format(k, x_k))
Again because x_100
is a unit vector, we can find the associated eigenvalue by computing the following:
eigvalue = x_k @ A @ x_k
eigvalue
To see that $x_100$ is an approximate eigenvalue observe:
A @ x_k - eigvalue * x_k
The above is so small that the quantity above looks like round off error.
A
So that we can use it later, we organize the above algorithm into a function. This essentially implements Algorithm 9 of TAK.
def smallest_magnitude_eigenvalue(A, epsilon):
''' This function returns a pair consisting of an eigenvalue and an
eigenvector. The eigenvalue returned is the eigenvalue of smallest
absolute value. We use the power method and LU-factorization iterating
until two consequtive eigenvalue estimates are within epsilon of each other.
'''
n = A.shape[0]
assert A.shape == (n,n), "A should be a square matrix."
x = 2*np.random.random(n)-1
LU_data = linalg.lu_factor(A)
k = 0
x_k = x / np.linalg.norm(x)
eigenvalue_old_est = float('inf')
eigenvalue_est = x_k @ A @ x_k
while abs(eigenvalue_est - eigenvalue_old_est) >= epsilon:
temp = linalg.lu_solve(LU_data, x_k)
k += 1
x_k = temp / np.linalg.norm(temp)
eigenvalue_old_est = eigenvalue_est
eigenvalue_est = x_k @ A @ x_k
return eigenvalue_est, x_k
We test it using the same matrix as before.
eigenvalue, eigenvector = smallest_magnitude_eigenvalue(A, 10**-8)
eigenvalue
A @ eigenvector - eigenvalue * eigenvector
It seemed to work.
To find other eigenvalues we can use the following trick. If $A$ is a matrix and $A v = \lambda v$ then we also have $$(A - cI) v = (\lambda-c)v.$$ That is $A$ and $A-c I$ have the same eigenvectors, and eigenvalues are shifted by subtracting $c$.
Let's use this to find the other eigenvalues of $A$. We first found a largest magnitude eigenvalue–eigenvector pair:
eigvalue_0 = 1.9499535745247238
vect_0 = np.array([-0.37108228, -0.55544271, 0.68572761, -0.28906573])
We found the smallest magnitude pair above. We repeat the calculation here:
eigvalue_1, vect_1 = smallest_magnitude_eigenvalue(A, 10**-12)
eigvalue_1
Since real symmetric matrices have all real eigenvalues, the other eigenvalues must be in the union of intervals $[-1.949\ldots, -0.1291\ldots] \cup [0.1291\ldots, 1.949\ldots]$. We can look for eigenvalues in the center of these intervals. First we set c
to be the center of the interval $[-1.949\ldots, -0.1291\ldots]$:
c = (-eigvalue_0 + eigvalue_1)/2
c
To find the eigenvalue, eigenvector pair nearest to $c$, we can find the smallest eigenvalue for $A-cI$.
temp_eigvalue, vect = smallest_magnitude_eigenvalue(A - c*np.identity(4), 10**-12)
Because of the shift, the actual eigenvalue should be obtained by adding $c$.
eigvalue = temp_eigvalue + c
eigvalue
Let's check if it worked.
A@vect - eigvalue*vect
These numbers are small, so it seemed to work. Let's store off our eigenvalue–eigenvector pair.
eigvalue_2 = eigvalue
vect_2 = vect
Let's check the other interval $[0.1291\ldots, 1.949\ldots]$ now. We set the midpoint to $c$:
c = (eigvalue_0 - eigvalue_1)/2
Now we look for the eigenvalue closest to $c$.
temp_eigvalue, vect = smallest_magnitude_eigenvalue(A - c*np.identity(4), 10**-12)
eigvalue = temp_eigvalue + c
eigvalue
A@vect - eigvalue*vect
Again it seemed to work. We found a final eigenvalue eigenvector pair:
eigvalue_3 = eigvalue
vect_3 = vect
Since $A$ is $4 \times 4$, we found all the eigenvalue–eigenvector pairs. To check our work, we verify that we have diagonalized $A$. If we create a diagonal matrix $D$ with entries given by the eigenvalues and a matrix $P$ with columns given by the eigenvectors (in the same order), we should have the equation $$A = P D P^{-1}.$$
D = np.diag([eigvalue_0, eigvalue_1, eigvalue_2, eigvalue_3])
P = np.array([vect_0, vect_1, vect_2, vect_3]).transpose()
To check that the equation is satisfied, we can check that $$AP = PD.$$ (Though, this doesn't check that $P$ is invertible.) To check this equation, we observe that $AP - PD$ has all small entries.
A @ P - P @ D
It seems to work! The easiest way to check that $P$ is invertible is to see that the columns of $P$ are orthonormal. By construction the columns are unit vectors. To see they are pairwise orthogonal we compute their dot products.
for i in range(4):
for j in range(i+1, 4):
print(P[:,i] @ P[:,j])
All dot products of columns are nearly zero, so the columns form an (approximate) orthonormal basis for ${\mathbb R}^4$. We conclude that $P$ is invertible. (Any collection of non-zero pairwise orthogonal vectors is linearly independent, since the projection of any vector to the span of the others is the zero vector.)
Remark: It is a theorem that symmetric matrices are orthogonally diagonalizable. If all eigenvalues of a symmetric matrix have algebraic multiplicity one, then whenever $A=PDP^{-1}$, the matrix $P$ will have orthogonal columns.