Talk on Chapter 6 Eigenvalues and Eigenvectors
(Linear Algebra Fall 1999)
Setup (Needed for using Maple, not for viewing)
Raw start (if without wqstools)
Consider a linear transformation
from a vector space
to itself and fix a basis
for
. We have seen that there is an associated n by n matrix
for
which gives us a formula to compute the coordinate vector
as
for any given vector
. The transformation itself can be then conveniently written as
.
We have also seen that if we change the basis to
for some n by n invertible matrix
, then the matrix for
changes to the matrix
which is said to be
similar
to
and we would like to investigate when this new matrix can be diagonal.
The columns of the matrix
turned out to be eigenvectors of
belonging to the corresponding eigenvalue on the resulting diagonal matrix. We study the details of these topics next.
We defined a vector
to be
an eigenvector for the matrix
if
is
nonzero
and
for some scalar
. The corresponding scalar
is said to be the eigenvalue that
belongs to.
How do we find eigenvectors and eigenvalues? First note that if we consider the matrix
, then
must be a nonzero vector in its null space. So, the matrix
must be singular and we know a very simple test for this, namely
. If we expand this determinant and simplify the expression, then it is easy to that we get a polynomial in
of degree
, where
has size n by n.
Indeed, the resulting polynomial is known as the characteristic polynomial
charpoly(A,lambda)
and is a builtin function. Actually, it is more customary to use the
where the two coefficients
and
have nice interpretations.
The coefficient
can be readily computed as the sum of all the main diagonal entries of
. You have already seen the formulas for the determinant.
Thus, all the eigenvalues are given by roots of the equation
and we know that this equation has n roots, at least if we go to complex numbers as needed and account for multiplicities.
We see some quick examples.
>
A:=matrix(2,2,[1,4,2,3]);charpoly:=factor(det(lambda*diag(1$2)-A));
eigenvals(A);
So, the eigenvalues are
.
>
A:=matrix(3,3,[.2,.4,0,.4,.8,0,0,0,1]);
charpoly:=factor(det(lambda*diag(1$3)-A));
eigenvals(A);
Because of the decimal points, maple went into numerical mode. However, the actual eigenvalues are simply 0,1. Note that 1 is a double eigenvalue! Maple can also find the eigenvectors, or you can use your skill with null spaces!
> eigenvects(A);
> nullspace(A-diag(1$3));
>
Thus, the calculation of the eigenvalues and eigenvectors is at least easy to describe, though it can be difficult in practice, since it involves solving a complicated determinant, then solving a polynomial (nonlinear) equation and then several null space calculations. We can also run into complex roots and numerical difficulties. It is because of these complications that this is considered a difficult task and there are alternate numerical procedures described in the book. We outline several tricks below.
For a given scalar
, we define
the
eigenspace
to be the set of all vectors
such that
or
.
Thus
is simply the nullspace of
. All nonzero vectors of
are eigenvectors of
belonging to
. Of course, unless
is an eigenvalue, the space
shall be the zero vector space and won't produce any eigenvector!
Consider two distinct scalars
and take bases for their eigenspaces to be
and
, respectively. We can show that the vectors together are still independent and give a basis for the sum of the two eigenspaces. To see this, we write a linear combination of all these vectors as
equated to 0, where
is the sum of the terms coming from {
} and
is the sum from the terms {
}.
We now get two equations:
and by applying the transformation
. It is easy to deduce that
and hence by hypothesis, all the coefficients are zero!
This easily generalizes to several eigenvectors to give us the result that if we take all the eigenvalues of a matrix
and put the bases of their eigenspaces together, then we get an independent set of vectors.
Diagonalization: when can we achieve it?
In particular, in the lucky situation when
has
distinct eigenvalues, then taking one eigenvector each, we get
independent vectors
, belonging to eigenvalues
respectively. Then setting
to be the matrix with columns
, we clearly see that
where
. Thus, the matrix
is diagonalizable!
We had already seen an example of a nondiagonalizable matrix where
. Here
is the only eigenvalue with multiplicity
, but the eigenspace
has dimension 1, with basis
.
There is also the case of complex roots, for example the matrix
, whose eigenvalues come out to be
the two square roots of
. Actually, this matrix is diagonalizable with
. You can check it. However, it cannot be diagonalized with any real matrix. Indeed, in real vectors, it has no eigenvector, since geometrically, it is a clockwise rotation about the origin by
radians and so no nonzero vector can be mapped to a multiple of itself!
Thus, we know two ways in which we can get stuck: if there are nonreal eigenvalues or if there are repeated eigenvalues.
Symmetric matrices are good (diagonalizable)!
If the matrix
is a symmetric matrix, then we can show that all eigenvalues are real and the matrix indeed is diagonalizable. We take this up next. What we present is one complete proof which is only briefly indicated in the book.
First, suppose that
is a symmetric matrix and has one real eigenvalue
, with a real eigenvector
, say. Let
be an orthonormal complement of the space spanned by
. Suppose that we choose an orthonormal basis
for
so that together with
, we get an orthonormal basis for all of
. We claim that with respect to the basis
, the matrix changes to
where
is a symmetric matrix operating on
.
Indeed, write the basis
, where
is the usual standard basis in
. Then from the orthonormality of the basis
, we see that the matrix
is itself orthogonal, i.e.
. Thus the matrix
transforms to
and is clearly a symmetric matrix. From the fact that
is nothing but
, we see that the first column of the transformed matrix is
. By symmetry of the transformed matrix, we easily see that it must have the announced shape!
Now, it is not hard to see that we can continue working on the new symmetric matrix
and deduce the final result that the matrix
can be diagonalized.
Thus it only needs to be proved that any eigenvalue of a (real) symmetric matrix is real. Suppose that
has a complex eigenvalue
and a corresponding complex eigenvector
. Then it also must have a conjugate eigenvalue
and a corresponding eigenvector
obtained by conjugating all the entries of
.
We clearly have
and
. This last equation gives, when transposed,
.
The quantity
can be computed as
times
to get the value
, but by computing it as
times
, we see it to be
. Since
is clearly nonzero, being the sum of squares of the absolute values of components of
, we see that
. Thus
is real! Now, we can always pick a real eigenvector for a real
, since it is simply a part of the basis for the null space of
.
More properties of symmetric matrices.
Symmetric matrices have many nice properties which we now collect together for reference.
We have already seen that all the eigenvalues of a (real) symmetric matrix are real. We can also show that eigenvectors belonging to different eigenvalues are automatically perpendicular to each other. For this, suppose that
is symmetric and has eigenvectors
belonging to different eigenvalues
respectively. Compute
in two different ways as above to get an equation
. Since
are distinct, we must have
or that
are perpendicular.
Thus, in order to get an orthonormal basis of eigenvectors of
, all we do is first pick orthonormal bases for the different eigenspaces and then just put them together in one combined basis. Getting an orthonormal basis for the eigenspace of a single eigenvalue
, all we do is pick some basis first and then make it orthonormal by the old Gram-Schmidt process, which we already know.
In particular, the diagonalizing matrix
can be chosen to be orthogonal, i.e.
, where
.
As a consequence, note that if the symmetric matrix has only one eigenvalue
, then it must be simply the scalar matrix
. This is because, by theory, A=
, but the last expression is simply
.
Symmetric matrices, inner products and related topics
Symmetric matrices can be used to define a kind of dot product or inner product on
. To understand this, let us review what properties a dot product should have.
Clearly a dot product should have
Symmetry:
Linearity: (
).
=
.
+
.
,
(
)=(
).
Properness:
and
iff
.
If we have the first two properties, then the usual algebra of dot products works and with the last property, we can define length and angles properly.
If we take any symmetric matrix
and define
, then the first two properties are easy to check. The last property of properness needs much more checking and this is what we are investigating.
The last property translates to two subproperties:
Nonnegativity:
and
Definiteness:
iff
.
We say that the matrix
is
positive definite, if it has both these subproperties.
It is
positive semidefinite, if only the nonnegativity holds.
Finally, we say that
is indefinite, if there are negative as well as positive self dot products.
We can change
to
and talk about negative definite and negative semidefinite as well!
First, we note that we can replace the matrix
by a similar matrix
where
without affecting the properties. But then
acquires a diagonal appearance and then we have very simple tests for either of these definiteness. Namely, if the diagonal entries are assumed to be
, then we have:
is positive definite iff all
are positive
is positive semidefinite iff all
are nonnegative
is said to be indefinite if we have mixed signs among the entries.
Similar tests are easily made for negative versions.
But the above test is expensive, for we need to build an orthonormal basis, which always involves lot of work. Calculating eigenvalues is no easy task either. We do know a simpler method already, which we now invoke.
We know that we can take our matrix
and write it as
where
is a diagonal matrix and
is an invertible lower trianglular matrix. We learnt this process while we did the alternate form of the Gram-Schmidt process in a special lecture. At that time, we built a symmetric matrix by taking dot products of a sequence of vectors, now we are starting with a symmetric matrix already!
If we look at the columns
of
, then we see that in our current dot product, they are a set of mutually perpendicular (meaning zero dot product) vectors. Thus, for the matrix to be positive definite, all self dot products must be positive, i.e., the diagonal entries must be all positive. Moreover it is not hard to see that for any combination
, the self dot product comes out to be
, where
denotes our current dot product. This is easily seen to be positve for nonzero vectors
.
Actually, there is an even simpler way of expressing this result. We can convince ourselves that the product of the first
diagonal entries
for the resulting matrix
is simply the first
by
determinant in
. (This presumes that we don't swap any rows along the way, and we can always arrange this to be the case). So, if this sequence of determinants can be calculated alternately, then we can state the test as checking that all these determinants are positive. These are sometimes called principle minors.
For example:
> A:=matrix(3,3,[2,-1,0,-1,2,-1,0,-1,2]);
>
[det(submatrix(A,1..1,1..1)),det(submatrix(A,1..2,1..2)),
det(submatrix(A,1..3,1..3))];
So,
is positive definite!
> A:=matrix(3,3,[2,-1,b,-1,2,-1,b,-1,2]);
>
[det(submatrix(A,1..1,1..1)),det(submatrix(A,1..2,1..2)),
det(submatrix(A,1..3,1..3))];
> factor(%);
>
So, we can say what values of
will keep this matrix positive definite or positive semidefinite or indefinite.
For
or for
, we will get indefinite,
for
or
, we will get positive semidefinite and positive definite otherwise!
Application to the critical points of multivariate functions
One of the main uses of this analysis is calculation of the behavior of a multivariate function near a critical point. To illustrate the method, consider a function
which has a zero at the origin and its first two partials
and
are also zero. Then a Taylor expansion gives a second order approximation
for small
. This is a quadratic form
and can be represented as a dot product
which uses the matrix
. Thus the test for positive definiteness is
and
. If you note that
,
and
, you will recognize the test for a local minimum.
Positive semidefiniteness corresponds to a minimum with some critical directions, while indefiniteness corresponds to a saddle point.
The concepts of negative definiteness etc. correspond to local maximum.
Similar tests are easily made in higher dimensions, since we already have the concept worked out in linear algebra.
The expression in
above is a quadratic form in two variables. In general, a quadratic form in several variables, gives a symmetric matrix where the square terms go along the main diagonal and product terms of two varibles go in the appropriate corners after division by
. We can perform the analysis of positive and negative values of the quadratic form in a similar fashion.
As already explained, we have no hope of finding a diagonal form for the matrix of a transformation from
to
, in case the two dimensions are different. Also it does not make sense to try and use the same basis for the domain as well as the target, since these don't even have the same dimension. So, we allow a choice of bases
for
and
for
and try to make the matrix as simple as possible. We have also indicated roughly how this is achieved. Now we describe it in detail.
First we show an example.
Consider a matrix
. This gives a transormation from
to
, given by
in the standard bases for respective spaces.
We compute the matrix
and find a basis of
, consisting of eigenvectors for the symmetric matrix
, which we know to exist now. We will see that the part of the basis corresponding to the eigenspace of
shall be a basis for the null space of
and the rest will map to a set of orthogonal vectors in the target space. Here are the calculations.
> A:=matrix(2,3,[1,1,0,0,1,1]);
> M:=evalm(transpose(A)&*A);
> lis:=eigenvects(M);
> S:=transpose(matrix(3,3,[lis[3][3][1],lis[2][3][1],lis[1][3][1]]));
> AA:=evalm(A&*S);
> evalm(transpose(AA)&*AA);
> evalm(transpose(S)&*S);
>
This shows that if we choose the vectors
given by the three columns of
are mutually perpendicular. So,
will give an orthonormal basis for
, such that
are mutually perpendicular and give a basis of
. We can set
as a basis of the target
and the resulting matrix has a nice diagonal form. Explicitly, we set the basis of
by the columns of
below and the the basis of
by the columns of
below. The matrix of the transformation is also shown.
> G:=evalm(S&*diag(1/sqrt(2),1/sqrt(6),1/sqrt(3)));
> H:=augment(col(AA,1)/norm(col(AA,1),2),col(AA,2)/norm(col(AA,2),2));
> evalm(A&*G);
> ans:=matrix(2,3,[1,0,0,0,sqrt(3),0]);
> evalm(H&*ans);
Let us record the whole procedure for future use.
Assume that we have a map from
to
. Assume that in the standard bases, it is given by a matrix
.
The matrix
is a symmetric n by n matrix. It can be diagonalized. Find an orthonormal basis
for
consisting of eigenvectors of
. Arrange the vectors so that the last
vectors are in the eigenspace of
, i.t. in the null space of
, where
is the rank of
. Call the resulting basis
.
Calculate
. Make an orthonormal basis for
consisting of
for
. Fill it up, if needed, by
vectors to complete an orthonormal basis for the
. Call the resulting basis
.
Then the matrix of the transformation in the bases
is given by
. Here
is a diagonal entries which are simply square roots of the eigenvalues of
, matching the corresponding vectors.
Thus, for our example, the final matrix is
.
The so-called pseudoinverse shall be
.
Let us do one more example.
> A:=matrix(3,3,[1,1,0,0,1,1,1,0,-1]);
> M:=evalm(transpose(A)&*A);
> lis:=eigenvects(M);
> S:=transpose(matrix(3,3,[lis[2][3][1],lis[2][3][2],lis[1][3][1]]));
The first two columns of
are from the same eigenvalue 3 and so need not be orthogonal. Indeed they are not.
We check their mutual dot products and take the one evident step of the Gram-Schmidt algorithm.
> evalm(transpose(S)&*S);
>
S1:=addcol(S,1,2,1/2);
S2:=evalm(transpose(S1)&*S1);
So, now we can build our
.
For
we take the three columns of
, which we normalize by dividing by the norm.
For
we take the images of the first two columns of
, normalized and we need to fill in one extra vector to make an orthonormal basis.
> AA:=evalm(A&*S1);
> evalm(transpose(AA)&*AA);
We picked the third column for
by inspection.
>
H:=augment(col(AA,1)/norm(col(AA,1),2),col(AA,2)/norm(col(AA,2),2),
matrix(3,1,[1/sqrt(3),-1/sqrt(3),-1/sqrt(3)]));
> evalm(transpose(H)&*H);
>
>
> G:=evalm(S1&*diag(1/sqrt(S2[1,1]),1/sqrt(S2[2,2]),1/sqrt(S2[3,3])));
> evalm(A&*G);
> evalm(transpose(%)&*%);
> ans:=matrix(3,3,[sqrt(3),0,0,0,sqrt(3),0,0,0,0]);
>
> evalm(H&*ans);
Thus the final matrix in the new bases
is
. This is verified by the identity
.
Finally, we explain the
pseudoinverse
of the matrix which is simply
. You shuld work out the spaces
as explained in the lecture on Chapter 7.
As explained above, we have a transformation
from
to
and we have found bases
respectively as follows.
First, let
be the matrix of
in some standard bases.
The basis
of
is
which is known to be orthonormal. Further, we have arranged it so that
form a set of orthonormal eigenvectors for
and
for
.
The orthonormal basis for
is
and is chosen so that
and is orthonormal. Then the matrix form
in the bases
is given by
where
is a diagonal matrix with diagonal entries
.
Then the pseudoinverse is given by first taking the transpose of
and then replacing the entries
by their inverses
. Denote this by
.
Then the transformation given by the matrix
gives the projection of
into the space
in the basis
. The reverse product
gives the projection of
into
in the basis
.