SVD Project

Go back to «Goal«

Goal

The goal of this project is to learn to use Matlab or Octave to work with the SVD of a matrix. We will cover low-rank approximations, a sort of “super” least squares idea. While I recommend learning to use GNU Octave on your own computer, you may find this online web-form copy to be handy for this project.

The players

Let \(A = \bmat{ 0.84 & 1.12 & 0.20 \\ 1.08 & 1.44 & -2.60 \\ 1.92 & 2.56 & -2.40 \\ 2.04 & 2.72 & -3.80 \\ 3.72 & 4.96 & -3.40 \\ }\) and let \(\vec{b} = \bmat{ 2.2 \\ 6.0 \\ 8.1 \\ 9.6 \\ 14.2 }\).

We type these into matlab as: format short g; A = [ 0.84, 1.12, 0.20; 1.08, 1.44, -2.60; 1.92, 2.56, -2.40; 2.04, 2.72, -3.80; 3.72, 4.96, -3.40 ], b = [ 2.2; 6.0; 8.1; 9.6; 14.2 ]

A =
        0.84        1.12         0.2
        1.08        1.44        -2.6
        1.92        2.56        -2.4
        2.04        2.72        -3.8
        3.72        4.96        -3.4

b =

         2.2
           6
         8.1
         9.6
        14.2

The rank one approximation

A rank one matrix is basically a vector, though it may have many rows and columns, every row is a multiple of one “true” row, and every column is a multiple of one “true” column. Every rank one matrix can be written in the form \(\vec{u} \vec{v}^T\) where \(\vec{u}\) is the one true column and \(\vec{v}\) describes the multiples of that column that are used in the other columns (and weirdly, \(\vec{v}^T\) is the one true row, and \(\vec{u}\) describes the multiples of that row that are used in the other rows).

The spectral decomposition (SVD) of \(A\) writes \(A\) as a sum of rank one matrices where all the \(\vec{u}_i\) are orthogonal and all the \(\vec{v}_i\) are orthogonal. In class, I allowed \(\vec{u}_i\) and \(\vec{v}_i\) to have arbitrary length, but Matlab will always scale them to length 1, and use an extra scalar \(\sigma_i\) to describe their combined length: \[ A = \sum \vec{u}_i \sigma_i \vec{v}_i \]

We can use the spectral decomposition (SVD) of \(A\) to solve \(A\vec{x}=\vec{b}\).

Type this into octave: [u,s,v] = svd(A,0)

u =
        -0.1        -0.5    -0.56565
        -0.3         0.5     0.41278
        -0.4  -6.1637e-17   -0.075052
        -0.5         0.5     -0.6018
        -0.7        -0.5     0.37664

s =

Diagonal Matrix

          10           0           0
           0           2           0
           0           0  3.5605e-16

v =

       -0.48       -0.36         0.8
       -0.64       -0.48        -0.6
         0.6        -0.8          -0

We ignore the vectors with \( \sigma_3 = 0 \), so we define:

\( \vec{u}_1 = \bmat{ 0.1 \\ 0.3 \\ 0.4 \\ 0.5 \\ 0.7 }\); \( \vec{u}_2 = \bmat{ 0.5 \\ -0.5 \\ 0 \\ -0.5 \\ 0.5 } \); \( \left[ \begin{array}{rcl} \sigma_1 = 10 \\ \sigma_2 = \phantom{1}2 \end{array}\right] \); \( \vec{v}_1 = \bmat{ 0.48 \\ 0.64 \\ -0.60 } \); \( \vec{v}_2 = \bmat{ 0.36 \\ 0.48 \\ 0.80 } \).

The \( \vec{u}_i \) are an orthonormal basis of \( \operatorname{Col}(A) \), and the \( \vec{v}_i \) are an orthonormal basis of \( \operatorname{Row}(A) \):

\( \|\vec{u}_1\|=1 \); \( \|\vec{u}_2\|=1 \); \( \vec{u}_1 \cdot \vec{u}_2=0 \); \( \|\vec{v}_1\|=1 \); \( \|\vec{v}_2\|=1 \); \( \vec{v}_1 \cdot \vec{v}_2=0 \).

How close is the rank one approximation to A?

Compute the rank-one approximation \( A_1 = \vec{u}_1 \sigma_1 \vec{v}_1^T \).

A1 = u(:,1)*s(1,1)*v(:,1)'

A1 =
        0.48        0.64        -0.6
        1.44        1.92        -1.8
        1.92        2.56        -2.4
         2.4         3.2          -3
        3.36        4.48        -4.2

Mathematically, \( A_1 = \vec{u}_1 \sigma_1 \vec{v}_1^T = \bmat{ 0.48 & 0.64 & -0.60 \\ 1.44 & 1.92 & -1.80 \\ 1.92 & 2.56 & -2.40 \\ 2.40 & 3.20 & -3.00 \\ 3.36 & 4.48 & -4.20 \\ } \).

How close means to measure \( A - A_1 =\bmat{ 0.36 & 0.48 & 0.80 \\ -0.36 & -0.48 & -0.80 \\ 0.00 & 0.00 & -0.00 \\ -0.36 & -0.48 & -0.80 \\ 0.36 & 0.48 & 0.80 \\ } \). In two important ways, this matrix has length 2.

norm(A-A1),norm(A-A1,'fro')

ans =          2
ans =          2

Compute x₁, A₁x₁, and Ax₁

Compute \( \displaystyle \vec{x}_1 = \frac{\vec{b} \cdot \vec{u}_1}{\sigma_1} \vec{v}_1 \).

x1 = (b'*u(:,1))/s(1,1)*v(:,1)

Mathematically, \( \displaystyle \vec{x}_1 = \frac{\vec{b} \cdot \vec{u}_1}{\sigma_1} \vec{v}_1 = \frac{20}{10} \vec{v}_1 = \bmat{ 0.96 \\ 1.28 \\ -1.20 } \)

What is \( A_1 \vec{x}_1 \)? What is \( A\vec{x}_1 \)?

A1*x1, A*x1

In other words, \( \displaystyle A_1 \vec{x}_1 = \left( \vec{u}_1 \sigma_1 \vec{v}_1^T \right) \left( \frac{\vec{b} \cdot \vec{u}_1}{\sigma_1} \vec{v}_1 \right) = \frac{ \vec{b} \cdot \vec{u}_1 \sigma_1 }{\sigma_1} \vec{u}_1 (\vec{v}_1^T \vec{v}_1) = (\vec{b} \cdot \vec{u}_1) \vec{u}_1 = 2\vec{u}_1 = \bmat{ 2\\ 6\\ 8\\ 10\\ 14} \)

How close is Ax₁ to b?

How close is \( A_1 \vec{x}_1 \) to \( \vec{b} \)?

A1*x1-b, norm(A1*x1-b)

ans =
        -0.2
  4.4409e-15
        -0.1
         0.4
        -0.2

ans =        0.5

\( \vec{b}-A_1 \vec{x}_1 = \bmat{ 0.2\\0\\0.1\\-0.4\\0.2 } \) has length \( \|A_1 \vec{x}_1 - \vec{b}\|= 0.5 \)

Rank two approximations

The rank one approximation basically replaced the matrix \(A\) by either \(\vec{u}_1\) or \(\vec{v}_1\) depending on how you viewed it. Solving \(A\vec{x} = \vec{b}\) was as simple as \(\vec{x} = \vec{b}^T \cdot \vec{v}_1 / \sigma_1\) and multiplying \( A\vec{y} \) simply became \( \vec{u}_1 ( \sigma_1 \vec{v}_1^T \cdot \vec{y} ) \). These are incredibly simple operations, but they are only somewhat accurate. If we need more accuracy, we need to use a higher dimensional, a higher rank, approximation to \(A\).

The rank two approximation of \(A\) is simply the sum of two rank one approximations: \( \vec{u}_1 \sigma_1 \vec{v}_1^T + \vec{u}_2 \sigma_2 \vec{v}_2^T \) where \( \sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r \) are the singular values in decreasing order.

Compute the rank 2 approximation:

A2 = A1 + u(:,2)*s(2,2)*v(:,2)'

A2 =
        0.84        1.12         0.2
        1.08        1.44        -2.6
        1.92        2.56        -2.4
        2.04        2.72        -3.8
        3.72        4.96        -3.4

Alternatively, you can use the diagonal matrix \(\sigma\) to write the result as matrix multiplication (this is called “factoring” a matrix). \[A_2 = \bmat{ \uparrow & \uparrow \\ \vec{u}_1 & \vec{u}_2 \\ \downarrow & \downarrow \\ } \cdot \bmat{ \sigma_1 & 0 \\ 0 & \sigma_2 } \cdot \bmat{ \leftarrow & \vec{v}_1 & \rightarrow \\ \leftarrow & \vec{v}_2 & \rightarrow }\]

u(:,1:2)*s(1:2,1:2)*v(:,1:2)'

ans =
        0.84        1.12         0.2
        1.08        1.44        -2.6
        1.92        2.56        -2.4
        2.04        2.72        -3.8
        3.72        4.96        -3.4

Note this matrix is even closer to \(A\)

A-A2, norm(A-A2)

ans =
  1.1102e-16  2.2204e-16  -1.0547e-15
  -2.2204e-16  -8.8818e-16  1.3323e-15
  -2.2204e-16  -4.4409e-16           0
  -4.4409e-16  -1.3323e-15           0
  -4.4409e-16  -8.8818e-16  1.3323e-15

ans = 2.6782e-15

The distance between \(A\) and \(A_2\) is \(2.678 \times 10^{-15}\), which is close enough to zero for homework.

Finding x₂ and Ax₂

Now that we have written \(A\) as basically two vectors, \( \vec{u}_1, \vec{u}_2 \) or \( \vec{v}_1, \vec{v}_2 \) depending on your mood, we can efficiently solve \( A\vec{x} = \vec{b} \) by writing \( \vec{b} \) in terms of \( \vec{u}_1, \vec{u}_2 \) and then \( \vec{x} \) in terms of \( \vec{v}_1, \vec{v}_2 \).

b1 = (b'*u(:,1)),
b2 = (b'*u(:,2)),
x2 = x1 + b2/s(2,2)*v(:,2)

b1 =        -20
b2 =       -0.4
x2 =
       1.032
       1.376
       -1.04

That is, \[ \vec{b} \approx (\vec{b}^T \cdot \vec{u}_1) \vec{u}_1 + (\vec{b}^T \cdot \vec{u}_2) \vec{u}_2 \] and \[ \vec{x} = \frac{(\vec{b}^T \cdot \vec{u}_1)}{\sigma_1} \vec{v}_1 + \frac{(\vec{b}^T \cdot \vec{u}_2)}{\sigma_2} \vec{v}_2 \] so that \( A\vec{x} = \ldots \)

A*x2, b1*u(:,1) + b2*u(:,2)

That is, \[ \begin{array}{rcl} A\vec{x} &=& \left( \vec{u}_1 \sigma_1 \vec{v}_1^T + \vec{u}_2 \sigma_2 \vec{v}_2^T \right) \cdot \left( \frac{(\vec{b}^T \cdot \vec{u}_1)}{\sigma_1} \vec{v}_1 + \frac{(\vec{b}^T \cdot \vec{u}_2)}{\sigma_2} \vec{v}_2 \right) \\ &=& \vec{u}_1 \frac{ \sigma_1 \left( \vec{v}_1^T \cdot \vec{v}_1\right) \left(\vec{b}^T \cdot \vec{u}_1\right)}{\sigma_1} \\ &&{}+ \vec{u}_1 \frac{ \sigma_1 \left( \vec{v}_1^T \cdot \vec{v}_2\right) \left(\vec{b}^T \cdot \vec{u}_2\right)}{\sigma_2} \\ &&{}+ \vec{u}_2 \frac{ \sigma_2 \left( \vec{v}_2^T \cdot \vec{v}_1\right) \left(\vec{b}^T \cdot \vec{u}_1\right)}{\sigma_1} \\ &&{}+ \vec{u}_2 \frac{ \sigma_2 \left( \vec{v}_2^T \cdot \vec{v}_2\right) \left(\vec{b}^T \cdot \vec{u}_2\right)}{\sigma_2} \\ &=& ( \vec{b}^T \cdot \vec{u}_1) \vec{u}_1 + 0 + 0 + (\vec{b}^T \cdot \vec{u}_2) \vec{u}_2 \end{array} \]

How close is the b to Ax₂?

We know that \( A\vec{x}_2 = ( \vec{b}^T \cdot \vec{u}_1) \vec{u}_1 + (\vec{b}^T \cdot \vec{u}_2) \vec{u}_2 \) is the projection of \( \vec{b} \) onto \( \{ \vec{u}_1, \vec{u}_2 \} \), so \( \vec{b} \) should be close. How close is \(\vec{b}\) to its rank-two approximation?

res = A*x2 - b, norm(res)

res =
  4.4409e-16
        -0.2
        -0.1
         0.2
  3.5527e-15

ans =        0.3

So we have not yet gotten \( \vec{b} \) exactly. We are still \( 0.3 \) away; that distance is called the “residual”. Is there any way to get closer? Not with \( A \)! Let's look at the columns of \( A \) versus the residual.

res'*A

ans =
  2.5207e-14  3.3653e-14  -3.2019e-14

Notice that \( \vec{res}^T \cdot \vec{a}_i = 0 \) for each column \( \vec{a}_i \) of \(A\). The residual is orthogonal to the column space of \(A\). There is nothing we can do to closer to \( \vec{b} \).

Other x

Could we improve \( \vec{x} \) in some other way? Well, what happens if we try \( x_3 = x_2 + \bmat{ 0.8 \\ -0.6 \\ 0 } \)?

x3 = x2 + [0.8;-0.6;0], A*x3-b

Well we do get a different \(\vec{x}\) but we don't really get a different residual: \( A\vec{x}_2 = A\vec{x}_3 \). How do we decide which \( \vec{x}\) is better? For some applications, both \( \vec{x} \) are fine. Sometimes however, bigger \( \vec{x} \) are more expensive, and if we aren't even going to get \( \vec{b} \) exactly, we certainly don't want to overpay for the privilege. Which \( \vec{x} \) is smaller?

norm(x2), norm(x3)

ans =       2.01
ans =      2.245

Looks like \( \vec{x}_2\) is smaller and gets the job done just as well (which is \(0.3\) away from actually getting the job done, no offense \( \vec{x}_2 \)). Luckily this smallest possible \( \vec{x} \) is always the one SVD will give us.

Conclusion

The SVD replaces a matrix with a set of orthonormal vectors, either \( \vec{u}_i \) or \( \vec{v}_i \) depending on your perspective. To solve \( A\vec{x} = \vec{b} \) we write \( \vec{b} \) in terms of the \( \vec{u}_i \) and \( \vec{x} \) in terms of the \( \vec{v}_i \). In fact the coordinates for \( \vec{x} \) and \( \vec{b} \) are almost the same: you just divide the coordinates of \( \vec{b} \) by the numbers \( \sigma_i \).

If you only use one pair of vectors, \( \vec{u}_1 \sigma_1 \vec{v}_1^T \), then you should probabily use the one with the largest \( \sigma_1 \). In this case you get the rank-one approximation to \( A \). If you use the largest \(k\) such \(\sigma_i\) then you get the rank \(k\) approximation. Solving now requires \( k \) divisions and \( 2k \) dot products instead of just 1 and 2, but the answers will be closer.

The SVD gives you the ability to trade accuracy for simplicity. A higher dimensional approximation will be more accurate, but take longer and may be harder to understand. Facial recognition software tends to take 100000 dimensional vectors and reduces them to 50 dimensional vectors. My wife and I are currently working on a \( 3000000 \times 1000 \) matrix in order to improve workflows in our university library; we are using SVD to identify the most important aspects of the work.

The approximation described here using SVD is the unique rank \(k\) approximation to \(A\) of minimum error (any other \(k\) vectors you choose will have larger area on average for unit vectors \( \vec{b}\)). In other words, this is the best trade-off you can hope for.

If you want to work with larger matrices, you may not want the “svd” command. Instead the “svds” command can be told exactly how large you want \(k\).

[u,s,v] = svds(a,2)

u =
        -0.1        -0.5
        -0.3         0.5
        -0.4           0
        -0.5         0.5
        -0.7        -0.5

s =

Diagonal Matrix

          10           0
           0           2

v =

       -0.48       -0.36
       -0.64       -0.48
         0.6        -0.8

Notice the \(\sigma_3 = 0\) and the accompanying vectors \( \vec{u}_3\) and \( \vec{v}_3 \) have been left out.

Continue on to »The players»