Rudimentary Linear Algebra and Matrix Decompositions


Draft in preparation!

In this article I am going to discuss linear algebra from a computational point of view dealing with matrix computations in contrast to abstract linear algebra that deal with geometric or vector space point of view. So let's jump right in!

Let's suppose you visit a cafe for two days: the first day you order a coffee and a couple of donuts and you pay 14 bucks; the second day you go with friends and you order 3 coffees and 8 donuts and you pay 50 bucks. The question is how much a coffee costs at that cafe? The branch of mathematics that answers this question uses some symbols for the representation of the prices of the coffee and donut and is known as Algebra.

Let $x$ be the price of a coffee and $y$ be the price of a donut then the first and second visits to the cafe can be mathematically written as: $$x+2y=14,$$ $$3x+8y=50,$$ respectively. By now probably you have found out that a coffee costs \$6 and a donut \$4. You can simply multiply the first equation by 4 to get $4x+8y=56$, next subtract the second equation from this to get $x+0y=56-50$ and finally you have $x=6$. This method is known as the elimination of variables since we eliminated the $y$ to get the value of $x$.

This problem may seem trivial; however, many problems in science and engineering can be formulated in this manner. We have some unknown quantities $\mathbf{x}=[x_1,x_2,...,x_m]$, we perform some experiments and observe some other quantities $\mathbf{b}=[b_1,b_2,...,b_n]$. We assume a linear relationship between $\mathbf{x}$ & $\mathbf{b}$ and solve the $n$ equations to find out the $m$ unknowns. In this problem we had two unknown quantities, i.e., $m=2$ $x$ & $y$ (the prices of coffee and donut) and two observations, i.e., $n=2$ (the amount paid per visit).

The elimination of variables method discussed above works well for small $m$ and $n$ but in real world problems their values can range from hundreds to thousands. For such large problems we organize our data in a format such that it follows some simple and common rules. One such format is called a matrix. A matrix is a collection of numerical data that is organized into rows and columns.
The equations $$x_1+2x_2=15,$$ $$3x_1+8x_2=50$$ can be represented in matrix form as \[ \begin{pmatrix} 1 & 2 \\ 3 & 8 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 15 \\ 50 \end{pmatrix}. \] In matrix notation we write $$\mathbf{A}\mathbf{x} = \mathbf{b},$$ where $\mathbf{A} =\begin{pmatrix} 1 & 2 \\ 3 & 8 \end{pmatrix} $, $\mathbf{x}= \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$ and $\mathbf{b}= \begin{pmatrix} 15 \\ 50 \end{pmatrix}$ . To find $\mathbf{x}$, we need to find $$\mathbf{x} = \mathbf{A}^{-1}\mathbf{b},$$ where $\mathbf{A}^{-1}$ is the inverse of the matrix $\mathbf{A}$. In general, finding the inverse of matrices is a computationally expensive task. In the remaining part of this article we will discuss how to efficiently solve this matrix inverse problem; however, before that let's look at some common types of matrices that arise frequently in practice.

Types of Matrices

Diagonal Matrix: All the off-diagonal elements in a diagonal matrix are zero, i.e., $$A_{i,j} = 0\; \forall\; i\neq j\,.$$ $$ A = \begin{pmatrix} a_{1,1} & 0 & 0 \\ 0 & a_{2,2} & 0 \\ 0 & 0 & a_{3,3} \end{pmatrix} $$ The inverse of a diagonal matrix is obtained by simply taking the reciprocal of the diagonal elements, i.e., $A^{-1}_{i,j}\;=\;1/A_{i,j}\; \forall\; i=j$. $$ A^{-1} = \begin{pmatrix} \frac{1}{a_{1,1}} & 0 & 0 \\ 0 & \frac{1}{a_{2,2}} & 0 \\ 0 & 0 & \frac{1}{a_{3,3}} \end{pmatrix} $$
Identity Matrix: An identity matrix is a diagonal matrix with the diagonal values equal to 1, i.e., $$I_{i,j} = 1\; \forall\; i = j \,.$$ The identity matrix is the inverse of itself. $$I = I^{-1} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}$$
Symmetric Matrix: Symmetric matrices remain unchanged if we exchange all its rows by its columns known as the transpose operation, i.e., \begin{align*} A_{i,j}\; &=\; A_{j,i}, \\ A^T\; &=\; A \end{align*} $$ A = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 2 & 0 \\ 3 & 0 & 3 \end{pmatrix} $$
Triangular Matrix: A triangular matrix has elements below (or above) the diagonal equal to zero, i.e., \begin{align*} U_{ij}\; &=\; 0\; \forall\; i\; \lt\; j &&\; \text{Upper Triangular} \\ L_{ij}\; &=\; 0\; \forall\; i\; \gt\; j &&\; \text{Lower Triangular} \end{align*} $$ U = \begin{pmatrix} u_{i,j} & u_{i,j} & u_{i,j} \\ 0 & u_{i,j} & u_{i,j} \\ 0 & 0 & u_{i,j} \end{pmatrix} \quad L = \begin{pmatrix} l_{i,j} & 0 & 0 \\ l_{i,j} & l_{i,j} & 0 \\ l_{i,j} & l_{i,j} & l_{i,j} \end{pmatrix} $$ It is straight forward to find the inverse of a triangular matrix using forward or backward substitution method. The substitution method is very similar to the elimination of variables method.
Orthogonal Matrix: A matrix $Q$ is orthogonal if its transpose is equal to its inverse, i.e., $$Q^{-1}=Q^{T}$$
Positive Definite Matrix: For the sake of completeness, let's define $M$ to be a positive definite matrix if $$z^{\mathrm{T}}Mz\; \gt\; 0,$$ where $z$ is a non-zero vector.

Eigenvalues and Eigenvectors

Matrix Decompositions

We have seen that diagonal, triangular and orthogonal matrices can be inverted efficiently; hence, it would be useful to transform any matrix into a product of diagonal, triangular and/or orthogonal matrices, invert these canonical matrices and multiply them back to get the inverse of the original matrix as shown in the figure below.

Inverse using matrix decomposition
This process of transforming a matrix into a product of simpler matrices in order to gain some computational efficiency is known as matrix decomposition. Besides computational efficiency, matrix decomposition provides analytic simplicity, given the simple form of the component matrices we can readily see the inherent characteristics of the data.

There are many decomposition algorithms available and each one is suited to a particular category of matrices. Here we will discuss the most common types of decompositions such as the LU, Cholesky, QR, SVD, and eigen decomposition.

LU Decomposition

LU decomposition decomposes a matrix into a lower triangular matrix $L$ and an upper triangular matrix $U$. To compute the LU decomposition of the matrix discussed above we have \[ \begin{bmatrix} 1 & 2 \\ 2 & 8 \end{bmatrix} = \begin{bmatrix} l_{11} & 0 \\ l_{21} & l_{22} \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\ 0 & u_{22} \end{bmatrix}. \] Multiplying the matrices on the right and equating terms we have the following equations \begin{align*} 1 &= l_{11}u_{11}\\ 2 &= l_{21}u_{11}\\ 2 &= l_{11}u_{12}\\ 8 &= l_{21}u_{12} + l_{22}u_{22} \end{align*} Since there are $4$ equations and $6$ unknowns, the system of equations is under-determined and we can set $u_{11}=u_{22}=1$. Then we have \begin{align*} l_{11} &= 1\\ l_{21} &= 2\\ l_{11}u_{12} &= 2 \implies u_{12} = 2\\ l_{21}u_{12} + l_{22} &= 8 \implies l_{22} = 4 \end{align*} Thus we have \[ \begin{bmatrix} 1 & 2 \\ 2 & 8 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 2 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}. \] The theoretical complexity of LU decomposition is $\mathcal{O}(\frac{2n^3}{3})$

Cholesky Decomposition

In mathematics, the defining properties of a distance function are that it is always non-negative and symmetric, i.e., $D(a,b)\ge0$ and $D(a,b)= D(b,a)$. Thus problems involving distance functions result in matrices that are symmetric and positive definite. Many real-world problems involve distance functions and thus there is an abundance of symmetric positive-definite matrices.

Can we do better than LU decomposition on symmetric positive-definite matrices? Can we exploit the symmetery? It turns out that we can decompose a symmetric positive-definite matrix into triangular matrices twice as fast as LU decomposition. The resulting decomposition is known as the Cholesky decomposition.

Cholesky decompositon exploits and preserves symmetry such that $$ A = LL^T$$ Fortunately, our example matrix $A$ is symmetric and positive definite, so we can write it as \[ \begin{bmatrix} 1 & 2 \\ 2 & 8 \end{bmatrix} = \begin{bmatrix} l_{11} & 0 \\ l_{21} & l_{22} \end{bmatrix} \begin{bmatrix} l_{11} & l_{12} \\ 0 & l_{22} \end{bmatrix}. \] Multiplying the matrices on the right and equating terms we have the following equations \begin{align*} 1 &= l_{11}l_{11}\\ 2 &= l_{11}l_{12}\\ 8 &= l_{21}l_{12} + l_{22}l_{22} \end{align*} Then we have \begin{align*} l_{11} &= 1\\ l_{21} &= 2\\ l_{22} &= 2\\ \end{align*} Pluging the values in the triangular matrix, we have \[ \begin{bmatrix} 1 & 2 \\ 2 & 8 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 2 & 2 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 0 & 2 \end{bmatrix}. \]

Eigen Decomposition

QR Decomposition

The QR decomposition factorizes a matrix $A$ into an orthoganl matrix $Q$ and an upper triangular matrix $R$.

Singular Value Decomposition