Compressed Sensing


If a signal is changing 10 times every second, the Nyquist-Shannon sampling theorem states that in order to recover that signal, we need at least 20 measurements per second. More precisely, the measurement sampling rate should be twice the highest frequency contained in the measured signal. Nyquist-Shannon sampling theorem (1949) has dictated signal processing for more than 50 years.

For applications such as magnetic resonance imaging, electro-encephalography and other biomedical sensing modalities, measurements are time-consuming and sometimes costly. So a natural question is whether such signals can be acquired more efficiently? If the signal being acquired has some structure, for example, if it is sparse, then the answer is yes. The area of signal processing that allows us to acquire high-dimensional sparse signals with very few measurements is known as compressed sensing. In this blog I will provide a basic introduction to the mathematical basis of compressed sensing and point to some of its applications.

Let's suppose our signal of interest is an $n$-dimensional vector $\mathbf{x}\in \mathbb{C}^{n}$. We take $m$ measurements $\mathbf{y}\in \mathbb{C}^{m}$. The original signal $\mathbf{x}$ and its measurements $\mathbf{y}$ are related by a sensing matrix $\mathbf{\Phi}\in \mathbb{C}^{m\times n}$ $$ \begin{equation} \mathbf{y} = \mathbf{\Phi} \mathbf{x} \end{equation}$$ The sensing matrix depends on the underlying mechanism of the measuring modality, e.g., for magnetic resonance imaging the sensing matrix is the Fourier transform matrix. If the number of measurements is smaller than the dimension of the signal $m \lt n$, we have fewer equations than unknowns, the system of equations is under-determined and there are an infinite number of solutions. That is, there are many different values of the signal $\mathbf{x}$ that will result in the same measurements $\mathbf{y}$.

To give a concrete example, let's assume that we want to measure the $4$-dimensional state of your heart and let's say its true value is $\mathbf{x}_{true} = [-2,0,1,0]^T$ which we don't know. Our measuring modality can output $3$-dimensional encoded measurements and has a sensing matrix $\mathbf{\Phi}$ given by $$\begin{pmatrix} 1 & 3 & 3 & 2 \\ 2 & 6 & 9 & 7 \\ -1 & -3 & 3 & 3 \end{pmatrix}.$$ And we get the encoded measurement $\mathbf{y}=[1, 5, 5]$ as follows: $$\begin{pmatrix} 1 \\ 5 \\ 5 \end{pmatrix} = \begin{pmatrix} 1 & 3 & 3 & 2 \\ 2 & 6 & 9 & 7 \\ -1 & -3 & 3 & 3 \end{pmatrix} \begin{pmatrix} -2 \\ 0 \\ 1 \\ 0 \end{pmatrix}.$$ Solving $\mathbf{y} = \mathbf{\Phi} \mathbf{x}$ for $\mathbf{x}$, we get: $$ \begin{align} x_1 &= -2 - 3x_2 + 2x_4, \\ x_2 &= \text{free variable}, \\ x_3 &= 1 - x_4, \\ x_4 &= \text{free variable}. \end{align} $$ As apparent from the solution, there are two free variables. This is because we have one less measurement and our sensing matrix is rank deficient by one. We can assume any values for them. No matter what values we assume for $x_2$ and $x_4$, multiplying $\mathbf{x}$ with $\mathbf{\Phi}$ will result in the same measurements $\mathbf{y}=[1, 5, 5]$. Thus we see that there are an infinite number of solutions (e.g. $[0, 0, 0, 1]$, $[-5, 1, 1, 0]$ etc.), that will result in our measurement being $[1,5,5]$. The puzzle is, which solution represents the true state of your heart? Can we recover $\mathbf{x}_{true}$ from $\mathbf{y}$?

In general, the answer is no! However, if we assume some structure about the nature of $\mathbf{x}_{true}$, such as if we know that $\mathbf{x}_{true}$ is sparse, i.e., it contains many zeros then we can efficiently recover $\mathbf{x}_{true}$. If we also know the locations of zeros, we can delete the corresponding coloumns from the sensing matrix and we have $$\begin{pmatrix} 1 \\ 5 \\ 5 \end{pmatrix} = \begin{pmatrix} 1 & 3 \\ 2 & 9 \\ -1 & 3 \end{pmatrix} \begin{pmatrix} -2 \\ 1 \\ \end{pmatrix}.$$ Using a least square approach we get $[-2, 0, 1, 0]$

Let's step back a little and analyze why we have infinite many solutions that satisfy the system of equations. Recall that the null space of a matrix is given by $$ Null\{\mathbf{\Phi}\} =\{\mathbf{x}_N | \mathbf{\Phi}\mathbf{x}_N=\mathbf{0}\} $$ Now if the matrix $\mathbf{\Phi}$ is not full-rank as in our example above, then the null space is non-empty $Null\{\mathbf{\Phi}\} \neq \{\emptyset\}$. Thus if we move our true or particular solution $\mathbf{x}_T$ along the null space, our measurements $\mathbf{y}$ will remain unchanged. Mathematically: \begin{align} \mathbf{y} &= \mathbf{\Phi} \mathbf{x}_T + \mathbf{0} \\ \mathbf{y} &= \mathbf{\Phi} \mathbf{x}_T + \mathbf{\Phi} \mathbf{x}_N \\ \mathbf{y} &= \mathbf{\Phi} [\mathbf{x}_T + \mathbf{x}_N] \\ \mathbf{y} &= \mathbf{\Phi} \mathbf{x} \end{align} Thus the solution $\mathbf{x}$ is the true solution $\mathbf{x}_T$ shifted by an arbitrary vector $\mathbf{x}_N$ in the null space of $\mathbf{\Phi}$. If we don't know anything about $\mathbf{x}_T$ a-priori, we have no reason to choose one solution over the other in the infinitely many solutions.

However, if we know something about $\mathbf{x}_T$, let's say it represents the energy expenditure of your heart, a natural solution then would be to choose $\mathbf{x}$ with the minimum energy. The notion of energy is usually represented by an $L_2$ norm, hence our problem of solving the system of equations becomes \begin{align} \min_{\mathbf{x}} ||\mathbf{x}||_2^2 \quad \text{such that} \quad \mathbf{y} = \mathbf{\Phi} \mathbf{x} \end{align} This is a standard quadratic programming problem with linear constraints for which we can derive a closed form solution. Consider the Lagrangian function \begin{align} \mathcal{L}(\mathbf{x}, \mathbf{\lambda}) = \mathbf{x}^T\mathbf{x} + \mathbf{\lambda}^T(\mathbf{\Phi} \mathbf{x} - \mathbf{y}), \end{align} setting the derivatives of the Lagrangian with respect to $\mathbf{x}$ and $\mathbf{\lambda}$ we have \begin{align} \frac{\partial \mathcal{L}(\mathbf{x}, \mathbf{\lambda})}{\partial \mathbf{x}} &= 2\mathbf{x} + \mathbf{\Phi}^T\mathbf{\lambda} = \mathbf{0} \implies \mathbf{x} = -\frac{1}{2}\mathbf{\Phi}^T\mathbf{\lambda}\\ % \frac{\partial \mathcal{L}(\mathbf{x}, \mathbf{\lambda})}{\partial \mathbf{\lambda}} &= \mathbf{\Phi}\mathbf{x} - \mathbf{y} =\mathbf{0} \implies \mathbf{\Phi}\mathbf{x} = \mathbf{y} \end{align} To know more about the derivatives of the matrix-vector products please see Matrix Calculus. Rearranging and eliminating $\mathbf{\lambda}$ we have the minimum $L_2$ norm or least squares solution: $$ \mathbf{x} =\mathbf{\Phi}^T \big( \mathbf{\Phi}\mathbf{\Phi}^T\big)^{-1}\mathbf{y}. $$

import numpy as np
import numpy.linalg as la

# Sensing matrix
phi = np.array([[ 1,  3, 3, 2],
                [ 2,  6, 9, 7],
                [-1, -3, 3, 4]])

# True state
x_t = np.array([-2,0,1,0])

# Encoded state measured
y=phi.dot(x_t);

#Computing the Normal equation
# inverse of (phi * phi^T)
tmp1 = la.inv( phi.dot(phi.transpose()) )
# phi^T * (phi * phi^T)^(-1)
phiInv = phi.transpose().dot( tmp1 )

# Least square solution
x = phiInv.dot(y)

print('Recovered state x:',x)
Output:
Recovered state x: [-0.1328125 -0.4296875  0.3984375  0.84375  ]

Minimizing $L_2$ norm, however, is not very useful in the case of sensing and acquisition applications. In these cases we know that the underlying signal is sparse and thus we would like to prefer solutions that are sparse. Sparsity is usually measured by an $L_1$ norm. The smaller the $L_1$ norm of a vector, the sparser is the vector. Thus our problem becomes \begin{align} \min_{\mathbf{x}} ||\mathbf{x}||_1^1 \quad \text{such that} \quad \mathbf{y} = \mathbf{\Phi} \mathbf{x} \end{align} Unfortunately, we don't have a closed form solution for this minimization, however, $L_1$ norm is convex, having a global minimum and numerical optimization techniques can be used to find the minimum.

import numpy as np
import numpy.linalg as la
import scipy.optimize as opt

# Sensing matrix
phi = np.array([[ 1,  3, 3, 2],
                [ 2,  6, 9, 7],
                [-1, -3, 3, 4]])

# True state
x_t = np.array([-2,0,1,0])

# Encoded state measured
y=phi.dot(x_t);

# setting up the optimization problem
xinit = [-1.9,0.1,0.9,0.1]      # initial solution
l1 = lambda x: la.norm(x,1)     # function to minimize
cons = {'type': 'eq', 'fun': lambda x:  phi.dot(x) - y} # equality constraint
# run the optimizer
res = opt.minimize(l1, xinit, method='SLSQP', constraints=cons, tol=1e-20, options={'maxiter':5000})

print('Recovered state x:', res.x)
print('L1 norm of the recovered state x:', l1(res.x))
print('L1 norm of the true state x_t:', l1(x_t))
print('Equality constraint satisfied?:', phi.dot(res.x) == y )
Output:
Recovered state x: [ -1.19477312e+00  -9.62374574e-09   1.94773146e-01   8.05226854e-01]
L1 norm of the recovered state x: 2.19477312698
L1 norm of the true state x_t: 3.0
Equality constraint satisfied?: [ True  True  True]

Although we got a solution with minimum $L_1$ norm, we were unable to recover $\mathbf{x_T}$ with just the minimum $L_1$ norm solution. Here we come to the central question of compressed sensing. Given that our signal of interest $\mathbf{x_T} \in \mathbb{C}^{n}$ is $s$-sparse, i.e., there are $s$ non-zero elements in it, what sensing mechanism $\mathbf{\Phi} \in \mathbb{C}^{m \times n}$ and how many measurements $m$ do we need to exactly recover $\mathbf{x_T}$? Compressed sensing theory says that we can design sensing matrices $\mathbf{\Phi}$ such that the $L_1$ minimization described above will exactly recover $\mathbf{x_T}$ if the number of measurements $$ m \geq c\,. s\log\big(\frac{n}{s}\big), $$ where $c$ is a positive constant. It has been shown that randomly sampling $\mathbf{\Phi}$ from Gaussian distribution with zero mean and variance $1/m$ results in the satisfaction of the above inequality. Thus to recover a 10000-dimensional signal, if we know that it is 100-sparse, we can recover the signal exactly without any approximation using only $100c$ measurments, where $c$ is a small positive constant.

Using a random sensing matrix, i.e., sampling measurements randomly and minimizing the $L_1$ norm indeed results in the exact recovery of $\mathbf{x_T}$, with high probabilty, as shown in the snippet below:

import numpy as np
import numpy.linalg as la
import scipy.optimize as opt
np.random.seed(42)      # random seed for reproducibility

# Random sensing matrix
phi = np.random.randn(3,4)

# True state
x_t = np.array([-2.,0.,1.,0.])

# Encoded state measured
y=phi.dot(x_t);

# setting up the optimization problem
xinit = np.array([0,0.,0.,0.])      # initial solution
l1 = lambda x: la.norm(x,1)     # function to minimize
cons = {'type': 'eq', 'fun': lambda x:  phi.dot(x) - y} # equality constraint
# run the optimizer
res = opt.minimize(l1, xinit, method='SLSQP', constraints=cons, tol=1e-20, options={'maxiter':5000})
print('Recovered state x:', res.x)
print('L1 norm of the recovered state x:', l1(res.x))
print('L1 norm of the true state x_t:', l1(x_t))
print('Equality constraint satisfied?:', phi.dot(res.x) == y )
        
Output:
Recovered state x: [ -2.00000000e+00   6.26397568e-10   1.00000000e+00  -3.18492422e-10]
L1 norm of the recovered state x: 3.0000000006
L1 norm of the true state x_t: 3.0
Equality constraint satisfied?: [ True  True  True]

To learn more about compressed sensing please refer to the excellent review paper An introduction to compressive sampling by Candes and Wakin.

Acknowledgement: This article is the culmination of rigorous discussions with Dr. Jawad Shah. The thumbnail comic image for this blog is taken from here. Please direct your feedback to najeeb dot khan at usask dot ca.