PSEUDO-INVERSES

Any matrix $ A \in \mathbb{R}^{m \times n} $ (ie. linear transformation) is invertible between the range of $ A $ and $ A^T $. The pseudo-inverse of a matrix performs this inverse.

Gaussian Elimination Perspective

As shown in the section on Gaussian elimination, in general, for a matrix $ A \in \mathbb{R}^{m \times n} $ we can find an invertible $E \in \mathbb{R}^{m \times m} $ that is a composition of elementary matrices $ E = E_k \cdots E_1 $ such that $$ EA = \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} $$ with $ I \in \mathbb{R}^{k \times k} $ and $ B \in \mathbb{R}^{k \times n-k} $. It will be helpful to decompose $ E $ and also $ E^{-1} $ as $$ E = \begin{bmatrix} - & E' & - \\ - & E'' & - \end{bmatrix}, \qquad E^{-1} \begin{bmatrix} | & | \\ F' & F'' \\ | & | \end{bmatrix} $$ where $ E' \in \mathbb{R}^{k \times m} $, $E'' \in \mathbb{m-k \times m} $, $ F' \in \mathbb{R}^{m \times k} $, and $F'' \in \mathbb{m \times m-k} $. Note briefly that we have $$ I = EE^{-1} = \begin{bmatrix} - & E' & - \\ - & E'' & - \end{bmatrix} \begin{bmatrix} | & | \\ F' & F'' \\ | & | \end{bmatrix} = \begin{bmatrix} E'F' & E'F'' \\ E''F' & E'' F'' \end{bmatrix} = \begin{bmatrix} I & 0 \\ 0 & I \end{bmatrix} $$ and also that $ I = E^{-1}E = F'E'+F''E'' $. (Note which submatrices are orthogonal.) Returning to our discussion on Gaussian elimination, it is useful to write the above equation as decomposition of $A$ $$ A = E^{-1} \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} | & | \\ F' & F'' \\ | & | \end{bmatrix} \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} = F' \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} $$ It is clear from this that $ F' $ is a basis for the range of $A$. (The above construction shows that $F'$ must span the range of $A$ and it must also have linearly independent columns since it is column subset of $E^{-1}$ which is invertible. We also have that the rows of $E''$ form a basis for the nullspace of $A^T$ by a similar linear independence argument, rank-nullity (applied to the co-domain) and the fact that $E''F' = 0 $.

For an equation of the form $y = Ax $ for a general $A$ that may not be full rank, there may not be a solution since $y$ may not be in the range of $A$ and however small we can make the difference $y-Ax$, there maybe a subspaces of $x$'s that get $Ax$ equally close to $y$ since $A$ may have a nontrivial nullspace. The above decomposition can help us find $x$ in a sensible way. Multiplying $y=Ax$ on the left by $E$ gives $$ Ey = EAx $$ $$ \begin{bmatrix} E'y \\ E''y \end{bmatrix} = \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} \begin{bmatrix} x' \\ x'' \end{bmatrix} $$ Note that since the bottom (block) row on the right is all zeros, the bottom equation here can only be satisfied if $E'' y = 0 $ or in other words if $y$ is in the range of $A$ (ie. orthogonal to the nullspace of $A^T$\). Also since any $x$ gets multiplied by $0$'s any $x$ any $x$ is equally bad/ good as far as the bottom equation is concerned. The top equation then becomes $$ E'y = x' + Bx'' $$ In this form be can simply choose $x' = E'y$ (and $x''=0$) to get a solution for the top equations. In some sense then, we have that $E'$ is a pseudo-inverse that finds a value for $x$ that gets $Ax$ as close to $y$ as possible. Adding in a component in the nullspace we have that $$ x = \begin{bmatrix} E'y \\ 0 \end{bmatrix} + \begin{bmatrix} B \\ -I \end{bmatrix} z $$ is a general solution to the equation (or at least as close as we can get). The component $E''y$ gives the difference between $y$ and $Ax$

Moore-Penrose Pseudo-Inverse

Perhaps the most often used pseudo-inverse is the Moore-Penrose pseudo-inverse, $ A^\dagger $ based on the singular value decomposition. Let the SVD of $A$ $$ A = U \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}V^T $$ The Moore-Penrose pseudo-inverse, is given by $$ A^\dagger = V \begin{bmatrix} \Sigma^{-1} & 0 \\ 0 & 0 \end{bmatrix}U^T $$ Note that the order ofr the orthonormal transformations has changed and we've inverted the singular values. Decomposing further we get $$ A^\dagger = \begin{bmatrix} | & | \\ V' & V'' \\ | & | \\ \end{bmatrix} \begin{bmatrix} \Sigma^{-1} & 0 \\ 0 & 0 \end{bmatrix}U^T \begin{bmatrix} - & {U'}^T & - \\ - & {U''}^T & - \end{bmatrix} = V'\Sigma^{-1}{U'}^T $$ We can note several things about any approximate solution to $y=Ax$ $$ x = A^\dagger y = V'\Sigma^{-1}{U'}^Ty $$ First, $x$ will be in the span of $V'$, ie. the range of $A^T$ and thus will not have a component in the nullspace of $A$ (ie. it will be a minimum norm solution). Second, since the first step of the inverse is to project $y$ onto the range of $A$ (onto the range of $U'$) the pseudo-inverse minimizes the difference between $Ax$ and $y$. Explicitly, $$ \min_x \ \Vert y-Ax\Vert|_2^2 = \min_x (y-Ax)^T(y-Ax) $$ $$ \min_x \ y^Ty - 2y^TU'\Sigma {V'}^Tx - x^TV'\Sigma^2{V'}^Tx$$ Applying the coordinate transform $x = Vz = V'z' + V''z'' $ gives that ${V'}^Tx = z'$ and thus we have the optimization problem above is $$ \min_{z'} \ y^Ty - 2y^TU'\Sigma z' - {z'}^T\Sigma^2 z'$$ Differentiating and solving for $z'$ gives $$ z' = \Sigma^{-2} \Sigma {U'}^Ty \qquad \Rightarrow \qquad x = V'z' = V'\Sigma^{-1}{U'}^Ty $$ which is precisely the pseudo-inverse.

Thus we have that the Moore-Penrose gives the minimum-norm least squares solution.