Any matrix \( A \in \mathbb{R}^{m \times n} \) (ie. linear transformation) is invertible between the range of \( A \) and \( A^T \). The pseudo-inverse of a matrix performs this inverse.
Gaussian Elimination PerspectiveAs shown in the section on Gaussian elimination, in general, for a matrix \( A \in \mathbb{R}^{m \times n} \) we can find an invertible \(E \in \mathbb{R}^{m \times m} \) that is a composition of elementary matrices \( E = E_k \cdots E_1 \) such that $$ EA = \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} $$ with \( I \in \mathbb{R}^{k \times k} \) and \( B \in \mathbb{R}^{k \times n-k} \). It will be helpful to decompose \( E \) and also \( E^{-1} \) as $$ E = \begin{bmatrix} - & E' & - \\ - & E'' & - \end{bmatrix}, \qquad E^{-1} \begin{bmatrix} | & | \\ F' & F'' \\ | & | \end{bmatrix} $$ where \( E' \in \mathbb{R}^{k \times m} \), \(E'' \in \mathbb{m-k \times m} \), \( F' \in \mathbb{R}^{m \times k} \), and \(F'' \in \mathbb{m \times m-k} \). Note briefly that we have $$ I = EE^{-1} = \begin{bmatrix} - & E' & - \\ - & E'' & - \end{bmatrix} \begin{bmatrix} | & | \\ F' & F'' \\ | & | \end{bmatrix} = \begin{bmatrix} E'F' & E'F'' \\ E''F' & E'' F'' \end{bmatrix} = \begin{bmatrix} I & 0 \\ 0 & I \end{bmatrix} $$ and also that \( I = E^{-1}E = F'E'+F''E'' \). (Note which submatrices are orthogonal.) Returning to our discussion on Gaussian elimination, it is useful to write the above equation as decomposition of \(A\) $$ A = E^{-1} \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} | & | \\ F' & F'' \\ | & | \end{bmatrix} \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} = F' \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} $$ It is clear from this that \( F' \) is a basis for the range of \(A\). (The above construction shows that \(F'\) must span the range of \(A\) and it must also have linearly independent columns since it is column subset of \(E^{-1}\) which is invertible. We also have that the rows of \(E''\) form a basis for the nullspace of \(A^T\) by a similar linear independence argument, rank-nullity (applied to the co-domain) and the fact that \(E''F' = 0 \).
For an equation of the form \(y = Ax \) for a general \(A\) that may not be full rank, there may not be a solution since \(y\) may not be in the range of \(A\) and however small we can make the difference \(y-Ax\), there maybe a subspaces of \(x\)'s that get \(Ax\) equally close to \(y\) since \(A\) may have a nontrivial nullspace. The above decomposition can help us find \(x\) in a sensible way. Multiplying \(y=Ax\) on the left by \(E\) gives $$ Ey = EAx $$ $$ \begin{bmatrix} E'y \\ E''y \end{bmatrix} = \begin{bmatrix} I & B \\ 0 & 0 \end{bmatrix} \begin{bmatrix} x' \\ x'' \end{bmatrix} $$ Note that since the bottom (block) row on the right is all zeros, the bottom equation here can only be satisfied if \(E'' y = 0 \) or in other words if \(y\) is in the range of \(A\) (ie. orthogonal to the nullspace of \(A^T\)\). Also since any \(x\) gets multiplied by \(0\)'s any \(x\) any \(x\) is equally bad/ good as far as the bottom equation is concerned. The top equation then becomes $$ E'y = x' + Bx'' $$ In this form be can simply choose \(x' = E'y\) (and \(x''=0\)) to get a solution for the top equations. In some sense then, we have that \(E'\) is a pseudo-inverse that finds a value for \(x\) that gets \(Ax\) as close to \(y\) as possible. Adding in a component in the nullspace we have that $$ x = \begin{bmatrix} E'y \\ 0 \end{bmatrix} + \begin{bmatrix} B \\ -I \end{bmatrix} z $$ is a general solution to the equation (or at least as close as we can get). The component \(E''y\) gives the difference between \(y\) and \(Ax\)
Moore-Penrose Pseudo-InversePerhaps the most often used pseudo-inverse is the Moore-Penrose pseudo-inverse, \( A^\dagger \) based on the singular value decomposition. Let the SVD of \(A\) $$ A = U \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}V^T $$ The Moore-Penrose pseudo-inverse, is given by $$ A^\dagger = V \begin{bmatrix} \Sigma^{-1} & 0 \\ 0 & 0 \end{bmatrix}U^T $$ Note that the order ofr the orthonormal transformations has changed and we've inverted the singular values. Decomposing further we get $$ A^\dagger = \begin{bmatrix} | & | \\ V' & V'' \\ | & | \\ \end{bmatrix} \begin{bmatrix} \Sigma^{-1} & 0 \\ 0 & 0 \end{bmatrix}U^T \begin{bmatrix} - & {U'}^T & - \\ - & {U''}^T & - \end{bmatrix} = V'\Sigma^{-1}{U'}^T $$ We can note several things about any approximate solution to \(y=Ax\) $$ x = A^\dagger y = V'\Sigma^{-1}{U'}^Ty $$ First, \(x\) will be in the span of \(V'\), ie. the range of \(A^T\) and thus will not have a component in the nullspace of \(A\) (ie. it will be a minimum norm solution). Second, since the first step of the inverse is to project \(y\) onto the range of \(A\) (onto the range of \(U'\)) the pseudo-inverse minimizes the difference between \(Ax\) and \(y\). Explicitly, $$ \min_x \ \Vert y-Ax\Vert|_2^2 = \min_x (y-Ax)^T(y-Ax) $$ $$ \min_x \ y^Ty - 2y^TU'\Sigma {V'}^Tx - x^TV'\Sigma^2{V'}^Tx$$ Applying the coordinate transform \(x = Vz = V'z' + V''z'' \) gives that \({V'}^Tx = z'\) and thus we have the optimization problem above is $$ \min_{z'} \ y^Ty - 2y^TU'\Sigma z' - {z'}^T\Sigma^2 z'$$ Differentiating and solving for \(z'\) gives $$ z' = \Sigma^{-2} \Sigma {U'}^Ty \qquad \Rightarrow \qquad x = V'z' = V'\Sigma^{-1}{U'}^Ty $$ which is precisely the pseudo-inverse.
Thus we have that the Moore-Penrose gives the minimum-norm least squares solution.