Orthogonal Principal

$$ A = \begin{bmatrix} | & & | \\ A_1 & \cdots & A_n \\ | & & | \end{bmatrix},\qquad \qquad A^{-1} = \begin{bmatrix} - & b_1^T & - \\ & \vdots & \\ - & b_n^T & - \end{bmatrix} $$ $$ A^{-1} A = \begin{bmatrix} - & b_1^T & - \\ & \vdots & \\ - & b_n^T & - \end{bmatrix} \begin{bmatrix} | & & | \\ A_1 & \cdots & A_n \\ | & & | \end{bmatrix} = \begin{bmatrix} b_1^TA_1 & \cdots & b_1^TA_n \\ \vdots & \ddots & \vdots \\ b_n^TA_1 & \cdots & b_n^TA_n \end{bmatrix} = \begin{bmatrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1 \end{bmatrix} $$ $$ \Rightarrow \qquad \qquad b_i^TA_i = 1, \qquad b_i^TA_j =0, \quad i\neq j $$

Block Matrix Inverse (2$\times$2)

If $A$ and $D-CA^{-1}B$ are invertible: $$ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} I & 0 \\ CA^{-1} & I \end{bmatrix} \begin{bmatrix} A & 0 \\ 0 & D-CA^{-1}B \end{bmatrix} \begin{bmatrix} I & A^{-1}B \\ 0 & I \end{bmatrix} $$ $$ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} A^{-1} + A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1} & -A^{-1}B(D-CA^{-1}B)^{-1} \\ -(D-CA^{-1}B)^{-1}CA^{-1} & (D-CA^{-1}B)^{-1} \end{bmatrix} $$ If $D$ and $A-BD^{-1}C$ are invertible: $$ \begin{bmatrix} A & B \\ C & D \end{bmatrix} = \begin{bmatrix} I & BD^{-1} \\ 0 & I \end{bmatrix} \begin{bmatrix} (A-BD^{-1}C)^{-1} & 0 \\ 0 & D \end{bmatrix} \begin{bmatrix} I & 0 \\ D^{-1}C & I \end{bmatrix} $$ $$ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} (A-BD^{-1}C)^{-1} & -(A-BD^{-1}C)^{-1}BD^{-1} \\ -D^{-1}C(A-BD^{-1}C)^{-1} & D^{-1} + D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1} \end{bmatrix} $$ If both $A$ and $D$ are invertible: $$ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} (A-BD^{-1}C)^{-1} & 0 \\ 0 & D-CA^{-1}B)^{-1} \end{bmatrix} \begin{bmatrix} I & -BD^{-1} \\ -CA^{-1} & I \end{bmatrix} $$

These formulas can be obtained by doing block row and column reduction on the matrix $2\times 2$ block matrix.

Block Matrix Inverse (General)

This block row-reduction could be applied to larger block matrices as well to develop formulas for inverses of matrices with more blocks $3 \times 3$, $4 \times 4$, etc. We illustrate the procedure for the block $3 \times 3$ case below, but note the the same logic would work on larger block matrices (assuming invertibility of the necessary sub-blocks). $$ \bigg[ \ \ X \ \ \bigg]^{-1} = \begin{bmatrix} A & B & C\\ D & E & F\\ G & H & J\\ \end{bmatrix}^{-1} $$ Deriving the full formulaswould be messy but straightforward.

$$ Row \ Red. \ 1: \qquad \underbrace{ \begin{bmatrix} I & 0 & 0 \\ -DA^{-1} & I & 0 \\ -GA^{-1} & 0 & I \end{bmatrix} }_{P} \begin{bmatrix} A & B & C\\ D & E & F\\ G & H & J\\ \end{bmatrix} = \begin{bmatrix} A & B & C \\ 0 & E-DA^{-1}B & F-DA^{-1}C \\ 0 & H-GA^{-1}B & J-GA^{-1}C \end{bmatrix} $$ $$ Row \ Red. \ 2: \qquad \underbrace{ \begin{bmatrix} I & 0 & 0 \\ 0 & I & 0 \\ 0 & -(H-GA^{-1}B)(E-DA^{-1}B)^{-1} & I \end{bmatrix} }_{Q} \underbrace{ \begin{bmatrix} I & 0 & 0 \\ -DA^{-1} & I & 0 \\ -GA^{-1} & 0 & I \end{bmatrix} }_{P} \begin{bmatrix} A & B & C\\ D & E & F\\ G & H & J\\ \end{bmatrix} = \begin{bmatrix} A & B & C \\ 0 & E-DA^{-1}B & F-DA^{-1}C \\ 0 & 0 & K \end{bmatrix} $$ where $$ K = J-GA^{-1}C - (H-GA^{-1}B)(E-DA^{-1}B)^{-1}(F-DA^{-1}C) $$ Note here we've done block row-reductions on to make the matrix upper triangular.

We can do similar block column-reduction operations to make the matrix block diagonal $$ Col \ Red. \ 1 \ \& \ 2: \qquad \bigg[ \ \ Q \ \ \bigg] \bigg[ \ \ P \ \ \bigg] \begin{bmatrix} A & B & C\\ D & E & F\\ G & H & J\\ \end{bmatrix} \underbrace{ \begin{bmatrix} I & -A^{-1}B & -A^{-1}C \\ 0 & I & 0 \\ 0 & 0 & I \end{bmatrix} }_{R} \underbrace{ \begin{bmatrix} I & 0 & 0 \\ 0 & I & -(E-DA^{-1}B)^{-1}(F-DA^{-1}C) \\ 0 & 0 & I \end{bmatrix} }_{S} = \begin{bmatrix} A & 0 & 0 \\ 0 & E-DA^{-1}B & 0 \\ 0 & 0 & K \end{bmatrix} $$ The full inverse is then given by $$ \begin{bmatrix} A & B & C\\ D & E & F\\ G & H & J\\ \end{bmatrix}^{-1} = \bigg[ \ \ R \ \ \bigg] \bigg[ \ \ S \ \ \bigg] \begin{bmatrix} A & 0 & 0 \\ 0 & E-DA^{-1}B & 0 \\ 0 & 0 & K \end{bmatrix}^{-1} \bigg[ \ \ Q \ \ \bigg] \bigg[ \ \ P \ \ \bigg] $$ where, again $$ K = J-GA^{-1}C - (H-GA^{-1}B)(E-DA^{-1}B)^{-1}(F-DA^{-1}C) $$

Woodbury Matrix Identity

This identity is particularly useful for a low-rank inverse updates. $$ (A+UCV)^{-1} = A^{-1} - A^{-1}U(C+VA^{-1}U)^{-1}VA^{-1} $$

Block Example 1: Block Triangular

$$ \begin{bmatrix} I & B \\ 0 & I \end{bmatrix}^{-1} = \begin{bmatrix} I & -B \\ 0 & I \end{bmatrix},\qquad \begin{bmatrix} I & 0 \\ C & I \end{bmatrix}^{-1} = \begin{bmatrix} I & 0 \\ -C & I \end{bmatrix} $$ $$ \begin{bmatrix} A & B \\ 0 & D \end{bmatrix}^{-1} = \begin{bmatrix} A^{-1} & -A^{-1}BD^{-1} \\ 0 & D^{-1} \end{bmatrix},\qquad \begin{bmatrix} A & 0 \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} A^{-1} & 0 \\ -D^{-1}CA^{-1} & D^{-1} \end{bmatrix} $$

Block Example 2: KKT Conditions Quadratic Programs

The following system of equations frequently shows up in constrained optimization. $$ \begin{bmatrix} Q & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} x \\ v \end{bmatrix} = \begin{bmatrix} c \\ b \end{bmatrix} $$ In particular this is the KKT conditions for a quadratic optimization problem with linear equality constraints $$ \min \quad \tfrac{1}{2}x^TQx - c^Tx, \qquad s.t. \quad Ax = b $$ The solution to this system of equations can be given explicitly by inverting the matrix $$ \begin{bmatrix} Q & A^T \\ A & 0 \end{bmatrix}^{-1} = \begin{bmatrix} Q^{-1}-Q^{-1}A^T(AQ^{-1}A)^{-1}AQ^{-1} & Q^{-1}A^T(AQ^{-1}A)^{-1} \\ (AQ^{-1}A)^{-1}AQ^{-1} & (AQ^{-1}A^T)^{-1} \end{bmatrix} $$

Newton's Method for Gradient Descent

Similar systems of equations arise in using Newton's method for gradient descent. In particular, for a general nonlinear equality constrained optimization problem, $$ \min_x \quad f(x), \qquad s.t. \quad g(x)=0 $$ The Newton step direction is given by $$ \begin{bmatrix} \Delta x \\ \\ \Delta v \end{bmatrix} = \begin{bmatrix} Q & A^T \\ & \\ A & 0 \end{bmatrix}^{-1} \begin{bmatrix} \tfrac{\partial f}{\partial x}+\tfrac{\partial g}{\partial x}^Tv \\ \\ g(x) \end{bmatrix}, \qquad Q = \tfrac{\partial^2 f}{\partial x^2}, \quad A = \tfrac{\partial g}{\partial x} $$

Interior Point/Barrier Methods

Interior point methods reframe the equality/inequality constrained optimization problem $$ \min_{x} \quad f(x) \qquad s.t. \quad g(x)=0, \ \ h(x) \geq 0 $$ with barrier functions as $$ \min_{x,s} \quad tf(x)-\sum_i\ln(s_i) \qquad s.t. \quad g(x)=0, \ \ h(x)-s = 0 $$ The Newton update for this problem is then given by $$ \begin{bmatrix} \Delta x \\ \Delta v \\ \Delta s \\ \Delta w \end{bmatrix} = \begin{bmatrix} P & 0 & B^T & C^T \\ 0 & dg(s)^{-2} & 0 & -I \\ B & 0 & 0 & 0 \\ C & -I & 0 & 0 \end{bmatrix}^{-1} \begin{bmatrix} \tfrac{\partial f}{\partial x} + B^Tv + C^Tw\\ -s^{-1}-w \\ g(x) \\ h(x)-s \end{bmatrix} $$ where $$ P = \tfrac{\partial^2 f}{\partial x^2}, \qquad B = \tfrac{\partial g}{\partial x}, \qquad C = \tfrac{\partial h}{x} $$ $$ AQ^{-1}A^T = \begin{bmatrix} B & 0 \\ C & -I \end{bmatrix} \begin{bmatrix} P^{-1} & 0 \\ 0 & dg(s)^{2} \end{bmatrix} \begin{bmatrix} B^T & C^T \\ 0 & -I \end{bmatrix} = \begin{bmatrix} BP^{-1}B^T & BP^{-1}C^T \\ CP^{-1}B^T & CP^{-1}C^T + dg(s)^2 \end{bmatrix} $$ Block matrix formulas can be applied to invert above. The most efficient computations will depend on the problem parametrs, ie. which matrices are easiest to invert.

Fundamental Theorem Basis

Given a matrix a full row rank matrix $A$ and basis for its nullspace given by the columns of $N$, we might want to choose $\big[A^T \ N\big]$ as a basis for the domain. As a coordinate tranformation we will often want to invert this. $$ \bigg[ \ A^T \ \ N \ \bigg]^{-1} = \begin{bmatrix} (AA^T)^{-1}A \\ (N^TN)^{-1}N^T \end{bmatrix} $$

One can check $$ \begin{bmatrix} (AA^T)^{-1}A \\ (N^TN)^{-1}N^T \end{bmatrix} \bigg[ \ A^T \ \ N \ \bigg] = \begin{bmatrix} (AA^T)^{-1}AA^T & (AA^T)^{-1}AN \\ (N^TN)^{-1}N^TA^T & (N^TN)^{-1}N^TN \end{bmatrix} = \begin{bmatrix} I & 0 \\ 0 & I \end{bmatrix} $$

Note, also that this leads to the fact that the sum of two projection matrices onto orthogonal subspaces (who's direct sum is the whole space) is the identity. (This is, perhaps, not obvious immmediately algebraically.) $$ \bigg[ \ A^T \ \ N \ \bigg] \begin{bmatrix} (AA^T)^{-1}A \\ (N^TN)^{-1}N^T \end{bmatrix} = A^T(AA^T)^{-1}A + N(N^TN)^{-1}N^T = I $$

Note the close relationship between this inverse and the transpose of the matrix since the two sets of columns are orthogonal. $$ \begin{bmatrix} A \\ N^T \end{bmatrix} \bigg[ \ A^T \ \ N \ \bigg] = \begin{bmatrix} AA^T & 0 \\ 0 & N^TN \end{bmatrix} $$

Average and Differences

Given any set of basis vectors $A \in \mathbb{R^{n \times n}}$, a new basis can be obtained by right multiplying by an invertible transformation $W \in \mathbb{R}^{n \times n}$. This creates a new basis where each new vector is a linear combination of the previous vectors $$ W = \begin{bmatrix} | & & | \\ W_1 & \cdots & W_n \\ | & & | \end{bmatrix}, \qquad AW = \bigg[ \ AW_1 \ \cdots \ AW_n \ \bigg] $$ We now give several choices for $W$ and the resulting $W^{-1}$ and the physical meaning for each. $$ W = \begin{bmatrix} 1 & -\mathbf{1}^T \\ 0 & I \end{bmatrix}, \qquad W^{-1} = \begin{bmatrix} 1 & \mathbf{1}^T \\ 0 & I \end{bmatrix} $$ $W$ chooses one vector (the first one) and measures all other vectors from that vector. $W^{-1}$ says every other vector is the main vector plus a difference. $$ W = \begin{bmatrix} \tfrac{1}{n} & -\tfrac{1}{n} \mathbf{1}^T \\ & \\ \tfrac{1}{n} \mathbf{1} & I- \tfrac{1}{n}\mathbf{1}\mathbf{1}^T \end{bmatrix}, \qquad W^{-1} = \begin{bmatrix} 1 & \mathbf{1}^T \\ & \\ -\mathbf{1} & I \end{bmatrix} $$ $W$ computes the arithmetic mean of the vecors and measures each point (except for one (the first) from that mean.) $W^{-1}$ selects the mean and adds differences to get the points. The one exception is the first point which must be recovered from the the mean and the differences with all the other points.