.dcmath - UNDER CONSTRUCTION

HYPERSHAPES

We're used to drawing 2D projections of 3D shapes. This process can be extended to n-dimensions by defining a 2D direction for each of the n coordinate axes. Modifying these 2D directions gives a different perspective on the nD shape. This example shows the result for various shapes including the unit simplex, unit cube, and unit sphere.

"Depth" is the dimension(s) that are lost in the 2D projection. For an n-dimensional shape, depth is n-2 dimensional. When drawing a 3D shape, depth is 1-dimensional, ie. there is a line of points (coming towards the camera) that are indistinguishable in the picture. When drawing a 4D shape, depth is 2-dimensional, ie. there is a plane of points (in the 4D space) that are indistinguishable. When drawing a 5D shape, depth is 3-dimensional...

We can't fully "see" n-dimensions, but we can see 2D or 3D subspaces of the nD space. The question then becomes can we trick our brains into stitching those 2D/3D subspaces back together into some "nD" image. The answer remains unclear.

VECTOR & LINEAR COMBINATIONS

The Cartesian representation of a mD vector is a string of \(m\) numbers that each represent a distance in a different direction/dimension.

Addition between two vectors \(x\) and \(z\) is done element-wise (assuming they have the same length). $$ x + z = \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix} + \begin{bmatrix} z_1 \\ \vdots \\ z_m \end{bmatrix} = \begin{bmatrix} x_1+z_1 \\ \vdots \\ x_m+z_m \end{bmatrix} $$ Visually, this sum is a new vector given by placing the two vectors "tip-to-tail."

"Scaling" \(x\) by a number \(\lambda\) involves multiplying each element in \(x\) by \(\lambda\). Visually this expands or shrinks (or flips) \(x\) without changing the direction. Scaling multiple vectors \(x^1,\dots,x^n\) and then adding them together is referred to as taking a "linear combination" of the vectors. All the possible linear combinations of a set of vectors is called the "span" of those vectors. If a set of vectors span a whole space that means any vector in that space can be written as a linear combination of those vectors.

Subtraction between two vectors is also done element-wise. It could also be thought of as taking a linear combination of two vectors with weights \(1\) and \(-1\). If we think of the vector as floating rather than attached to the origin, the difference vector points from the end of one vector to the other.

CONVEX COMBINATIONS

A linear combination of vectors where the weights are all positive and sum to 1 is called a "convex combination". This linear combination will lie "in between" the tips of the vectors. For example, a linear combination of two vectors each with weights 0.5 will exactly between the two vectors. If one of the weights is 1 (and the other 0), it will just be that vector.

LINEAR TRANSFORMATIONS

$$ y = Ax$$ A matrix \( A \) can be used to represent a linear transformation from one vector space another. Any point \(x\) in the domain maps to \(y=Ax\) in the co-domain. Each standard basis vector gets mapped to the corresponding column of \(A\). For example, $$ \begin{bmatrix} | & | & & | \\ A_1 & A_2 & \cdots & A_n \\ | & | & & | \\ \end{bmatrix} \begin{bmatrix} 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix} = \begin{bmatrix} | \\ A_2 \\ | \end{bmatrix} $$ A general vector \(x\) with nonzero components in multiple coordinates gets mapped to a weighted sum of the columns of \(A\) $$ y = \begin{bmatrix} | & & | \\ A_1 & \cdots & A_n \\ | & & | \\ \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} $$ $$ y = \begin{bmatrix} | \\ A_1 \\ | \end{bmatrix} x_1 + \cdots + \begin{bmatrix} | \\ A_n \\ | \end{bmatrix} x_n $$ Another way of saying this is that \(Ax\) is a short hand for a linear combination of the columns of \(A\) with coefficients given by the elements of \(x\). Note that we can map sets of \(x\)'s (such as the simplex, unit sphere, etc) through \(A\) as well as points. The span of the columns of \(A\) is called the "column space" or the "range" of \(A\) (more below).

COORDINATES & BASES

A vector \(x\) can be written as a linear combination of the "standard basis" vectors with the coefficients being the elements of the vector $$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} x_1 + \begin{bmatrix} 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix} x_2 + \cdots + \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} x_n $$ A "basis" for a space is a set of vectors that span the whole space without any redundant vectors (more on this later). Clearly any vector can be written as a linear combination of the standard basis vectors.

A different set of vectors \(P_1,\dots,P_n\) could also be used as a basis for \(\mathbb{R}^n\), meaning any vector \(x\) in the space can be written as a linear combination of \(P_1,\dots,P_n\). Writing these vectors as the columns of a matrix \(P\), we can write this as $$ x = Px' $$ $$ \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} | & & | \\ P_1 & \cdots & P_n \\ | & & | \end{bmatrix} \begin{bmatrix} x'_1 \\ \vdots \\ x'_n \end{bmatrix} $$ The matrix \(P\) is called a "coordinate transformation" on the space \(\mathbb{R}^n \), \(x\) is a "set of coordinates" with respect to the standard basis, and \(x'\) is a set of coordinates with respect to the basis \(P\). Any vector \(x\) can be written as a linear combination of the columns of \(P\) assuming the columns of \(P\) form a basis. These conditions are equivalent to saying that \(P\) is "invertible".

INVERSES

We say \(P\) is invertible if there is a unique \(x'\) that solves \(x=Px'\) for any \(x \in \mathbb{R}^n \). The phrase "for any \(x\)" requires that the columns of \(P\) span \(\mathbb{R}^n \). For \(x'\) to be unique, we need each column of \(P\) to point in a different direction in \(\mathbb{R}^n\) (again more later). The inverse of \(P\) is the matrix \(P^{-1}\) such that \(PP^{-1} = I \) Expanding this equation we have that $$ I = PP^{-1} = \Bigg[ \ \ P \ \ \Bigg] \begin{bmatrix} | & & | \\ Q_1 & \cdots & Q_n \\ | & & | \end{bmatrix} = \begin{bmatrix} | & & | \\ PQ_1 & \cdots & PQ_n \\ | & & | \end{bmatrix} $$ Thus, \(Q_i\), the \(i\)th column of \(P^{-1}\) is the coordinates of the \(i\)th standard basis vector with respect to the columns of P. Setting \(x' = P^{-1}x\) then gives that \(x=Px'\) will be satisfied. Note that for square, invertible \(P\), \(P^{-1}\) is unique and \(P^{-1}P=I\). While maybe not surprising, proving these facts is slightly more involved.

MATRIX ADDITION

Matrix addition happens columnwise (or row-wise). Placing the columns (or rows) tip-to-tail produces the final result. In order for addition to be possible the matrices must have the same dimensions (both row and cols). Since addition is commutative, we can reorder the matrices without changing the result $$ A + B + C = \Bigg[\ \ A \ \ \Bigg] \begin{bmatrix} | & & | \\ A_1 + B_1 + C_1 & \cdots & A_n + B_n + C_n \\ | & & | \\ \end{bmatrix} $$ Matrix subtraction is just matrix addition where the second matrix is negated. If we allow the columns (or rows) to "float" as opposed to being attached to the origin the difference between two matrices can be thought of as the vectors going from the second matrix to the first matrix.

MATRIX MULTIPLICATION

When we multiply two matrices \(A\) and \(B\) together \(AB\) we can think of \(A\) as acting on each of the columns of \(B\) separately $$ AB = \Bigg[\ \ A \ \ \Bigg] \begin{bmatrix} | & & | \\ B_1 & \cdots & B_n \\ | & & | \\ \end{bmatrix} $$ $$ AB = \begin{bmatrix} | & & | \\ AB_1 & \cdots & AB_n \\ | & & | \\ \end{bmatrix} $$ Visually, the columns of the matrix \(A\) give the vectors that the standard bases vectors map to under the transformation \(A\). If we can picture the columns of \(B\) by themselves, the columns of \(AB\) are just where the columns of \(B\) would be relative to the basis of the columns of \(A\). Squinting our eyes in this way we can see each of the successive transformations represented by a multiplication of the form. $$ M = ABCD... $$.

INNER PRODUCTS & ORTHOGONALITY

Vectors contain both the notion of length and direction. While magnitude is a property that vectors share with regular numbers, direction is a uniquely vector property. As such we can talk about the relative magnitude of two vectors and we can also talk about whether or not they point in similar directions, opposite directions or whether they are "perpendicular" or "orthogonal" to each other. The "inner product" or "dot product" of two vectors is at the heart of this direction comparison. Using matrix multiplication notation, if we think of one vector as a row and one as a column, the inner product is given by $$ y^Tx = \sum_i y_ix_i = |y||x|cos(\theta) $$ where \(\theat\) is the angle between two vectors. Needs more. I'm bored.

ROWS

Multiplying a matrix by a vector can also be thought of in terms of the rows of the matrix as opposed to the columns. Given the equation \(y=Ax\), \(y_i\) is determined by the inner product between \(x\) and the \(i\)th row of \(A\) $$ y = Ax $$ $$ \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix = \begin{bmatrix} -- & a_1^T & -- \\ & \vdots & \\ -- & a_m^T & -- \end{bmatrix} \begin{bmatrix } | \\ x \\ | \end{bmatrix} = \begin{bmatrix} a_1^Tx \\ \vdots \\ a_m^Tx \end{bmatrix} $$ As such the direction of the \(i\)th row of \(A\) can be thought of as the direction in the domain which \(x\) should point to have the greatest impact on \(y_i\), ie. the rows of \(A\) could be thought of as input directions of \(A\). A slightly more insightful way to define input directions of matrix is not to think about \(a_i^T\) as an input direction but rather to think about the direction(s) orthogonal to the other \(m-1\) rows as defining an input direction for \(y_i\). For example, suppose \(x\) is chosen to be orthogonal to all but the second row \(a_2^T\). \(Ax\) is then given by $$ Ax = \begin{bmatrix} a_1^Tx \\ a_2^Tx \\ \vdots \\ a_m^Tx \end{bmatrix} = \begin{bmatrix} 0 \\ a_2^Tx \\ \vdots \\ 0 \end{bmatrix $$ \(a_i^T\) (in this case \(a_2^T\)) then determines how much \(y_i\) changes for a given magnitude \(x\) (based on both magnitude and relative orientation) and also helps define an orthogonal input direction for all the other indices of \(y\).

RANGE

The "range" of a matrix \(A\) is the span of the columns, ie. the set of vectors \(y\) in the co-domain for which you could find an \(x\) such that \(y=Ax\). If the range of \(A\) is all of the co-domain, we say that the matrix \(A\) is onto. As a rule of thumb, fat matrices are onto (assuming there are enough linearly independent columns). If a matrix is tall (or there are not enough linearly independent columns), then there is a subspace of the co-domain that is not reachable through \(A\). An equation of the form \(y=Ax\) for a tall \(A\) will usually not have a solution because it is not guaranteed that every \(y\) is in the range of \(A\) (in other words, \(y=Ax\) is actually false for all \(x\)'s.) At best, we can solve the equation \(\text{proj}_Ay = Ax\) where \(\text{proj}_Ay\) is the closest vector to \(y\) within the range of \(A\). This is the well-studied "least-squares solution" in which we choose \(x\) to minimize the norm-squared of \(y-Ax\) $$ \min_x \quad \big|\big|y-Ax \big|\big|^2_2 = (y-Ax)^T(y-Ax) $$ with the the optimal \(x\) given by $$ x = (A^TA)^{-1}A^Ty $$ \(Ax\) is then given by $$ Ax = A(A^TA)^{-1}A^Ty $$ (which is the projection of \(y\) onto the range of A).

NULLSPACE

If some of the columns of \(A\) are linear combinations of other columns, then different sets of coordinates \(x\) could(will) produce the same resulting \(Ax\). For example, suppose for a \(3\times3\) matrix \(A\) the third column is a linear combination of the first two columns, ie. $$ \begin{bmatrix} | \\ A_3 \\ |\end{bmatrix} = \begin{bmatrix} | \\ A_1 \\ |\end{bmatrix} w_1 + \begin{bmatrix} | \\ A_2 \\ |\end{bmatrix} w_2 $$ Note then that we get \(Ax=\mathbf{0}\) if we choose \(x\) in the following way $$ Ax = \begin{bmatrix} | & | & | \\ A_1 & A_2 & A_3 \\ | & | & | \\ \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ -1 \end{bmatrix} = \begin{bmatrix} | \\ \mathbf{0} \\ | \end{bmatrix} $$ The vector \(x\) is said to be in the "nullspace" of \(A\) since \(Ax=\mathbf{0}\). In other words, elements in the nullspace are coordinates of 0.

In general for an \(m \times n\) matrix \(A\), suppose that all the columns of \(A\) lie in the span of the first \(k\) columns. we can extend the above technique to construct elements in the nullspace of \(A\) as follows. Let \(B_j \in \mathbb{R}^k\) be the coordinates of column \(A_j\) with respect to the first \(k\) columns of \(A\), ie. $$ \begin{bmatrix} | \\ A_j \\ |\end{bmatrix} = \begin{bmatrix} | & & | \\ A_1 & \cdots & A_k \\ | & & | \\ \end{bmatrix} \begin{bmatrix} | \\ B_j \\ |\end{bmatrix} $$ It follows then all the columns of the a matrix \(N \in \mathbb{R}^{n \times n-k}\) given below are in the nullspace of \(A\) $$ AN = \Bigg[ \ \ A' \quad A'' \ \ \Bigg] \begin{bmatrix} B \\ -I_{k+1} \end{bmatrix} = \Bigg[ \ \ \mathbf{0} \ \ \Bigg] $$ where $$ B = \small{ \begin{bmatrix} | & & | \\ B_{k+1} & \cdots & B_n \\ | & & | \end{bmatrix} } $$ and $$ \tiny{ A' = \begin{bmatrix} | & & | \\ A_{1} & \cdots & A_k \\ | & & | \end{bmatrix} \ \ A'' = \begin{bmatrix} | & & | \\ A_{k+1} & \cdots & A_n \\ | & & | \end{bmatrix} } $$ We can actually show that the columns of \(N\) constructed this way are actually a basis for the nullspace of \(A\). If we chose other columns (rather than the first \(k\)) to span the column space of \(A\), we could do the same construction but we would have to reorder the rows of \(N\).

We say the nullspace of a matrix is "non-trivial" if it contains more than the zero-vector. (The zero-vector is obviously always in the nullspace.)

If two sets of coordinates \(x\) and \(x'\) differ by an element of the nullspace, ie. \(x-x'\) is in the nullspace then they will both map through \(A\) to the same point $$ 0 = A(x-x') \qquad \Rightarrow \qquad Ax = Ax' $$ This means equations of the form \(y=Ax\) do not have unique solutions when \(A\) has a non-trivial nullspace.

FUNDAMENTAL THEOREM OF LINEAR ALGEBRA

As discussed above, any matrix \(A \in \mathbb{R}^{m \times n}\) divides up the domain and co-domain into two orthogonal subspaces respectively. \(\mathcal{R}(A^T)\) and \(\mathcal{N}(A)\) together span the domain. \(\mathcal{R}(A)\) and \(\mathcal{N}(A^T)\) together span the co-domain. Each of these four subspaces can be visualized in the co-domain - in terms of columns of the matrix \(A\) and also in the domain - in terms of the rows of the matrix \(A\).