.dcmath - UNDER CONSTRUCTION

HIGH DIMENSIONAL VECTOR DIRECTIONS (Difficulty: 5/10)

In higher (10+) dimensions, ordinary spatial intuition for vectors breaks down and only parallel axis representations are possible. As discussed previously unit vectors in high dimensions have somewhat counter-intuitive forms. We show a parallel axis representation for unit vectors in eight and sixteen dimensions below. Note that only a few elements of each vector can be substantively larger than 0 at any time.

In low dimensions with a spatial representation of vectors, we can easily see when two vectors line up or point in opposite directions or are perpendicular. For high dimensional vectors, we are forced to use a parallel axis representation of vectors. In order to "see" orthogonality and correlation between vectors in a parallel axis representation (in low or high dimensions) we must develop a different set of intuition.

Center of Mass Visualization

One potential method is the following. Take parallel axis representations of two vectors \(x\) and \(y\). Rotate one by 90 degrees and then line up the axes as pictured below. Draw rectangles with sides defined by \(x_i\) and \(y_i\) and area given by \(x_iy_i\). View this construction from the end on (again as shown below) and the center of mass gives an indication of what the inner product is. If the center of mass lies on a line at \(45^o\) the vectors are highly correlated; if the center of mass lies on a line at \(-45^o\), the vectors are negatively correlated; and if the center of mass lines up with the horizontal or vertical axes, then the vectors are orthogonal. To make precise statements like \(y^Tx =1 \) or \(y^Tx = -1 \), etc, we need the extra assumption that \(x\) and \(y\) are unit vectors, but the correlation intuition holds even if this is not the case.

Data Fitting Visualization

For very high dimensional vectors, we get a similar intuitive picture if we borrow some well known ideas from statistics and data fitting. First represent two high dimensional vectors \(x\) and \(y\) $$ \begin{bmatrix} x_1 & \cdots & x_n \end{bmatrix}, \qquad \begin{bmatrix} y_1 & \cdots & y_n \end{bmatrix}, $$ as points along two separate axes. This is a parallel axis representation for each vector - with the directions of variation drawn perpendicular to each other. We can then visualize the values in the \(i\)th dimension as a point \((x_i, y_i)\) in a 2D plane. Again note here that the dimensions of the vectors is the number of points and the 2D plane represents the fact we have two vectors. To isolate the directionality of the vectors assume both \(x\) and \(y\) are unit vectors. (If the vectors are not unit vectors originally, we can simply think of scaling the axes so that the vectors are inside the unit ball.

Intuition from Least Squares

We first consider the simple linear model $$ y_i = m x_i, \qquad i=1,\dots,n $$ with slope \(m\) and 0 vertical intercept. From the data cloud of points \((x_i,y_i)\), we can easily intuitively see what the best fit slope \(m\) should be. The formula for the optimal slope \(m\) is closely related to the inner product and is in fact exactly the inner product if \(x\) is a unit vector. $$ m = (x^Tx)^{-1}x^Ty = x^Ty $$ In statistics the inner product \(x^Ty\) is often called the correlation. For two arbitrary unit vectors, the correlation varies between 1 and -1. A best fit slope of 1 means that \(x^Ty \approx 1 \); the points \((x_i,y_i)\) lie along the line \(y=x\) and \(x\) and \(y\) point in about the same direction. A best fit slope of -1 means \(x^Ty \approx -1); the points lie along the line \(y=-x\) and the vectors point in opposite directions. A best fit slope of 0 means \(x^Ty \approx 0\); the points lie along the line \(y=0\) and the vectors \(x\) and \(y\) are orthogonal to each other. These relationships are illustrated in the figure below

Note: the slope here is chosen to minimize the sum of squares error $$ \sum_i \ |y_i - mx_i|^2 $$ which is the vertical distance between the best fit line and the points. Because of this for two unit vectors \(x\) and \(y\), the best fit slope will always be between 1 and -1. We could easily switch the roles of the horizontal and vertical axes and fit a curve of the form $$ x_i = m'y_i, \qquad i =1,\dots,n $$ In this case, we would end up minimizing the horizontal distance between the best fit line and the points. In this case the slope of the best fit line would range from 1 to \(\infty\) and then from \(-\infty\) to -1 and \(m' = 1/m\). There is no difference between these two for the intuition we are seeking. Additionally, if mentally look for a best fit line depending on the distribution of the data we may naturally do one or the other so we should consider both cases. compute if we look at a point cloud just
We can summarize our practical intuition in the following way. For unit vectors \(x\) and \(y\), if the point cloud lines up with the line \(y=x\), then \(y^Tx \approx 1\). If the point cloud lines up with the line \(y=-x\), then \(y^Tx \approx -1\). If the point cloud lines up with either the vertical or horizontal axis (the point cloud is symmetric) then \(y^Tx \approx 0\) and \(y\) and \(x\) are close to orthogonal. This intuition is illustrated below.

Note that in the context of data fitting, statistics, and probability, this perspective is quite common. For example, it is well known that a multivariate normal distributions for 2D vector of the form \((x_i,y_i)\) is symmetric across the axes if the \(x_i\) and \(y_i\) elements are independently distributed or uncorrelated, ie. \(\sum_i x_iy_i = x^Ty \approx 0 \) for data samples \(i=1,\dots,n\)