Matrices can be thought of as vector style objects (in a vector space) and also as linear operators between vector spaces. There are many norm ("size") concepts for matrices in both of these settings. When we treat matrices simply as vectors we can apply the standard vector norm concepts. These norms are useful in some settings but do not capture the structure of the linear operator the matrix represents. Norms that treat matrices as linear operators are called induced norms and capture more of the operator structure of the matrix. We will discuss both norm types below, with more of a focus on operator/induced norms.
VECTOR-STYLE NORMSWhen we think of a matrix as a vector the standard \(p\)-norms (1-norm, 2-norm, \(\infty\)-norm, etc.) all still apply with their usual definitions. Some of them have special names. Specifically, the Frobenius norm refers to the standard vector 2-norm applied to a matrix.
Frobenius NormThe Frobenius norm of a matrix \( A \in \mathbb{R}^{m \times n} \) is denoted as \( \Vert A \Vert_F\) and given by $$ \Vert A \Vert_F = \left(\sum_{ij} (A_{ij})^2\right)^{\tfrac{1}{2}} = \left(Tr\big(A^TA\big)\right)^{\tfrac{1}{2}} = \left(Tr\big(AA^T\big)\right)^{\tfrac{1}{2}} $$ Note that this is the vector 2-norm appllied to \(A\) and that the trace formulas on the right are computable for any size \(A\) and gives the desired sum (though computing the full matrices \(A^TA\) or \(AA^T\) does many extra/unneeded computations).
From properties of traces, we can quickly derive the relationship between the Frobenius norm and the singular value decomposition. Let the SVD of \(A\) be given by $$ A = U \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}V^T $$ We can then quickly compute that $$ Tr \big(A^TA\big) = Tr\left(V \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}U^T U\begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}V^T\right) $$ $$ = Tr \left( V\begin{bmatrix} \Sigma^2 & 0 \\ 0 & 0 \end{bmatrix}V^T \right) = Tr \left( \begin{bmatrix} \Sigma^2 & 0 \\ 0 & 0 \end{bmatrix}V^TV \right) = Tr \big(\Sigma^2 \big) $$ Thus we have that the Frobenius norm is also the 2-norm applied to a vector of the singular values of a matrix. $$ \Vert A \Vert_F = \left(\sum_{i}(\sigma_i)^2\right)^{\tfrac{1}{2}} $$ The trace definitions above make the Frobenius norm often easy to work with algebraically (since matrices commute (cyclically) inside the trace operator). Often in optimization when we want to minimize the size of a matrix, we use the squared Frobenius-norm as an objective function.
Induced norms, or operator norms, measure how much a matrix changes the size of vectors it operates on. To quantify this exactly we ask what is the maximum increase in size a matrix can ad to a unit vector in the domain. Note that this implies a notation of size, ie. a vector norm both to define a unit vector in the domain as well as the size of the output vector in the codomain. The vector norms we pick for the domain or codomain can be the same or different. We often denote an induced norm as \(\Vert A\Vert_{q,p}\) where \(p\) refers to the norm type in the domain and \(q\) refers to the norm type in the co-domain. Explicitly, we can use the following definitions for an induced norm $$ \Vert A\Vert_{q,p} = \max_{\Vert x \Vert_p=1 } \ \Vert Ax \Vert_q = \max_{\Vert x \Vert_p \neq 0 } \ \frac{\Vert Ax \Vert_q}{\Vert x \Vert_p} $$ In the first definition, we find the \(x\) constrained to the (\(p\)-norm) unit ball, that maximizes the size (with respect to the \(q\)-norm) after it is transformed by \(A\). In the second definition, we no longer explicitly require that \(x\) is on the unit ball (only that \(x \neq 0\)) since we normalize by the length of \(x\) when we do the maximization. Both definitions are equivalent and either can be used. If \(q\) and \(p\) are the same, we sometimes use \(\Vert \cdot \Vert_p\) to refer to \(\Vert \cdot \Vert_{p,p}\) if the fact we're using an induced norm is clear from context.
For several specifc induced norms, we can derive further relationships to the structure of the matrix. The \(2,2\)-norm or induced 2-norm is given by the maximum singular value of the matrix and the \(1,1\)-norm, \(1,\infty\)-norm, the \(1,\infty\)-norm, and the \(\infty,\infty\)-norm all have relationships with the magnitude of the rows and columns that I can't remember
\( ( 2,2 ) \)-norm:The induced 2-norm can be shown to be the maximum singular value of the matrix. Assuming the singular value decomposition of \( A \) shown above (and also using some of the steps from the above expressions for \( A^TA \) ) we have that $$ \Vert A \Vert_{2,2} = \max_{ \Vert x \Vert_2 = 1} \ \Vert Ax \Vert_2 = \max_{ \Vert x \Vert_2 = 1} \ \big( x^TA^TAx \big)^{\tfrac{1}{2}} $$ $$ = \max_{ \Vert x \Vert_2 = 1} \ \left( x^TV \begin{bmatrix} \Sigma^2 & 0 \\ 0 & 0 \end{bmatrix} V^T x \right)^{ \tfrac{1}{2} } = \max_{ \Vert x' \Vert_2 = 1} \ \left( (x')^T\begin{bmatrix} \Sigma^2 & 0 \\ 0 & 0 \end{bmatrix}x'\right)^{\tfrac{1}{2}} $$ with \( x = Vx' \) where we have used the fact that since \( V \) is orthonormal, the unit ball constraint on \(x\) is equivalent to a unit ball constraint on \( x' \). $$ 1 = \Vert x \Vert = \Vert Vx' \Vert = \Vert x' \Vert $$ The maximum is achieved by putting all the weight of \( x'\) on the largest singular value. If we order the singular values from largest to smallest (as is standard) this means taking \( x' = I_1\), the first standard basis vector. We then get $$ \Vert A \Vert_{2,2} = \max_{\Vert x' \Vert_2 = 1} \ \left( (x')^T\begin{bmatrix} \Sigma^2 & 0 \\ 0 & 0 \end{bmatrix}x' \right)^{\tfrac{1}{2}} = \left(I_1^T\begin{bmatrix} \Sigma^2 & 0 \\ 0 & 0 \end{bmatrix}I_1\right)^{\tfrac{1}{2}} = \sigma_{\max}(A) $$ Note the above imples that the induced 2-norm is always smaller than the Frobenius norm \( \Vert A \Vert_F \geq \Vert A \Vert_{2,2}\).