diff --git a/VisualizingAMatrix.pdf b/VisualizingAMatrix.pdf index 4487007..89c2681 100644 Binary files a/VisualizingAMatrix.pdf and b/VisualizingAMatrix.pdf differ diff --git a/VisualizingAMatrix.tex b/VisualizingAMatrix.tex index 715b9d5..7199414 100644 --- a/VisualizingAMatrix.tex +++ b/VisualizingAMatrix.tex @@ -11,7 +11,6 @@ \graphicspath{ {./images/} } \usepackage{amsfonts} -\usepackage{graphicx} \usepackage{epstopdf} \usepackage{algorithmic} \usepackage{float} @@ -124,8 +123,8 @@ \section{Introduction} u_2 \\ u_3 \\ \vdots - \label{eq:vectorelements} \end{pmatrix} +\label{eq:vectorelements} \end{equation} Now in order to write a numeric representation for a vector like \eqref{eq:vectorelements} we need to @@ -134,7 +133,7 @@ \section{Introduction} visualized as an arrow in space. The arrow has a length and points in a specific direction. The arrow has an independent existence outside of any particular coordinate system imposed on the space. Therefore, the -actual numbers in \cref{eq:vectorelements} may change depending upon +actual numbers in \eqref{eq:vectorelements} may change depending upon our choice of coordinate system, but the direction and length of the arrow remain the same. @@ -159,11 +158,11 @@ \section{Introduction} \left\lVert{u}\right\rVert_2 = \sqrt{u_1^2 + u_2^2 + u_3^2} = \sqrt{\sum_{i=1}^3 u_i^2} \label{eq:normdef} \end{equation} -More exactly, this is the 2-norm or Euclidian norm of a 3-dimensional vector. +More exactly, this is the 2-norm or Euclidean norm of a 3-dimensional vector. The 2-norm is the most frequently used vector norm in engineering and physics applications. Generalizing the definition of norm to vectors of -higher dimension is an obvious extension of \cref{eq:normdef}. +higher dimension is an obvious extension of \eqref{eq:normdef}. Visualizing a vector as an arrow in space serves well in many disciplines, especially @@ -183,7 +182,7 @@ \section{The matrix as linear transform} and intuitive. But how to visualize a matrix? Is there some geometric picture which we can carry in our heads which captures important properties of a matrix? The answer is "yes" -- that's the subject of this article. -The starting point to visualizing a matrix is +The starting point for visualizing a matrix is to consider the action of matrix-vector multiplication. For most of this article we will work in 2 dimensions (2x2 matrix), and consider only square matrices. @@ -276,7 +275,7 @@ \subsection{Transforming a unit square} form a parallelogram (as opposed to forming some other shape). \FloatBarrier -Matlab code to perform this operation is presented in \cref{mat:ActionUnitSquare}. If +Matlab code to perform this operation is presented in \eqref{mat:ActionUnitSquare}. If you run this code repeatedly it will generate a new random matrix with each run, and you will get a different output parallelogram each time. This is shown in \cref{fig:4Parallelograms}. @@ -287,7 +286,7 @@ \subsection{Transforming a unit square} \label{fig:4Parallelograms} \end{figure} Examining the results produced by several different matrices shows that -the resulting square is both stretched and rotated by the action of $A$. This is +the resulting parallelogram is both stretched and rotated by the action of $A$. This is an important observation which we will return to in subsequent sections. @@ -306,7 +305,7 @@ \subsection{Transforming a unit square} D = \det(A) = \alpha \gamma - \beta \delta \label{eq:detareaformula} \end{equation} -A "proof without words" of \cref{eq:detareaformula} is shown in \cref{fig:ParallelogramAreaDetFormula}. +A "proof without words" of \eqref{eq:detareaformula} is shown in \cref{fig:ParallelogramAreaDetFormula}. \begin{figure}[thb] \centering \includegraphics[width=0.7\columnwidth]{ParallelogramAreaDetFormula.png} @@ -316,7 +315,7 @@ \subsection{Transforming a unit square} is rearranged into a form where Area = $\alpha \gamma - \beta \delta$ may be deduced by inspection.} \label{fig:ParallelogramAreaDetFormula} \end{figure} -There are a couple of interesting observations to make about \cref{eq:detareaformula}: +There are a couple of interesting observations to make about \eqref{eq:detareaformula}: \begin{itemize} \item If $A$ is singular, then $\det(A) = 0$. In this case, the parallelogram reduces to a line segment with zero area (or possibly even just a point at the origin @@ -330,7 +329,7 @@ \subsection{Transforming a unit square} orders: first right then left, or first left then right. This is similar to computing the "right hand rule" associated with the vector cross product. That is, the area reported by - \cref{eq:detareaformula} is a "signed area" which can take on + \eqref{eq:detareaformula} is a "signed area" which can take on either sign depending upon the details of how matrix $A$ is constructed. \end{itemize} @@ -434,7 +433,7 @@ \subsection{Application: Rotation and stretching matrices} \end{equation} where $I$ is the identity matrix. The proof is based on looking at the dot products of the constituent column vectors -- try doing it yourself! - Another way of expressing \cref{eq:orthogonalmatrix} is to recognize that the + Another way of expressing \eqref{eq:orthogonalmatrix} is to recognize that the transpose of an orthogonal matrix is its inverse, $$ Q^T = Q^{-1} @@ -461,7 +460,7 @@ \subsection{Application: Rotation and stretching matrices} is a rectangle. We will examine the case of stretching in arbitrary directions in -section \cref{rotstretchrot} below, which describes the singular value decomposition. +\eqref{rotstretchrot} below, which describes the singular value decomposition. \section{Transforming a unit circle} \label{sec:unit_ball} @@ -514,7 +513,7 @@ \section{Transforming a unit circle} Each point $u = (x(\theta), y(\theta))^T$ -satisfies \cref{eq:unit_ball}, so each of these points lies on the +satisfies \eqref{eq:unit_ball}, so each of these points lies on the unit circle. Therefore, this representation provides a simple algorithm to create a unit circle in the plane: just sweep $\theta$ from $0$ to $2\pi$ and plot the points $(x(\theta), y(\theta))^T$. @@ -525,8 +524,8 @@ \section{Transforming a unit circle} transformed by matrix-vector multiplication. The idea is to take every unit vector $u$ and multiply by the matrix $A$, giving a new point $v$. Then plot the set of all points $v$ generated by $v = A u$. -A Matlab program which implements this idea is given in \cref{Mat:ActionUnitBall}. The circle and its -transformed image are shown in \cref{fig:CircleTransformedToEllipse}. +A Matlab program which implements this idea is given in \eqref{Mat:ActionUnitBall}. The circle and its +transformed image are shown in \eqref{fig:CircleTransformedToEllipse}. \begin{figure}[thb] \centering \includegraphics[width=0.7\columnwidth]{CircleTransformedToEllipse.png} @@ -537,14 +536,14 @@ \section{Transforming a unit circle} transformed into an ellipse. The shape, size, and rotation of the ellipse are given by the properties of the matrix $A$. Different matrices will create different ellipses having different shapes, sizes, and rotations. This is illustrated -by running the program in \cref{Mat:ActionUnitBall} several times for different, randomly generated +by running the program in \eqref{Mat:ActionUnitBall} several times for different, randomly generated $A$ matrices. The results obtained by operating on a circle using four different random matrices are shown in \cref{fig:4Ellipses} \begin{figure}[thb] \centering \includegraphics[width=1.0\columnwidth]{4Ellipses.png} \caption{Repeated runs of the unit circle program produce different random matrices $A$ which - product different output ellipses. The original unit circle is red, the output ellipses are blue.} + produce different output ellipses. The original unit circle is red, the output ellipses are blue.} \label{fig:4Ellipses} \end{figure} @@ -587,7 +586,7 @@ \subsection{Matrix norm} \item $\left\lVert{M+N}\right\rVert \leq \left\lVert{M}\right\rVert + \left\lVert{N}\right\rVert$ - \item if $M$ is a zero object, then $\left\lVert{M}\right\rVert = 0$. + \item $\left\lVert{M}\right\rVert = 0$ if and only if $M$ is a zero object. \end{itemize} We note that these rules leave a lot of flexibility for defining a norm. This flexibility manifests itself when considering computing @@ -596,7 +595,7 @@ \subsection{Matrix norm} \begin{itemize} \item $L_1$ norm: $\left\lVert{x}\right\rVert = \sum_i \lvert x_i\rvert$ - \item $L_2$ (Euclidian) norm: $\left\lVert{x}\right\rVert = \sqrt{\sum_i x_i^2}$ + \item $L_2$ (Euclidean) norm: $\left\lVert{x}\right\rVert = \sqrt{\sum_i x_i^2}$ \item $L_\infty$ norm: $\left\lVert{x}\right\rVert = \max\limits_{i} \lvert x_i \rvert$ \end{itemize} @@ -616,7 +615,7 @@ \subsection{Matrix norm} The induced matrix norm is found as follows. Consider an input vector -$u$ with norm $\left\lVert {u}\right\rVert$. We use the Euclidian ($L_2$) norm for the vector +$u$ with norm $\left\lVert {u}\right\rVert$. We use the Euclidean ($L_2$) norm for the vector norm. Now consider operating on $u$ with $A$ via matrix-vector multiplication, $v = A u$. The output vector $v$ will also have a norm, $\left\lVert {v}\right\rVert$. Now ask, what is the @@ -642,8 +641,8 @@ \subsection{Matrix norm} \label{eq:inducednorm} \begin{aligned} \left\lVert {A}\right\rVert -& = \argmax_u \frac{\left\lVert{v}\right\rVert}{\left\lVert{u}\right\rVert} \\ -& = \argmax_u \frac{\left\lVert{A u}\right\rVert}{\left\lVert{u}\right\rVert} +& = \max_u \frac{\left\lVert{v}\right\rVert}{\left\lVert{u}\right\rVert} \\ +& = \max_u \frac{\left\lVert{A u}\right\rVert}{\left\lVert{u}\right\rVert} \end{aligned} \end{equation*} This is the definition of induced matrix norm ordinarily found in @@ -792,10 +791,10 @@ \subsection{Condition number} Ax = b \end{equation} is to small perturbations of $A$ and $b$. In general, the value -of $x$ obtained by solving \cref{eq:linearsystem} will be +of $x$ obtained by solving \eqref{eq:linearsystem} will be inexact due to rounding error and other numerical non-idealities. You might think that if $A$ is close to singular, the -error incurred in solving \cref{eq:linearsystem} will be larger than +error incurred in solving \eqref{eq:linearsystem} will be larger than if $A$ is far from singular. This is indeed the case in practice, and condition number $\kappa$ quantifies this effect. @@ -809,7 +808,7 @@ \subsection{Condition number} discouraged. (The reason is that computing the matrix inverse $A^{-1}$ is an $O(n^3)$ operation, while the condition number is more easily computed using the SVD, which is typically calculated -iteratively.) Nonetheless, you can see from \cref{eq:condno} +iteratively.) Nonetheless, you can see from \eqref{eq:condno} the general behavior of $\kappa$. On one hand, the most stable matrix is the identity matrix, $$ @@ -835,7 +834,7 @@ \subsection{Condition number} is the matrix to singular. In this case, solving the system $A x = b$ is very sensitive to small changes in $A$ and $b$ (i.e. from round-off) so the result $x$ may have large errors. In this case one says the matrix is - "badly conditioned". + "ill-conditioned". \end{itemize} This language extends to numeric problems themselves -- one can speak of well or @@ -860,15 +859,15 @@ \subsection{Condition number} $$ \kappa = \sigma_1 / \sigma_N $$ -This can be seen by considering \cref{eq:condno}. The norm of $A$ is +This can be seen by considering \eqref{eq:condno}. The norm of $A$ is the largest singular value $\sigma_1$. Meanwhile, taking the -SVD of $A^{-1}$ yeilds +SVD of $A^{-1}$ yields \begin{equation*} \begin{aligned} -A^{-1} & = (U S V^T)^{-1} -& = (V^T)^{-1} S^{-1} U^{-1} -& = V S^{-1} U^T -\end{aligned} +A^{-1} & = (U S V^T)^{-1} \\ +& = (V^T)^{-1} S^{-1} U^{-1} \\ +& = V S^{-1} U^T \\ +\end{aligned} \\ \end{equation*} and since $S$ is diagonal we have \begin{equation} @@ -881,7 +880,7 @@ \subsection{Condition number} \end{pmatrix} \end{equation} Accordingly, the norm of $A^{-1}$ is its largest singular value, -which is $1/\sigma_N$. Therefore, from the definition \cref{eq:condno} we have +which is $1/\sigma_N$. Therefore, from the definition \eqref{eq:condno} we have $\kappa = \sigma_1 / \sigma_N$. What does this mean for our ellipse visualization? It means that the condition @@ -974,7 +973,7 @@ \subsection{Application: Sensitivity of linear solve} \item Compute the backward error $e_{bkwd} = {\left\lVert b - \tilde{b} \right\rVert}/{\left\lVert b \right\rVert}$. \item Compute the ratio $\kappa_{comp} = e_{fwd}/e_{bkwd}$. \end{enumerate} -A Matlab program which implements this algorithm is presented in \cref{Mat:CondThm}. Running the program 10 times using +A Matlab program which implements this algorithm is presented in \eqref{Mat:CondThm}. Running the program 10 times using a condition number $\kappa = 100$ gives the following output: \\ \texttt{ \indent >> test\_cond\_thm(100) \\ @@ -991,7 +990,7 @@ \subsection{Application: Sensitivity of linear solve} } As can be seen, the inferred condition number is bounded by $\kappa = 100$ above. That is, although at least one trial got above 60, the ratio clearly never exceeded 100. This demonstrates that the inequality -\cref{condnothm} is obeyed, and shows +\eqref{condnothm} is obeyed, and shows how the condition number characterizes the sensitivity of the linear solve operation to the effects of input error or perturbations. @@ -1017,7 +1016,7 @@ \section{The quadratic form visualization} 2D plane, then the resulting $f(u)$ is a surface floating above the plane. -Although one my create a quadratic form using any arbitrary matrix, for +Although one may create a quadratic form using any arbitrary matrix, for what follows we will restrict ourselves to symmetric matrices $A$ with all real elements. The reasons for this are: \begin{itemize} @@ -1073,8 +1072,8 @@ \subsection{Eigenvalues and eigenvectors} \includegraphics[width=1.0\columnwidth]{QuadraticFormFixedSurfaces.png} \caption{From left to right: 1. Upward opening paraboloid: the matrix is positive definite, both eigenvalues are positive. 2. Downward opening paraboloid: the matrix is - negative definite, both eigenvalues are negative. 3. Saddle: The matrix is indeterminate, - one eigenvalue is positive, one is negative. 4. Cylidrical paraboloid: The matrix is + negative definite, both eigenvalues are negative. 3. Saddle: The matrix is indefinite, + one eigenvalue is positive, one is negative. 4. Cylindrical paraboloid: The matrix is singular, one eigenvalue is positive, the other is zero.} \label{fig:QuadraticFormFixedSurfaces} \end{figure} @@ -1085,7 +1084,7 @@ \subsection{Eigenvalues and eigenvectors} \label{eq:evd} A = Q \Lambda Q^T \end{equation} -where $Q$ is an orthonormal matrix, and $\Lambda$ is a diagonal matrix whose diagonal elements +where $Q$ is an orthogonal matrix, and $\Lambda$ is a diagonal matrix whose diagonal elements are the eigenvalues of matrix $A$. I will assume you have some basic familiarity with eigenvalues and eigenvectors; the subject is well treated in the standard linear algebra textbooks. One important thing to note is that by restricting ourselves to real, symmetric @@ -1104,8 +1103,8 @@ \subsection{Eigenvalues and eigenvectors} eigenvalue corresponds to a downward-facing parabola. If an eigenvalue is zero, it corresponds to one direction of the surface which faces neither up nor down -- this is the degenerate case shown in \cref{fig:QuadraticFormFixedSurfaces} - \item The two directions of principle curvature are given by the eigenvectors of $A$. The - term "principle curvature" means the minimum and maximum values of curvature along all + \item The two directions of principal curvature are given by the eigenvectors of $A$. The + term "principal curvature" means the minimum and maximum values of curvature along all lines passing through the origin. \item A positive definite matrix is one whose eigenvalues are all positive. Such matrices show up frequently in real-world problems. For a 2x2 matrix the corresponding surface is a paraboloid opening @@ -1123,7 +1122,7 @@ \subsection{Eigenvalues and eigenvectors} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %XXXX Need drawing showing vertical plates cutting paraboloid giving parabolas -%XXXX Need drawing showing how eigenvectors align along directions of principle curvature. +%XXXX Need drawing showing how eigenvectors align along directions of principal curvature. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -1136,17 +1135,17 @@ \subsection{Relation between ellipse and quadratic form visualizations} lines of intersection between the paraboloid and the slicing planes are called "levelsets" in optimization speak, or the "isolines" in computer graphics speak. Importantly for us, the lines are manifestly ellipses -- a fact that should -immediately remind us of the ellipse visualization presented in \cref{sec:unit_ball}. +immediately remind us of the ellipse visualization presented in \eqref{sec:unit_ball}. \begin{figure}[thb] \centering \includegraphics[width=0.7\columnwidth]{LevelsetsOfParabola.png} - \caption{The levelsets (isolines) of the sliced paraboliod are + \caption{The levelsets (isolines) of the sliced paraboloid are ellipses, shown here as solid green and blue lines corresponding to different slicing planes.} \label{fig:LevelsetsOfParabola} \end{figure} The natural question is, is there a relationship between these ellipses and -the ellipses generated by matrix-vector multiplication studied in \cref{sec:unit_ball}? +the ellipses generated by matrix-vector multiplication studied in \eqref{sec:unit_ball}? The answer is yes, the isoline ellipses depicted in \cref{fig:LevelsetsOfParabola} and the ellipses depicted in \cref{fig:CircleTransformedToEllipse} are indeed related. @@ -1163,16 +1162,18 @@ \subsection{Relation between ellipse and quadratic form visualizations} Next imagine that $A$ may be decomposed into a product $A = B^T B$. Since $A$ is positive definite we know this is the case -- the decomposition is essentially the Cholesky decomposition \cite{Higham2020_Chol}. -Using this, \cref{eq:quad_form} becomes +Using this, \eqref{eq:quad_form} becomes \begin{equation} -1 = u^T B^T B u \\ -= (u^T B^T) (B u) \\ -= e^T e +\begin{aligned} +1 = u^T B^T B u += (u^T B^T) (B u) += e^T e +\end{aligned} \end{equation} where $e = B u$ is a vector. This equation says that the 2-norm of $e$ is one. Therefore, the set of all vectors $e$ are unit vectors, and their tips lie on the the unit circle. -This was the starting point of the ellipse visualization back in section \cref{sec:unit_ball}! That -means the ellipses defined by \cref{eq:quad_form} are created by the matrix-vector +This was the starting point of the ellipse visualization back in section \eqref{sec:unit_ball}! That +means the ellipses defined by \eqref{eq:quad_form} are created by the matrix-vector multiplication $$ u = B^{-1} e @@ -1181,15 +1182,17 @@ \subsection{Relation between ellipse and quadratic form visualizations} So the the isoline ellipses and the ellipses generated from the unit circle under matrix-vector multiplication are indeed related. But can we find a straightforward -forumula connecting them? Consider the image of the unit circle under $A$, +formula connecting them? Consider the image of the unit circle under $A$, $$ w = A e $$ where $e$ is the unit circle and $w$ is its image as an ellipse. Knowing that, and using the definition of $B$ we have $$ -w = B^T B e \\ +\begin{aligned} +w = B^T B e = B^T B B u +\end{aligned} $$ This says that $w$ is the image of $u$ under matrix-vector multiplication. In particular, the first product $B u$ returns a unit circle, and then $B^T B$ transforms it into @@ -1213,7 +1216,7 @@ \subsection{Relation between ellipse and quadratic form visualizations} \end{figure} \subsection{Condition number and the quadratic form visualization} -Recall using the ellipse visualization in \cref{conditionnumber} as +Recall using the ellipse visualization in \eqref{conditionnumber} as a proxy of the condition number. Can we also use the quadratic form visualization as a way to understand the condition number of $A$? As usual, when I ask a rhetorical question in this article the answer is "yes". @@ -1400,9 +1403,7 @@ \subsection{Condition number theorem} \bibitem{Higham2020_BkwdErr} Nick Higham, "What Is Backward Error?", \url{https://nhigham.com/2020/03/25/what-is-backward-error/} -\bibitem{Trefethen97} For a full discussion, see L. Trefethen, and D. Bau, -"Numerical Linear Algebra", -SIAM, Philadelphia, (1997) +\bibitem{Trefethen97} L. Trefethen, and D. Bau, "Numerical Linear Algebra", SIAM, Philadelphia, (1997) \bibitem{Higham2020_Chol} Nick Higham, "What Is a Cholesky Factorization?", \url{https://nhigham.com/2020/08/11/what-is-a-cholesky-factorization/}