Diagonalizing matrices¶
As was mentioned above, diagonal matrices are particularly easy to compute with. This raises the question if (and how) it is possible to “bring” a given matrix \(A\) into such an easy form.
Definition 6.14
A square matrix \(A\) is called diagonalizable if there is an invertible matrix \(P \in {\mathrm {Mat}}_{n \times n}\) such that
where \(D=\left ( \begin{array}{ccc} d_{11} & {} & 0 \\ {} & \ddots & {} \\ 0 & {} & d_{nn} \end{array} \right )\) is a diagonal matrix.
An example showing the relevance of this notion is this: in the context of differential equations, one needs to compute the exponential of a square matrix \(A\), which is defined as
Here \(A^3 = A \cdot A \cdot A\) etc. Instead of computing all these powers of \(A\) one after another, one can use the above definition: if \(A\) is diagonalizable, i.e., \(PAP^{-1} = D\), then \(A = (P^{-1}P)A(P^{-1}P) = P^{-1}(PAP^{-1})P = P^{-1}DP\). Then,
etc. Computing the powers of \(D\), as opposed to those of \(A\) is easy: one just needs to raise the diagonal entries to the corresponding power.
Method 6.15 (Related exercises: Exercise 6.16, Exercise 6.7, Exercise 6.1, Exercise 6.3, Exercise 6.4, Exercise 6.6, Exercise 6.8)
In order to diagonalize a square matrix \(A \in {\mathrm {Mat}}_{n \times n}\), i.e., to determine whether \(P\) above exists, and to compute \(D\), one proceeds as follows:
-
Compute the eigenvalues of \(A\), for example by finding the zeros of the characteristic polynomial. Denote them by \(\lambda_1, \dots, \lambda_k\). Denote the eigenspaces by \(E_{\lambda_k}\).
-
The matrix \(A\) is diagonalizable precisely if
i.e., if the dimensions of the eigenspaces sum up to the size of the matrix \(A\).
- In this event, one may choose \(P\) to be the \(n \times n\)-matrix whose columns are the basis vectors of all the eigenspaces (for the various eigenvalues \(\lambda_1, \dots, \lambda_k\)). The matrix \(D\) is the diagonal matrix whose diagonal entries are
One can show that above, one always has \(k \le n\). One does this by proving that the sum of the subspaces \(E_{\lambda_1}, \dots, E_{\lambda_k}\) is a direct sum, so that
This implies the following:
Corollary 6.16 (Related exercises: Exercise 6.3, Exercise 6.6)
If an \(n \times n\)-matrix has \(n\) distinct eigenvalues, then it is diagonalizable.
Definition 6.17 (Related exercises: Exercise 6.3)
Let \(A \in {\mathrm {Mat}}_{n \times n}\) be given. A basis \(v_1, \dots, v_n\) of \({\bf R}^n\) is called an eigenbasis for \(A\) if each \(v_i\) is an eigenvector (for a certain eigenvalue) of \(A\).
Lemma 6.18 (Related exercises: Exercise 6.1)
For \(A \in {\mathrm {Mat}}_{n \times n}\) the following two statements are equivalent:
-
\(A\) is diagonalizable.
-
\(A\) admits an eigenbasis, i.e., there is an eigenbasis (of \({\bf R}^n\)) for \(A\).
One proves this by observing that if \(PAP^{-1}\) is a diagonal matrix, then \(P\) is a base-change matrix between the standard basis and an eigenbasis.
Example 6.19
The matrix \(A = \left ( \begin{array}{cc} 0 & -1 \\ -2 & 0 \end{array} \right )\) in Example 6.13 has two distinct eigenvalues, and is therefore diagonalizable (Corollary 6.16). An eigenbasis for \(A\) is
Example 6.20
We consider the shearing matrix
Its characteristic polynomial is \(\chi_A(t) = (1-t)^2\), whose only zero is \(t = 1\). Thus, \(A\) has this eigenvalue only: \(\lambda = 1\). We compute the eigenspace: consider the matrix \(B := A - \lambda {\mathrm {id}} = \left ( \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right )\). Writing, as usual, \(v = \left ( \begin{array}{c} x \\ y \end{array} \right ) \in {\bf R}^2\), the space of solutions of the homogeneous system
is our eigenspace, namely
This space is 1-dimensional, and has a basis consisting of the (single) vector \((1,0)\). Thus, \(A\) is not diagonalizable.
Example 6.21
We continue the discussion of the rotation matrix \(A = \left ( \begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array} \right )\). Its (complex) eigenvalues are \(\lambda_1 = i\), \(\lambda_2 = -i\). According to Corollary 6.16, \(A\) is diagonalizable. We compute the eigenspaces, where we regard \(A\) as a complex matrix:
If \(v = \left ( \begin{array}{c} z_1 \\ z_2 \end{array} \right )\), then
This means \(z_1 = i z_2\) from the second equation; the first is then also satisfied since \(-iz_1 - z_2 = -i(iz_2) - z_2=z_2-z_2 = 0\). Thus
i.e., as a complex vector space, \(E_i\) is 1-dimensional and a basis of it is the vector \((i,1)\).
Similarly,
Computing this leads to the linear system
This gives \(z_1 = -iz_2\), so that \(E_{-i} = \{(-iz, z) \ | \ z \in {{\bf C}}\}\), and a basis of it is the (single) vector \((-i, 1)\). Thus, the matrix \(P\) above is
Example 6.22
For \(A = \left ( \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right )\), \(\lambda = 0\) is an eigenvalue (and \(\left ( \begin{array}{c} 0 \\ 1 \end{array} \right )\) an 0-eigenvector) and \(\lambda = 1\) is another eigenvalue (and \(\left ( \begin{array}{c} 1 \\ 0 \end{array} \right )\) an 1-eigenvector).
Definition 6.23
We say that two square matrices \(A, B \in {\mathrm {Mat}}_{n \times n}\) are similar if there is an invertible matrix \(P\) such that \(PAP^{-1} = B\).
The question whether a given matrix \(A\) is diagonalizable is a special case of the following more general question: given two square matrices \(A, B\), are they similar in the sense of the above definition?
Proposition 6.24 (Related exercises: Exercise 6.12, Exercise 6.8)
If \(A\) and \(B\) are similar, then the following holds:
-
\(\det A = \det B\),
-
\(\mathrm{tr} A = \mathrm{tr} B\) (the traces, see Exercise 4.24),
-
\(\chi_A(t) = \chi_B(t)\),
-
\(A\) and \(B\) have the same eigenvalues. The eigenspaces are related like so: if \(B=PAP^{-1}\), then
Proof. Let \(B=PAP^{-1}\) with \(P\) invertible. For the determinant, using multiplicativity of the determinant (Proposition 5.18):
For the trace, we use the statement proved in Exercise 4.24 (marked * below):
For the characteristic polynomial the argument is similar as for the determinant:
Hence \(A\) and \(B\) have the same characteristic polynomial, and therefore the same eigenvalues.
Finally, let \(\lambda\in{\bf R}\). If \(v\in E_\lambda(A)\), then \(Av=\lambda v\), so
hence \(Pv\in E_\lambda(B)\). Thus \(P(E_\lambda(A))\subseteq E_\lambda(B)\). Conversely, if \(w\in E_\lambda(B)\), then \(Bw=\lambda w\), and multiplying by \(P^{-1}\) gives
so \(P^{-1}w\in E_\lambda(A)\), i.e. \(w\in P(E_\lambda(A))\). Therefore \(E_\lambda(B)=P(E_\lambda(A))\). ◻
Conversely, it is not true that if the above conditions hold, then \(A\) and \(B\) are similar: The matrices
both have 0 as their only eigenvalue. The eigenspaces \(E_0(A)\) and \(E_0(B)\) are both 2-dimensional, but \(A\) and \(B\) are not similar, as one can see by observing that \(A^2 \ne 0\) while \(B^2 = 0\). And if \(B = PAP^{-1}\), then \(B^2 = PAP^{-1}PAP^{-1} = PA^2P^{-1}\) and likewise \(B^3 = PA^3P^{-1}\), giving a contradiction. A sufficient condition for similarity can be expressed in terms of the so-called Jordan normal form, which we will not discuss in this course.