Skip to content

Outlook: current research

Since matrix multiplication is such a key asset, it is of great interest to perform this process as efficiently as possible. Given two \(2 \times 2\)-matrices \(A\) and \(B\), the computation of \(AB\) by just following the definition takes 8 multiplications, namely

\[ a_{ie} b_{ej} \]

for each of the indices \(i, j, e\) being either 1 or 2. In the 1960’s an algorithm (https://en.wikipedia.org/wiki/Strassen_algorithm) was found that only requires 7 multiplications. By applying that algorithm iteratively for larger matrices, this gives a decidedly better algorithm. Current research is using methods of artificial intelligence to try and come up with similar methods for \(3 \times 3\)- and other matrices. Check out this interesting lay-accessible article on recent trends: https://www.quantamagazine.org/ai-reveals-new-possibilities-in-matrix-multiplication-20221123/!

Kernel and image of a linear map

The kernel and the image of a linear map are an important measure how, roughly speaking, interesting this map is. E.g., the zero map \({\bf R}^2 \to {\bf R}^2\), \(\left ( \begin{array}{c} x \\ y \end{array} \right ) \mapsto \left ( \begin{array}{c} 0 \\ 0 \end{array} \right )\) is certainly very boring in the sense that it only produces the zero vector in \({\bf R}^2\). By contrast, say, a rotation (by a fixed angle \(r\)) in \({\bf R}^2\) is more interesting, since any point in \({\bf R}^2\) can be obtained from another point by rotating by that angle \(r\).

In order to introduce kernel and image, we need the following general notions related to maps between sets.

Definition 4.20

Let \(f : X \to Y\) be a function between two sets.

  • The preimage of some element \(y \in Y\) is
\[ f^{-1}(y) := \{ x \in X \ | \ f(x) = y\} \ (\subset X). \]
  • The image of \(f\) is defined as
\[ {\operatorname{im}\ } (f) := f(X) := \{ f(x) \ | \ x \in X \} \ (\subset Y). \]
  • \(f\) is called injective (or one-to-one) if for each \(y\), the preimage \(f^{-1}(y)\) contains at most one element.

  • \(f\) is called surjective (or onto) if for each \(y\), \(f^{-1}(y)\) contains at least one element. Equivalently, \(f\) is surjective if \({\operatorname{im}\ } (f) = Y\).

  • \(f\) is called bijective if it is both injective and surjective. In other words, if for each \(y \in Y\), \(f^{-1}(y)\) contains exactly one element.

Example 4.21

While in the applications below, we will often consider \(X\) and \(Y\) to be vector spaces, Definition 4.20 applies to maps between arbitrary sets. For example, consider a group of \(n\) people \(\{P_1, \dots, P_n \}\). Consider the function

\[ m : \{P_1, \dots, P_n \} \to \{1, 2, \dots, 12\} \]

that assigns to each person their month of birth. This function is surjective if for each month, one of the persons is born in that month. It is injective, if in each month only one birthday party is happening. It is bijective if both conditions are true, i.e., in every month there is exactly one birthday party (for one of the persons).

In the example above, the map \(m\) can only be bijective if \(n = 12\), i.e., if the size of the two sets is the same. For linear maps (between vector spaces) we want to articulate a similar idea, but simply saying that the size of the vector spaces are the same is insufficient, since \({\bf R}\), \({\bf R}^2\), \({\bf R}^3\) etc. all have infinitely many elements. Rather, we will see in Corollary 4.28 that the dimension of a vector space is the correct notion of size.

Definition 4.22 (Related exercises: Exercise 4.8, Exercise 4.9, Exercise 4.3, Exercise 4.9, Exercise 4.22)

Let \(f : V \to W\) be a linear map. The kernel of \(f\) is defined as

\[ \begin{align*} \ker (f) & := f^{-1}(0_W) \\ & = \{ v\in V \ | \ f(v) = 0_W \}. \end{align*} \]

Note that \(\ker (f) \subset V\) and \({\operatorname{im}\ } (f) \subset W\). In fact, these are not just arbitrary subsets:

Proposition 4.23 (Related exercises: Exercise 4.11, Exercise 4.12)

For a linear map \(f: V \to W\), \(\ker f\) is a subspace of \(V\), while \({\operatorname{im}\ } f\) is a subspace of \(W\).

Proof. We only check the conditions in Definition 3.17 for the kernel. (The case of the image is similar.)

  • \(0_V \in \ker f\): this means that \(f(0_V) = 0_W\), which holds by Remark 4.4.

  • For \(v, v' \in \ker f\) we check \(v+v' \in \ker f\): this means \(f(v+v') = 0_W\). Indeed, using that \(f\) is linear we have

\[ f(v+v') = f(v) + f(v') = 0_W + 0_W = 0_W. \]
  • For \(v \in \ker f\) and \(a \in {\bf R}\), we check \(av \in \ker f\): as before, using the linearity of \(f\), we have
\[ f(av) = af(v) = a\cdot 0_W = 0_W. \]

Example 4.24

We consider the matrix

\[ A = \left ( \begin{array}{ccc} 1 & 2 & 0 \\ 2 & 4 & 0 \end{array} \right ) \]

and the associated linear map

\[ f : {\bf R}^3 \to {\bf R}^2, v = \left ( \begin{array}{c} x \\ y \\ z \end{array} \right ) \mapsto Av = \left ( \begin{array}{c} x+2y \\ 2x+4y \end{array} \right ). \]

The kernel of \(f\) consists of vectors \(v\) such that

\[ \begin{align*} x+2y & = 0 \\ 2x+4y & =0. \end{align*} \]

This tells us that the kernel of \(f\), or equivalently the solutions of this system (in the unknowns \(x, y\) and \(z\)!), is

\[ \ker f = \left \{\left ( \begin{array}{c} -2y \\ y \\ z \end{array} \right ) \in {\bf R}^3 \ | \ y, z \in {\bf R} \right \}. \]

A basis of \(\ker f\) is given by the two vectors \(\left ( \begin{array}{c} -2 \\ 1 \\ 0 \end{array} \right )\) and \(\left ( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right )\).

The image of \(f\) consists of all vectors of the form

\[ v = \left ( \begin{array}{c} v_1 \\ v_2 \end{array} \right ) = \left ( \begin{array}{c} x+2y \\ 2x+4y \end{array} \right ), \]

with arbitrary \(x, y \in {\bf R}\). (Also, the \(z\) is arbitrary, but it does not show up in \(f\).) This means that \(v_2 = 2 v_1\), and \(v_1\) is an arbitrary real number. Thus

\[ {\operatorname{im}\ } f = \left \{ \left ( \begin{array}{c} v_1 \\ 2v_1 \end{array} \right ) \ | \ v_1 \in {\bf R} \right \} \subset {\bf R}^2. \]

A basis of \({\operatorname{im}\ } f\) is thus given by the vector \(\left ( \begin{array}{c} 1 \\ 2 \end{array} \right )\).

Our goal below is to develop an algorithmic method that determine bases of \(\ker f\), \({\operatorname{im}\ } f\). For now, just observe that in the example above

\[ \dim (\ker f) + \dim ({\operatorname{im}\ } f) = 2 + 1 = 3 = \dim {\bf R}^3. \]

This is an example of the rank-nullity theorem (Theorem 4.26) below.

Injectivity of linear maps can be measured in terms of the kernel:

Lemma 4.25 (Related exercises: Exercise 4.17)

Let \(f : V \to W\) be a linear map. Then the following are equivalent (i.e., one condition holds if and only if the other holds):

  1. \(f\) is injective,

  2. \(\ker f = \{0_V\}\).

Proof. Suppose \(f\) is injective, we prove \(\ker f = \{0\}\). Since \(f(0) = 0\) by linearity (Remark 4.4), we have \(0 \in \ker f\). If \(v \in \ker f\), then \(f(v) = 0_W\), so both \(v\) and \(0_V\) are in the preimage of \(0_W\). By the injectivity of \(f\), this forces \(v = 0\).

Conversely, suppose \(\ker f = 0\). Suppose two vectors \(v, v' \in V\) are in the preimage of some \(w \in W\), i.e., \(f(v) = f(v') = w\). Then, by linearity of \(f\)

\[ f(v-v') = f(v+(-1)v') = f(v) + (-1) f(v') = f(v) - f(v') = 0. \]

Thus, \(v-v' \in \ker f\), which means by assumption that \(v-v'=0\). That is: \(v = v'\). Therefore \(f\) is injective. ◻

Theorem 4.26 (Related exercises: Exercise 4.44, Exercise 4.10, Exercise 4.13, Exercise 4.16, Exercise 4.17, Exercise 4.3)

(Rank-nullity theorem) Let \(f: V \to W\) be a map between (finite-dimensional) vector spaces. Then

\[ \dim (\ker f) + \dim ({\operatorname{im}\ } f) = \dim V. \]

The rank of \(f\) is defined to be

\[ {\operatorname {rk}} f := \dim ({\operatorname{im}\ } f), \]

while the nullity of \(f\) is defined to be \(\dim (\ker f)\).

A proof of this theorem appears in any linear algebra textbook, e.g. (Nicholson 1995, Theorem 7.2.4). As a remark on the proof, we note that one can prove the following fact, which is very useful in its own right.

Theorem 4.27

Let \(f: V \to W\) be a linear map. Let

\[ v_1, \dots, v_r, v_{r+1}, \dots v_n \]

be a basis of \(V\) such that

\[ v_1, \dots, v_r \]

is a basis of \(\ker f\). Then \(f(v_{r+1}), \dots, f(v_n)\) is a basis of \({\operatorname{im}\ } f\).

Nicholson, W. K. 1995. *Linear Algebra with Applications*. Mathematics Series. PWS Publishing Company. .