The mathematics of vectors, matrices, and linear transformations. Linear algebra is the backbone of modern computation, powering everything from computer graphics and machine learning to quantum mechanics and economic modeling.
Linear algebra is one of the most widely applicable branches of mathematics. While algebra deals with single unknown quantities, linear algebra deals with collections of unknowns organized into vectors and matrices, and it studies the linear relationships between them.
The word "linear" refers to straight lines and flat planes — the simplest geometric objects. A linear equation in variables x₁, x₂, …, xₙ has the form:
where a₁, a₂, …, aₙ and b are constants. There are no squares, cubes, products of variables, or other nonlinear terms — just variables multiplied by constants and added together.
Linear algebra asks — and answers — fundamental questions such as:
The subject was developed over centuries, with key contributions from mathematicians including Carl Friedrich Gauss (elimination methods), Arthur Cayley (matrix theory), Hermann Grassmann (vector spaces), and David Hilbert (abstract spaces). Today, linear algebra is typically the first mathematics course that moves from calculation to abstraction and proof.
A vector is a mathematical object that has both magnitude (length) and direction. Geometrically, a vector is an arrow pointing from one location to another. Algebraically, a vector is an ordered list of numbers called components.
A vector in ℝⁿ (n-dimensional real space) is written as:
For example, the vector v = (3, 4) in ℝ² represents a point or an arrow from the origin to the point (3, 4) in the plane.
Common notations for vectors include boldface (v), an arrow above the letter (v⃗), or an underline. We'll use boldface throughout this lesson.
Vectors of the same dimension are added component by component:
(2, 5, -1) + (3, -2, 4) = (2+3, 5+(-2), -1+4) = (5, 3, 3)
Geometrically, adding two vectors corresponds to placing them tip-to-tail and drawing the arrow from the start of the first to the end of the second (the parallelogram rule).
Multiplying a vector by a scalar (a real number) scales every component:
3 · (2, -1, 4) = (3·2, 3·(-1), 3·4) = (6, -3, 12)
If c > 1, the vector stretches. If 0 < c < 1, it shrinks. If c < 0, it reverses direction and scales.
The magnitude (or norm) of a vector v = (v₁, v₂, …, vₙ) is:
‖(3, 4)‖ = √(3² + 4²) = √(9 + 16) = √25 = 5
A unit vector has magnitude 1. To find the unit vector in the direction of v, divide by its magnitude:
û = (3, 4) / 5 = (3/5, 4/5) = (0.6, 0.8)
Check: ‖(0.6, 0.8)‖ = √(0.36 + 0.64) = √1 = 1 ✓
The standard unit vectors in ℝ³ are i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1).
The dot product of two vectors is a scalar quantity defined as:
The dot product can also be expressed geometrically:
where θ is the angle between the two vectors.
(1)(4) + (2)(-5) + (3)(6) = 4 - 10 + 18 = 12
Rearranging the geometric dot product formula gives:
u · v = (1)(1) + (0)(1) = 1
‖u‖ = 1, ‖v‖ = √2
cos θ = 1 / (1 · √2) = 1/√2
θ = arccos(1/√2) = 45° (π/4 radians)
The cross product is defined only in ℝ³ and produces a new vector that is perpendicular to both input vectors:
First component: (2)(6) - (3)(5) = 12 - 15 = -3
Second component: (3)(4) - (1)(6) = 12 - 6 = 6
Third component: (1)(5) - (2)(4) = 5 - 8 = -3
u × v = (-3, 6, -3)
Verify orthogonality:
(-3, 6, -3) · (1, 2, 3) = -3 + 12 - 9 = 0 ✓
(-3, 6, -3) · (4, 5, 6) = -12 + 30 - 18 = 0 ✓
The magnitude of the cross product equals the area of the parallelogram formed by the two vectors:
A matrix is a rectangular array of numbers arranged in rows and columns. An m × n matrix has m rows and n columns. Matrices are the central data structure of linear algebra and provide a compact way to represent systems of equations, transformations, and data.
A general m × n matrix A is written as:
The entry in row i and column j is denoted aᵢⱼ.
Matrices of the same dimensions are added entry by entry, and scalar multiplication scales every entry:
The product AB is defined when the number of columns in A equals the number of rows in B. If A is m × n and B is n × p, then AB is m × p:
Each entry of AB is the dot product of the corresponding row of A with the corresponding column of B.
Let A = [1 2] and B = [5 6]
[3 4] [7 8]
AB:
(AB)₁₁ = (1)(5) + (2)(7) = 5 + 14 = 19
(AB)₁₂ = (1)(6) + (2)(8) = 6 + 16 = 22
(AB)₂₁ = (3)(5) + (4)(7) = 15 + 28 = 43
(AB)₂₂ = (3)(6) + (4)(8) = 18 + 32 = 50
AB = [19 22] = [19 22]
[43 50] [43 50]
The transpose of an m × n matrix A, written Aᵀ, is the n × m matrix obtained by swapping rows and columns:
A = [1 2 3]
[4 5 6]
Aᵀ = [1 4]
[2 5]
[3 6]
Key transpose properties:
A system of linear equations is a collection of one or more linear equations involving the same set of variables. Linear algebra provides systematic methods for solving such systems efficiently, even when they involve thousands of variables.
Any system of linear equations can be written in matrix form as:
where A is the coefficient matrix, x is the vector of unknowns, and b is the vector of constants.
The system:
2x + 3y = 7
x − y = 1
In matrix form:
[2 3] [x] [7]
[1 -1] [y] = [1]
To solve a system, we form the augmented matrix [A | b] by appending the constants column to the coefficient matrix:
Gaussian elimination is the fundamental algorithm for solving systems of linear equations. It uses three elementary row operations to systematically reduce the augmented matrix:
System:
x + y + z = 6
2x + 3y + z = 14
x + 2y + 3z = 16
Step 1: Form augmented matrix:
[1 1 1 | 6]
[2 3 1 | 14]
[1 2 3 | 16]
Step 2: R₂ → R₂ - 2R₁:
[1 1 1 | 6]
[0 1 -1 | 2]
[1 2 3 | 16]
Step 3: R₃ → R₃ - R₁:
[1 1 1 | 6]
[0 1 -1 | 2]
[0 1 2 | 10]
Step 4: R₃ → R₃ - R₂:
[1 1 1 | 6]
[0 1 -1 | 2]
[0 0 3 | 8]
Step 5: Back-substitute:
From R₃: 3z = 8 → z = 8/3
From R₂: y - z = 2 → y = 2 + 8/3 = 14/3
From R₁: x + y + z = 6 → x = 6 - 14/3 - 8/3 = 6 - 22/3 = -4/3
Solution: x = -4/3, y = 14/3, z = 8/3
A matrix is in row echelon form if:
A matrix is in reduced row echelon form if it satisfies all REF conditions plus:
The matrix:
[1 0 0 | 2]
[0 1 0 | 3]
[0 0 1 | 5]
is in RREF. The solution is immediately readable: x = 2, y = 3, z = 5.
A system of linear equations has exactly one of three possibilities:
The determinant is a scalar value associated with every square matrix. It encodes important information about the matrix: whether it's invertible, how it scales area/volume, and more. The determinant of matrix A is written det(A) or |A|.
For a 2 × 2 matrix:
det [3 7] = (3)(2) - (7)(1) = 6 - 7 = -1
[1 2]
For a 3 × 3 matrix, we expand along the first row using cofactor expansion:
Find det(A) where:
A = [2 1 -1]
[3 0 2]
[1 4 -3]
Expanding along the first row:
= 2[(0)(-3) - (2)(4)] - 1[(3)(-3) - (2)(1)] + (-1)[(3)(4) - (0)(1)]
= 2[0 - 8] - 1[-9 - 2] + (-1)[12 - 0]
= 2(-8) - 1(-11) + (-1)(12)
= -16 + 11 - 12
= -17
A square matrix A is invertible (nonsingular) if and only if det(A) ≠ 0. If det(A) = 0, the matrix is singular — it squashes space into a lower dimension, losing information irreversibly.
An n × n matrix A is invertible if there exists a matrix A⁻¹ such that:
where I is the n × n identity matrix. The inverse "undoes" the action of A.
If A = [a b; c d] and det(A) = ad - bc ≠ 0, then:
A = [4 7]
[2 6]
det(A) = (4)(6) - (7)(2) = 24 - 14 = 10
A⁻¹ = (1/10) [ 6 -7] = [ 0.6 -0.7]
[-2 4] [-0.2 0.4]
Verify:
AA⁻¹ = [4 7] [ 0.6 -0.7] = [4(0.6)+7(-0.2) 4(-0.7)+7(0.4)]
[2 6] [-0.2 0.4] [2(0.6)+6(-0.2) 2(-0.7)+6(0.4)]
= [2.4-1.4 -2.8+2.8] = [1 0] ✓
[1.2-1.2 -1.4+2.4] [0 1]
For larger matrices, augment A with the identity matrix and row reduce to RREF:
If the left side reduces to I, the right side is A⁻¹. If the left side cannot be reduced to I (you get a row of zeros), then A is not invertible.
Find the inverse of:
A = [1 0 1]
[0 1 1]
[1 1 0]
Form [A | I]:
[1 0 1 | 1 0 0]
[0 1 1 | 0 1 0]
[1 1 0 | 0 0 1]
R₃ → R₃ - R₁:
[1 0 1 | 1 0 0]
[0 1 1 | 0 1 0]
[0 1 -1 | -1 0 1]
R₃ → R₃ - R₂:
[1 0 1 | 1 0 0]
[0 1 1 | 0 1 0]
[0 0 -2 | -1 -1 1]
R₃ → R₃ / (-2):
[1 0 1 | 1 0 0]
[0 1 1 | 0 1 0]
[0 0 1 | 1/2 1/2 -1/2]
R₁ → R₁ - R₃ and R₂ → R₂ - R₃:
[1 0 0 | 1/2 -1/2 1/2]
[0 1 0 | -1/2 1/2 1/2]
[0 0 1 | 1/2 1/2 -1/2]
A⁻¹ = (1/2) [ 1 -1 1]
[-1 1 1]
[ 1 1 -1]
Cramer's Rule provides an explicit formula for the solution of a system Ax = b when A is invertible. For each variable xᵢ:
where Aᵢ is the matrix formed by replacing the i-th column of A with b.
System:
3x + 2y = 16
x + 5y = 18
A = [3 2], b = [16]
[1 5] [18]
det(A) = (3)(5) - (2)(1) = 15 - 2 = 13
A₁ (replace column 1 with b):
det [16 2] = (16)(5) - (2)(18) = 80 - 36 = 44
[18 5]
A₂ (replace column 2 with b):
det [3 16] = (3)(18) - (16)(1) = 54 - 16 = 38
[1 18]
x = 44/13, y = 38/13
Solution: x = 44/13 ≈ 3.385, y = 38/13 ≈ 2.923
A vector space (over ℝ) is a set V together with two operations — addition and scalar multiplication — that satisfy ten axioms. Vector spaces abstract the familiar properties of arrows in the plane to much more general settings.
A set V with operations + (addition) and · (scalar multiplication) is a vector space if for all u, v, w ∈ V and scalars c, d ∈ ℝ:
Common examples of vector spaces include:
A subspace of a vector space V is a nonempty subset W ⊆ V that is itself a vector space under the same operations. To verify W is a subspace, check three conditions:
Is W = {(x, y, z) ∈ ℝ³ : x + y + z = 0} a subspace of ℝ³?
Zero vector: (0, 0, 0) → 0 + 0 + 0 = 0 ✓
Addition: If (x₁, y₁, z₁) and (x₂, y₂, z₂) are in W, then:
(x₁+x₂) + (y₁+y₂) + (z₁+z₂) = (x₁+y₁+z₁) + (x₂+y₂+z₂) = 0 + 0 = 0 ✓
Scalar multiplication: If (x, y, z) ∈ W and c ∈ ℝ, then:
cx + cy + cz = c(x + y + z) = c(0) = 0 ✓
W is a subspace of ℝ³.
A linear combination of vectors v₁, v₂, …, vₖ is any sum of the form:
where c₁, c₂, …, cₖ are scalars.
The span of a set of vectors is the set of all possible linear combinations of those vectors. It is the "smallest" subspace containing all the given vectors:
Let v₁ = (1, 0) and v₂ = (0, 1).
span{v₁, v₂} = {c₁(1,0) + c₂(0,1)} = {(c₁, c₂) : c₁, c₂ ∈ ℝ} = ℝ²
These two vectors span all of ℝ² because any point (a, b) can be written as a(1,0) + b(0,1).
A set of vectors {v₁, v₂, …, vₖ} is linearly independent if the only solution to:
is c₁ = c₂ = … = cₖ = 0. Otherwise, the set is linearly dependent, meaning at least one vector can be expressed as a linear combination of the others.
Are v₁ = (1, 2, 0), v₂ = (0, 1, 1), v₃ = (1, 0, -2) linearly independent?
We need to determine if c₁(1,2,0) + c₂(0,1,1) + c₃(1,0,-2) = (0,0,0) has only the trivial solution.
This gives the system:
c₁ + c₃ = 0
2c₁ + c₂ = 0
c₂ - 2c₃ = 0
Form the augmented matrix and row reduce:
[1 0 1 | 0]
[2 1 0 | 0]
[0 1 -2 | 0]
R₂ → R₂ - 2R₁:
[1 0 1 | 0]
[0 1 -2 | 0]
[0 1 -2 | 0]
R₃ → R₃ - R₂:
[1 0 1 | 0]
[0 1 -2 | 0]
[0 0 0 | 0]
There's a free variable (c₃), so the system has nontrivial solutions. The vectors are linearly dependent.
Setting c₃ = 1: c₁ = -1, c₂ = 2, so v₃ = v₁ - 2v₂.
A basis for a vector space V is a set of vectors that is:
A basis is a minimal spanning set — every vector in V can be written uniquely as a linear combination of the basis vectors.
The dimension of a vector space is the number of vectors in any basis (all bases have the same size).
The standard basis for ℝ³ is:
e₁ = (1, 0, 0), e₂ = (0, 1, 0), e₃ = (0, 0, 1)
These are linearly independent and span all of ℝ³, so dim(ℝ³) = 3.
However, there are infinitely many other valid bases, such as:
{(1, 1, 0), (1, 0, 1), (0, 1, 1)}
Any set of three linearly independent vectors in ℝ³ forms a basis.
A linear transformation is a function T: V → W between two vector spaces that preserves the operations of addition and scalar multiplication:
Equivalently, a function is linear if and only if T(c₁v₁ + c₂v₂) = c₁T(v₁) + c₂T(v₂) for all vectors and scalars.
Linear transformations include familiar operations like:
Is T(x, y) = (2x + y, x - 3y) a linear transformation?
Let u = (x₁, y₁) and v = (x₂, y₂):
T(u + v) = T(x₁+x₂, y₁+y₂) = (2(x₁+x₂) + (y₁+y₂), (x₁+x₂) - 3(y₁+y₂))
= (2x₁+y₁ + 2x₂+y₂, x₁-3y₁ + x₂-3y₂)
= (2x₁+y₁, x₁-3y₁) + (2x₂+y₂, x₂-3y₂)
= T(u) + T(v) ✓
T(cu) = T(cx₁, cy₁) = (2cx₁+cy₁, cx₁-3cy₁) = c(2x₁+y₁, x₁-3y₁) = cT(u) ✓
Yes, T is linear.
Every linear transformation T: ℝⁿ → ℝᵐ can be represented as multiplication by an m × n matrix A:
The matrix A is found by applying T to each standard basis vector and using the results as columns:
Find the matrix for T(x, y) = (2x + y, x - 3y).
T(e₁) = T(1, 0) = (2(1)+0, 1-3(0)) = (2, 1)
T(e₂) = T(0, 1) = (2(0)+1, 0-3(1)) = (1, -3)
A = [2 1]
[1 -3]
Verify: A(3, 2)ᵀ = [2 1][3] = [2(3)+1(2)] = [8]
[1 -3][2] [1(3)-3(2)] [-3]
T(3, 2) = (2(3)+2, 3-3(2)) = (8, -3) ✓
Here are standard transformation matrices for geometric operations:
Two fundamental subspaces are associated with every linear transformation T: V → W:
The kernel (or null space) of T is the set of all vectors that T maps to zero:
The image (or range) of T is the set of all possible output vectors:
Find the kernel of T(x, y, z) = (x + y, y + z).
The matrix is A = [1 1 0]
[0 1 1]
Solve Ax = 0:
[1 1 0 | 0] → R₁ → R₁ - R₂:
[0 1 1 | 0]
[1 0 -1 | 0]
[0 1 1 | 0]
From RREF: x = z, y = -z. Setting z = t:
ker(T) = {t(1, -1, 1) : t ∈ ℝ}
The kernel is a line in ℝ³ through the origin. The nullity is 1.
Eigenvalues and eigenvectors are among the most important concepts in linear algebra. They reveal the "natural" behavior of a linear transformation — the directions that are merely scaled, not rotated.
Let A be an n × n matrix. A nonzero vector v is an eigenvector of A if:
for some scalar λ. The scalar λ is called the corresponding eigenvalue. In other words, multiplying v by A simply scales v by the factor λ — the direction of v is preserved (or reversed if λ < 0).
To find eigenvalues, we rearrange Av = λv:
For a nonzero solution v to exist, the matrix (A - λI) must be singular, which means:
This is the characteristic equation. The left side, when expanded, is a polynomial of degree n in λ called the characteristic polynomial.
A = [4 1]
[2 3]
Step 1: Find eigenvalues.
det(A - λI) = det [4-λ 1 ] = (4-λ)(3-λ) - (1)(2)
[ 2 3-λ]
= λ² - 7λ + 12 - 2
= λ² - 7λ + 10
= (λ - 5)(λ - 2) = 0
Eigenvalues: λ₁ = 5 and λ₂ = 2
Step 2: Find eigenvectors for λ₁ = 5.
(A - 5I)v = 0:
[-1 1] [x] = [0]
[ 2 -2] [y] [0]
From R₁: -x + y = 0, so y = x.
Eigenvector: v₁ = t(1, 1) for any t ≠ 0. We often choose v₁ = (1, 1).
Step 3: Find eigenvectors for λ₂ = 2.
(A - 2I)v = 0:
[2 1] [x] = [0]
[2 1] [y] [0]
From R₁: 2x + y = 0, so y = -2x.
Eigenvector: v₂ = t(1, -2) for any t ≠ 0. We often choose v₂ = (1, -2).
Verify: A(1,1)ᵀ = [4+1, 2+3]ᵀ = [5, 5]ᵀ = 5(1,1)ᵀ ✓
Verify: A(1,-2)ᵀ = [4-2, 2-6]ᵀ = [2, -4]ᵀ = 2(1,-2)ᵀ ✓
A = [2 0 0]
[0 3 1]
[0 1 3]
det(A - λI) = det [2-λ 0 0 ]
[ 0 3-λ 1 ]
[ 0 1 3-λ]
Expanding along the first column (since it has two zeros):
= (2-λ)[(3-λ)(3-λ) - (1)(1)]
= (2-λ)(9 - 6λ + λ² - 1)
= (2-λ)(λ² - 6λ + 8)
= (2-λ)(λ-2)(λ-4)
= -(λ-2)²(λ-4)
Eigenvalues: λ₁ = 2 (multiplicity 2) and λ₂ = 4 (multiplicity 1)
An n × n matrix A is diagonalizable if it can be written as:
where D is a diagonal matrix of eigenvalues and P is the matrix whose columns are the corresponding eigenvectors. A matrix is diagonalizable if and only if it has n linearly independent eigenvectors.
From our earlier example, A = [4 1; 2 3] with eigenvalues λ₁ = 5, λ₂ = 2 and eigenvectors v₁ = (1, 1), v₂ = (1, -2).
P = [1 1] D = [5 0]
[1 -2] [0 2]
Then A = PDP⁻¹.
This is incredibly useful! For example, computing Aⁿ becomes easy:
Aⁿ = PDⁿP⁻¹ = P [5ⁿ 0 ] P⁻¹
[ 0 2ⁿ]
This converts an expensive matrix power into simply raising scalars to powers!
Linear algebra is not just an abstract mathematical subject — it is the computational engine behind many of the technologies and scientific methods that shape the modern world. Here are some of its most impactful applications.
Every image, animation, and 3D model you see on a screen is rendered using linear algebra. Transformations of objects — rotation, scaling, translation, and perspective projection — are all represented as matrix operations.
In 3D graphics, homogeneous coordinates extend ℝ³ to ℝ⁴ so that translations (which are not linear) can also be represented as matrix multiplication. A point (x, y, z) becomes (x, y, z, 1), and all transformations become 4 × 4 matrices:
Composing multiple transformations is simply multiplying matrices together. The GPU in your computer is essentially a massively parallel linear algebra engine.
Rotate the point (3, 1) by 90° counterclockwise.
Rotation matrix: R = [cos 90° -sin 90°] = [ 0 -1]
[sin 90° cos 90°] [ 1 0]
R [3] = [0(-1)(3)+(-1)(1)] = [-1]
[1] [1(3)+0(1)] [ 3]
Wait, let's compute correctly:
[ 0 -1] [3] = [(0)(3)+(-1)(1)] = [-1]
[ 1 0] [1] [(1)(3)+(0)(1)] [ 3]
The rotated point is (-1, 3) ✓
Data science uses linear algebra at every level. Datasets are naturally represented as matrices (rows = samples, columns = features), and most machine learning algorithms are built on linear algebraic operations:
One of the most famous applications of linear algebra is Google's original PageRank algorithm, which ranked web pages by importance. The key insight is modeling the web as a Markov chain:
In practice, a damping factor d ≈ 0.85 is used:
This is solved iteratively (power iteration), and pages with higher PageRank values appear higher in search results.
Consider 3 pages: A links to B and C, B links to C, C links to A.
Transition matrix (each column sums to 1):
M = [ 0 0 1 ] (A receives from C)
[1/2 0 0 ] (B receives from A)
[1/2 1 0 ] (C receives from A and B)
Starting with r₀ = (1/3, 1/3, 1/3) and iterating rₖ₊₁ = Mrₖ converges to the steady-state PageRank vector.
When a system Ax = b has no exact solution (more equations than unknowns, i.e., the system is overdetermined), we seek the least squares solution — the vector x̂ that minimizes the total squared error ‖Ax - b‖².
The solution satisfies the normal equations:
If AᵀA is invertible:
Find the best-fit line y = c₀ + c₁x for the data points (1, 2), (2, 3), (3, 6), (4, 8).
Set up Ac ≈ b:
A = [1 1] b = [2]
[1 2] [3]
[1 3] [6]
[1 4] [8]
AᵀA = [1 1 1 1][1 1] = [ 4 10]
[1 2 3 4][1 2] [10 30]
[1 3]
[1 4]
Aᵀb = [1 1 1 1][2] = [ 19]
[1 2 3 4][3] [ 57]
[6]
[8]
Solving [ 4 10 | 19] → c₀ = -0.5, c₁ = 2.1
[10 30 | 57]
Best-fit line: y = -0.5 + 2.1x
Quantum mechanics is fundamentally built on linear algebra. The state of a quantum system is represented as a vector in a complex vector space (Hilbert space), and physical observables (like energy, momentum, position) are represented as Hermitian matrices. Measurements correspond to eigenvalues, and the system's state collapses to the corresponding eigenvector after measurement.
Linear algebra appears in economics through input-output models (Leontief model), where the relationships between industries are captured in a matrix. In network analysis, matrices encode connections between nodes, and their eigenvalues reveal properties like connectivity, clustering, and influence.