Chapter 8 — Norms & Distances
Norms and distance metrics are essential in AI & ML for measuring similarity or difference between data points. They are used in clustering, nearest neighbors, and NLP applications.
8.1 What is a Norm?
A norm is a function that assigns a non-negative length or size to a vector. Norms help measure how large or small a vector is.
ML Context: Norms quantify distances between points, affecting clustering, classification, and similarity measurements.
8.2 L1 Norm (Manhattan Distance)
The L1 norm is the sum of absolute values of vector components:
||v||₁ = |v1| + |v2| + ... + |vn|
Example: For v = [2, -3, 1]
, ||v||₁ = 2 + 3 + 1 = 6
.
Use in ML: Manhattan distance is used in KNN, Lasso regression, and when axis-aligned movements matter.
8.3 L2 Norm (Euclidean Distance)
The L2 norm is the usual Euclidean length of a vector:
||v||₂ = sqrt(v1² + v2² + ... + vn²)
Example: For v = [3, 4]
, ||v||₂ = sqrt(3² + 4²) = 5
.
Use in ML: Euclidean distance is widely used in K-Means clustering, nearest neighbors, and optimization problems.
8.4 Cosine Similarity
Cosine similarity measures how aligned two vectors are, ignoring magnitude:
cos(θ) = (a · b) / (||a|| * ||b||)
Example: For a=[1,0,1]
and b=[0,1,1]
:
cos(θ) = (1*0 + 0*1 + 1*1) / (sqrt(1+0+1) * sqrt(0+1+1)) = 1 / (sqrt(2)*sqrt(2)) = 0.5
ML Context: Cosine similarity is crucial in NLP for comparing word embeddings, document similarity, and semantic search.
8.5 Quick NumPy Examples (Practical)
import numpy as np
# define vectors
a = np.array([1, 0, 1])
b = np.array([0, 1, 1])
# L1 norm
l1_a = np.sum(np.abs(a))
print("L1 norm of a:", l1_a)
# L2 norm
l2_a = np.linalg.norm(a)
print("L2 norm of a:", l2_a)
# Euclidean distance between a and b
euclidean_dist = np.linalg.norm(a - b)
print("Euclidean distance:", euclidean_dist)
# Cosine similarity
cos_sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print("Cosine similarity:", cos_sim)
8.6 Geometric Intuition
- L1 distance measures movement along axes (like city blocks).
- L2 distance measures straight-line distance.
- Cosine similarity measures angle between vectors — 1 means identical direction, 0 orthogonal, -1 opposite.
8.7 AI/ML Use Cases (Why Norms & Distances Matter)
- KNN (K-Nearest Neighbors): Distances determine nearest neighbors.
- K-Means Clustering: L2 distances are used to assign points to cluster centroids.
- NLP & Word Embeddings: Cosine similarity compares semantic similarity between words or documents.
- Regularization: L1/L2 norms used in Lasso/Ridge regression to prevent overfitting.
8.8 Exercises
- Compute L1 and L2 norms of
v=[-2,4,1]
by hand and verify in Python. - Compute Euclidean distance between
a=[1,2,3]
andb=[4,0,3]
. - Compute cosine similarity between two sample word embeddings:
a=[0.5,0.2,0.3]
andb=[0.4,0.4,0.2]
. - Explain why cosine similarity might be preferred over Euclidean distance for text similarity.
Answers / Hints
- L1 norm = 7, L2 norm = sqrt(21)
- Euclidean distance = sqrt((1-4)² + (2-0)² + (3-3)²) = sqrt(9+4+0) = sqrt(13)
- Cosine similarity = (0.5*0.4 + 0.2*0.4 + 0.3*0.2) / (sqrt(0.5²+0.2²+0.3²) * sqrt(0.4²+0.4²+0.2²)) ≈ 0.925
- Cosine ignores magnitude differences; useful when text vectors vary in length but orientation matters.
8.9 Practice Projects / Mini Tasks
- Implement a small KNN classifier using L1, L2, and cosine distances to compare performance.
- Compute cosine similarity between sentences using pre-trained embeddings (Word2Vec/GloVe).
- Visualize 2D points and their L1 vs L2 distances to centroids.
8.10 Further Reading & Videos
- 3Blue1Brown — Essence of Linear Algebra (vector norms & distances).
- NumPy documentation —
np.linalg.norm
andnp.dot
. - Machine Learning texts covering KNN, clustering, and embedding similarity.
Next chapter: Linear Independence & Rank — understanding linear dependence in vectors and matrices, rank computation, and its importance in feature selection and dimensionality reduction.