Chapter 4 — Expectation, Variance, and Moments
Expectation and moments quantify the center, spread, and shape of probability distributions, which is crucial for feature scaling, uncertainty estimation, and risk analysis in AI & ML.
4.1 Expected Value (Mean)
The expected value of a random variable X represents its average or mean value over many trials.
- Discrete: E[X] = Σ x * P(X=x)
- Continuous: E[X] = ∫ x * f(x) dx
Example: Flipping 3 coins, X = # of heads:
P(X=0)=1/8, P(X=1)=3/8, P(X=2)=3/8, P(X=3)=1/8
E[X] = 0*(1/8) + 1*(3/8) + 2*(3/8) + 3*(1/8) = 1.5
AI/ML context: Mean is used in feature normalization, centering data before training models, and initializing weights.
4.2 Variance & Standard Deviation
Variance measures how much the random variable deviates from its mean. Standard deviation is the square root of variance.
- Discrete: Var(X) = Σ (x - E[X])² * P(X=x)
- Continuous: Var(X) = ∫ (x - E[X])² * f(x) dx
Example: For the same 3-coin flip example:
Var(X) = (0-1.5)²*(1/8) + (1-1.5)²*(3/8) + (2-1.5)²*(3/8) + (3-1.5)²*(1/8) = 0.75
Std Dev = √0.75 ≈ 0.866
AI/ML context: Variance and standard deviation are used in feature scaling, detecting outliers, and estimating uncertainty in predictions.
4.3 Higher-Order Moments
Moments describe the shape of a distribution:
- 1st moment: Mean (E[X])
- 2nd moment: Variance (Var(X))
- 3rd moment: Skewness – asymmetry of distribution
- 4th moment: Kurtosis – "peakedness" of distribution
AI/ML context: Higher-order moments are used in feature engineering, anomaly detection, and risk modeling.
4.4 Practical Examples in Python
import numpy as np
# Example: discrete variable X = # of heads in 3 coin flips
X = np.array([0, 1, 2, 3])
pmf = np.array([1/8, 3/8, 3/8, 1/8])
# Expected value
E_X = np.sum(X * pmf)
# Variance
Var_X = np.sum(((X - E_X)**2) * pmf)
# Standard deviation
Std_X = np.sqrt(Var_X)
print("Expected value:", E_X)
print("Variance:", Var_X)
print("Standard deviation:", Std_X)
# Skewness (3rd moment)
Skew_X = np.sum(((X - E_X)**3) * pmf) / Std_X**3
# Kurtosis (4th moment)
Kurt_X = np.sum(((X - E_X)**4) * pmf) / Std_X**4
print("Skewness:", Skew_X)
print("Kurtosis:", Kurt_X)
4.5 Key Takeaways
- Expected value measures the central tendency of a random variable.
- Variance and standard deviation quantify the spread or uncertainty.
- Higher-order moments (skewness, kurtosis) describe the shape of distributions.
- All these metrics are crucial in ML for feature scaling, normalization, risk estimation, and probabilistic modeling.
Next chapter: Common Probability Distributions — modeling real-world data in AI & ML applications.