Probability for AI/ML - Chapter 8 Entropy, Information, and KL Divergence

Chapter 8 — Entropy, Information, and KL Divergence

Measuring uncertainty and information is fundamental in AI/ML for decision making, loss computation, and probabilistic modeling.

8.1 Entropy

Entropy measures the uncertainty or unpredictability in a probability distribution. High entropy means more uncertainty; low entropy means more predictability.

Mathematical Definition: For a discrete random variable X with probability mass function P(X): H(X) = - Σ P(x) log₂ P(x)

Example: A fair coin flip has entropy H(X) = 1 bit. A biased coin (90% heads, 10% tails) has lower entropy, ~0.47 bits. AI/ML context: Entropy helps in constructing decision trees by measuring the impurity of a split (ID3, C4.5 algorithms).

8.2 Cross-Entropy

Cross-entropy measures the difference between two probability distributions — often the true labels vs predicted probabilities in classification tasks.

Mathematical Definition: H(P,Q) = - Σ P(x) log Q(x)

Example: In a 3-class classification problem, if true labels are P = [1,0,0] and predictions Q = [0.8,0.1,0.1], cross-entropy = - (1*log0.8 + 0 + 0) ≈ 0.223. AI/ML context: Cross-entropy loss is widely used in training neural networks for classification tasks.

8.3 Kullback-Leibler (KL) Divergence

KL Divergence measures how one probability distribution diverges from a reference distribution. It is asymmetric and non-negative.

Mathematical Definition: KL(P || Q) = Σ P(x) log (P(x) / Q(x))

Example: Comparing a true distribution P = [0.5,0.5] with an estimated Q = [0.8,0.2] gives KL(P||Q) = 0.193. AI/ML context: KL divergence is used in variational autoencoders (VAEs) to regularize learned latent distributions and in reinforcement learning for policy updates.

8.4 Practical Examples in Python

import numpy as np
from scipy.stats import entropy

# Entropy of a distribution
p = np.array([0.5, 0.5])
H = entropy(p, base=2)
print("Entropy H(X):", H)

# Cross-entropy loss
true = np.array([1, 0, 0])
pred = np.array([0.8, 0.1, 0.1])
cross_entropy = -np.sum(true * np.log(pred))
print("Cross-Entropy:", cross_entropy)

# KL Divergence
P = np.array([0.5, 0.5])
Q = np.array([0.8, 0.2])
kl_div = entropy(P, Qk=Q, base=2)
print("KL Divergence KL(P||Q):", kl_div)

8.5 Key Takeaways

Entropy quantifies uncertainty in data or predictions.
Cross-entropy measures the difference between predicted and true distributions — key for classification losses.
KL Divergence measures how one distribution diverges from another, essential for probabilistic modeling, VAEs, and RL.

Next chapter: Markov Chains & Stochastic Processes — modeling sequences and transitions in AI/ML.

IndianTechnoEra

Probability for AI/ML - Chapter 8 Entropy, Information, and KL Divergence

Chapter 8 — Entropy, Information, and KL Divergence

8.1 Entropy

8.2 Cross-Entropy

8.3 Kullback-Leibler (KL) Divergence

8.4 Practical Examples in Python

8.5 Key Takeaways

Post a Comment

Mastering HTML Tables: A Comprehensive Guide | HTML | IndianTechnoEra

HTML Editor - Draw HTML Page Free Fast

Mastering HTML Lists: A Comprehensive Guide | HTML

ITE - CodeSam