Latest update Android YouTube

Probability for AI/ML - Chapter 8 Entropy, Information, and KL Divergence

Chapter 8 — Entropy, Information, and KL Divergence

Measuring uncertainty and information is fundamental in AI/ML for decision making, loss computation, and probabilistic modeling.

8.1 Entropy

Entropy measures the uncertainty or unpredictability in a probability distribution. High entropy means more uncertainty; low entropy means more predictability.

Mathematical Definition: For a discrete random variable X with probability mass function P(X): H(X) = - Σ P(x) log₂ P(x)

Example: A fair coin flip has entropy H(X) = 1 bit. A biased coin (90% heads, 10% tails) has lower entropy, ~0.47 bits. AI/ML context: Entropy helps in constructing decision trees by measuring the impurity of a split (ID3, C4.5 algorithms).

8.2 Cross-Entropy

Cross-entropy measures the difference between two probability distributions — often the true labels vs predicted probabilities in classification tasks.

Mathematical Definition: H(P,Q) = - Σ P(x) log Q(x)

Example: In a 3-class classification problem, if true labels are P = [1,0,0] and predictions Q = [0.8,0.1,0.1], cross-entropy = - (1*log0.8 + 0 + 0) ≈ 0.223. AI/ML context: Cross-entropy loss is widely used in training neural networks for classification tasks.

8.3 Kullback-Leibler (KL) Divergence

KL Divergence measures how one probability distribution diverges from a reference distribution. It is asymmetric and non-negative.

Mathematical Definition: KL(P || Q) = Σ P(x) log (P(x) / Q(x))

Example: Comparing a true distribution P = [0.5,0.5] with an estimated Q = [0.8,0.2] gives KL(P||Q) = 0.193. AI/ML context: KL divergence is used in variational autoencoders (VAEs) to regularize learned latent distributions and in reinforcement learning for policy updates.

8.4 Practical Examples in Python

import numpy as np
from scipy.stats import entropy

# Entropy of a distribution
p = np.array([0.5, 0.5])
H = entropy(p, base=2)
print("Entropy H(X):", H)

# Cross-entropy loss
true = np.array([1, 0, 0])
pred = np.array([0.8, 0.1, 0.1])
cross_entropy = -np.sum(true * np.log(pred))
print("Cross-Entropy:", cross_entropy)

# KL Divergence
P = np.array([0.5, 0.5])
Q = np.array([0.8, 0.2])
kl_div = entropy(P, Qk=Q, base=2)
print("KL Divergence KL(P||Q):", kl_div)

8.5 Key Takeaways

  • Entropy quantifies uncertainty in data or predictions.
  • Cross-entropy measures the difference between predicted and true distributions — key for classification losses.
  • KL Divergence measures how one distribution diverges from another, essential for probabilistic modeling, VAEs, and RL.

Next chapter: Markov Chains & Stochastic Processes — modeling sequences and transitions in AI/ML.

Post a Comment

Feel free to ask your query...
Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.