Latest update Android YouTube

Calculus For AI/ML - Chapter 9 Gradient Descent & Variants

Chapter 9 — Gradient Descent & Variants

Gradient descent is the primary optimization algorithm used in machine learning, particularly in training deep learning models. This chapter explains gradient descent, its variants, and practical applications with examples.

9.1 What is Gradient Descent?

Gradient descent is an iterative optimization algorithm used to minimize a function (often a loss function in ML) by moving in the direction of the negative gradient.

Importance in ML: Neural networks, linear regression, logistic regression, and many ML algorithms rely on gradient descent to adjust parameters (weights) to reduce error.

9.2 Basic Gradient Descent Algorithm

Update rule for a parameter vector θ:

θ_new = θ_old - η * ∇L(θ_old)

where η is the learning rate, and ∇L(θ) is the gradient of the loss function.

9.3 Learning Rate

- Too small: slow convergence.
- Too large: may overshoot minima or diverge.
Adaptive methods help adjust the learning rate dynamically.

9.4 Momentum

Momentum adds a fraction of the previous update to the current step to accelerate convergence:

v = β * v_old + (1 - β) * ∇L(θ)
θ_new = θ_old - η * v

Helps overcome shallow local minima and smooth oscillations.

9.5 Adaptive Methods

  • RMSProp: Adjusts learning rate based on moving average of squared gradients.
  • Adam: Combines momentum + adaptive learning rate. Most widely used in deep learning.
  • Adagrad: Adapts learning rate per parameter based on historical gradients.

9.6 Convergence Criteria

Stop gradient descent when:

  • Gradient magnitude is very small (close to zero).
  • Change in loss function between steps is below a threshold.
  • Maximum number of iterations reached.

9.7 Visualization

For a simple 2D function f(x, y), gradient descent moves along the surface following the steepest descent direction until reaching the minimum.

9.8 Quick Python Example

import numpy as np

# Example: f(x) = x^2 + y^2
def f(theta): 
    x, y = theta
    return x**2 + y**2

def grad_f(theta):
    x, y = theta
    return np.array([2*x, 2*y])

theta = np.array([3.0, 4.0])
eta = 0.1

for i in range(20):
    theta = theta - eta * grad_f(theta)
    print(f"Step {i+1}, theta = {theta}, f(theta) = {f(theta)}")

9.9 ML Applications

  • Training Neural Networks: Gradient descent adjusts weights layer by layer.
  • Regression: Minimize mean squared error using gradient descent.
  • Logistic Regression: Minimize cross-entropy loss for classification tasks.
  • Deep Learning: Adam optimizer is commonly used for fast and stable convergence.

9.10 Exercises

  1. Implement vanilla gradient descent for f(x, y) = x² + y² and plot convergence.
  2. Experiment with different learning rates and observe convergence speed.
  3. Implement momentum-based gradient descent and compare with vanilla gradient descent.
  4. Try Adam optimizer on a small regression dataset and compare loss reduction.
Hints / Answers
  1. Convergence depends heavily on learning rate choice; visualize to see overshooting or slow convergence.
  2. Momentum accelerates convergence along shallow valleys.
  3. Adam often converges faster and is less sensitive to hyperparameters.

9.11 Further Reading & Videos

  • Deep Learning Book (Goodfellow et al.) — Chapters on optimization algorithms.
  • 3Blue1Brown — Gradient descent visualization (YouTube).
  • Hands-on Python tutorials: implement various optimizers using NumPy and PyTorch.

Next chapter: Jacobian & Hessian Matrices — understanding derivatives in multivariate settings and their applications in backpropagation for neural networks.

إرسال تعليق

Feel free to ask your query...
Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.