Latest update Android YouTube

Calculus For AI/ML - Chapter 8 Optimization Techniques

Chapter 8 — Optimization Techniques

Optimization techniques are at the heart of machine learning. Training models involves finding the best parameters (weights) that minimize a loss function. This chapter explains critical points, extrema, and Lagrange multipliers with practical examples and ML applications.

8.1 What is Optimization?

Optimization is the process of finding the maximum or minimum of a function. In machine learning, we often minimize a loss function to improve model performance.

Importance in ML: Gradient-based optimization algorithms like gradient descent rely on these concepts to efficiently update weights during training. Understanding critical points helps avoid local minima or saddle points.

8.2 Critical Points & Extrema

A critical point occurs where the gradient of a function is zero (∇f = 0). These points can be:

  • Local minimum: function value is lower than surrounding points.
  • Local maximum: function value is higher than surrounding points.
  • Saddle point: function is neither maximum nor minimum (common in high-dimensional ML loss landscapes).

Example: f(x) = x² → derivative f'(x) = 2x → critical point at x = 0 (global minimum).

8.3 Lagrange Multipliers

Used for constrained optimization: maximize or minimize a function f(x, y, …) subject to a constraint g(x, y, …) = 0.

Method: Introduce a multiplier λ and solve:

∇f(x, y) = λ ∇g(x, y)
g(x, y) = 0

Example: Maximize f(x, y) = xy subject to x² + y² = 1.


∇f = (y, x)
∇g = (2x, 2y)
Set ∇f = λ ∇g → y = 2λx, x = 2λy
Constraint: x² + y² = 1
Solution: x = ±1/√2, y = ±1/√2

8.4 Gradient Descent

Gradient descent is the backbone of ML optimization:

x_new = x_old - η * ∇f(x_old)

where η is the learning rate. It iteratively moves towards minima of the loss function.

8.5 Hessian Matrix

The Hessian contains second-order partial derivatives and helps determine the nature of critical points:

  • If Hessian is positive definite → local minimum
  • If Hessian is negative definite → local maximum
  • If Hessian has mixed eigenvalues → saddle point

ML context: Second-order optimization methods like Newton's method use the Hessian to converge faster than first-order methods.

8.6 Applications in Machine Learning

  • Training Neural Networks: Gradient descent & variants minimize loss functions.
  • Constrained Optimization: Lagrange multipliers are used in SVMs (maximize margin with constraints).
  • Regularization: Adding constraints or penalties (like L1/L2) is a form of constrained optimization.
  • Newton & Quasi-Newton methods: Used in optimization with second-order derivatives for faster convergence.

8.7 Quick Python Example (Gradient Descent)

import numpy as np

# f(x) = x^2, derivative f'(x) = 2x
def f(x): return x**2
def grad_f(x): return 2*x

x = 5  # initial guess
eta = 0.1  # learning rate

for i in range(10):
    x = x - eta * grad_f(x)
    print(f"Step {i+1}, x = {x}, f(x) = {f(x)}")

8.8 Exercises

  1. Find critical points of f(x, y) = x² + xy + y² and determine their nature using Hessian.
  2. Use Lagrange multipliers to maximize f(x, y) = x + y subject to x² + y² = 1.
  3. Implement gradient descent for f(x) = x³ - 3x² + 2 and visualize convergence.
Hints / Answers
  1. Critical points: solve ∇f = 0 → (0,0) and (-1,1). Use Hessian to classify.
  2. Lagrange: ∇f = λ∇g → solutions x = ±1/√2, y = ±1/√2.
  3. Gradient descent: iterate x_new = x_old - η * f'(x_old).

8.9 Practice Projects / Mini Tasks

  • Implement gradient descent with momentum and compare with vanilla gradient descent.
  • Train a small neural network on a dataset and visualize loss convergence.
  • Use Lagrange multipliers to solve a toy SVM problem with 2D points.

8.10 Further Reading & Videos

  • Deep Learning Book (Goodfellow et al.) — Chapters on Optimization
  • Khan Academy — Multivariable Calculus: Critical Points & Lagrange Multipliers
  • MIT OCW — Optimization in Machine Learning

Next chapter: Gradient, Jacobian & Hessian in Neural Networks — practical applications in backpropagation and advanced ML optimization.

Post a Comment

Feel free to ask your query...
Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.