Interactive Gradient Descent Visualization: How Derivatives Drive Weight Updates
Gradient descent is the backbone of modern machine learning. But understanding why it works — how a simple derivative tells the model which direction to adjust — can be tricky without seeing it in action. This interactive visualization lets you step through the process one iteration at a time.
What You'll See
The visualization below presents three carefully chosen scenarios that reveal the core mechanics of gradient descent:
- Scenario 1 — Underprediction: When the prediction ŷ is below the target y=1, the derivative dL/dw is negative. The update rule increases w, pushing the prediction upward.
- Scenario 2 — Overprediction: When ŷ overshoots the target y=0, the derivative is positive. The update rule decreases w.
- Scenario 3 — Oscillation: With an excessively large learning rate (α=4.0), the weight overshoots repeatedly before converging.
Interactive Demo
Click through the iterations or hit Auto Play to watch the optimization unfold.
Key Takeaways
When the prediction is too low, moving w in the positive direction reduces the loss.
When the prediction is too high, moving w in the negative direction reduces the loss.
Too large a learning rate causes oscillation. Too small and convergence is painfully slow.
Want to use premium AI tools like ChatGPT Plus or Gemini Advanced? Check out SlashSub for the best subscription deals.