Optimizers
How Optimizers Work
From vanilla gradient descent to Adam — the intuition behind SGD, Momentum, RMSProp, and AdamW, with interactive demos at every step.
Interactive deep dives into ML concepts. Read the intuition, play with the demos.
From vanilla gradient descent to Adam — the intuition behind SGD, Momentum, RMSProp, and AdamW, with interactive demos at every step.
A neural network warps the space your data lives in until the answer becomes obvious. The geometric intuition, with interactive demos.
Without nonlinearity, depth is useless. From sigmoid to GELU, how each activation shapes learning and why the field keeps inventing new ones.
Walk through the chain rule step by step, from a simple computation graph to a full neural network pass.
How queries, keys and values interact — and why scaling by √d matters more than it sounds.