Learning

Interactive deep dives into ML concepts. Read the intuition, play with the demos.

How Optimizers Work

From vanilla gradient descent to Adam — the intuition behind SGD, Momentum, RMSProp, and AdamW, with interactive demos at every step.

A neural network warps the space your data lives in until the answer becomes obvious. The geometric intuition, with interactive demos.

Without nonlinearity, depth is useless. From sigmoid to GELU, how each activation shapes learning and why the field keeps inventing new ones.

Backpropagation

Walk through the chain rule step by step, from a simple computation graph to a full neural network pass.

Coming soon

Attention

How queries, keys and values interact — and why scaling by √d matters more than it sounds.

Coming soon