A field guide to modern AI

Intelligence,
reduced to a single question:
what comes next?

Strip away the marketing and every large language model does one humble thing, over and over, billions of times a second. It reads what came before and guesses the next fragment of text. Watch one do it.

See how it works Make a prediction

model.generate() predicting…

The mechanism

A model doesn't know things. It weighs them.

There is no fact lookup and no little person inside. There is a pipeline, four moves repeated until the text runs out.

tokenize
Text becomes numbers

Your sentence is sliced into tokens — roughly word-pieces — and each becomes an ID. A model never sees letters, only positions in a vocabulary.
embed
Numbers become meaning

Each token is mapped to a long vector of numbers. Words used in similar ways land near each other in this space, so geometry starts to stand in for meaning.
attend
Context gets mixed

Attention lets every token look at every other and decide what matters. "It" learns which noun it points back to; a verb finds its subject.
predict
Meaning becomes a guess

The mixed signal is projected onto the whole vocabulary, yielding a probability for every possible next token. One is sampled. Then the loop runs again.

A large model is a compression of much of what people have written, shaped until the most likely continuation is usually the useful one.

The vocabulary

Eight words that unlock the rest

The field hides plain ideas behind borrowed jargon. Here they are in plain terms.

How we got here

Seventy years, four breakthroughs

The dates are real chronology — each entry is the moment a stuck idea came unstuck.

Your turn

Feel the probabilities shift

Pick a sentence opening. This isn't a real model — it's a hand-built toy with scripted odds — but the behaviour is honest: change the context, change the distribution, and the most likely next token changes with it.

The cat sat on the ▍

In fairness

What prediction can't promise

It can be confidently wrong

A high-probability next token is the most plausible one, not the most true one. When the training data is thin, fluent fiction fills the gap.

It has no clock and no memory

Each request starts cold. Whatever it seems to "remember" was either packed into the prompt or baked in during training, frozen at a cutoff date.

It reflects its data

Patterns include the biases of the people who wrote the text. Alignment work pushes back, but the substrate is us — flaws included.

Intelligence, reduced to a single question: what comes next?

A model doesn't know things. It weighs them.

Text becomes numbers

Numbers become meaning

Context gets mixed

Meaning becomes a guess