A field guide to modern AI
Intelligence,
reduced to a single question:
what comes next?
Strip away the marketing and every large language model does one humble thing, over and over, billions of times a second. It reads what came before and guesses the next fragment of text. Watch one do it.
The mechanism
A model doesn't know things. It weighs them.
There is no fact lookup and no little person inside. There is a pipeline, four moves repeated until the text runs out.
-
tokenize
Text becomes numbers
Your sentence is sliced into tokens — roughly word-pieces — and each becomes an ID. A model never sees letters, only positions in a vocabulary.
-
embed
Numbers become meaning
Each token is mapped to a long vector of numbers. Words used in similar ways land near each other in this space, so geometry starts to stand in for meaning.
-
attend
Context gets mixed
Attention lets every token look at every other and decide what matters. "It" learns which noun it points back to; a verb finds its subject.
-
predict
Meaning becomes a guess
The mixed signal is projected onto the whole vocabulary, yielding a probability for every possible next token. One is sampled. Then the loop runs again.
A large model is a compression of much of what people have written, shaped until the most likely continuation is usually the useful one.
The vocabulary
Eight words that unlock the rest
The field hides plain ideas behind borrowed jargon. Here they are in plain terms.
How we got here
Seventy years, four breakthroughs
The dates are real chronology — each entry is the moment a stuck idea came unstuck.
Your turn
Feel the probabilities shift
Pick a sentence opening. This isn't a real model — it's a hand-built toy with scripted odds — but the behaviour is honest: change the context, change the distribution, and the most likely next token changes with it.
The cat sat on the ▍
In fairness
What prediction can't promise
It can be confidently wrong
A high-probability next token is the most plausible one, not the most true one. When the training data is thin, fluent fiction fills the gap.
It has no clock and no memory
Each request starts cold. Whatever it seems to "remember" was either packed into the prompt or baked in during training, frozen at a cutoff date.
It reflects its data
Patterns include the biases of the people who wrote the text. Alignment work pushes back, but the substrate is us — flaws included.