Live demo

NolanGPT

Turns out "Attention is ALL you need"
A transformer trained on Christopher Nolan screenplays.
Built from scratch by Gaurav Jain using PyTorch. No APIs. Inspired by Karpathy's Build GPT2.
Tech spec
What you're interacting with
Architecture
GPT
Decoder-only transformer — same architecture as ChatGPT
Training data
1.4M
Characters across 8 Nolan screenplays
Training steps
70K
~57 minutes on Mac Mini M4
Final loss
1.27
Down from 4.71 at random initialisation
Parameters
~3.7M
GPT-3 has 175 billion. Same idea, 47,000x smaller.
Attention heads
8
Multi-head self-attention per block
Transformer blocks
4
Each block = attention + feedforward + layer norm
Embedding dim
128
Each token represented as a 128-dimensional vector
01
Token + Position Embeddings
Every character mapped to a 128-dim vector. Position embeddings added so the model knows where each character appears in the sequence.
02
Multi-Head Self-Attention
8 attention heads running in parallel. Each head learns different relationships — character names, dialogue patterns, scene structure. Based on "Attention Is All You Need" (Vaswani et al., 2017).
03
Feedforward + Residual
Each token independently processes what it learned from attention. Residual connections allow gradients to flow cleanly through 4 stacked blocks during training.
04
Trained on
Batman Begins · The Dark Knight · Inception · Interstellar · Dunkirk · Tenet · The Prestige · Oppenheimer. Character-level tokenisation — 97 unique tokens.
Live demo
Play with a (rudimentary) Tesseract
Prompt
Generated output NolanGPT · 3.7M params