microGPT in your browser

4,192-parameter transformer compiled from C to WebAssembly. Same trained weights as TALOS-V2's FPGA. Generates plausible names character-by-character via multinomial sampling. · overview · full report · source

Loading WebAssembly module…

Generate

— tok/sec — vs FPGA
Press a button to start.

How this compares

Same model, different substrates. The "your browser" row updates after you run the benchmark.

ImplementationSourcetok/secvs FPGA
MLX GPUmeasured locally, M4 Pro (24 GB)1,8650.04×
MLX CPUmeasured locally, M4 Pro (24 GB)3,8730.07×
pure Pythonmeasured locally, M4 Pro (24 GB)4,3320.08×
pure Python (M5 Pro)talos-vs-macbook README8,4910.16×
NumPy fp32measured locally, M4 Pro (24 GB)24,2230.46×
NumPy fp32 (M5 Pro)talos-vs-macbook README47,5890.90×
TALOS-V2 (FPGA, 56 MHz)v2.talos.wtf write-up · repo53,0001.00×
WASM browser (M4 Pro 24GB, regular Chrome)bench_runs.txt · raw runs1,341,20625.30×
WASM browser (M4 Pro 24GB, Electron preview)bench_runs.txt · raw runs2,038,13138.46×
your browser (WASM, live)live, this page
C fp32+AVX2 (Intel)microgpt-c README2,631,68949.65×
C+NEON Q4.12 (M4 Pro)measured locally, M4 Pro (24 GB)2,191,21941.34×
C+NEON Q4.12 (M5 Pro)talos-vs-macbook README3,373,95063.66×
C+NEON fp32 (M4 Pro)measured locally, M4 Pro (24 GB)3,820,76072.09×
C+NEON fp32 (M5 Pro)talos-vs-macbook README6,713,978126.68×
C+NEON ×14 streams (M4 Pro)measured locally, M4 Pro (24 GB)32,894,149620.6×
C+NEON ×18 streams (M5 Pro)talos-vs-macbook README85,967,8501,621.7×

"Source" indicates whether the number was measured locally on the M4 Pro (24 GB, 10P+4E) or quoted from the upstream repos as published. Important caveats: (1) cross-machine numbers (M5 Pro vs M4 Pro) are not apples-to-apples — M5 Pro single-stream is roughly 1.5–1.8× M4 Pro due to clock + IPC changes; (2) the C+NEON benchmark precomputes (token, pos) embedding + RMSNorm + Q/K/V lookup tables outside its timed loop, while the WASM forward computes that work every step. WASM-vs-C+NEON is therefore "browser WASM vs LUT-optimized native," not strict same-workload comparison.

What's running. The C source for the entire forward pass (~150 lines) is in microgpt_inf.c, compiled with emcc -O3 -msimd128 -ffast-math. Weights are 16,768 bytes of fp32 loaded into the WASM module at startup. The model architecture matches Karpathy's microGPT exactly: n_embd=16, n_head=4, block_size=16, vocab=27, one transformer block. Sampling uses temperature 0.5 multinomial.
Heads-up: your number depends on your browser's V8 build. The same .wasm binary on the same M4 Pro hardware measures ~1.34M tok/sec in regular Chrome 145 and ~2.04M tok/sec in Electron 41's bundled Chromium 146 — about a 50% delta from the runtime alone. Click Benchmark in your own browser to see what your combination of hardware + V8 produces. Both numbers are recorded in bench_runs.txt.