The current runtime executes an ordinary decoder-only transformer one token at a time through handwritten scalar Rust loops compiled to WebAssembly.
Layer order
Embedding → attention RMSNorm → Q/K/V → RoPE → KV store → causal attention → WO → residual → FFN RMSNorm → W1/W3 → SwiGLU → W2 → residual. Final RMSNorm and output projection produce logits.
Current bottleneck
There is no batched prefill, SIMD-specific kernel, WebGPU dispatch, or worker isolation. Quantization primarily reduces memory bandwidth; it does not change the scalar execution topology.