A 16-million-parameter reference workload is small enough to make tensor layout, quantization, tokenizer design, KV-cache growth, and browser loading understandable. It is still not a universal model specification.
What to record
- Canonical model source and revision.
- Exact layer, hidden, head, vocabulary, and context dimensions.
- Tokenizer files and special-token IDs.
- Tensor data types, quantization variant, block or group size, and mixed-precision exceptions.
- Artifact size, peak host memory, peak device memory, and context-dependent state.
Do not publish estimates as measurements
Parameter-derived file-size estimates and projected tokens per second are planning inputs. Published support requires measured data on named hardware and a reproducible implementation revision.