TinyLM-16M as a reference workload, not a support claim

A compact transformer profile can make runtime design concrete while all dimensions, sizes, quality, and performance remain subject to artifact-level verification.

Admin-72ADh

2026-06-25

Research Note

A 16-million-parameter reference workload is small enough to make tensor layout, quantization, tokenizer design, KV-cache growth, and browser loading understandable. It is still not a universal model specification.

What to record

Canonical model source and revision.
Exact layer, hidden, head, vocabulary, and context dimensions.
Tokenizer files and special-token IDs.
Tensor data types, quantization variant, block or group size, and mixed-precision exceptions.
Artifact size, peak host memory, peak device memory, and context-dependent state.

Do not publish estimates as measurements

Parameter-derived file-size estimates and projected tokens per second are planning inputs. Published support requires measured data on named hardware and a reproducible implementation revision.