TinyLM-16M as a reference workload, not a support claim

A compact transformer profile can make runtime design concrete while all dimensions, sizes, quality, and performance remain subject to artifact-level verification.

A 16-million-parameter reference workload is small enough to make tensor layout, quantization, tokenizer design, KV-cache growth, and browser loading understandable. It is still not a universal model specification.

What to record

  • Canonical model source and revision.
  • Exact layer, hidden, head, vocabulary, and context dimensions.
  • Tokenizer files and special-token IDs.
  • Tensor data types, quantization variant, block or group size, and mixed-precision exceptions.
  • Artifact size, peak host memory, peak device memory, and context-dependent state.

Do not publish estimates as measurements

Parameter-derived file-size estimates and projected tokens per second are planning inputs. Published support requires measured data on named hardware and a reproducible implementation revision.