TinyLM-16M reference workload

Uses a proposed compact transformer profile to explain model metadata, runtime operators, quantization, memory estimates, and test design without asserting verified throughput.

Research
Last verified
Not verified
Updated
Reading time
1 minutes

Uses a proposed compact transformer profile to explain model metadata, runtime operators, quantization, memory estimates, and test design without asserting verified throughput.

Reference profile

Research reports propose a compact decoder-only transformer around four layers and a 512-wide hidden state. Treat every dimension, tokenizer, parameter total, and artifact size as a candidate profile until an exact model source is verified.

Required runtime evidence

  • Embedding, attention, feed-forward, normalization, position encoding, output, and tokenizer tensor names and shapes.
  • Context and KV-cache layout.
  • Precision or quantization by tensor.
  • Peak memory under a named prompt and generation length.
  • Reference logits or token outputs for regression testing.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?