Diagnostics schema and measurement ownership

Separates runtime-generated counters from browser wall-clock measurements and defines which fields can support a benchmark record.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Separates runtime-generated counters from browser wall-clock measurements and defines which fields can support a benchmark record.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Runtime fields

The no-serde JSON contains model-load time placeholder, prompt and generated token counts, runtime tokens per second placeholder, peak scratch bytes, KV length, quantization label, last error, model-loaded state, tokenizer IDs, logits summary, selected token, five top candidates, and active sampling values.

Host fields

The current browser measures model fetch-plus-load and synchronous generation using performance.now(). It calculates displayed tokens per second from generated count divided by browser elapsed time.

Important limit

Runtime tokens_per_second remains zero in the inspected source; browser timing includes JavaScript boundary and decode work but is not a controlled benchmark. A publishable record still needs warm-up, repetitions, median/P95, hardware, browser, power mode, prompt, raw samples, and source hashes.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?