Implementation benchmark record schema

Specifies the raw fields needed to turn the browser’s current load and generation display into reproducible implementation evidence.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Specifies the raw fields needed to turn the browser’s current load and generation display into reproducible implementation evidence.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Identity

Runtime commit and WASM SHA-256, artifact SHA-256 and SLM checksum, manifest checksum, model dimensions, tokenizer, quantization, browser build, OS, CPU/GPU, memory, power mode, and site build.

Workload

Exact prompt bytes or approved corpus hash, prompt-token count, max new tokens, generated-token IDs, sampling configuration, context state, warm-up policy, cache state, and whether load time is cold, disk-cached, or memory-resident.

Measurements

Fetch, verification, copy, parse/load, prefill, per-token decode, total generation, median, P95, tokens per second, peak JS heap, WASM memory, model storage, scratch, KV, and GPU memory when applicable.

Retention

Store machine-readable raw samples, harness output, environment capture, failures, and reproduction commands. The accessible website table should be generated from the same record.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?