Model-load and generation timing

Separates browser wall-clock measurements from runtime diagnostics, identifies what is and is not measured, and specifies a reproducible benchmark record.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Separates browser wall-clock measurements from runtime diagnostics, identifies what is and is not measured, and specifies a reproducible benchmark record.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Current host measurements

JavaScript measures model fetch/load duration and generation wall time. It divides generated token count by elapsed generation time for a displayed token rate.

Runtime fields

The inspected Rust path initializes model-load time and token speed fields but does not populate them with an internal clock. The browser presentation is therefore the active timing source.

Included work

Generation wall time includes the synchronous ABI call and Rust computation. Model-load timing may include fetch, copy, parser validation, tensor materialization, cache/scratch allocation, and provenance timing depending on exact boundaries in the UI.

Required benchmark

Record browser/version, OS, CPU, memory, power mode, artifact hash, quantization, prompt bytes/tokens, generated tokens, warm-up, repetitions, median, P95, time-to-first-token, decode rate, load phases, peak memory, and raw samples. Do not compare f32/q8/q4 unless all other variables match.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?