Model loader and tensor storage

Explains the complete model-admission sequence, runtime-owned storage variants, pre-resolved tensor indices, tokenizer selection, and load-time memory implications.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Explains the complete model-admission sequence, runtime-owned storage variants, pre-resolved tensor indices, tokenizer selection, and load-time memory implications.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Admission sequence

  1. Parse fixed header and checksum.
  2. Parse tensor directory and ranges.
  3. Parse BTOK or BPE1 section.
  4. Verify required tensor hashes.
  5. Verify exact global and per-layer shapes.
  6. Decode or copy each tensor into a typed storage variant.
  7. Resolve global and per-layer indices.
  8. Allocate KV cache, forward scratch, and logits.
  9. Install the model into Runtime.

Storage enum

  • F32: Vec<f32>.
  • Q8: Vec<i8> plus row scales.
  • Q4: packed Vec<u8>, block scales, and block size.

Selective dequantization

Matrix operations dispatch directly by storage type. Small vectors and embedding rows can be copied into reusable f32 scratch. Borrowing a quantized tensor as an f32 slice is rejected, preventing accidental hidden full-model expansion.

Memory accounting gap

The runtime exposes scratch bytes and KV length but does not yet report total model heap, transfer peak, allocator overhead, browser ArrayBuffer duplication, or process/GPU memory.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?