Model loader and tensor storage

Explains the complete model-admission sequence, runtime-owned storage variants, pre-resolved tensor indices, tokenizer selection, and load-time memory implications.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Explains the complete model-admission sequence, runtime-owned storage variants, pre-resolved tensor indices, tokenizer selection, and load-time memory implications.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Admission sequence

Parse fixed header and checksum.
Parse tensor directory and ranges.
Parse BTOK or BPE1 section.
Verify required tensor hashes.
Verify exact global and per-layer shapes.
Decode or copy each tensor into a typed storage variant.
Resolve global and per-layer indices.
Allocate KV cache, forward scratch, and logits.
Install the model into Runtime.

Storage enum

F32: Vec<f32>.
Q8: Vec<i8> plus row scales.
Q4: packed Vec<u8>, block scales, and block size.

Selective dequantization

Matrix operations dispatch directly by storage type. Small vectors and embedding rows can be copied into reusable f32 scratch. Borrowing a quantized tensor as an f32 slice is rejected, preventing accidental hidden full-model expansion.

Memory accounting gap

The runtime exposes scratch bytes and KV length but does not yet report total model heap, transfer peak, allocator overhead, browser ArrayBuffer duplication, or process/GPU memory.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

Model loader and tensor storage

Admission sequence #

Storage enum #

Selective dequantization #

Memory accounting gap #

Scope #

Engineering considerations #

Verification questions #

Admission sequence

Storage enum

Selective dequantization

Memory accounting gap

Scope

Engineering considerations

Verification questions