Storage-dispatched tensor operations

Explains how one model API selects f32, q8_0, or q4_0 behavior without decoding every quantized tensor to a permanent float copy.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Explains how one model API selects f32, q8_0, or q4_0 behavior without decoding every quantized tensor to a permanent float copy.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Storage enum

Each loaded tensor is one of three runtime-owned variants: f32 values, q8 signed bytes plus row scales, or q4 packed nibbles plus block scales and a block size.

Operation dispatch

Embedding and norm reads call copy helpers. Matrix operations call a storage-aware matvec method. Quantized matrix rows are consumed directly by q8 or q4 kernels; only requested rows or vectors are expanded into caller-provided scratch when required.

Evidence boundary

This is compact runtime storage, not zero-copy file mapping. The parser copies accepted bytes into Rust-owned vectors. The browser transfer buffer is released only after loading returns.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

Storage-dispatched tensor operations

Storage enum #

Operation dispatch #

Evidence boundary #

Scope #

Engineering considerations #

Verification questions #

Storage enum

Operation dispatch

Evidence boundary

Scope

Engineering considerations

Verification questions