F32 tensor path

Details little-endian float decoding, finite-value expectations, matrix-vector dispatch, memory size, and its role as a reference implementation.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Details little-endian float decoding, finite-value expectations, matrix-vector dispatch, memory size, and its role as a reference implementation.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Storage

Each four-byte little-endian word is decoded into an f32 and retained in a contiguous Rust vector. Exact byte length must equal element count × 4.

Execution

F32 matrices use a row-major scalar matrix-vector loop. Embedding rows and norm vectors can be borrowed or copied directly.

Reference role

F32 provides the simplest numerical reference for comparing q8_0 and q4_0 output. The packer and runtime tests compare direct quantized matvec results with values reconstructed into f32.

Cost

The TinyLM-16M f32 artifact is 68,194,944 bytes. Loading also needs source-transfer memory, Rust tensor vectors, KV cache, scratch, logits, and browser state. F32 should not be treated as the default low-memory browser target.

Validation requirement

Runtime parsing validates encoded length. Trained-source validation additionally requires finite, bounded, nonzero data. Product admission should publish numerical tolerances and reference logits.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?