Q8_0 storage and matvec

Documents per-row signed-byte quantization, scale layout, direct matrix-vector execution, memory behavior, and numerical validation requirements.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Documents per-row signed-byte quantization, scale layout, direct matrix-vector execution, memory behavior, and numerical validation requirements.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Encoding

Each matrix element is stored as signed i8. One f32 scale is stored per row. The approximate weight is q × row_scale.

Direct dispatch

The kernel iterates each row, multiplies every signed byte by the input component and row scale, and writes an f32 output. It does not create a full decoded matrix.

Artifact size

The supplied 17,048,064-parameter q8 artifact is 17,160,000 bytes, approximately 25.2% of the corresponding f32 artifact.

Quality boundary

The artifact is deterministic smoke data, not a quantized trained assistant. Numerical validation proves kernel agreement with the project’s own dequantization path; it does not establish language-model quality retention.

Future optimization

Use blocked/vectorized dot products, explicit accumulator policy, calibration metadata, per-tensor quantization descriptors, and backend-specific reference tolerances.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

Q8_0 storage and matvec

Encoding #

Direct dispatch #

Artifact size #

Quality boundary #

Future optimization #

Scope #

Engineering considerations #

Verification questions #

Encoding

Direct dispatch

Artifact size

Quality boundary

Future optimization

Scope

Engineering considerations

Verification questions