Q4_0 storage and matvec

Documents packed signed-nibble decoding, block scales, block-size constraints, direct execution, artifact size, and optimization risks.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Documents packed signed-nibble decoding, block scales, block-size constraints, direct execution, artifact size, and optimization risks.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Encoding

Two signed four-bit values are packed per byte. The low and high nibbles are decoded into the range represented by the project’s q4 convention. One f32 scale is stored per fixed-size block.

Shape constraints

Block size must be even and matrix columns must be divisible by it. Encoded bytes and scale counts must match rows, columns, and block size exactly.

Direct dispatch

The kernel walks blocks, decodes nibbles on demand, multiplies by the block scale and input vector, and accumulates f32 outputs. Full decoded matrices are not retained.

Artifact size

The supplied q4 TinyLM-16M artifact is 10,657,728 bytes, approximately 15.6% of f32 and 62.1% of q8.

Risk

Nibbles, signed conversion, block ordering, and scale indexing are format-critical. Cross-runtime fixtures, fuzzing, known tensor vectors, and quality evaluation are required before interchange claims.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

Q4_0 storage and matvec

Encoding #

Shape constraints #

Direct dispatch #

Artifact size #

Risk #

Scope #

Engineering considerations #

Verification questions #

Encoding

Shape constraints

Direct dispatch

Artifact size

Risk

Scope

Engineering considerations

Verification questions