Tensor naming and shape contracts

Lists required global and per-layer tensor names, exact shape relationships, tied-output behavior, and load-time index resolution.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Lists required global and per-layer tensor names, exact shape relationships, tied-output behavior, and load-time index resolution.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Global tensors

tok_embeddings.weight: vocabulary × hidden.
norm.weight: hidden.
output.weight: vocabulary × hidden unless tied-output flag is set.

Per-layer tensors

For each layers.N prefix: attn_norm.weight, ffn_norm.weight, wq.weight, wk.weight, wv.weight, wo.weight, w1.weight, w2.weight, and w3.weight.

Shape relationships

Q/K/V/O projections use hidden-width matrices in the current equal-head implementation.
W1 and W3 project hidden to FFN; W2 projects FFN back to hidden.
Norm vectors match hidden size.
Embedding and output rows match vocabulary.

Resolved indices

During load, the model stores top-level and per-layer directory indices. Token execution therefore avoids formatting layer names and linearly scanning the directory on every forward pass.

Versioning risk

Names are an ABI. Renaming, transposition, bias introduction, GQA shapes, or alternative FFN layout requires a new model-type contract or format version.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

Tensor naming and shape contracts

Global tensors #

Per-layer tensors #

Shape relationships #

Resolved indices #

Versioning risk #

Scope #

Engineering considerations #

Verification questions #

Global tensors

Per-layer tensors

Shape relationships

Resolved indices

Versioning risk

Scope

Engineering considerations

Verification questions