Lists required global and per-layer tensor names, exact shape relationships, tied-output behavior, and load-time index resolution.
Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.
Global tensors
tok_embeddings.weight: vocabulary × hidden.norm.weight: hidden.output.weight: vocabulary × hidden unless tied-output flag is set.
Per-layer tensors
For each layers.N prefix: attn_norm.weight, ffn_norm.weight, wq.weight, wk.weight, wv.weight, wo.weight, w1.weight, w2.weight, and w3.weight.
Shape relationships
- Q/K/V/O projections use hidden-width matrices in the current equal-head implementation.
- W1 and W3 project hidden to FFN; W2 projects FFN back to hidden.
- Norm vectors match hidden size.
- Embedding and output rows match vocabulary.
Resolved indices
During load, the model stores top-level and per-layer directory indices. Token execution therefore avoids formatting layer names and linearly scanning the directory on every forward pass.
Versioning risk
Names are an ABI. Renaming, transposition, bias introduction, GQA shapes, or alternative FFN layout requires a new model-type contract or format version.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?