Specifies every v1 fixed-header field, offset, type, validation rule, and compatibility implication.
Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.
| Offset | Bytes | Field | Validation |
|---|---|---|---|
| 0 | 4 | Magic | Exactly SLM1. |
| 4 | 4 | version | Exactly 1. |
| 8 | 4 | header_length | At least 108 and not beyond file. |
| 12 | 4 | model_type | Stored; current loader supports the implemented transformer contract. |
| 16 | 4 | flags | Bit 0 is tied output. |
| 20 | 4 | vocab_size | Must agree with tokenizer. |
| 24 | 4 | special_token_count | Tokenizer metadata. |
| 28 | 4 | hidden_size | Nonzero. |
| 32 | 4 | layer_count | Nonzero. |
| 36 | 4 | head_count | Nonzero. |
| 40 | 4 | kv_head_count | Nonzero; current forward path also requires equality with head_count. |
| 44 | 4 | head_dim | Nonzero and head_count × head_dim = hidden_size. |
| 48 | 4 | ffn_size | Nonzero. |
| 52 | 4 | max_context | Nonzero. |
| 56 | 4 | rope_theta | f32 bits. |
| 60 | 4 | rms_norm_epsilon | f32 bits. |
| 64 | 8 | tokenizer_offset | Range must fit file. |
| 72 | 8 | tokenizer_length | Range must fit file. |
| 80 | 8 | tensor_directory_offset | Directory range must fit file. |
| 88 | 4 | tensor_count | Multiplied by 64 with checked range. |
| 92 | 8 | tensor_data_offset | Must be divisible by 64. |
| 100 | 8 | checksum | Nonzero and equal custom checksum. |
Forward compatibility
The parser allows a header length greater than 108 but does not parse extension fields. A future version should define extension ownership, unknown-flag behavior, minimum reader version, and canonical serialization.
Integer safety
Offset plus length uses checked arithmetic. u64 offsets are converted to host usize with explicit failure. Header dimensions are not trusted to size allocations until later shape checks.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?