Defines how BOS/EOS are produced, why prefill removes the final EOS, when generation stops, and how the last prompt slot is treated.
Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.
Tokenizer output
Both tokenizer paths produce an explicit sequence contract. The byte tokenizer uses token IDs 256–259 for BOS, EOS, PAD, and UNK. BPE1 retains the same reserved base and adds declared token-table entries.
Prefill slice
prompt_tokens_without_eos removes one terminal EOS before model prefill. BOS remains part of the model context.
Capacity rule
A prompt with length greater than or equal to max_context is rejected because generation requires at least one position beyond the prefill boundary. During decode, reaching capacity ends the loop without inventing an additional error.
Stop rule
Only token ID 257 is a structural stop token. It is not appended to generated output or the continuation token vector.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?