Prompt token boundary and EOS handling

Defines how BOS/EOS are produced, why prefill removes the final EOS, when generation stops, and how the last prompt slot is treated.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Defines how BOS/EOS are produced, why prefill removes the final EOS, when generation stops, and how the last prompt slot is treated.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Tokenizer output

Both tokenizer paths produce an explicit sequence contract. The byte tokenizer uses token IDs 256–259 for BOS, EOS, PAD, and UNK. BPE1 retains the same reserved base and adds declared token-table entries.

Prefill slice

prompt_tokens_without_eos removes one terminal EOS before model prefill. BOS remains part of the model context.

Capacity rule

A prompt with length greater than or equal to max_context is rejected because generation requires at least one position beyond the prefill boundary. During decode, reaching capacity ends the loop without inventing an additional error.

Stop rule

Only token ID 257 is a structural stop token. It is not appended to generated output or the continuation token vector.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

Prompt token boundary and EOS handling

Tokenizer output #

Prefill slice #

Capacity rule #

Stop rule #

Scope #

Engineering considerations #

Verification questions #

Tokenizer output

Prefill slice

Capacity rule

Stop rule

Scope

Engineering considerations

Verification questions