Documents greedy mode, stochastic constraints, top-k retention, top-p truncation, fallback behavior, and the fixed 1,024-candidate memory bound.
Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.
Defaults
Temperature 0, top-k 1, top-p 1, and seed 1 select deterministic argmax.
Validation
Temperature must be finite and nonnegative. Top-p must be finite and in (0, 1]. Top-k must be between 1 and 1,024.
Candidate algorithm
The sampler retains the highest top-k logits in two fixed-size stack arrays, applies max-shifted temperature exponentiation, truncates at the cumulative top-p threshold, and draws from the retained weight mass.
Recovery
Non-finite or zero probability mass falls back to greedy selection. The fixed cap bounds transient sampling memory independently of vocabulary size.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?