Model-load transaction and rollback semantics

Traces exact model admission from raw bytes through parsing, tensor materialization, scratch and KV allocation, and the state cleared when any stage fails.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Traces exact model admission from raw bytes through parsing, tensor materialization, scratch and KV allocation, and the state cleared when any stage fails.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Transaction stages

  1. Model::load parses SLM1 metadata and validates required tensors and shapes.
  2. The loader selects BTOK or BPE1 and materializes f32, q8_0, or q4_0 storage.
  3. ForwardScratch::new allocates reusable compute buffers and rejects unequal attention and KV head counts.
  4. KvCache::new allocates key and value stores for the complete declared context.
  5. The runtime allocates one reusable logits vector with vocab_size elements.

Commit point

The model becomes visible only after all three model, scratch, and cache stages succeed. Token and generation state are cleared, the tokenizer is cloned from the model, diagnostics report the quantization mode, and the result becomes model loaded.

Rollback

Any parser, storage, scratch, or cache failure invokes clear_model_state, marks model_loaded=false, records a stable error message, and leaves no partially accepted model.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?