Container checksum and integrity limits

Explains the custom checksum algorithm, covered bytes, accidental-corruption value, cryptographic limitations, and required signed-manifest upgrade.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Explains the custom checksum algorithm, covered bytes, accidental-corruption value, cryptographic limitations, and required signed-manifest upgrade.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Current checksum

The header stores a 64-bit project-specific rolling checksum. Calculation starts from a fixed constant, treats checksum bytes 100–107 as zero, mixes each byte and index, rotates, and multiplies by the FNV prime.

What it catches

It detects ordinary truncation and many accidental byte changes before tensor decoding. The parser rejects zero or mismatched values as InvalidHeader.

What it does not prove

  • Publisher identity or authenticity.
  • Collision resistance against a motivated attacker.
  • License, model lineage, or quality.
  • Agreement with a sidecar manifest.

Required trust upgrade

Use SHA-256 over the final artifact, a signed canonical manifest binding path, size, SLM version, model dimensions, tokenizer hash, source lineage, quality scope, and runtime compatibility. Verify before calling load_model, then retain the internal checksum as a fast format-consistency check.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?