Custom model container as a proposal

Explains how a self-describing model container could record magic, version, hyperparameters, tokenizer data, tensor metadata, offsets, data types, and integrity while remaining a proposal.

Research
Last verified
Not verified
Updated
Reading time
1 minutes

Explains how a self-describing model container could record magic, version, hyperparameters, tokenizer data, tensor metadata, offsets, data types, and integrity while remaining a proposal.

Proposed fields

A compact container may include a magic value, format version, endianness, alignment, model family, dimensions, tokenizer section, tensor directory, offsets, lengths, data types, quantization descriptors, integrity hashes, and optional provenance.

Compatibility rule

A file extension does not establish interoperability. Publish a versioned schema, malformed-file tests, overflow checks, alignment rules, unknown-field behavior, and conversion provenance before presenting the format as supported.

Naming caution

Choose an extension only after checking collision, ecosystem, and trademark concerns. The research label “.slm” is a proposal, not a MiRust standard.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?