Implementation evidence / Source snapshot 2026-06-25
The system, as implemented.
This section documents the inspected GGUF.MiRust.com source package at code, binary-format, runtime, browser, test, and artifact level. It replaces architecture-by-aspiration with an explicit inventory of what exists, what was independently rechecked, and what remains absent.
bd47fe8e91db5f5e5674ae4d77520b44248ab52117b170b345e1a0c87e6366291e017449eea9f70916ef5afd225d0ff7a4d3dc68292b41b6c7ddd81b914191e0Evidence boundary
What this release can now say
Statements below are grounded in the inspected source archive. “Implemented” means source and artifacts are present. “Rechecked” means an additional local check was run against the supplied bytes. Neither status means production deployment or assistant quality.
Browser-local Rust/WASM runtime
A 114 KiB compiled WebAssembly module exposes a raw C-style ABI for allocation, model loading, generation, diagnostics, reset, stepping, and model release.
Custom .slm v1 model container
The parser validates a 108-byte header, tokenizer section, 64-byte tensor entries, aligned tensor payloads, dtype contracts, shapes, and a custom non-cryptographic checksum.
Scalar transformer execution
The runtime executes embedding lookup, RMSNorm, Q/K/V projection, RoPE, causal attention, KV caching, output projection, residual flow, SwiGLU FFN, final normalization, logits, and sampling.
F32, Q8_0, and Q4_0 storage
Quantized matrices retain packed values and scales, dispatch directly to quantized matrix-vector kernels, and avoid full decoded f32 shadow copies.
WASM ABI recovery path
The supplied q8 artifact passed a local Node WebAssembly smoke for invalid UTF-8, null pointers, invalid lengths, sampling rejection, recovery generation, step-token generation, model free, and post-free rejection.
Modular Teleodynamic runtime
The source does not yet implement model registries, multi-model routing, adapters, structural operators, endogenous resource transitions, or a Teleodynamic slow loop. Those remain next-system work.
Actual workspace
Four zero-dependency Rust crates
tinyrustlm-runtime
WebAssembly and native library containing the model parser, tokenizer, tensors, quantized kernels, transformer execution, sampling, diagnostics, and eval runner.
tinyrustlm-slm-pack
Writes fixtures, validates .slm admission, validates provenance manifests, converts raw f32 trained sources, emits quantized variants, and enforces quality gates.
tinyrustlm-local-server
Loopback-only static server supporting GET and HEAD, percent decoding, traversal rejection, content types, and the app/WASM/model route surface.
tinyrustlm-browser-harness
Static contract crawler and optional loopback probe checking required DOM identifiers, local-only policy, WASM call markers, model routes, manifests, and content types.
Dependency finding: The workspace lockfile contains only these four local packages. No third-party Rust crates are declared. This narrows supply-chain exposure but shifts all parser, math, server, and test-harness correctness into project-owned code.
Execution path
Prompt to token, without a framework
- 01
Boot static assets
Handwritten JavaScript fetches the local WASM module and instantiates it with no imports.
- 02
Transfer the selected artifact
The browser fetches the entire local
.slmfile, allocates equal-size WASM memory, copies all bytes, and callsload_model. - 03
Validate and materialize
Rust checks magic, version, header, checksum, tokenizer, tensor directory, ranges, dtypes, required names, and exact shapes; then creates runtime-owned tensor storage.
- 04
Encode and prefill
The selected BTOK or BPE1 tokenizer adds BOS/EOS. Generation removes the terminal EOS and forwards each prompt token to build the KV cache.
- 05
Decode autoregressively
Each new token runs the full four-layer scalar transformer, samples from logits, appends to runtime state, and stops at EOS, context capacity, or the requested token count.
- 06
Return bounded state
The browser reads result bytes and diagnostics JSON from WASM memory, renders the response, benchmark panel, provenance sidecar, and local transcript.
Binary contract
.slm v1 at byte level
| Offset | Field | Type | Meaning |
|---|---|---|---|
| 0 | Magic | 4 bytes | ASCII SLM1 |
| 4 | Version | u32 LE | Must equal 1 |
| 8 | Header length | u32 LE | At least 108 bytes |
| 16 | Flags | u32 LE | Bit 0 permits tied output projection |
| 20–52 | Model dimensions | u32 LE | Vocabulary, layers, heads, KV heads, head dimension, FFN, context |
| 56 | RoPE theta | f32 LE | Rotary frequency base |
| 60 | RMS epsilon | f32 LE | Normalization epsilon |
| 64–80 | Tokenizer and directory | u64 LE | Offsets and tokenizer length |
| 88 | Tensor count | u32 LE | Number of 64-byte directory entries |
| 92 | Tensor-data offset | u64 LE | Must be 64-byte aligned |
| 100 | Checksum | u64 LE | Custom checksum with bytes 100–107 treated as zero |
Each 64-byte tensor entry stores an FNV-1a name hash, dtype, rank, four dimensions, data offset and length, quantization-scale offset, and block size. Supported dtypes are f32 = 1, q8_0 = 2, and q4_0 = 3.
Artifact inventory
Eight local model files, zero trained-quality claims
| Artifact | Parameters | Bytes | Precision | Admission |
|---|---|---|---|---|
| TinyLM-16M f32 | 17,048,064 | 68,194,944 | f32 | Runtime smoke only |
| TinyLM-16M q8 | 17,048,064 | 17,160,000 | q8_0 | Runtime smoke only |
| TinyLM-16M q4 | 17,048,064 | 10,657,728 | q4_0 | Runtime smoke only |
| Tiny fixture f32 | 4,824 | 20,352 | f32 | Runtime smoke only |
| Tiny fixture q8 | 4,824 | 8,832 | q8_0 | Runtime smoke only |
| Tiny fixture q4 | 4,824 | 6,592 | q4_0 | Runtime smoke only |
| Tiny BPE fixture | 4,856 | 20,544 | f32 | Runtime smoke only |
| Tiny tied-output fixture | 2,744 | 11,968 | f32 | Runtime smoke only |
Quality boundary: Every supplied manifest identifies its source kind as deterministic-smoke, declares no trained-quality claim, and requires replacement by a trained or evaluated model before a product-quality claim.
Critical implementation audit
Current gaps are now first-class documentation
No GGUF loader
Despite the project/domain name, the active runtime parses the custom .slm format, not GGUF. GGUF compatibility is not present in the inspected source.
Main-thread inference
The browser JavaScript calls blocking WASM exports directly. No Web Worker, WASM thread pool, SharedArrayBuffer path, or streaming token event loop is currently wired into the app.
Scalar CPU only
The executed path uses handwritten scalar Rust loops. No WebGPU, WebNN, SIMD-specific kernel, BLAS backend, or device dispatch abstraction is implemented.
Full-file transfer
The app fetches and copies the complete artifact into WASM memory. The transfer allocator rejects values over 128 MiB. There is no range loading, model sharding, Cache API, IndexedDB, or OPFS model store.
No GQA/MQA execution
The file header carries separate attention and KV-head counts, but forward scratch currently requires them to be equal. Grouped-query and multi-query attention are rejected.
Non-cryptographic container checksum
The internal checksum detects accidental byte changes but is not an authenticity mechanism. Artifact SHA-256 exists in external records; the browser does not verify it before admission.
No modular model system
One model is loaded at a time. There is no skill manifest ABI, module registry, router, cascade, committee, adapter hot-swap, shared base, or dependency resolver.
No Teleodynamic control loop
There is no endogenous resource variable, candidate structural action evaluator, activation/retirement operator, no-op decision trace, phase detector, or co-evolving structural state.
Memory package drift
The implementation README and AGENTS file route agents through a .uai/ directory, but the supplied source archive contains workspace.uai and no .uai/ directory. This is recorded as a source-package handoff defect.
Operational contract
Source files are now connected to runbooks
MiRust 1.5.0 adds a second implementation documentation root that follows the actual runtime through ownership, memory transfer, model admission, generation transactions, failure recovery, packer gates, browser controls, test contracts, release operations, and the concrete extension points for GGUF, modular skills, and a Teleodynamic controller.
Runtime ownership
One process-global Mutex<Option<Runtime>> owns the active model, KV cache, tokenizer, scratch buffers, logits, sampling state, result bytes, and diagnostics.
Raw buffer contract
Host code allocates, copies, calls, reads, and deallocates under a 128 MiB single-transfer ceiling. Pointer and capacity identity are part of the ABI.
Error-preserving state
Boundary failures write a stable message and diagnostics without automatically discarding an accepted model; model rejection and explicit release clear model-owned state.
Worker and verified store
The next operational milestone moves blocking generation off the UI thread and admits cached artifacts only after cryptographic identity and manifest policy checks.
Read the implementation operations handbook Browse all operational contracts
Next implementation program
From single deterministic runtime to modular Teleodynamic system
- Gate A
Trained model and reproducible quality evidence
Import an exact trained source, retain source checksums and tokenizer identity, convert all precision modes, execute task and safety evaluations, and publish scoped results.
- Gate B
Worker-owned incremental runtime
Move WASM execution off the UI thread, expose cancellation and token streaming, define transferable-buffer ownership, and preserve deterministic recovery semantics.
- Gate C
Persistent artifact store and cryptographic admission
Add chunked downloads, resumability, Cache API or OPFS persistence, SHA-256 verification, signed manifests, quotas, eviction, and transactional activation.
- Gate D
Skill package and module registry
Define exact base/tokenizer compatibility, artifact identity, input/output contracts, dependencies, authority, quality evidence, and per-module resource estimates.
- Gate E
Modular orchestration
Implement sparse routing first; then bounded cascades, pipelines, committees, adapters, and explicit uncertainty/fallback contracts.
- Gate F
Teleodynamic slow loop
Add a measured resource state, local candidate-action score, activate/deactivate/replace/retire/reserve/no-op operators, decision traces, phase diagnostics, ablations, and falsification tests.
Site/runtime boundary: MiRust.com publishes this implementation map and evidence model. It does not embed the supplied Rust source, WebAssembly binary, or model artifacts in the WordPress theme or plugin. The separate implementation project remains the executable authority.