Defines normalization math, epsilon handling, learned scales, residual ordering, scratch reuse, and numerical test expectations.
Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.
mean_square = sum(x[i] * x[i]) / N
inv_rms = 1 / sqrt(mean_square + epsilon)
y[i] = x[i] * inv_rms * scale[i]
Placement
The model uses pre-normalization: attention and FFN each consume a normalized copy while the original residual is updated by adding the projected block output.
Scratch ownership
A reusable normed buffer serves attention normalization and a separate FFN-normalized buffer serves the feed-forward block. No per-layer normalization vector is allocated during steady-state generation.
Failure checks
Input and scale lengths must match and cannot be empty. Product validation should include NaN/Inf behavior, epsilon extremes, denormal handling, and reference comparisons across quantized modes.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?