Rotary positional embedding

Explains the in-place Q/K rotation, head dimensions, position input, theta contract, pairwise shape requirements, and context implications.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Explains the in-place Q/K rotation, head dimensions, position input, theta contract, pairwise shape requirements, and context implications.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Operation

Each head is processed in adjacent component pairs. A frequency derived from pair index, head width, and rope_theta produces sine and cosine values for the current token position; the pair is rotated in place.

Inputs

The operation requires an even vector length and positive theta. The model header supplies theta, while generation supplies the absolute cache position.

Current scope

There is no RoPE scaling, NTK extension, sliding-window remapping, alternate rotary layout, or partial rotary dimension. Context beyond the trained or configured range is not claimed.

Verification

Reference vectors should cover position zero, later positions, multiple head widths, numerical tolerances, and agreement between model converter and runtime convention.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?