Defines W1/W3 projections, SiLU gate, elementwise product, W2 projection, dimensional contracts, and memory/performance characteristics.
Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.
gate = W1 × x
up = W3 × x
hidden[i] = silu(gate[i]) × up[i]
out = W2 × hidden
Shapes
W1 and W3 produce ffn_size values from the hidden state. W2 returns to hidden_size. The inspected TinyLM-16M shape uses hidden 512 and FFN 2048.
Activation
SiLU is implemented directly as x / (1 + exp(-x)). The elementwise gate and up projection are combined into reusable FFN scratch.
Cost concentration
Three large matrix-vector products per layer make FFN weights and memory bandwidth a dominant decode cost. q8/q4 direct matvec reduces bytes read but remains scalar.
Tests
Core operations include known-vector tests and shape-failure tests. Trained-model validation still requires reference layer outputs and tolerance budgets.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?