SwiGLU feed-forward network

Defines W1/W3 projections, SiLU gate, elementwise product, W2 projection, dimensional contracts, and memory/performance characteristics.

Experimental
Last verified
2026-06-25 00:00 UTC
Updated
Reading time
2 minutes

Defines W1/W3 projections, SiLU gate, elementwise product, W2 projection, dimensional contracts, and memory/performance characteristics.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Feed-forward path
gate = W1 × x
up   = W3 × x
hidden[i] = silu(gate[i]) × up[i]
out = W2 × hidden

Shapes

W1 and W3 produce ffn_size values from the hidden state. W2 returns to hidden_size. The inspected TinyLM-16M shape uses hidden 512 and FFN 2048.

Activation

SiLU is implemented directly as x / (1 + exp(-x)). The elementwise gate and up projection are combined into reusable FFN scratch.

Cost concentration

Three large matrix-vector products per layer make FFN weights and memory bandwidth a dominant decode cost. q8/q4 direct matvec reduces bytes read but remains scalar.

Tests

Core operations include known-vector tests and shape-failure tests. Trained-model validation still requires reference layer outputs and tolerance budgets.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?