SwiGLU feed-forward network

Defines W1/W3 projections, SiLU gate, elementwise product, W2 projection, dimensional contracts, and memory/performance characteristics.

Experimental

Last verified: 2026-06-25 00:00 UTC
Updated: 2026-06-25
Reading time: 2 minutes

Defines W1/W3 projections, SiLU gate, elementwise product, W2 projection, dimensional contracts, and memory/performance characteristics.

Implementation evidence: this topic is grounded in the reviewed GGUF.MiRust.com source snapshot. It documents observed code and artifacts without claiming broad deployment, model quality, or production readiness.

Feed-forward path

gate = W1 × x
up   = W3 × x
hidden[i] = silu(gate[i]) × up[i]
out = W2 × hidden

Shapes

W1 and W3 produce ffn_size values from the hidden state. W2 returns to hidden_size. The inspected TinyLM-16M shape uses hidden 512 and FFN 2048.

Activation

SiLU is implemented directly as x / (1 + exp(-x)). The elementwise gate and up projection are combined into reusable FFN scratch.

Cost concentration

Three large matrix-vector products per layer make FFN weights and memory bandwidth a dominant decode cost. q8/q4 direct matvec reduces bytes read but remains scalar.

Tests

Core operations include known-vector tests and shape-failure tests. Trained-model validation still requires reference layer outputs and tolerance budgets.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

Identify the source, version, target environment, and owner.
Separate observed values from estimates and externally reported values.
Record trade-offs, unsupported cases, and fallback behavior.
Link performance statements to a compatible benchmark methodology.

Verification questions

What exact artifact, revision, backend, and environment were reviewed?
Which assumptions could change the result?
Which data should be retained so another engineer can reproduce the conclusion?

SwiGLU feed-forward network

Shapes #

Activation #

Cost concentration #

Tests #

Scope #

Engineering considerations #

Verification questions #

Shapes

Activation

Cost concentration

Tests

Scope

Engineering considerations

Verification questions