Why twelve installed models should not mean twelve active models

Separate installed capacity, active residency, and concurrent execution before estimating a modular tiny-model system.

A model catalog can contain many narrow specialists without executing all of them. Treating “downloaded,” “resident,” “active,” and “executing” as synonyms produces misleading resource claims.

Use three counts

N_total records installed artifacts, K_active bounds the current active set, and K_parallel bounds simultaneous execution. Storage follows the first; peak residency follows the second; compute contention follows the third.

Design consequence

A sparse router with twelve cached specialists and one active model can be lighter at inference than a three-model committee. Model count alone is not an architecture metric.

Evidence

Trace state transitions explicitly and measure cold load, warm activation, peak memory, latency, and quality for the exact route.