A model catalog can contain many narrow specialists without executing all of them. Treating “downloaded,” “resident,” “active,” and “executing” as synonyms produces misleading resource claims.
Use three counts
N_total records installed artifacts, K_active bounds the current active set, and K_parallel bounds simultaneous execution. Storage follows the first; peak residency follows the second; compute contention follows the third.
Design consequence
A sparse router with twelve cached specialists and one active model can be lighter at inference than a three-model committee. Model count alone is not an architecture metric.
Evidence
Trace state transitions explicitly and measure cold load, warm activation, peak memory, latency, and quality for the exact route.