Separates cached capacity, resident capacity, and concurrent execution through explicit N-total, K-active, and K-parallel quantities.
Architecture guide: this topic defines a modular tiny-model planning contract. It does not claim that model artifacts exist, are compatible, or execute on this WordPress site.
Three independent quantities
- N-total
- All installed or locally cached specialist artifacts and adapters.
- K-active
- The maximum number of roles resident or logically enabled for the current request.
- K-parallel
- The maximum number actually executing concurrently.
Why the distinction matters
Storage cost follows N-total, peak weight residency tends to follow K-active, and instantaneous compute contention follows K-parallel. A system with twelve cached specialists and one active route behaves differently from a twelve-model ensemble.
Required traces
Record installed, loaded, active, executing, cooling-down, retired, and failed states separately. Do not infer activity from cache presence.
Scope
This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.
Engineering considerations
- Identify the source, version, target environment, and owner.
- Separate observed values from estimates and externally reported values.
- Record trade-offs, unsupported cases, and fallback behavior.
- Link performance statements to a compatible benchmark methodology.
Verification questions
- What exact artifact, revision, backend, and environment were reviewed?
- Which assumptions could change the result?
- Which data should be retained so another engineer can reproduce the conclusion?