Installed, active, and parallel model counts

Separates cached capacity, resident capacity, and concurrent execution through explicit N-total, K-active, and K-parallel quantities.

Research
Last verified
Not verified
Updated
Reading time
1 minutes

Separates cached capacity, resident capacity, and concurrent execution through explicit N-total, K-active, and K-parallel quantities.

Architecture guide: this topic defines a modular tiny-model planning contract. It does not claim that model artifacts exist, are compatible, or execute on this WordPress site.

Three independent quantities

N-total
All installed or locally cached specialist artifacts and adapters.
K-active
The maximum number of roles resident or logically enabled for the current request.
K-parallel
The maximum number actually executing concurrently.

Why the distinction matters

Storage cost follows N-total, peak weight residency tends to follow K-active, and instantaneous compute contention follows K-parallel. A system with twelve cached specialists and one active route behaves differently from a twelve-model ensemble.

Required traces

Record installed, loaded, active, executing, cooling-down, retired, and failed states separately. Do not infer activity from cache presence.

Scope

This starter page defines the questions, boundaries, evidence, and failure modes that should be recorded before a capability is presented as supported.

Engineering considerations

  • Identify the source, version, target environment, and owner.
  • Separate observed values from estimates and externally reported values.
  • Record trade-offs, unsupported cases, and fallback behavior.
  • Link performance statements to a compatible benchmark methodology.

Verification questions

  • What exact artifact, revision, backend, and environment were reviewed?
  • Which assumptions could change the result?
  • Which data should be retained so another engineer can reproduce the conclusion?