A small-model system is defined by its total resource behavior, not parameter count alone. Resource budgets belong in the architecture and the evidence model.
Budget dimensions
- Artifact download and persistent storage.
- Host and device model residency.
- Context, KV cache, and intermediate buffers.
- Initialization and steady-state latency.
- CPU, GPU, power, and thermal limits.
- Confidence and safe fallback thresholds.
Decision outcomes
Backend selection, precision, model variant, context limit, worker count, and no-op behavior should be traceable to the applicable budget and measured environment.