Resource budgets as first-class architecture

Memory, time, power, storage, and confidence budgets should drive runtime decisions instead of appearing only after implementation.

A small-model system is defined by its total resource behavior, not parameter count alone. Resource budgets belong in the architecture and the evidence model.

Budget dimensions

  • Artifact download and persistent storage.
  • Host and device model residency.
  • Context, KV cache, and intermediate buffers.
  • Initialization and steady-state latency.
  • CPU, GPU, power, and thermal limits.
  • Confidence and safe fallback thresholds.

Decision outcomes

Backend selection, precision, model variant, context limit, worker count, and no-op behavior should be traceable to the applicable budget and measured environment.