API Load Prevention Rollout (CivicOS OpenClaw)
API Load Prevention Rollout (CivicOS OpenClaw)
Objective
Reduce external API token usage and spend while preserving response quality for operations work.
Targets (first 14 days)
- API token reduction: >= 60%
- API spend reduction: >= 50%
- Quality parity on sampled outputs: >= 95%
- Prompt-cache hit rate (eligible calls): >= 90%
- P95 latency (mixed workload): <= baseline + 10%
Architecture Controls
- Tier 0 Local Route
- Complexity score < 0.35 => local model only
- Tier 1 Semantic Cache
- Similarity >= 0.88 and within TTL => return cached response
- Tier 2 API Route with Compression
- Compress context before API dispatch
- Budget Governor
- Hard stops and soft alerts at router level
Hard Guardrails (No quality disruption)
- Never cache/compress high-stakes tasks (security, policy, legal-sensitive, irreversible actions)
- On low confidence from classifier/compressor: auto-escalate to full-context API path
- Cache invalidation on source changes (policy file/version hash change)
- Human review required for low-confidence high-impact outputs
Rollout Plan
Phase 0 — Instrumentation (Day 0-1)
Owner: Ops (Burt)
- Add per-call telemetry fields:
- task_id, route_tier, classifier_score, cache_hit, compression_ratio, input_tokens, output_tokens, cost_usd, latency_ms, quality_flag
- Capture baseline for 24h with no routing changes
Exit criteria:
- Baseline report generated
- Telemetry completeness > 98%
Phase 1 — Shadow Mode (Day 2-4)
Owner: Ops + QA
- Run router decisions in parallel but do not enforce
- Compare recommended route vs actual route
- Build confusion matrix for classifier
Exit criteria:
- False-low-complexity rate < 5%
- Proposed thresholds calibrated
Phase 2 — Controlled Enforcement (Day 5-9)
Owner: Ops
- Enforce Tier 0/Tier 1 for low-risk task classes only
- Keep Tier 2 fallback always available
- Enable budget soft alerts at 80%
Exit criteria:
- Quality parity >= 95% on random sample
- API token reduction >= 40% early signal
Phase 3 — Full Enforcement (Day 10-14)
Owner: Ops
- Extend enforcement to broader task classes
- Enable hard budget caps and queueing
- Weekly threshold tuning
Exit criteria:
- Meets target metrics in “Targets”
Fallback Matrix
- Local model timeout/error => Tier 2 API with compressed context
- Cache retrieval error => bypass cache and continue route
- Compression failure => send minimal safe raw context bundle
- Budget hard cap reached => queue non-critical tasks; escalate critical tasks with approval
Weekly Governance
- Review top 20 escalations and misses
- Tune classifier thresholds by +/−0.05 max per week
- Update cache TTL by task family
- Publish one-page efficiency report