Model Selection Matrix (Local-First Practical Policy)
Model Selection Matrix (Local-First Practical Policy)
Status: Active Owner: Burt Prime Updated: 2026-02-26
Policy Goal
Use local models as much as practical while preserving execution quality and speed for high-impact tasks.
API Gate (Active)
Budget-conservation mode is active.
- API usage requires explicit approval token:
APPROVE API [task-id] - Non-critical tasks are local-only.
- Emergency reserve policy: 8% emergency + 8% planned critical ops.
- See:
generated/API_GATE_POLICY_ACTIVE.md
Current Constraints
- Kimi tokens are constrained this week → do not use by default.
- Use API models only when local models are likely to fail task quality/speed requirements.
Routing Rules
Tier 1 — Local First (default)
Use local by default for:
- Drafting, summarization, brainstorming
- Routine analysis
- Low-risk formatting/transforms
- Background tasks
Preferred:
- Qwen3-14B (
local/qwen-14balias →qwen3:14b) — default local generalist - Mistral (
local/mistral-small) — local fallback / second-opinion style - Qwen-Coder (
local/qwen-coder-32b) — coding-heavy local, off-peak preferred - Qwen2.5-14B (
local/qwen2.5-14b) — legacy fallback only
Tier 2 — API Escalation (when needed)
Escalate from local when task is high-stakes or local quality is insufficient:
- Critical code changes, debugging blockers
- External-facing final copy with high consequences
- Complex reasoning requiring stronger model consistency
Preferred:
- Codex (
Codex) — technical execution + debugging - GPT-4o (
GPT-4o) — polished writing/comms - Gemini (
Gemini) — broad research/synthesis
Tier 3 — Constrained/Explicit-Only
- Kimi (
Kimi) — use only if explicitly requested until recharge
Safety Overrides
Always require explicit human confirmation for:
- Financial commitments
- Irreversible external operations
- Public statements representing organization policy
Sub-Agent Model Defaults
artifact architect,coding,automation,debug,infra→ Codexresearch,scan,summarize→ Qwen3-14B first, escalate to Gemini if weakemail,inbox,simple tasks→ Qwen3-14B (local) by defaultdraft,announcement→ Qwen3-14B first, escalate to GPT-4o for final polishbulk/background→ Qwen3-14B / Mistral
Escalation Trigger Checklist
Escalate from local if any true:
- Output quality fails once after revision prompt
- Task requires deep repo-wide bug root cause under time pressure
- User requests final production-grade polish immediately
- Context pressure exceeds practical local performance
Operator Note
When spawning sub-agents, pass model explicitly to enforce this matrix.
Empirical Local Profiles (2026-02-26)
Benchmarked local Ollama models for speed + quality taste checks:
qwen3:14b(vialocal/qwen-14b): new default; ~2x faster average vs qwen2.5 in current 3-case benchmark setqwen2.5:14b(vialocal/qwen2.5-14b): retained as legacy fallback routemistral-small3.2:24b-instruct-2506-q4_K_M: style-diverse fallback / second-opinion laneqwen2.5-coder:32b-instruct-q3_K_L: coding specialist, slower but useful for hard code tasks
Artifacts:
/tmp/qwen_benchmark_results_2026-02-26.jsongenerated/subagent_assignment_matrix_2026-02-23.md(superseded guidance in progress)
Smart spawn helper
Use:
~/.openclaw/workspace/scripts/spawn_smart.sh "<task description>" [low|normal|high] [run|session]
Examples:
~/.openclaw/workspace/scripts/spawn_smart.sh "debug stripe api sync" high run
~/.openclaw/workspace/scripts/spawn_smart.sh "summarize inbox for today" normal session