MODEL_ROUTING_MATRIX.md

CivicOS Institute — AI Infrastructure Policy
Version: 1.2
Last updated: 2026-02-27

Tier Definitions

Tier	Label	Condition
T0	Local only	complexity < 0.35
T1	Cache return	similarity ≥ 0.88 and TTL valid
T2	API path	complexity > 0.65 OR high-stakes class
T2 Hard Override	Forced API	security, legal-sensitive, irreversible actions, governance-critical outputs

Note: Complexity scoring thresholds (0.35 / 0.65) require a complexity classifier.
Status: [TBD: complexity classifier] — treat as heuristic until implemented.

Router Rules (applied first)

T0: complexity < 0.35 → local only
T1: similarity ≥ 0.88 and TTL valid → cache return
T2: complexity > 0.65 or high-stakes class → API path
Hard override to T2: security, legal-sensitive, irreversible actions, governance-critical outputs

Experimental Local Default (Decision 2026-02-27)

qwen3.5-local (llama.cpp) is the default local inference model in experimental lane only.
Endpoint: http://localhost:18080/v1
Fallback when unavailable: qwen3:14b via Ollama (http://127.0.0.1:11434/v1)
Priority task classes in experimental lane:
- signal summarization
- CRM enrichment notes
- board-ready analysis

Model Routing Matrix

Task Type	Default Route	Primary Model	Fallback 1	Fallback 2	Quality Gate
Formatting, templating, extraction	Tier 0 (local)	qwen3:14b	qwen2.5:14b	Codex	schema/format validation
Code scaffolding, scripts, refactors (routine)	Tier 0 (local)	qwen2.5-coder:32b-instruct-q3_K_L	qwen3:14b	Codex	lint/tests pass
Policy lookup / repeated Q&A	Tier 1 (cache)	semantic cache hit	qwen3:14b regenerate	Codex	freshness + similarity threshold
Multi-file architecture decisions	Tier 2 (API + compression)	Codex (gpt-5.3-codex)	Kimi	Gemini	decision completeness checklist
Security/compliance review	Tier 2 (API, minimal compression)	Codex	GPT-4o	Gemini	high-stakes review required
Governance drafting (board/policy language)	Tier 2 (API + compression)	Codex	GPT-4o	Kimi	legal/policy rubric score
Fast summarization (internal)	Tier 0 (local)	mistral-small3.2:24b	qwen3:14b	Codex	factual spot-check
Cross-agent synthesis	Tier 2 (API + compression)	Codex	Kimi	GPT-4o	synthesis rubric + contradiction scan
Council seats (Magnus/Vera/etc.)	Local queue serialized	see Council Seat Map below	seat fallback per skill	Codex (seat-specific)	disagreement + risk surfaced
Time-sensitive low-risk ops reply	Tier 0 (local)	qwen3:14b	mistral-small3.2:24b	Codex	latency SLA + no critical risk

Budget / Quality Controls

Auto-escalate to higher tier on:
- low confidence
- failed validation
- stale cache
- high-impact task class
Never allow cache-only answer for high-stakes classes.
Session log field required: record tier and model actually used per task for empirical threshold tuning.

Council Seat Model Map

Companion to the routing matrix row for Council sessions.
Seat 7 (Burt) always receives full prior six outputs — no compression.

Seat	Persona	Primary Model	Fallback
1	Magnus	qwen2.5-coder:32b-instruct-q3_K_L	Codex
2	Vera	qwen3:14b (via local/qwen-14b route)	Codex
3	Dante	mistral-small3.2:24b-instruct-2506-q4_K_M	qwen3:14b
4	Eleanor	qwen3:14b (via local/qwen-14b route)	Codex
5	Ray	mistral-small3.2:24b-instruct-2506-q4_K_M	qwen3:14b
6	Mira	mistral-small3.2:24b-instruct-2506-q4_K_M	qwen3:14b
7	Burt (synthesis authority)	Orchestrator seat (no fixed model mandate)	N/A