WORKFLOW_V2_SPEC
WORKFLOW_V2_SPEC
Status: Active (Draft v1) Owner: Burt Prime Date: 2026-02-26
Purpose
Operational standard for CivicOS workflow architecture: reliability-first, queue-governed, and promotion-gated.
Scope
Applies to all automated workflows in this workspace, including cron jobs, queue tasks, model routing, and fallback behavior.
1) Lane Architecture (Mandatory)
Lane A: prod-critical
Use for donor, grants, comms, monitoring, and decision support that affects daily operations.
- Priority:
highorurgent - Must have fallback behavior
- Must emit structured logs
- Must have rollback path
Lane B: experimental
Use for pilots (e.g., llama.cpp/HF Qwen3.5), tests, and architecture experiments.
- Priority:
normalunless explicitly promoted - Must never block Lane A
- Must include experiment tag + success criteria
Rule: If congestion occurs, experimental tasks are deferred first.
2) Queue Contract (Mandatory)
All local model work must route through:
skills/ollama-agent-queue/scripts/integration_helper.py- or
skills/ollama-agent-queue/scripts/queue_manager.py
Required queue payload fields:
agent_idcalling_skillmodelprioritysystem_promptuser_promptmax_tokenscallback
Prohibited:
- Direct
ollama runfrom non-queue scripts - Direct
http://localhost:11434/api/generateoutside queue manager
3) Routing Standard
Default routing
- Local default generalist:
local/qwen-14b(currently mapped toqwen3:14b) - Local fallback:
local/mistral-small - Coding specialist:
local/qwen-coder-32b - Legacy route:
local/qwen2.5-14b
Cloud usage
Cloud model use is allowed only when:
- Task class requires premium lane quality/speed, or
- Local lane fails quality/reliability gate, or
- Director explicitly requests.
All cloud usage should be logged with task class and reason.
4) Reliability Controls
Each workflow must define:
- Idempotency strategy (how duplicate runs are handled)
- Retry policy (count + backoff)
- Timeout policy
- Fallback path
- Output location and latest pointer
- Knowledge write artifact using
templates/WORKFLOW_KB_WRITE_TEMPLATE.mdfor production-impacting changes
Minimum expected behavior:
- No silent failure
- No infinite retries
- No unbounded queue growth
- KB discipline checks pass (
scripts/workflow_kb_enforcer.py) for active prod workflows
5) Promotion Gate (Model/Workflow Changes)
Before promoting a new model/server to default:
Benchmark battery (required)
- 10 task set minimum across:
- short response
- medium synthesis
- long planning
- workflow-specific outputs (grants/comms/ops)
Pass criteria
- Reliability: no regression in completion rate
- Latency: >=20% improvement OR justified parity with quality gain
- Quality: >=15% judged improvement on approved rubric
- Cost/quota: within approved envelope
- Rollback: tested and documented
If any criterion fails -> remain experimental.
6) Observability & SLOs
Track daily:
- Queue depth and wait time (p50/p95)
- Workflow success/failure by job type
- Fallback rate (local->API/cloud)
- Model latency and tokens/sec
- Cron health and missed executions
Target SLOs:
- Prod workflow success >= 98%
- Prod queue p95 wait < 30s
- Urgent task start time < 10s when queue healthy
7) Security & Governance
- Treat auth boundary findings as hardening tasks, not loopholes.
- No secrets in prompts/logs.
- External messaging must not disclose exploit paths.
- Irreversible external actions still require explicit human approval.
8) Runbook: Pause / Resume / Rollback
Pause queue
python3 skills/ollama-agent-queue/scripts/queue_manager.py pause
Resume queue
python3 skills/ollama-agent-queue/scripts/queue_manager.py resume
Status
python3 skills/ollama-agent-queue/scripts/queue_manager.py status-block
Rollback routing
- Revert queue manager + model matrix commit
- Validate with smoke test (
local/qwen-14broute)
9) Immediate Implementation Plan
Phase 1 (48h)
- Enforce lane tagging in scripts
- Keep prod priorities high/urgent
- Verify queue pause/resume and alert flow
Phase 2 (7d)
- Add unified daily SLO report
- Add explicit fallback reason logging
- Harden top 5 workflows for idempotency/retry
Phase 3 (14–21d)
- Pilot non-Ollama local server in experimental lane only
- Run promotion benchmark battery
- Promote only if gates pass