ollama-agent-queue — Eval Rubric

Score each 0-2 (max 10):

Serialization correctness
- 0: parallel execution observed
- 1: mostly serialized with race edge cases
- 2: strictly one-at-a-time
Queue integrity
- 0: state corruption occurs
- 1: occasional malformed/partial state
- 2: consistent valid queue.json updates
Failure isolation
- 0: one failure blocks queue
- 1: queue sometimes halts after failure
- 2: failure logged + queue continues
Integration clarity
- 0: other skills unclear how to call
- 1: partial contract docs
- 2: contract + commands + callback schema clear
Ops diagnostics
- 0: no usable status/control
- 1: partial controls
- 2: status/pause/resume/clear all function

Passing threshold: 8/10.