Overnight Local Model Benchmark Report
Overnight Local Model Benchmark Report
Generated: 2026-02-28T08:31:15
Summary by Model
local/qwen-14b
- Toolcall exec pass rate: 1.0
- Toolcall avg seconds: 3.3
- Skill routing accuracy: 1.0
- Skill routing avg seconds: 3.29
local/qwen2.5-14b
- Toolcall exec pass rate: 1.0
- Toolcall avg seconds: 7.67
- Skill routing accuracy: 1.0
- Skill routing avg seconds: 3.27
local/mistral-small
- Toolcall exec pass rate: 0.0
- Toolcall avg seconds: 3.27
- Skill routing accuracy: 0.0
- Skill routing avg seconds: 3.29
local/qwen-coder-32b
- Toolcall exec pass rate: 0.0
- Toolcall avg seconds: 3.25
- Skill routing accuracy: 0.0
- Skill routing avg seconds: 3.3
qwen3.5:cloud
- Toolcall exec pass rate: 1.0
- Toolcall avg seconds: 3.9
- Skill routing accuracy: 1.0
- Skill routing avg seconds: 3.3
Best model by skill (empirical)
- apple-notes: qwen3.5:cloud (acc=1.0, avg_sec=3.3)
- apple-reminders: local/qwen2.5-14b (acc=1.0, avg_sec=3.23)
- browser-automation: local/qwen2.5-14b (acc=1.0, avg_sec=3.28)
- github: local/qwen-14b (acc=1.0, avg_sec=3.31)
- openai-whisper: local/qwen-14b (acc=1.0, avg_sec=3.3)
- video-frames: local/qwen2.5-14b (acc=1.0, avg_sec=3.3)
- weather: local/qwen-14b (acc=1.0, avg_sec=3.23)
- youtube-summarizer: local/qwen-14b (acc=1.0, avg_sec=3.3)