Weekly Workflow SLO Digest — ending 2026-02-26

Weekly Summary

  • Days with jobs: 1
  • Jobs total: 24
  • Avg success rate: 95.83%
  • Avg p95 wait: 232.44s

Recurring Failure Sources

  • error: 1

Top Model Pressure (by day presence)

  • local/qwen-14b: top-load on 1 day(s)

Recommendations

  • Raise queue reliability: inspect non-complete statuses and tighten retry/timeout for failing workflows.
  • Reduce queue congestion: reserve high/urgent for prod-critical and defer experimental workloads.
  • Add targeted runbooks for timeout/error-heavy workflows and verify fallback reason codes.

Daily Breakdown

  • 2026-02-20: jobs=0, success=0.0%, p95=0.0s
  • 2026-02-21: jobs=0, success=0.0%, p95=0.0s
  • 2026-02-22: jobs=0, success=0.0%, p95=0.0s
  • 2026-02-23: jobs=0, success=0.0%, p95=0.0s
  • 2026-02-24: jobs=0, success=0.0%, p95=0.0s
  • 2026-02-25: jobs=0, success=0.0%, p95=0.0s
  • 2026-02-26: jobs=24, success=95.83%, p95=232.44s