Token Saver — K2.5 Primary Mode

Primary Model: Kimi K2.5 (moonshot/kimi-k2.5) — used for ALL tasks by default.

Fallback Strategy:

Kimi K2.5 — Always try first (256K context, strong at everything)
Other APIs — Only if K2.5 can’t handle (context >256K, quota exhausted)
Local models — Last resort when all APIs unavailable

Philosophy

K2.5 First: This is the designated primary model. Use it unless there’s a specific reason not to. Minimal Fallback: Only switch models when K2.5 genuinely can’t handle the task. Local Last: Free local models are contingency only, not primary strategy.

Quick Start

# Select model (will return K2.5 unless context/quotas exceeded)
token-saver select "Summarize this email"
token-saver select "Debug this code" --tokens 5000

# Long context task (may trigger fallback)
token-saver select "Analyze this 500-page document" --tokens 300000

How It Works

Selection Priority

SCORING:
- kimi-k2.5: +5000 (dominant priority)
- other APIs: +100 (only if K2.5 fails)
- local models: +50 (last resort)

When Fallback Occurs

K2.5 is NOT used only when:

Context exceeded: Task needs >256K tokens
Quota exhausted: Daily/monthly token limit hit
Provider down: Moonshot API unavailable
Specific gap: Task requires capability K2.5 lacks (rare)

Fallback Order

If K2.5 can’t handle:

Claude 3.5 Sonnet — Best all-rounder, 200K context
GPT-4o — Fast, good at code/vision
Gemini 1.5 Pro — 2M context for ultra-long docs
Local models — Only if all APIs fail

Commands

Select Model

token-saver select "<task>" [--tokens <n>] [--background]

Flags:

--tokens <n>: Estimated token count for context window checking
--background: Allow slow models (Qwen 2.5 Coder 32B) for background tasks

K2.5 selected (normal case):

🎯 Selected: kimi-k2.5
   Provider: moonshot
   Why: Primary model (K2.5); Strong at code; Can handle MODERATE complexity

Fallback triggered (rare):

🎯 Selected: gemini-1.5-pro
   Provider: google
   Why: Fallback - Gemini 1.5 Pro; K2.5 context limit exceeded (256,000 tokens)

Check Available Models

token-saver models

Shows which models are available and why K2.5 would/wouldn’t be selected.

List Available Models

token-saver models

Model Registry

Primary

| Model | Context | Strengths | Fallback Trigger | |——-|———|———–|——————| | kimi-k2.5 | 256K | Everything | Context >256K, quota exhausted |

Fallback APIs (use only if K2.5 fails)

Emergency Fallback (local)

⚠️ Qwen 2.5 Coder 32B: This 32B parameter model runs very slow on MacBook. Only available during off-peak hours (10 PM - 8 AM) when more system resources are available. Use --background flag to allow selection during these hours.

Configuration

Stored in ~/.openclaw/token-saver/config.json:

{
  "primary_model": "kimi-k2.5",
  "fallback_order": ["claude-3-sonnet", "gpt-4o", "gemini-1.5-pro"],
  "local_fallback": true
}

Integration

When using with token-tracker:

# 1. Select (returns K2.5 99% of time)
MODEL=$(token-saver select "$TASK" --json | jq -r .selected_model)
# → "kimi-k2.5"

# 2. Execute...

# 3. Log usage
token-tracker log moonshot kimi-k2.5 2000 800

Why K2.5 as Primary?

256K context: Handles most documents
Fast: Good throughput
Capable: Strong at code, analysis, chat, research
Cost-effective: $0.002/1K input, $0.008/1K output
Reliable: 91% reliability score

When NOT to Use This Mode

Use the original local-first token-saver if:

Budget is extremely constrained (local = $0)
Running on battery/low power (local = no network)
Privacy-critical (data can’t leave machine)

Comparison: K2.5 Primary vs Local First

Scenario	K2.5 Primary	Local First
Simple Q&A	K2.5	Qwen (local, free)
Code review	K2.5	Qwen/Mistral (local, free)
200K doc analysis	K2.5	GPT-4o (API, needed for context)
Cost/month	~$10-30	~$0-5
Consistency	High (always K2.5)	Variable (depends on task)