OpenClaw brain
MODEL-POLICY.md
docs/MODEL-POLICY.md
# OpenClaw Model Policy — David / Kompis
Created: 2026-05-01
Owner: David Westman
Purpose: keep default work on the new Codex account while keeping high reliability for sensitive, complex, or high-impact work.
---
## 1. Policy summary
Use the cheapest Codex-account model that is safe for the task. Default to Codex 5.3 and escalate to Codex 5.5 when the task genuinely needs it.
- **Default everyday work:** `GPT Codex 5.3` / `openai-codex/gpt-5.3-codex`
- **Escalation / highest confidence:** `GPT Codex` / `openai-codex/gpt-5.5`
- **Legacy OpenAI account models:** fallback only, not default
- **Cost-sensitive background work:** lightweight OpenRouter/mini/flash-class model where available, but keep defaults on Codex
- **Escalation rule:** if the Codex model is uncertain, blocked, or about to make a high-impact change, escalate before acting.
ChatGPT Plus/Pro-style subscriptions should not be assumed to reduce OpenClaw API cost. OpenClaw/API usage should be treated as separate pay-as-you-go spend.
---
## 2. Model tiers
### Tier A — Premium / highest confidence / escalation
**Preferred alias:** `GPT Codex`
**Current mapped model:** `openai-codex/gpt-5.5`
Use for:
- coding tasks that change production or deployment behaviour
- security, authentication, secrets, permissions, or infrastructure decisions
- financial/accounting/tax reasoning
- high-risk Outlook/email actions where misclassification would matter
- writing durable rules for MEMORY.md, AGENTS.md, SOUL.md, USER.md, or policy files
- debugging failures after cheaper models have failed
- user-facing final review of important work
Avoid for:
- routine summaries
- simple classification
- recurring cron reports
- large exploratory scans where output can be compressed first
---
### Tier B — Standard / everyday assistant / default
**Preferred alias:** `GPT Codex 5.3`
**Current mapped model:** `openai-codex/gpt-5.3-codex`
Use for:
- normal chat / vanliga frågor
- planning
- summaries
- document drafting
- first-pass research
- workspace updates
- non-sensitive file inspection
- explaining results to David
- routine project bookkeeping
Escalate to Tier A when:
- the answer affects money, security, legal/tax handling, credentials, deployment, or external communication
- the model expresses uncertainty about a decision that would cause real-world action
- the task requires careful multi-step code changes
---
### Tier C — Background / low-cost automation
**Model:** cheapest reliable mini/flash/haiku-class model available through OpenRouter or configured provider. Exact model should be verified before applying because availability changes.
Use for:
- cron heartbeat-style summaries
- daily morning summary drafts
- scanning report folders
- detecting whether anything changed
- extracting structured fields from emails/reports
- initial Outlook classification suggestions
- creating short status digests
Rules:
- Use `lightContext: true` where possible.
- Keep prompts short and task-specific.
- Prefer JSON/structured output.
- Do not let Tier C perform irreversible or ambiguous actions by itself.
- If it finds uncertainty, it should report candidates/questions rather than decide.
---
## 3. Task routing matrix
| Task type | Default tier | Escalate when |
|---|---:|---|
| Normal Telegram conversation | B | user asks for implementation, legal/tax/security/finance, or complex coding |
| Simple reminders | C | unusual context or important family/business consequence |
| Daily morning summary | C | it includes decisions, ambiguous email interpretation, or external action |
| Weekly WORKSPACE report | B or C | policy/memory changes are proposed |
| Outlook steady-state maintenance | B | uncertain thread matching, new sender category, money/legal/family-sensitive content |
| Outlook final archive moves | B | category/archive destination unclear |
| Outlook deletion/retention/gallring | A | always for moves toward Deleted Items or deletion-adjacent work |
| Coding: read-only diagnosis | B | production/secrets/deploy/security involved |
| Coding: edit/test local app | A for non-trivial; B for tiny safe edits | failing tests, auth, payments, deploy, database/data migration |
| Cloudflare/GitHub/deployment | A | always if changing external state |
| Memory maintenance | B | modifying long-term rules or sensitive personal/business memory |
| Finance/accounting/tax/BAS/FAR | A | generally always A for final judgement |
| Web research | B | final recommendation affects money/legal/security |
| Image/music/video generation | lowest suitable model/tool | private/sensitive content or paid large generation |
---
## 4. Cron policy
### Current cron jobs to adjust
1. **Daglig morgonsammanfattning – privat/familj**
- Recommended tier: C
- Timeout: 120s is fine, but last run timed out.
- Changes recommended:
- shorter prompt
- `lightContext: true`
- cheap model override when available
- skip deep mailbox/calendar scans unless explicitly needed
2. **Outlook inkorg/arkiv — regelstyrt underhåll**
- Recommended tier: B for normal deterministic rules
- Escalate/report instead of acting on uncertain cases
- Use Tier A only for rule changes, deletion-adjacent retention, or complex ambiguity.
- Keep timeout high enough for Graph operations, but avoid feeding huge email bodies into the model.
3. **Veckorapport — WORKSPACE veckoversion**
- Recommended tier: B or C
- Use structured local file reads and short summary generation.
- Escalate to Tier A only if it proposes policy/memory changes.
4. **Namnsdagspåminnelser**
- Recommended tier: C or no model if systemEvent-style reminder is enough.
- These should be extremely cheap.
---
## 5. Prompt/context cost controls
Default rules:
- Do not include full transcripts or large files unless needed.
- Prefer reading exact files/lines over broad context dumps.
- For recurring jobs, pass only the job-specific prompt and needed paths.
- Use summaries and reports instead of raw email bodies where possible.
- For Outlook jobs, classify from headers/snippets first; fetch full body only when needed.
- Batch similar work into one model call rather than many tiny calls.
- Use deterministic scripts for mechanical work; use models only for judgment.
---
## 6. Escalation protocol
A cheap model may produce a recommendation, but GPT/Tier A should review before:
- sending external messages
- deleting or moving toward deletion
- changing credentials/secrets/auth
- changing Cloudflare/GitHub/deployment config
- modifying long-term policy/memory files
- making legal/tax/accounting conclusions
- applying broad Outlook rules to many messages
If escalation is not available or would be too expensive, ask David instead.
---
## 7. Monthly spend guardrails
Suggested starting budget:
- OpenRouter/general automation: **USD 25–50/month**
- OpenAI GPT premium use: reserve for important work; review if spend exceeds **USD 25/month**
- If total model spend exceeds **USD 75/month**, audit cron prompts and model routing before increasing limits.
Preferred monthly review:
- top expensive jobs
- failed/time-out jobs
- number of GPT escalations
- whether any cron job can be converted to deterministic script + short summary
---
## 8. Implementation checklist
Before applying this policy operationally:
1. Verify currently allowed model IDs/aliases in OpenClaw.
2. Keep the default model on the Codex account standard model (`openai-codex/gpt-5.3-codex`) unless David asks otherwise.
3. Add explicit cheaper model override to low-risk cron jobs.
4. Keep legacy OpenAI-account models as manual fallback only.
5. Update cron prompts to use shorter context and `lightContext: true` where safe.
6. Create a monthly cost-review reminder/report if David wants it.
---
## 9. Recommended immediate changes
1. Change daily morning summary cron to cheap/light model and shorter prompt.
2. Change weekly WORKSPACE cron to cheap or standard model with `lightContext: true`.
3. Keep Outlook maintenance on standard tier, not GPT, with strict escalation-on-uncertainty.
4. Leave direct chat on the Codex account by default using `openai-codex/gpt-5.3-codex`; escalate to `openai-codex/gpt-5.5` when task risk/complexity warrants it; use legacy OpenAI-account models only on explicit request or fallback.