filedocs/MODEL-POLICY.md
# OpenClaw Model Policy — David / Kompis

Created: 2026-05-01
Owner: David Westman
Purpose: keep default work on the new Codex account while keeping high reliability for sensitive, complex, or high-impact work.

---

## 1. Policy summary

Use the cheapest Codex-account model that is safe for the task. Default to Codex 5.3 and escalate to Codex 5.5 when the task genuinely needs it.

- **Default everyday work:** `GPT Codex 5.3` / `openai-codex/gpt-5.3-codex`
- **Escalation / highest confidence:** `GPT Codex` / `openai-codex/gpt-5.5`
- **Legacy OpenAI account models:** fallback only, not default
- **Cost-sensitive background work:** lightweight OpenRouter/mini/flash-class model where available, but keep defaults on Codex
- **Escalation rule:** if the Codex model is uncertain, blocked, or about to make a high-impact change, escalate before acting.

ChatGPT Plus/Pro-style subscriptions should not be assumed to reduce OpenClaw API cost. OpenClaw/API usage should be treated as separate pay-as-you-go spend.

---

## 2. Model tiers

### Tier A — Premium / highest confidence / escalation

**Preferred alias:** `GPT Codex`  
**Current mapped model:** `openai-codex/gpt-5.5`

Use for:

- coding tasks that change production or deployment behaviour
- security, authentication, secrets, permissions, or infrastructure decisions
- financial/accounting/tax reasoning
- high-risk Outlook/email actions where misclassification would matter
- writing durable rules for MEMORY.md, AGENTS.md, SOUL.md, USER.md, or policy files
- debugging failures after cheaper models have failed
- user-facing final review of important work

Avoid for:

- routine summaries
- simple classification
- recurring cron reports
- large exploratory scans where output can be compressed first

---

### Tier B — Standard / everyday assistant / default

**Preferred alias:** `GPT Codex 5.3`  
**Current mapped model:** `openai-codex/gpt-5.3-codex`

Use for:

- normal chat / vanliga frågor
- planning
- summaries
- document drafting
- first-pass research
- workspace updates
- non-sensitive file inspection
- explaining results to David
- routine project bookkeeping

Escalate to Tier A when:

- the answer affects money, security, legal/tax handling, credentials, deployment, or external communication
- the model expresses uncertainty about a decision that would cause real-world action
- the task requires careful multi-step code changes

---

### Tier C — Background / low-cost automation

**Model:** cheapest reliable mini/flash/haiku-class model available through OpenRouter or configured provider. Exact model should be verified before applying because availability changes.

Use for:

- cron heartbeat-style summaries
- daily morning summary drafts
- scanning report folders
- detecting whether anything changed
- extracting structured fields from emails/reports
- initial Outlook classification suggestions
- creating short status digests

Rules:

- Use `lightContext: true` where possible.
- Keep prompts short and task-specific.
- Prefer JSON/structured output.
- Do not let Tier C perform irreversible or ambiguous actions by itself.
- If it finds uncertainty, it should report candidates/questions rather than decide.

---

## 3. Task routing matrix

| Task type | Default tier | Escalate when |
|---|---:|---|
| Normal Telegram conversation | B | user asks for implementation, legal/tax/security/finance, or complex coding |
| Simple reminders | C | unusual context or important family/business consequence |
| Daily morning summary | C | it includes decisions, ambiguous email interpretation, or external action |
| Weekly WORKSPACE report | B or C | policy/memory changes are proposed |
| Outlook steady-state maintenance | B | uncertain thread matching, new sender category, money/legal/family-sensitive content |
| Outlook final archive moves | B | category/archive destination unclear |
| Outlook deletion/retention/gallring | A | always for moves toward Deleted Items or deletion-adjacent work |
| Coding: read-only diagnosis | B | production/secrets/deploy/security involved |
| Coding: edit/test local app | A for non-trivial; B for tiny safe edits | failing tests, auth, payments, deploy, database/data migration |
| Cloudflare/GitHub/deployment | A | always if changing external state |
| Memory maintenance | B | modifying long-term rules or sensitive personal/business memory |
| Finance/accounting/tax/BAS/FAR | A | generally always A for final judgement |
| Web research | B | final recommendation affects money/legal/security |
| Image/music/video generation | lowest suitable model/tool | private/sensitive content or paid large generation |

---

## 4. Cron policy

### Current cron jobs to adjust

1. **Daglig morgonsammanfattning – privat/familj**
   - Recommended tier: C
   - Timeout: 120s is fine, but last run timed out.
   - Changes recommended:
     - shorter prompt
     - `lightContext: true`
     - cheap model override when available
     - skip deep mailbox/calendar scans unless explicitly needed

2. **Outlook inkorg/arkiv — regelstyrt underhåll**
   - Recommended tier: B for normal deterministic rules
   - Escalate/report instead of acting on uncertain cases
   - Use Tier A only for rule changes, deletion-adjacent retention, or complex ambiguity.
   - Keep timeout high enough for Graph operations, but avoid feeding huge email bodies into the model.

3. **Veckorapport — WORKSPACE veckoversion**
   - Recommended tier: B or C
   - Use structured local file reads and short summary generation.
   - Escalate to Tier A only if it proposes policy/memory changes.

4. **Namnsdagspåminnelser**
   - Recommended tier: C or no model if systemEvent-style reminder is enough.
   - These should be extremely cheap.

---

## 5. Prompt/context cost controls

Default rules:

- Do not include full transcripts or large files unless needed.
- Prefer reading exact files/lines over broad context dumps.
- For recurring jobs, pass only the job-specific prompt and needed paths.
- Use summaries and reports instead of raw email bodies where possible.
- For Outlook jobs, classify from headers/snippets first; fetch full body only when needed.
- Batch similar work into one model call rather than many tiny calls.
- Use deterministic scripts for mechanical work; use models only for judgment.

---

## 6. Escalation protocol

A cheap model may produce a recommendation, but GPT/Tier A should review before:

- sending external messages
- deleting or moving toward deletion
- changing credentials/secrets/auth
- changing Cloudflare/GitHub/deployment config
- modifying long-term policy/memory files
- making legal/tax/accounting conclusions
- applying broad Outlook rules to many messages

If escalation is not available or would be too expensive, ask David instead.

---

## 7. Monthly spend guardrails

Suggested starting budget:

- OpenRouter/general automation: **USD 25–50/month**
- OpenAI GPT premium use: reserve for important work; review if spend exceeds **USD 25/month**
- If total model spend exceeds **USD 75/month**, audit cron prompts and model routing before increasing limits.

Preferred monthly review:

- top expensive jobs
- failed/time-out jobs
- number of GPT escalations
- whether any cron job can be converted to deterministic script + short summary

---

## 8. Implementation checklist

Before applying this policy operationally:

1. Verify currently allowed model IDs/aliases in OpenClaw.
2. Keep the default model on the Codex account standard model (`openai-codex/gpt-5.3-codex`) unless David asks otherwise.
3. Add explicit cheaper model override to low-risk cron jobs.
4. Keep legacy OpenAI-account models as manual fallback only.
5. Update cron prompts to use shorter context and `lightContext: true` where safe.
6. Create a monthly cost-review reminder/report if David wants it.

---

## 9. Recommended immediate changes

1. Change daily morning summary cron to cheap/light model and shorter prompt.
2. Change weekly WORKSPACE cron to cheap or standard model with `lightContext: true`.
3. Keep Outlook maintenance on standard tier, not GPT, with strict escalation-on-uncertainty.
4. Leave direct chat on the Codex account by default using `openai-codex/gpt-5.3-codex`; escalate to `openai-codex/gpt-5.5` when task risk/complexity warrants it; use legacy OpenAI-account models only on explicit request or fallback.