An operating framework for AI coding agents, refined through ~12 months of private enterprise-sales workflow and ~6 weeks of public iteration.
Copy AGENT_FRAMEWORK.md to your project root as CLAUDE.md. Fill in Section 0 with your project specifics. Start a session.
cp AGENT_FRAMEWORK.md /path/to/your/project/CLAUDE.mdSee guides/getting-started.md for the full adoption path.
You've set up CLAUDE.md. You've built a few skills. You're using Projects Memory. But outputs are still inconsistent, the agent ignores rules under pressure, and you're manually reviewing everything.
This framework is the next step. It adds rules with documented enforcement contracts (some advisory by design), circuit breakers (stop after 3 failures), and an escalation model (advice → law → barriers) that makes your CLAUDE.md actually stick. See the rule-to-hook coverage matrix for what is system-enforced versus advisory in v1.5 — five of six rules ship with hooks; one (no-local-infrastructure) is a decision framework that is advisory by design.
If you're just getting started with Claude Code, read the beginner guides first. If you've hit the wall where your CLAUDE.md "stops working," start here.
| I am… | Start here |
|---|---|
| New to Claude Code | guides/getting-started.md |
| CLAUDE.md "stopped working" | guides/from-beginner-to-framework.md |
| Want copy-paste rules | examples/claude-code-rules/ |
| Want hooks | examples/hooks/ |
| Want the full framework file | AGENT_FRAMEWORK.md |
| Curious what failures produced this | INCIDENTS.md |
A behavioral operating system for Claude Code that combines:
- Project identity — role, output contracts, quality criteria, session lifecycle
- Evidence-first culture — four gates: read before touching, first-time check, evidence card, no guessing
- Circuit breakers — three-failure stop, scope discipline, delivery protocol
- Quality gates — verification before done, post-delivery checklist, HTML token hygiene
- Enforcement architecture — memory (advice) → rules (law) → hooks (barriers)
- Self-improvement — capture lessons, escalate failures, consolidate when rules accumulate
Claude Code note: The hook implementations in examples/hooks/ use Claude Code's PreToolUse/PostToolUse lifecycle. Rule prose is portable to any agent platform.
Every rule exists because its absence caused a specific, documented failure. See INCIDENTS.md for the log.
- AGENT_FRAMEWORK.md — The complete framework (v1.5). Use as your project's CLAUDE.md.
- From Beginner to Framework — You've built CLAUDE.md and skills but outputs are inconsistent. Here's why and what to do next.
- Getting Started — How to adopt the framework, where files go, recommended path
- Enforcement Architecture — Memory → Rules → Hooks escalation model with examples
- Rule Consolidation — When rules accumulate past ~20, how to cluster by root cause and compress without losing lessons
- Why Post-Failure Frameworks Win — Why rules born from incidents beat rules born from best-practice lists
- Auto-Optimizing Skills — Eval-driven skill improvement: triage, iteration loop, 7 guardrails, data model
Individual rule files for ~/.claude/rules/ or .claude/rules/. Each absorbs multiple earlier rules into a single file with sub-gates:
| Rule | What It Prevents |
|---|---|
| read-before-acting.md | Guessing instead of reading — 5 gates + three-failure stop |
| scope-discipline.md | Over-engineering, unapproved dependencies, building what already exists, remediating dormant code |
| session-lifecycle.md | Cold starts, plan-mode violations, sessions that end without auditing delivery |
| delivery-protocol.md | Scattered deliverables, skipped checklists, token-wasteful HTML iterations |
| no-local-infrastructure.md | Persistent agents on the user's laptop instead of cloud-hosted solutions |
| secure-configuration.md | Config file overwrites, secrets in chat, wrong credentials on wrong system |
Shell scripts that enforce rules at the tool-call level — the third tier of the enforcement ladder. Copy to your hooks directory and configure in settings.json:
| Hook | Type | What It Enforces |
|---|---|---|
| read-gate.sh | PreToolUse hard block | Blocks writes unless the target resource was read first |
| search-gate.sh | PreToolUse hard block | Blocks code creation unless a search was done first |
| secure-config-gate.sh | PreToolUse hard block | Blocks secret patterns in any tool call + Write to protected config paths |
| dormant-code-gate.sh | CI lint hard block | Rejects PRs that modify files whose every extracted symbol has zero callers elsewhere (scope-discipline Gate 5) |
| delivery-gate.sh | PreToolUse advisory | Reminds agent to log deliverables (fail-open) |
| focus-breadcrumb.sh | UserPromptSubmit | Writes a session breadcrumb when an explicit task is detected (companion to focus-confirmation-gate) |
| focus-confirmation-gate.sh | PreToolUse advisory | Warns when first Edit/Write/Bash fires with no focus breadcrumb (session-lifecycle Phase 1) |
| deprecated-field-gate.sh | PreToolUse hard block | Template for blocking writes that reference deprecated DB columns or API fields |
| empty-rule-body-gate.sh | CI meta-hook hard block | Pre-merge gate rejecting rule files < 200 bytes or missing ## Why (closes the empty-stub loophole) |
See §5.3 Rule-to-Hook Coverage for which rule each hook backs and the honest enforced-vs-advisory accounting.
See examples/hooks/README.md for setup instructions and the breadcrumb pattern.
- INCIDENTS.md — 33 sanitized incidents linking real failures to the rules they produced. Month-precision dates.
Most agent failures come from the same root cause: acting without reading. The agent guesses a column name instead of checking the schema. It deploys with assumed config instead of reading the setup guide. It tries a fourth variation of a broken approach instead of stopping to research.
Prompt instructions are not enforcement. They are guidance. The only durable approach is an escalation ladder:
Memory (advice) → Rules (law) → Hooks (barriers)
Prose tells the model what it should do. Gates determine what it is allowed to do.
This framework's core principle: one read is worth ten guesses.
This framework grades itself. An eval harness in examples/evals/ walks session handoffs from the author's own workflow and scores each one on four deterministic metrics — rule adherence, plan-delivery gap, cost, and dispatch quality — then publishes the trend to a live dashboard:
Dashboard: aof-eval.vercel.app
The harness is the framework's own credibility test: if the rules work, the scores hold. If a regression slips into v1.6, the trend line moves before anyone writes a postmortem. Numbers come from 47 real sessions backfilled at v1.5 ship (mean composite 9.20/10). Re-run manually with python -m examples.evals.run_harness whenever a new batch of sessions lands.
Built and maintained by Michael Busacca — 13+ years enterprise SaaS, running AI-assisted workflows in a high-volume sales context.
See CHANGELOG.md for version history.