When /agent-loop:run detects a previous run with all stories passed (or the feature branch deleted after merge), it archives the old artifacts and resets .loop/ automatically — no more manual rm -rf .loop. - Add archive_and_reset() for on-demand archiving from skills - Add runs.log index tracking all archived runs - Update /run and /stories skills to detect completed runs - setup.sh archives instead of hard-failing when prd.json exists - Bump version to 0.9.0
Agent Loop
Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.
Inspired by Geoffrey Huntley's Ralph pattern and Anthropic's harness design research.
A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each iteration: a Generator does the work, then an Evaluator verifies it. Human judgment stays in the planning phase; execution is autonomous with full visibility.
Install
/plugin install agent-loop@agent-loop
Then in any project:
/agent-loop:run
That's it. The single command handles setup, planning, and execution.
Prerequisites
- Claude Code CLI installed
tmuxavailable (used to run the loop in a detachable session)jqorpython3(for JSON state management)
How It Works
- Write a spec describing what you want to build (
SPEC.md,docs/specs/*.md, or similar). You can write it yourself, ask Claude to draft one, or use planning tools like/plan. - Run
/agent-loop:run— it scaffolds.loop/, generates stories from your spec, and presents them for review - Say "go" — the loop launches in tmux and runs autonomously
/agent-loop:run
├─ Phase 1: Scaffold .loop/ (if needed)
├─ Phase 2: Generate stories from spec (if needed)
│ └─ Presents stories for human review
│ └─ STOPS — user reviews and says "go"
└─ Phase 3: Launch loop in tmux
├─→ Generator → picks story → implements → commits
├─→ Evaluator → verifies → PASS or REJECT
├─→ next iteration (fresh CC session each time)
└─→ all stories pass → done
Modes
| Mode | What it does | Git writes? |
|---|---|---|
| implement | Build features from a spec | Yes |
| explore | Read-only codebase analysis | No |
| fix | Targeted bug fixes / tech debt | Yes |
Monitoring
After the loop launches in tmux:
# Watch live (from Claude Code)
! tmux attach -t agent-loop
# Detach back to Claude Code
Ctrl+B then D
# Stop the loop
Ctrl+C in the tmux session
Or ask Claude Code "status" — it reads .loop/prd.json and .loop/progress.md.
Each generator and evaluator run is a full Claude Code session saved to history. Use claude -r to resume any session and inspect what happened, debug a rejection, or continue from where it left off.
Architecture
Generator
Fresh Claude Code session each iteration. Follows a strict startup sequence: reads progress.md, finds the next story from prd.json, reads the sprint contract, checks for evaluator feedback, reviews git history, and runs a smoke test if available — all before writing any code. Then implements the story, runs quality gates, commits, and marks it done.
Evaluator
Separate fresh session after each generator pass. Skeptically verifies the work: checks each acceptance criterion against actual code with file paths and line numbers, runs tests, and issues a PASS or REJECT verdict. Rejection sends the story back with specific feedback.
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction and few-shot calibration examples.
Sprint Contracts
Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
State Persistence
| Artifact | Purpose |
|---|---|
prd.json |
Story status (pass/fail), acceptance criteria |
progress.md |
Append-only session log + codebase patterns |
contracts/ |
Sprint contracts per story |
config.json |
Harness configuration |
| Git commits | Code changes with story-tagged messages |
Runtime Verification
The evaluator doesn't just read diffs — it runs tests, builds the project, and checks for runtime errors using whatever tools the project already has (test runners, linters, build commands).
Design Principles
- Fresh context per iteration — no accumulated hallucination drift
- Separate generation from evaluation — external skepticism is easier to tune than self-criticism
- Human judgment for planning, AI for execution — human reviews stories, loop executes autonomously
- Structured handoffs via artifacts — not conversation memory
- No git revert on rejection — next generator sees partial work + feedback (more signal)
- Tool-agnostic — evaluator uses whatever tools are available, no hardcoded dependencies
Credits
- Geoffrey Huntley — original Ralph pattern
- Anthropic Engineering — generator-evaluator harness design