Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
167 lines
7.1 KiB
Markdown
167 lines
7.1 KiB
Markdown
# Agent Loop
|
|
|
|
Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.
|
|
|
|
Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps).
|
|
|
|
A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous.
|
|
|
|
Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention).
|
|
|
|
## Install
|
|
|
|
### As a Claude Code Plugin (Recommended)
|
|
|
|
```
|
|
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
|
|
/plugin install agent-loop@agent-loop
|
|
```
|
|
|
|
Then in any project:
|
|
|
|
```
|
|
/agent-loop:init # Set up the loop for your project
|
|
/agent-loop:plan # Generate PRD and sprint contracts
|
|
/agent-loop:run # Run the loop interactively
|
|
```
|
|
|
|
### Manual Install
|
|
|
|
```bash
|
|
# Clone into your project
|
|
cp -r /path/to/loop-loop .loop
|
|
|
|
# Install skills as Claude Code commands
|
|
mkdir -p .claude/commands
|
|
for skill in loop-init loop-plan loop-run loop-triage; do
|
|
ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md"
|
|
done
|
|
|
|
# Then in Claude Code:
|
|
/loop-init && /loop-plan && /loop-run
|
|
```
|
|
|
|
## How It Works
|
|
|
|
```
|
|
[You + Claude Code] [Loop Execution]
|
|
|
|
/agent-loop:init Interactive (/agent-loop:run)
|
|
→ scaffolds .loop/ └─ dispatches Agent subagents
|
|
→ detects project └─ visible tool calls, can intervene
|
|
→ picks mode └─ chat mid-loop to adjust course
|
|
→ creates config.json
|
|
Headless (.loop/loop.sh)
|
|
/agent-loop:plan └─ spawns claude --print per iteration
|
|
→ asks clarifying questions └─ fully autonomous, no UI
|
|
→ generates prd.json
|
|
→ generates sprint contracts Both paths:
|
|
→ populates progress.md ├─→ Generator → picks story → implements → commits
|
|
├─→ Evaluator → verifies → PASS or REJECT
|
|
├─→ next iteration...
|
|
└─→ all stories pass → done
|
|
```
|
|
|
|
## Modes
|
|
|
|
| Mode | What it does | Git writes? |
|
|
|------|-------------|-------------|
|
|
| **implement** | Build features from a PRD | Yes |
|
|
| **explore** | Read-only codebase analysis | No |
|
|
| **fix** | Targeted bug fixes / tech debt | Yes |
|
|
|
|
## Running the Loop
|
|
|
|
### Option A: Interactive (`/loop-run`) — Recommended
|
|
|
|
Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop.
|
|
|
|
```
|
|
/loop-run # Run until done or max iterations
|
|
/loop-run 3 # Run at most 3 iterations
|
|
/loop-run --skip-eval # Skip evaluator pass
|
|
/loop-run --story US-003 # Run only a specific story
|
|
```
|
|
|
|
### Option B: Headless (`loop.sh`)
|
|
|
|
Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI.
|
|
|
|
```bash
|
|
.loop/loop.sh [options]
|
|
|
|
--mode <implement|explore|fix> Operating mode
|
|
--max <N> Maximum iterations (default: 20)
|
|
--skip-eval Skip evaluator pass
|
|
--tool <claude|amp> AI tool to use
|
|
--no-hooks Don't install stop hooks
|
|
--dry-run Print assembled prompts without running agents
|
|
--resume Skip already-passed stories (explicit exit when none remain)
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Generator
|
|
Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.
|
|
|
|
### Evaluator
|
|
Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback.
|
|
|
|
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.
|
|
|
|
### Sprint Contracts
|
|
Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
|
|
|
|
### State Persistence
|
|
|
|
| Artifact | Purpose |
|
|
|----------|---------|
|
|
| `prd.json` | Story status (pass/fail), acceptance criteria |
|
|
| `progress.md` | Append-only session log + codebase patterns |
|
|
| `contracts/` | Sprint contracts per story |
|
|
| `config.json` | Harness configuration |
|
|
| Git commits | Code changes with story-tagged messages |
|
|
|
|
## File Structure
|
|
|
|
```
|
|
.loop/
|
|
loop.sh # Main loop orchestrator
|
|
config.json # Project config (generated by /loop-init)
|
|
init.sh # Project setup script (generated by /loop-init)
|
|
prd.json # Active PRD (generated by /loop-plan)
|
|
progress.md # Cross-session memory (append-only)
|
|
|
|
prompts/
|
|
generator/_base.md # Shared generator instructions
|
|
generator/implement.md # Implement mode overlay
|
|
generator/explore.md # Explore mode overlay
|
|
generator/fix.md # Fix mode overlay
|
|
evaluator/_base.md # Skeptical evaluator base
|
|
evaluator/implement.md # Implement verification
|
|
evaluator/explore.md # Analysis verification
|
|
evaluator/fix.md # Fix verification
|
|
planner/plan.md # Planning context
|
|
|
|
templates/ # Reference templates
|
|
lib/ # Shell library functions
|
|
skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage)
|
|
contracts/ # Sprint contracts (generated by /loop-plan)
|
|
triage/ # Analysis output (explore mode)
|
|
archive/ # Completed feature archives
|
|
```
|
|
|
|
## Design Principles
|
|
|
|
- **Fresh context per iteration** — no accumulated hallucination drift
|
|
- **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism
|
|
- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop
|
|
- **Structured handoffs via artifacts** — not conversation memory
|
|
- **No git revert on rejection** — next generator sees partial work + feedback (more signal)
|
|
- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration
|
|
|
|
## Credits
|
|
|
|
- [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern
|
|
- [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design
|