Files
loop-loop/README.md

183 lines
7.8 KiB
Markdown

# Agent Loop
Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.
Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps).
A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous.
Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention).
## Install
### As a Claude Code Plugin (Recommended)
```
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
/plugin install agent-loop@agent-loop
```
Then in any project:
```
/agent-loop:init # Set up the loop for your project
/agent-loop:plan # Generate PRD and sprint contracts
/agent-loop:run # Run the loop interactively
```
### Manual Install
```bash
# Clone into your project
cp -r /path/to/loop-loop .loop
# Install skills as Claude Code commands
mkdir -p .claude/commands
for skill in loop-init loop-plan loop-run loop-triage; do
ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md"
done
# Then in Claude Code:
/loop-init && /loop-plan && /loop-run
```
## How It Works
```
[You + Claude Code] [Loop Execution]
/agent-loop:init Interactive (/agent-loop:run)
→ scaffolds .loop/ └─ dispatches Agent subagents
→ detects project └─ visible tool calls, can intervene
→ picks mode └─ chat mid-loop to adjust course
→ creates config.json
Headless (.loop/loop.sh)
/agent-loop:plan └─ spawns claude --print per iteration
→ asks clarifying questions └─ fully autonomous, no UI
→ generates prd.json
→ generates sprint contracts Both paths:
→ populates progress.md ├─→ Generator → picks story → implements → commits
├─→ Evaluator → verifies → PASS or REJECT
├─→ next iteration...
└─→ all stories pass → done
```
## Modes
| Mode | What it does | Git writes? |
|------|-------------|-------------|
| **implement** | Build features from a PRD | Yes |
| **explore** | Read-only codebase analysis | No |
| **fix** | Targeted bug fixes / tech debt | Yes |
## Running the Loop
### Option A: Interactive (`/loop-run`) — Recommended
Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop.
```
/loop-run # Run until done or max iterations
/loop-run 3 # Run at most 3 iterations
/loop-run --skip-eval # Skip evaluator pass
/loop-run --story US-003 # Run only a specific story
```
### Option B: Headless (`loop.sh`)
Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI.
```bash
.loop/loop.sh [options]
--mode <implement|explore|fix> Operating mode
--max <N> Maximum iterations (default: 20)
--skip-eval Skip evaluator pass
--tool <claude|amp> AI tool to use
--no-hooks Don't install stop hooks
--dry-run Print assembled prompts without running agents
--resume Skip already-passed stories (explicit exit when none remain)
```
## Architecture
### Generator
Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.
### Evaluator
Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback.
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.
### Sprint Contracts
Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
### State Persistence
| Artifact | Purpose |
|----------|---------|
| `prd.json` | Story status (pass/fail), acceptance criteria |
| `progress.md` | Append-only session log + codebase patterns |
| `contracts/` | Sprint contracts per story |
| `config.json` | Harness configuration |
| Git commits | Code changes with story-tagged messages |
## File Structure
```
.loop/
loop.sh # Main loop orchestrator
config.json # Project config (generated by /loop-init)
init.sh # Project setup script (generated by /loop-init)
prd.json # Active PRD (generated by /loop-plan)
progress.md # Cross-session memory (append-only)
prompts/
generator/_base.md # Shared generator instructions
generator/implement.md # Implement mode overlay
generator/explore.md # Explore mode overlay
generator/fix.md # Fix mode overlay
evaluator/_base.md # Skeptical evaluator base
evaluator/implement.md # Implement verification
evaluator/explore.md # Analysis verification
evaluator/fix.md # Fix verification
planner/plan.md # Planning context
templates/ # Reference templates
lib/ # Shell library functions
skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage)
contracts/ # Sprint contracts (generated by /loop-plan)
triage/ # Analysis output (explore mode)
archive/ # Completed feature archives
```
## Browser Testing (Optional)
The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server:
```bash
claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
```
When Playwright is available, the evaluator will use it to:
- Navigate to the running application
- Check for JavaScript console errors
- Take screenshots for visual verification
- Reject stories with runtime errors
This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser.
## Design Principles
- **Fresh context per iteration** — no accumulated hallucination drift
- **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism
- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop
- **Structured handoffs via artifacts** — not conversation memory
- **No git revert on rejection** — next generator sees partial work + feedback (more signal)
- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration
## Credits
- [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern
- [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design