# Agent Loop Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks. Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps). A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous. Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention). ## Install ### As a Claude Code Plugin (Recommended) ``` /plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git /plugin install agent-loop@agent-loop ``` Then in any project: ``` /agent-loop:init # Set up the loop for your project /agent-loop:plan # Generate PRD and sprint contracts /agent-loop:run # Run the loop interactively ``` ### Manual Install ```bash # Clone into your project cp -r /path/to/loop-loop .loop # Install skills as Claude Code commands mkdir -p .claude/commands for skill in loop-init loop-plan loop-run loop-triage; do ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md" done # Then in Claude Code: /loop-init && /loop-plan && /loop-run ``` ## How It Works ``` [You + Claude Code] [Loop Execution] /agent-loop:init Interactive (/agent-loop:run) → scaffolds .loop/ └─ dispatches Agent subagents → detects project └─ visible tool calls, can intervene → picks mode └─ chat mid-loop to adjust course → creates config.json Headless (.loop/loop.sh) /agent-loop:plan └─ spawns claude --print per iteration → asks clarifying questions └─ fully autonomous, no UI → generates prd.json → generates sprint contracts Both paths: → populates progress.md ├─→ Generator → picks story → implements → commits ├─→ Evaluator → verifies → PASS or REJECT ├─→ next iteration... └─→ all stories pass → done ``` ## Modes | Mode | What it does | Git writes? | |------|-------------|-------------| | **implement** | Build features from a PRD | Yes | | **explore** | Read-only codebase analysis | No | | **fix** | Targeted bug fixes / tech debt | Yes | ## Running the Loop ### Option A: Interactive (`/loop-run`) — Recommended Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop. ``` /loop-run # Run until done or max iterations /loop-run 3 # Run at most 3 iterations /loop-run --skip-eval # Skip evaluator pass /loop-run --story US-003 # Run only a specific story ``` ### Option B: Headless (`loop.sh`) Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI. ```bash .loop/loop.sh [options] --mode Operating mode --max Maximum iterations (default: 20) --skip-eval Skip evaluator pass --tool AI tool to use --no-hooks Don't install stop hooks --dry-run Print assembled prompts without running agents --resume Skip already-passed stories (explicit exit when none remain) ``` ## Architecture ### Generator Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done. ### Evaluator Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback. Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction. ### Sprint Contracts Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete. ### State Persistence | Artifact | Purpose | |----------|---------| | `prd.json` | Story status (pass/fail), acceptance criteria | | `progress.md` | Append-only session log + codebase patterns | | `contracts/` | Sprint contracts per story | | `config.json` | Harness configuration | | Git commits | Code changes with story-tagged messages | ## File Structure ``` .loop/ loop.sh # Main loop orchestrator config.json # Project config (generated by /loop-init) init.sh # Project setup script (generated by /loop-init) prd.json # Active PRD (generated by /loop-plan) progress.md # Cross-session memory (append-only) prompts/ generator/_base.md # Shared generator instructions generator/implement.md # Implement mode overlay generator/explore.md # Explore mode overlay generator/fix.md # Fix mode overlay evaluator/_base.md # Skeptical evaluator base evaluator/implement.md # Implement verification evaluator/explore.md # Analysis verification evaluator/fix.md # Fix verification planner/plan.md # Planning context templates/ # Reference templates lib/ # Shell library functions skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage) contracts/ # Sprint contracts (generated by /loop-plan) triage/ # Analysis output (explore mode) archive/ # Completed feature archives ``` ## Browser Testing (Optional) The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server: ```bash claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium ``` When Playwright is available, the evaluator will use it to: - Navigate to the running application - Check for JavaScript console errors - Take screenshots for visual verification - Reject stories with runtime errors This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser. ## Design Principles - **Fresh context per iteration** — no accumulated hallucination drift - **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism - **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop - **Structured handoffs via artifacts** — not conversation memory - **No git revert on rejection** — next generator sees partial work + feedback (more signal) - **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration ## Credits - [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern - [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design