# Agent Loop Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks. Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps). A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous with full visibility. ## Install ### As a Claude Code Plugin (Recommended) ``` /plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git /plugin install agent-loop@agent-loop ``` Then in any project: ``` /agent-loop:run ``` That's it. The single command handles setup, planning, and execution. ### Manual Install ```bash cp -r /path/to/loop-loop .loop ``` Then run `.loop/loop.sh` directly. ## How It Works ``` /agent-loop:run ├─ Phase 1: Scaffold .loop/ (if needed) ├─ Phase 2: Generate stories from spec (if needed) │ └─ Presents stories for human review │ └─ STOPS — user reviews and says "go" └─ Phase 3: Launch loop in tmux ├─→ Generator → picks story → implements → commits ├─→ Evaluator → verifies → PASS or REJECT ├─→ next iteration (fresh CC session each time) └─→ all stories pass → done ``` ## Modes | Mode | What it does | Git writes? | |------|-------------|-------------| | **implement** | Build features from a PRD | Yes | | **explore** | Read-only codebase analysis | No | | **fix** | Targeted bug fixes / tech debt | Yes | ## Monitoring After the loop launches in tmux: ```bash # Watch live (from Claude Code) ! tmux attach -t agent-loop # Detach back to Claude Code Ctrl+B then D # Stop the loop Ctrl+C in the tmux session ``` Or ask Claude Code "status" — it reads `.loop/prd.json` and `.loop/progress.md`. Each generator and evaluator run is a full Claude Code session saved to history. Use `claude -r` to resume any session and inspect what happened, debug a rejection, or continue from where it left off. ## Headless Mode For CI or background execution without the interactive UI: ```bash .loop/loop.sh --headless [options] --mode Operating mode --max Maximum iterations (default: 20) --skip-eval Skip evaluator pass --dry-run Print assembled prompts without running ``` ## Architecture ### Generator Fresh Claude Code session each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done. ### Evaluator Separate fresh session after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests and the application, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback. Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction. ### Sprint Contracts Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete. ### State Persistence | Artifact | Purpose | |----------|---------| | `prd.json` | Story status (pass/fail), acceptance criteria | | `progress.md` | Append-only session log + codebase patterns | | `contracts/` | Sprint contracts per story | | `config.json` | Harness configuration | | Git commits | Code changes with story-tagged messages | ## Optional: Runtime Testing Tools The evaluator verifies code actually runs, not just that it looks correct. It uses whatever tools are available. For richer verification, install these optional MCP servers: **Web projects (Playwright):** ```bash claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium ``` **iOS/Xcode projects (XcodeBuildMCP):** ```bash brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp claude mcp add xcodebuild -- xcodebuildmcp ``` **iOS Simulator interaction:** ```bash claude mcp add ios-simulator -- npx -y ios-simulator-mcp ``` These are optional — the evaluator works without them but may miss runtime-only issues. ## Design Principles - **Fresh context per iteration** — no accumulated hallucination drift - **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism - **Human judgment for planning, AI for execution** — human reviews stories, loop executes autonomously - **Structured handoffs via artifacts** — not conversation memory - **No git revert on rejection** — next generator sees partial work + feedback (more signal) - **Tool-agnostic** — evaluator uses whatever tools are available, no hardcoded dependencies ## Credits - [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern - [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design