loop-loop/README.md

# Agent Loop

Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.

Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps).

A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous.

Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention).

## Install

### As a Claude Code Plugin (Recommended)

```
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
/plugin install agent-loop@agent-loop
```

Then in any project:

```
/agent-loop:init          # Set up the loop for your project
/agent-loop:plan          # Generate PRD and sprint contracts
/agent-loop:run           # Run the loop interactively
```

### Manual Install

```bash
# Clone into your project
cp -r /path/to/loop-loop .loop

# Install skills as Claude Code commands
mkdir -p .claude/commands
for skill in loop-init loop-plan loop-run loop-triage; do
    ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md"
done

# Then in Claude Code:
/loop-init && /loop-plan && /loop-run
```

## How It Works

```
[You + Claude Code]                    [Loop Execution]

/agent-loop:init                       Interactive (/agent-loop:run)
  → scaffolds .loop/                     └─ dispatches Agent subagents
  → detects project                      └─ visible tool calls, can intervene
  → picks mode                           └─ chat mid-loop to adjust course
  → creates config.json
                                        Headless (.loop/loop.sh)
/agent-loop:plan                         └─ spawns claude --print per iteration
  → asks clarifying questions            └─ fully autonomous, no UI
  → generates prd.json
  → generates sprint contracts          Both paths:
  → populates progress.md                ├─→ Generator → picks story → implements → commits
                                          ├─→ Evaluator → verifies → PASS or REJECT
                                          ├─→ next iteration...
                                          └─→ all stories pass → done
```

## Modes

| Mode | What it does | Git writes? |
|------|-------------|-------------|
| **implement** | Build features from a PRD | Yes |
| **explore** | Read-only codebase analysis | No |
| **fix** | Targeted bug fixes / tech debt | Yes |

## Running the Loop

### Option A: Interactive (`/loop-run`) — Recommended

Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop.

```
/loop-run                    # Run until done or max iterations
/loop-run 3                  # Run at most 3 iterations
/loop-run --skip-eval        # Skip evaluator pass
/loop-run --story US-003     # Run only a specific story
```

### Option B: Headless (`loop.sh`)

Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI.

```bash
.loop/loop.sh [options]

--mode <implement|explore|fix>   Operating mode
--max <N>                        Maximum iterations (default: 20)
--skip-eval                      Skip evaluator pass
--tool <claude|amp>              AI tool to use
--no-hooks                       Don't install stop hooks
--dry-run                        Print assembled prompts without running agents
--resume                         Skip already-passed stories (explicit exit when none remain)
```

## Architecture

### Generator
Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.

### Evaluator
Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback.

Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.

### Sprint Contracts
Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.

### State Persistence

| Artifact | Purpose |
|----------|---------|
| `prd.json` | Story status (pass/fail), acceptance criteria |
| `progress.md` | Append-only session log + codebase patterns |
| `contracts/` | Sprint contracts per story |
| `config.json` | Harness configuration |
| Git commits | Code changes with story-tagged messages |

## File Structure

```
.loop/
  loop.sh                        # Main loop orchestrator
  config.json                    # Project config (generated by /loop-init)
  init.sh                        # Project setup script (generated by /loop-init)
  prd.json                       # Active PRD (generated by /loop-plan)
  progress.md                    # Cross-session memory (append-only)

  prompts/
    generator/_base.md           # Shared generator instructions
    generator/implement.md       # Implement mode overlay
    generator/explore.md         # Explore mode overlay
    generator/fix.md             # Fix mode overlay
    evaluator/_base.md           # Skeptical evaluator base
    evaluator/implement.md       # Implement verification
    evaluator/explore.md         # Analysis verification
    evaluator/fix.md             # Fix verification
    planner/plan.md              # Planning context

  templates/                     # Reference templates
  lib/                           # Shell library functions
  skills/                        # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage)
  contracts/                     # Sprint contracts (generated by /loop-plan)
  triage/                        # Analysis output (explore mode)
  archive/                       # Completed feature archives
```

## Browser Testing (Optional)

The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server:

```bash
claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
```

When Playwright is available, the evaluator will use it to:
- Navigate to the running application
- Check for JavaScript console errors
- Take screenshots for visual verification
- Reject stories with runtime errors

This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser.

## Design Principles

- **Fresh context per iteration** — no accumulated hallucination drift
- **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism
- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop
- **Structured handoffs via artifacts** — not conversation memory
- **No git revert on rejection** — next generator sees partial work + feedback (more signal)
- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration

## Credits

- [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern
- [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design