Files
loop-loop/README.md

7.8 KiB

Agent Loop

Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.

Inspired by Geoffrey Huntley's Ralph pattern and Anthropic's harness design research.

A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a Generator does the work, then an Evaluator verifies it. Human judgment stays in the planning phase; execution is autonomous.

Two execution modes: headless via loop.sh (fully autonomous bash process) or interactive via /loop-run (Claude Code-native with full visibility and intervention).

Install

/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
/plugin install agent-loop@agent-loop

Then in any project:

/agent-loop:init          # Set up the loop for your project
/agent-loop:plan          # Generate PRD and sprint contracts
/agent-loop:run           # Run the loop interactively

Manual Install

# Clone into your project
cp -r /path/to/loop-loop .loop

# Install skills as Claude Code commands
mkdir -p .claude/commands
for skill in loop-init loop-plan loop-run loop-triage; do
    ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md"
done

# Then in Claude Code:
/loop-init && /loop-plan && /loop-run

How It Works

[You + Claude Code]                    [Loop Execution]

/agent-loop:init                       Interactive (/agent-loop:run)
  → scaffolds .loop/                     └─ dispatches Agent subagents
  → detects project                      └─ visible tool calls, can intervene
  → picks mode                           └─ chat mid-loop to adjust course
  → creates config.json
                                        Headless (.loop/loop.sh)
/agent-loop:plan                         └─ spawns claude --print per iteration
  → asks clarifying questions            └─ fully autonomous, no UI
  → generates prd.json
  → generates sprint contracts          Both paths:
  → populates progress.md                ├─→ Generator → picks story → implements → commits
                                          ├─→ Evaluator → verifies → PASS or REJECT
                                          ├─→ next iteration...
                                          └─→ all stories pass → done

Modes

Mode What it does Git writes?
implement Build features from a PRD Yes
explore Read-only codebase analysis No
fix Targeted bug fixes / tech debt Yes

Running the Loop

Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop.

/loop-run                    # Run until done or max iterations
/loop-run 3                  # Run at most 3 iterations
/loop-run --skip-eval        # Skip evaluator pass
/loop-run --story US-003     # Run only a specific story

Option B: Headless (loop.sh)

Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI.

.loop/loop.sh [options]

--mode <implement|explore|fix>   Operating mode
--max <N>                        Maximum iterations (default: 20)
--skip-eval                      Skip evaluator pass
--tool <claude|amp>              AI tool to use
--no-hooks                       Don't install stop hooks
--dry-run                        Print assembled prompts without running agents
--resume                         Skip already-passed stories (explicit exit when none remain)

Architecture

Generator

Fresh Claude Code instance each iteration. Reads prd.json to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.

Evaluator

Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a PASS or REJECT verdict. Rejection sends the story back to the generator with specific feedback.

Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.

Sprint Contracts

Before the loop starts, /loop-plan generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.

State Persistence

Artifact Purpose
prd.json Story status (pass/fail), acceptance criteria
progress.md Append-only session log + codebase patterns
contracts/ Sprint contracts per story
config.json Harness configuration
Git commits Code changes with story-tagged messages

File Structure

.loop/
  loop.sh                        # Main loop orchestrator
  config.json                    # Project config (generated by /loop-init)
  init.sh                        # Project setup script (generated by /loop-init)
  prd.json                       # Active PRD (generated by /loop-plan)
  progress.md                    # Cross-session memory (append-only)

  prompts/
    generator/_base.md           # Shared generator instructions
    generator/implement.md       # Implement mode overlay
    generator/explore.md         # Explore mode overlay
    generator/fix.md             # Fix mode overlay
    evaluator/_base.md           # Skeptical evaluator base
    evaluator/implement.md       # Implement verification
    evaluator/explore.md         # Analysis verification
    evaluator/fix.md             # Fix verification
    planner/plan.md              # Planning context

  templates/                     # Reference templates
  lib/                           # Shell library functions
  skills/                        # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage)
  contracts/                     # Sprint contracts (generated by /loop-plan)
  triage/                        # Analysis output (explore mode)
  archive/                       # Completed feature archives

Browser Testing (Optional)

The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server:

claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium

When Playwright is available, the evaluator will use it to:

  • Navigate to the running application
  • Check for JavaScript console errors
  • Take screenshots for visual verification
  • Reject stories with runtime errors

This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser.

Design Principles

  • Fresh context per iteration — no accumulated hallucination drift
  • Separate generation from evaluation — external skepticism is easier to tune than self-criticism
  • Human judgment for planning, AI for execution — interactive /loop-plan, autonomous loop
  • Structured handoffs via artifacts — not conversation memory
  • No git revert on rejection — next generator sees partial work + feedback (more signal)
  • Advisory scope budgets — prompt-enforced limits on files read/written per iteration

Credits