Go to file

Sheldon Finlay 9a7fa3a1bd fix: enforce strict orientation sequence in generator prompt

Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.

2026-03-27 21:07:48 -04:00

.claude

fix: correct URLs, author name, and clean up stale hook

2026-03-27 19:00:26 -04:00

.claude-plugin

fix: correct URLs, author name, and clean up stale hook

2026-03-27 19:00:26 -04:00

agents

feat: bash setup script, planner agent with disallowedTools, simplified skills

2026-03-27 09:23:42 -04:00

lib

feat: US-002 - Guard against data loss in archive.sh

2026-03-27 18:40:31 -04:00

prompts

fix: enforce strict orientation sequence in generator prompt

2026-03-27 21:07:48 -04:00

skills

feat: background watcher notifies CC session when loop completes

2026-03-27 15:22:43 -04:00

templates

feat: agent loop harness with Claude Code plugin support

2026-03-27 08:03:18 -04:00

.gitignore

feat: agent loop harness with Claude Code plugin support

2026-03-27 08:03:18 -04:00

config.json.example

feat: US-007 - Increase evalRetries default from 2 to 3

2026-03-27 18:49:40 -04:00

CONTRIBUTING.md

fix: correct URLs, author name, and clean up stale hook

2026-03-27 19:00:26 -04:00

init.sh.example

feat: US-006 - Improve init.sh.example with project-type guidance

2026-03-27 18:47:44 -04:00

install.sh

fix: critical bugs, stale refs, README rewrite, security fixes

2026-03-27 14:58:01 -04:00

LICENSE

fix: correct URLs, author name, and clean up stale hook

2026-03-27 19:00:26 -04:00

loop.sh

feat: US-007 - Increase evalRetries default from 2 to 3

2026-03-27 18:49:40 -04:00

README.md

fix: correct URLs, author name, and clean up stale hook

2026-03-27 19:00:26 -04:00

setup.sh

feat: US-007 - Increase evalRetries default from 2 to 3

2026-03-27 18:49:40 -04:00

README.md

Agent Loop

Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.

Inspired by Geoffrey Huntley's Ralph pattern and Anthropic's harness design research.

A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each iteration: a Generator does the work, then an Evaluator verifies it. Human judgment stays in the planning phase; execution is autonomous with full visibility.

Install

As a Claude Code Plugin (Recommended)

/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
/plugin install agent-loop@agent-loop

Then in any project:

/agent-loop:run

That's it. The single command handles setup, planning, and execution.

Manual Install

cp -r /path/to/loop-loop .loop

Then run .loop/loop.sh directly.

How It Works

/agent-loop:run
  ├─ Phase 1: Scaffold .loop/ (if needed)
  ├─ Phase 2: Generate stories from spec (if needed)
  │    └─ Presents stories for human review
  │    └─ STOPS — user reviews and says "go"
  └─ Phase 3: Launch loop in tmux
       ├─→ Generator → picks story → implements → commits
       ├─→ Evaluator → verifies → PASS or REJECT
       ├─→ next iteration (fresh CC session each time)
       └─→ all stories pass → done

Modes

Mode	What it does	Git writes?
implement	Build features from a PRD	Yes
explore	Read-only codebase analysis	No
fix	Targeted bug fixes / tech debt	Yes

Monitoring

After the loop launches in tmux:

# Watch live (from Claude Code)
! tmux attach -t agent-loop

# Detach back to Claude Code
Ctrl+B then D

# Stop the loop
Ctrl+C in the tmux session

Or ask Claude Code "status" — it reads .loop/prd.json and .loop/progress.md.

Each generator and evaluator run is a full Claude Code session saved to history. Use claude -r to resume any session and inspect what happened, debug a rejection, or continue from where it left off.

Headless Mode

For CI or background execution without the interactive UI:

.loop/loop.sh --headless [options]

--mode <implement|explore|fix>   Operating mode
--max <N>                        Maximum iterations (default: 20)
--skip-eval                      Skip evaluator pass
--dry-run                        Print assembled prompts without running

Architecture

Generator

Fresh Claude Code session each iteration. Reads prd.json to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.

Evaluator

Separate fresh session after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests and the application, and issues a PASS or REJECT verdict. Rejection sends the story back with specific feedback.

Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.

Sprint Contracts

Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.

State Persistence

Artifact	Purpose
`prd.json`	Story status (pass/fail), acceptance criteria
`progress.md`	Append-only session log + codebase patterns
`contracts/`	Sprint contracts per story
`config.json`	Harness configuration
Git commits	Code changes with story-tagged messages

Optional: Runtime Testing Tools

The evaluator verifies code actually runs, not just that it looks correct. It uses whatever tools are available. For richer verification, install these optional MCP servers:

Web projects (Playwright):

claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium

iOS/Xcode projects (XcodeBuildMCP):

brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp
claude mcp add xcodebuild -- xcodebuildmcp

iOS Simulator interaction:

claude mcp add ios-simulator -- npx -y ios-simulator-mcp

These are optional — the evaluator works without them but may miss runtime-only issues.

Design Principles

Fresh context per iteration — no accumulated hallucination drift
Separate generation from evaluation — external skepticism is easier to tune than self-criticism
Human judgment for planning, AI for execution — human reviews stories, loop executes autonomously
Structured handoffs via artifacts — not conversation memory
No git revert on rejection — next generator sees partial work + feedback (more signal)
Tool-agnostic — evaluator uses whatever tools are available, no hardcoded dependencies

Credits

Geoffrey Huntley — original Ralph pattern
Anthropic Engineering — generator-evaluator harness design