diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index b205b33..3bf91e8 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "agent-loop", "version": "0.8.0", - "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan with /agent-loop:init, then execute with /agent-loop:run.", + "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Run /agent-loop:run to start.", "author": { "name": "Sheldon" }, diff --git a/README.md b/README.md index 2507ed2..054cbc2 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,7 @@ Autonomous AI agent harness that combines a generator-evaluator architecture wit Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps). -A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous. - -Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention). +A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous with full visibility. ## Install @@ -20,46 +18,32 @@ Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) Then in any project: ``` -/agent-loop:init # Set up the loop for your project -/agent-loop:plan # Generate PRD and sprint contracts -/agent-loop:run # Run the loop interactively +/agent-loop:run ``` +That's it. The single command handles setup, planning, and execution. + ### Manual Install ```bash -# Clone into your project cp -r /path/to/loop-loop .loop - -# Install skills as Claude Code commands -mkdir -p .claude/commands -for skill in loop-init loop-plan loop-run loop-triage; do - ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md" -done - -# Then in Claude Code: -/loop-init && /loop-plan && /loop-run ``` +Then run `.loop/loop.sh` directly. + ## How It Works ``` -[You + Claude Code] [Loop Execution] - -/agent-loop:init Interactive (/agent-loop:run) - → scaffolds .loop/ └─ dispatches Agent subagents - → detects project └─ visible tool calls, can intervene - → picks mode └─ chat mid-loop to adjust course - → creates config.json - Headless (.loop/loop.sh) -/agent-loop:plan └─ spawns claude --print per iteration - → asks clarifying questions └─ fully autonomous, no UI - → generates prd.json - → generates sprint contracts Both paths: - → populates progress.md ├─→ Generator → picks story → implements → commits - ├─→ Evaluator → verifies → PASS or REJECT - ├─→ next iteration... - └─→ all stories pass → done +/agent-loop:run + ├─ Phase 1: Scaffold .loop/ (if needed) + ├─ Phase 2: Generate stories from spec (if needed) + │ └─ Presents stories for human review + │ └─ STOPS — user reviews and says "go" + └─ Phase 3: Launch loop in tmux + ├─→ Generator → picks story → implements → commits + ├─→ Evaluator → verifies → PASS or REJECT + ├─→ next iteration (fresh CC session each time) + └─→ all stories pass → done ``` ## Modes @@ -70,47 +54,48 @@ done | **explore** | Read-only codebase analysis | No | | **fix** | Targeted bug fixes / tech debt | Yes | -## Running the Loop +## Monitoring -### Option A: Interactive (`/loop-run`) — Recommended - -Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop. - -``` -/loop-run # Run until done or max iterations -/loop-run 3 # Run at most 3 iterations -/loop-run --skip-eval # Skip evaluator pass -/loop-run --story US-003 # Run only a specific story -``` - -### Option B: Headless (`loop.sh`) - -Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI. +After the loop launches in tmux: ```bash -.loop/loop.sh [options] +# Watch live (from Claude Code) +! tmux attach -t agent-loop + +# Detach back to Claude Code +Ctrl+B then D + +# Stop the loop +Ctrl+C in the tmux session +``` + +Or ask Claude Code "status" — it reads `.loop/prd.json` and `.loop/progress.md`. + +## Headless Mode + +For CI or background execution without the interactive UI: + +```bash +.loop/loop.sh --headless [options] --mode Operating mode --max Maximum iterations (default: 20) --skip-eval Skip evaluator pass ---tool AI tool to use ---no-hooks Don't install stop hooks ---dry-run Print assembled prompts without running agents ---resume Skip already-passed stories (explicit exit when none remain) +--dry-run Print assembled prompts without running ``` ## Architecture ### Generator -Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done. +Fresh Claude Code session each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done. ### Evaluator -Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback. +Separate fresh session after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests and the application, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback. Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction. ### Sprint Contracts -Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete. +Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete. ### State Persistence @@ -122,59 +107,36 @@ Before the loop starts, `/loop-plan` generates contracts for each story. These d | `config.json` | Harness configuration | | Git commits | Code changes with story-tagged messages | -## File Structure +## Optional: Runtime Testing Tools -``` -.loop/ - loop.sh # Main loop orchestrator - config.json # Project config (generated by /loop-init) - init.sh # Project setup script (generated by /loop-init) - prd.json # Active PRD (generated by /loop-plan) - progress.md # Cross-session memory (append-only) - - prompts/ - generator/_base.md # Shared generator instructions - generator/implement.md # Implement mode overlay - generator/explore.md # Explore mode overlay - generator/fix.md # Fix mode overlay - evaluator/_base.md # Skeptical evaluator base - evaluator/implement.md # Implement verification - evaluator/explore.md # Analysis verification - evaluator/fix.md # Fix verification - planner/plan.md # Planning context - - templates/ # Reference templates - lib/ # Shell library functions - skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage) - contracts/ # Sprint contracts (generated by /loop-plan) - triage/ # Analysis output (explore mode) - archive/ # Completed feature archives -``` - -## Browser Testing (Optional) - -The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server: +The evaluator verifies code actually runs, not just that it looks correct. It uses whatever tools are available. For richer verification, install these optional MCP servers: +**Web projects (Playwright):** ```bash claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium ``` -When Playwright is available, the evaluator will use it to: -- Navigate to the running application -- Check for JavaScript console errors -- Take screenshots for visual verification -- Reject stories with runtime errors +**iOS/Xcode projects (XcodeBuildMCP):** +```bash +brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp +claude mcp add xcodebuild -- xcodebuildmcp +``` -This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser. +**iOS Simulator interaction:** +```bash +claude mcp add ios-simulator -- npx -y ios-simulator-mcp +``` + +These are optional — the evaluator works without them but may miss runtime-only issues. ## Design Principles - **Fresh context per iteration** — no accumulated hallucination drift - **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism -- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop +- **Human judgment for planning, AI for execution** — human reviews stories, loop executes autonomously - **Structured handoffs via artifacts** — not conversation memory - **No git revert on rejection** — next generator sees partial work + feedback (more signal) -- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration +- **Tool-agnostic** — evaluator uses whatever tools are available, no hardcoded dependencies ## Credits diff --git a/install.sh b/install.sh index e189de4..4d211be 100755 --- a/install.sh +++ b/install.sh @@ -5,7 +5,7 @@ # 1. Copies the harness to ~/.claude/loop/ (prompts, templates, lib, loop.sh) # 2. Installs skills as Claude Code commands at ~/.claude/commands/ # -# After install, use /loop-init in any project to get started. +# After install, use /agent-loop:run in any project to get started. # # Usage: # ./install.sh # Install @@ -18,7 +18,7 @@ HARNESS_DIR="$CLAUDE_DIR/loop" COMMANDS_DIR="$CLAUDE_DIR/commands" SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -SKILLS=(loop-init loop-plan loop-run loop-triage) +SKILLS=(setup stories run triage) # --- Colors (if terminal supports them) --- if [ -t 1 ]; then @@ -100,9 +100,7 @@ info "${BOLD}Installation complete.${RESET}" echo "" echo " Next steps (inside Claude Code, in any project):" echo "" -echo " /loop-init # Set up the loop for your project" -echo " /loop-plan # Generate PRD and sprint contracts" -echo " /loop-run # Run the loop interactively" +echo " /agent-loop:run # Single command — setup, plan, and run" echo "" echo " Or run headless: .loop/loop.sh" echo "" diff --git a/lib/hooks.sh b/lib/hooks.sh index 0ecf4c0..95400c3 100644 --- a/lib/hooks.sh +++ b/lib/hooks.sh @@ -20,9 +20,9 @@ install_hooks() { jq '.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": "kill -INT $PPID || true"}]}]' \ "$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE" else - python3 -c " + LOOP_SETTINGS="$SETTINGS_FILE" python3 -c " import json, os -p = '$SETTINGS_FILE' +p = os.environ['LOOP_SETTINGS'] s = json.load(open(p)) if os.path.exists(p) else {} s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': 'kill -INT \$PPID || true'}]}] json.dump(s, open(p, 'w'), indent=2) @@ -37,12 +37,13 @@ remove_hooks() { jq 'del(.hooks.Stop)' "$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE" jq 'if .hooks == {} then del(.hooks) else . end' "$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE" else - python3 -c " -import json -s = json.load(open('$SETTINGS_FILE')) + LOOP_SETTINGS="$SETTINGS_FILE" python3 -c " +import json, os +p = os.environ['LOOP_SETTINGS'] +s = json.load(open(p)) s.get('hooks', {}).pop('Stop', None) if not s.get('hooks'): s.pop('hooks', None) -json.dump(s, open('$SETTINGS_FILE', 'w'), indent=2) +json.dump(s, open(p, 'w'), indent=2) " fi log "Stop hook removed" diff --git a/loop.sh b/loop.sh index 9b6a396..09bb913 100755 --- a/loop.sh +++ b/loop.sh @@ -124,7 +124,7 @@ while [[ $# -gt 0 ]]; do --dry-run) DRY_RUN=true; shift ;; --headless) export LOOP_HEADLESS=true; shift ;; --resume) RESUME=true; shift ;; - --replan) log "ERROR: --replan is not yet implemented. Use /loop-plan interactively."; exit 1 ;; + --replan) log "ERROR: --replan is not yet implemented. Use /agent-loop:stories interactively."; exit 1 ;; [0-9]*) MAX_ITERATIONS="$1"; shift ;; *) log "Unknown option: $1"; exit 1 ;; esac @@ -162,7 +162,7 @@ check_archive # Validate prd.json exists (AFTER archive check, which may delete it on branch change) if [ ! -f "$LOOP_DIR/prd.json" ]; then - log "ERROR: No prd.json found. Run /loop-plan first to create one." + log "ERROR: No prd.json found. Run /agent-loop:stories first to create one." exit 1 fi @@ -240,11 +240,11 @@ run_agent() { claude) printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \ claude --dangerously-skip-permissions --output-format text \ - --print 2>&1 > "$output_file" + --print > "$output_file" 2>&1 ;; amp) printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \ - amp --dangerously-allow-all 2>&1 > "$output_file" + amp --dangerously-allow-all > "$output_file" 2>&1 ;; *) log "ERROR: Unknown tool '$TOOL'" @@ -319,7 +319,7 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do fi snapshot_for_archive if any_stories_blocked 2>/dev/null; then - log "Some stories are blocked and need human review. Run /loop-triage for details." + log "Some stories are blocked and need human review. Run /agent-loop:triage for details." exit $EXIT_ALL_BLOCKED fi exit $EXIT_OK @@ -364,7 +364,7 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do # --- Scope budget check --- # Verify the generator stayed within configured limits (files modified, lines written). # Advisory in implement/fix modes (log warning), but enforced as rejection reason for evaluator. - if [ -n "$PRE_GENERATOR_SHA" ] && [ "$PRE_GENERATOR_SHA" != "" ]; then + if [ -n "$PRE_GENERATOR_SHA" ]; then SCOPE_FILES_MODIFIED=$(git diff --name-only "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | wc -l | tr -d ' ') SCOPE_LINES_WRITTEN=$(git diff --stat "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0") @@ -381,18 +381,9 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do export SCOPE_FILES_MODIFIED SCOPE_LINES_WRITTEN fi - # Check for completion — in interactive mode, check prd.json directly - if all_stories_pass 2>/dev/null; then - log_header "All Stories Complete! ($(story_counts))" - snapshot_for_archive - exit 0 - fi - # Headless mode: also check output sentinel - if [ -n "$GENERATOR_OUTPUT" ] && echo "$GENERATOR_OUTPUT" | grep -q "COMPLETE"; then - log_header "Generator signaled COMPLETE ($(story_counts))" - snapshot_for_archive - exit 0 - fi + # NOTE: Do NOT check all_stories_pass here. The generator marks its own story + # as passed, but the evaluator hasn't verified yet. Checking here would skip + # evaluation on the last story. The completion check is at the top of the loop. # --- Evaluator pass --- if [ "$SKIP_EVAL" != true ]; then @@ -460,6 +451,6 @@ done # --- Max iterations reached --- log_header "Max Iterations Reached ($MAX_ITERATIONS)" log "Stories completed: $(story_counts)" -log "Run /loop-triage to generate a handoff brief." +log "Run /agent-loop:triage to generate a handoff brief." snapshot_for_archive exit $EXIT_MAX_ITERATIONS diff --git a/prompts/evaluator/explore.md b/prompts/evaluator/explore.md index ef6ec4a..c595454 100644 --- a/prompts/evaluator/explore.md +++ b/prompts/evaluator/explore.md @@ -6,7 +6,7 @@ You are evaluating an analysis/exploration task. The generator claims to have an Before any other checks, verify explore mode's read-only constraint: 1. Run `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only` -2. If ANY file outside `.loop/triage/` was modified or committed, **REJECT immediately** — explore mode is read-only. The generator must not modify host project files. +2. If ANY file outside `.loop/` was modified or committed, **REJECT immediately** — explore mode is read-only. The generator must not modify host project files. (Files inside `.loop/` like `prd.json` and `progress.md` are expected.) ## Exploration-Specific Checks diff --git a/prompts/planner/plan.md b/prompts/planner/plan.md index e2fc3ed..df5fb8c 100644 --- a/prompts/planner/plan.md +++ b/prompts/planner/plan.md @@ -1,6 +1,6 @@ # Planner Context -This file is loaded by the `/loop-plan` skill to provide additional context for PRD generation. +This file provides additional context for PRD generation. ## Story Decomposition Guidelines diff --git a/setup.sh b/setup.sh index db490ae..d010b44 100755 --- a/setup.sh +++ b/setup.sh @@ -1,7 +1,7 @@ #!/bin/bash # Agent Loop — project setup script # Scaffolds .loop/ directory and generates config.json. -# Called by /agent-loop:init or run directly. +# Called by /agent-loop:setup or /agent-loop:run, or run directly. # # Usage: # setup.sh # mode: implement, explore, or fix @@ -120,5 +120,5 @@ echo "[setup] Mode: $MODE" echo "[setup] Config: .loop/config.json" echo "" echo "Next steps (in Claude Code):" -echo " /agent-loop:plan # Generate stories from your spec or description" +echo " /agent-loop:stories # Generate stories from your spec or description" echo "" diff --git a/skills/setup/SKILL.md b/skills/setup/SKILL.md index 3a89452..ddcfc1a 100644 --- a/skills/setup/SKILL.md +++ b/skills/setup/SKILL.md @@ -3,7 +3,7 @@ name: setup description: "Run the setup script to scaffold .loop/ directory. Does not plan features or write code." --- -# /init — Scaffold the Agent Loop +# /setup — Scaffold the Agent Loop Run the setup script to create `.loop/` with harness files and config. This skill does ONE thing: run a bash command. diff --git a/skills/stories/SKILL.md b/skills/stories/SKILL.md index 748c712..0bb23da 100644 --- a/skills/stories/SKILL.md +++ b/skills/stories/SKILL.md @@ -3,7 +3,7 @@ name: stories description: "Generate prd.json and sprint contracts by dispatching the planner agent. Does not write source code." --- -# /plan — Generate PRD and Sprint Contracts +# /stories — Generate PRD and Sprint Contracts Dispatch the planner agent to decompose a spec into stories. The planner agent cannot write source code or run bash commands — it can only write to `.loop/`. @@ -11,7 +11,7 @@ Dispatch the planner agent to decompose a spec into stories. The planner agent c ### 1. Check prerequisites -Verify `.loop/config.json` exists. If not, tell the user to run `/agent-loop:init` first and stop. +Verify `.loop/config.json` exists. If not, tell the user to run `/agent-loop:setup` first and stop. ### 2. Find the spec