commit 17e5eb707ffd1ed76070fe15f578367492dde29d Author: Sheldon Finlay Date: Fri Mar 27 08:03:18 2026 -0400 feat: agent loop harness with Claude Code plugin support Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run. diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json new file mode 100644 index 0000000..8b72a2d --- /dev/null +++ b/.claude-plugin/marketplace.json @@ -0,0 +1,14 @@ +{ + "name": "agent-loop", + "plugins": [ + { + "name": "agent-loop", + "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.", + "version": "0.1.0", + "source": { + "source": "github", + "repo": "https://git.jagfly.com/sheldon/loop-loop.git" + } + } + ] +} diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..dd716ab --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,11 @@ +{ + "name": "agent-loop", + "version": "0.1.0", + "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan with /agent-loop:init, then execute with /agent-loop:run.", + "author": { + "name": "Sheldon" + }, + "repository": "https://git.jagfly.com/sheldon/loop-loop.git", + "license": "MIT", + "keywords": ["agent", "loop", "autonomous", "generator", "evaluator", "harness"] +} diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..8f224e1 --- /dev/null +++ b/.gitignore @@ -0,0 +1,19 @@ +# Runtime artifacts (generated per-project, not part of the harness) +prd.json +progress.md +progress-archive.md +config.json +init.sh +contracts/ +triage/ +archive/ +.archive-staging/ +.last-branch +.loop.lock + +# OS +.DS_Store +Thumbs.db + +# Claude Code +.claude/ diff --git a/README.md b/README.md new file mode 100644 index 0000000..7b82fdc --- /dev/null +++ b/README.md @@ -0,0 +1,166 @@ +# Agent Loop + +Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks. + +Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps). + +A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous. + +Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention). + +## Install + +### As a Claude Code Plugin (Recommended) + +``` +/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git +/plugin install agent-loop@agent-loop +``` + +Then in any project: + +``` +/agent-loop:init # Set up the loop for your project +/agent-loop:plan # Generate PRD and sprint contracts +/agent-loop:run # Run the loop interactively +``` + +### Manual Install + +```bash +# Clone into your project +cp -r /path/to/loop-loop .loop + +# Install skills as Claude Code commands +mkdir -p .claude/commands +for skill in loop-init loop-plan loop-run loop-triage; do + ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md" +done + +# Then in Claude Code: +/loop-init && /loop-plan && /loop-run +``` + +## How It Works + +``` +[You + Claude Code] [Loop Execution] + +/agent-loop:init Interactive (/agent-loop:run) + → scaffolds .loop/ └─ dispatches Agent subagents + → detects project └─ visible tool calls, can intervene + → picks mode └─ chat mid-loop to adjust course + → creates config.json + Headless (.loop/loop.sh) +/agent-loop:plan └─ spawns claude --print per iteration + → asks clarifying questions └─ fully autonomous, no UI + → generates prd.json + → generates sprint contracts Both paths: + → populates progress.md ├─→ Generator → picks story → implements → commits + ├─→ Evaluator → verifies → PASS or REJECT + ├─→ next iteration... + └─→ all stories pass → done +``` + +## Modes + +| Mode | What it does | Git writes? | +|------|-------------|-------------| +| **implement** | Build features from a PRD | Yes | +| **explore** | Read-only codebase analysis | No | +| **fix** | Targeted bug fixes / tech debt | Yes | + +## Running the Loop + +### Option A: Interactive (`/loop-run`) — Recommended + +Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop. + +``` +/loop-run # Run until done or max iterations +/loop-run 3 # Run at most 3 iterations +/loop-run --skip-eval # Skip evaluator pass +/loop-run --story US-003 # Run only a specific story +``` + +### Option B: Headless (`loop.sh`) + +Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI. + +```bash +.loop/loop.sh [options] + +--mode Operating mode +--max Maximum iterations (default: 20) +--skip-eval Skip evaluator pass +--tool AI tool to use +--no-hooks Don't install stop hooks +--dry-run Print assembled prompts without running agents +--resume Skip already-passed stories (explicit exit when none remain) +``` + +## Architecture + +### Generator +Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done. + +### Evaluator +Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback. + +Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction. + +### Sprint Contracts +Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete. + +### State Persistence + +| Artifact | Purpose | +|----------|---------| +| `prd.json` | Story status (pass/fail), acceptance criteria | +| `progress.md` | Append-only session log + codebase patterns | +| `contracts/` | Sprint contracts per story | +| `config.json` | Harness configuration | +| Git commits | Code changes with story-tagged messages | + +## File Structure + +``` +.loop/ + loop.sh # Main loop orchestrator + config.json # Project config (generated by /loop-init) + init.sh # Project setup script (generated by /loop-init) + prd.json # Active PRD (generated by /loop-plan) + progress.md # Cross-session memory (append-only) + + prompts/ + generator/_base.md # Shared generator instructions + generator/implement.md # Implement mode overlay + generator/explore.md # Explore mode overlay + generator/fix.md # Fix mode overlay + evaluator/_base.md # Skeptical evaluator base + evaluator/implement.md # Implement verification + evaluator/explore.md # Analysis verification + evaluator/fix.md # Fix verification + planner/plan.md # Planning context + + templates/ # Reference templates + lib/ # Shell library functions + skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage) + contracts/ # Sprint contracts (generated by /loop-plan) + triage/ # Analysis output (explore mode) + archive/ # Completed feature archives +``` + +## Design Principles + +- **Fresh context per iteration** — no accumulated hallucination drift +- **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism +- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop +- **Structured handoffs via artifacts** — not conversation memory +- **No git revert on rejection** — next generator sees partial work + feedback (more signal) +- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration + +## Credits + +- [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern +- [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design diff --git a/config.json.example b/config.json.example new file mode 100644 index 0000000..d80a37c --- /dev/null +++ b/config.json.example @@ -0,0 +1,26 @@ +{ + "tool": "claude", + "mode": "implement", + "maxIterations": 20, + "skipEval": false, + "evalRetries": 2, + "autoHooks": true, + "branchPrefix": "loop/", + "scopeBudgets": { + "explore": { + "maxFilesToRead": 15, + "maxLinesToWrite": 0, + "maxFilesToModify": 0 + }, + "implement": { + "maxFilesToRead": 50, + "maxLinesToWrite": 500, + "maxFilesToModify": 10 + }, + "fix": { + "maxFilesToRead": 30, + "maxLinesToWrite": 200, + "maxFilesToModify": 5 + } + } +} diff --git a/init.sh.example b/init.sh.example new file mode 100644 index 0000000..12aed55 --- /dev/null +++ b/init.sh.example @@ -0,0 +1,56 @@ +#!/bin/bash +# Project-specific initialization for the agent loop. +# Copy this to .loop/init.sh and customize for your project. +# +# This script runs at the start of each loop.sh invocation to ensure +# the development environment is ready. Keep it idempotent (safe to run multiple times). + +set -euo pipefail + +echo "[init] Setting up development environment..." + +# --- Dependencies --- +# Uncomment and adapt for your project: + +# Node.js +# if [ -f package.json ]; then +# npm install --silent +# fi + +# Python +# if [ -f requirements.txt ]; then +# pip install -q -r requirements.txt +# fi + +# Go +# if [ -f go.mod ]; then +# go mod download +# fi + +# Rust +# if [ -f Cargo.toml ]; then +# cargo build --quiet +# fi + +# --- Dev Server --- +# Start if not already running: + +# if ! lsof -i :3000 &>/dev/null; then +# npm run dev & +# sleep 3 +# fi + +# --- Database --- +# Run migrations if needed: + +# npm run migrate +# python manage.py migrate +# alembic upgrade head + +# --- Verify --- +# Quick smoke test: + +# npm run typecheck +# npm run test -- --run --silent + +echo "[init] Environment ready." diff --git a/install.sh b/install.sh new file mode 100755 index 0000000..e189de4 --- /dev/null +++ b/install.sh @@ -0,0 +1,108 @@ +#!/bin/bash +# Install Agent Loop globally for Claude Code. +# +# What this does: +# 1. Copies the harness to ~/.claude/loop/ (prompts, templates, lib, loop.sh) +# 2. Installs skills as Claude Code commands at ~/.claude/commands/ +# +# After install, use /loop-init in any project to get started. +# +# Usage: +# ./install.sh # Install +# ./install.sh --uninstall # Remove + +set -euo pipefail + +CLAUDE_DIR="$HOME/.claude" +HARNESS_DIR="$CLAUDE_DIR/loop" +COMMANDS_DIR="$CLAUDE_DIR/commands" +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" + +SKILLS=(loop-init loop-plan loop-run loop-triage) + +# --- Colors (if terminal supports them) --- +if [ -t 1 ]; then + GREEN='\033[0;32m' + YELLOW='\033[0;33m' + RED='\033[0;31m' + BOLD='\033[1m' + RESET='\033[0m' +else + GREEN='' YELLOW='' RED='' BOLD='' RESET='' +fi + +info() { echo -e "${GREEN}[loop]${RESET} $*"; } +warn() { echo -e "${YELLOW}[loop]${RESET} $*"; } +error() { echo -e "${RED}[loop]${RESET} $*"; } + +# --- Uninstall --- +if [[ "${1:-}" == "--uninstall" ]]; then + info "Uninstalling Agent Loop..." + + if [ -d "$HARNESS_DIR" ]; then + rm -rf "$HARNESS_DIR" + info "Removed $HARNESS_DIR" + fi + + for skill in "${SKILLS[@]}"; do + cmd="$COMMANDS_DIR/$skill.md" + if [ -f "$cmd" ]; then + rm -f "$cmd" + info "Removed $cmd" + fi + done + + info "Done. Per-project .loop/ directories are untouched." + exit 0 +fi + +# --- Install --- +info "Installing Agent Loop..." + +# Ensure ~/.claude/ exists +mkdir -p "$CLAUDE_DIR" + +# Copy harness (prompts, templates, lib, loop.sh, config example) +if [ -d "$HARNESS_DIR" ]; then + warn "Updating existing install at $HARNESS_DIR" + rm -rf "$HARNESS_DIR" +fi + +mkdir -p "$HARNESS_DIR" +cp -r "$SCRIPT_DIR/prompts" "$HARNESS_DIR/" +cp -r "$SCRIPT_DIR/templates" "$HARNESS_DIR/" +cp -r "$SCRIPT_DIR/lib" "$HARNESS_DIR/" +cp -r "$SCRIPT_DIR/skills" "$HARNESS_DIR/" +cp "$SCRIPT_DIR/loop.sh" "$HARNESS_DIR/" +cp "$SCRIPT_DIR/config.json.example" "$HARNESS_DIR/" +cp "$SCRIPT_DIR/init.sh.example" "$HARNESS_DIR/" +chmod +x "$HARNESS_DIR/loop.sh" + +info "Harness installed to $HARNESS_DIR" + +# Install Claude Code commands +mkdir -p "$COMMANDS_DIR" + +for skill in "${SKILLS[@]}"; do + src="$HARNESS_DIR/skills/$skill/SKILL.md" + dest="$COMMANDS_DIR/$skill.md" + + if [ -f "$src" ]; then + cp "$src" "$dest" + info "Installed /$skill command" + else + warn "Skill not found: $src (skipping)" + fi +done + +echo "" +info "${BOLD}Installation complete.${RESET}" +echo "" +echo " Next steps (inside Claude Code, in any project):" +echo "" +echo " /loop-init # Set up the loop for your project" +echo " /loop-plan # Generate PRD and sprint contracts" +echo " /loop-run # Run the loop interactively" +echo "" +echo " Or run headless: .loop/loop.sh" +echo "" diff --git a/lib/archive.sh b/lib/archive.sh new file mode 100644 index 0000000..02e522e --- /dev/null +++ b/lib/archive.sh @@ -0,0 +1,83 @@ +#!/bin/bash +# Branch archiving — archives previous run artifacts when the branch changes. +# Preserves prd.json, progress.md, and contracts from the previous feature. +# +# Design: At the end of each run, snapshot_for_archive saves current artifacts +# to .archive-staging/. On the next run, if the branch changed, check_archive +# moves the snapshot to archive/ and cleans up. This avoids archiving the +# WRONG artifacts (the new feature's) when prd.json has already been overwritten. + +LAST_BRANCH_FILE="$LOOP_DIR/.last-branch" +STAGING_DIR="$LOOP_DIR/.archive-staging" + +# Snapshot current artifacts so they can be archived later if the branch changes. +# Call this at the END of a successful run or before exit. +snapshot_for_archive() { + rm -rf "$STAGING_DIR" + mkdir -p "$STAGING_DIR" + + [ -f "$LOOP_DIR/prd.json" ] && cp "$LOOP_DIR/prd.json" "$STAGING_DIR/" + [ -f "$LOOP_DIR/progress.md" ] && cp "$LOOP_DIR/progress.md" "$STAGING_DIR/" + [ -d "$LOOP_DIR/contracts" ] && cp -r "$LOOP_DIR/contracts" "$STAGING_DIR/" +} + +# Check if we need to archive and do so if branch changed. +# Reads the NEW branch from live prd.json and the OLD branch from the staging +# snapshot (which was saved at the end of the previous run). This avoids the +# bug where both branches read from the same (already-overwritten) prd.json. +check_archive() { + local current_branch + current_branch=$(prd_branch_name 2>/dev/null) + [ -z "$current_branch" ] && return + + # Determine the previous branch from the staging snapshot (most reliable) + # or fall back to .last-branch file + local last_branch="" + if [ -f "$STAGING_DIR/prd.json" ]; then + if command -v jq &>/dev/null; then + last_branch=$(jq -r '.branchName // empty' "$STAGING_DIR/prd.json" 2>/dev/null) + else + last_branch=$(LOOP_PRD="$STAGING_DIR/prd.json" python3 -c " +import json, os +print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='') +" 2>/dev/null) + fi + fi + [ -z "$last_branch" ] && [ -f "$LAST_BRANCH_FILE" ] && last_branch=$(cat "$LAST_BRANCH_FILE") + + if [ -n "$last_branch" ] && [ "$last_branch" != "$current_branch" ]; then + archive_run "$last_branch" + fi + + echo "$current_branch" > "$LAST_BRANCH_FILE" +} + +# Archive the previous run's staged artifacts (NOT current prd.json) +archive_run() { + local branch_name="$1" + local feature_name + feature_name=$(echo "$branch_name" | sed 's|.*/||') + + local archive_dir="$LOOP_DIR/archive/$(date +%Y-%m-%d)-${feature_name}" + mkdir -p "$archive_dir" + + if [ -d "$STAGING_DIR" ]; then + # Use the staged snapshot (correct artifacts from the previous run) + cp -r "$STAGING_DIR"/* "$archive_dir/" 2>/dev/null || true + rm -rf "$STAGING_DIR" + else + # Fallback: no snapshot exists (first run or upgrade from old version). + # Current artifacts may belong to the new feature — archive what we have + # but warn the user. + log "WARNING: No archive snapshot found. Archiving current artifacts (may be from new feature)." + [ -f "$LOOP_DIR/prd.json" ] && cp "$LOOP_DIR/prd.json" "$archive_dir/" + [ -f "$LOOP_DIR/progress.md" ] && cp "$LOOP_DIR/progress.md" "$archive_dir/" + [ -d "$LOOP_DIR/contracts" ] && cp -r "$LOOP_DIR/contracts" "$archive_dir/" + fi + + # Clean up old run's artifacts (progress.md, contracts — NOT prd.json which belongs to new feature) + rm -f "$LOOP_DIR/progress.md" + rm -rf "$LOOP_DIR/contracts" + + log "Archived previous run to $archive_dir" +} diff --git a/lib/hooks.sh b/lib/hooks.sh new file mode 100644 index 0000000..0a1d875 --- /dev/null +++ b/lib/hooks.sh @@ -0,0 +1,19 @@ +#!/bin/bash +# Stop hook management for Claude Code loop continuation. +# +# NOTE: Hooks are currently no-ops. The loop uses `claude --print` (non-interactive), +# which runs to completion and exits naturally — no Stop hook is needed to signal +# iteration boundaries. The install/remove interface is preserved so that a future +# interactive mode can be added without changing loop.sh's call sites. +# +# If interactive mode is added, the hook mechanism will need redesign: `kill -INT $PPID` +# targets the hook runner's parent (Claude Code), not loop.sh. A sentinel-file or +# named-pipe approach would be more reliable. + +install_hooks() { + : # no-op — see note above +} + +remove_hooks() { + : # no-op — see note above +} diff --git a/lib/prompt.sh b/lib/prompt.sh new file mode 100644 index 0000000..e9feb78 --- /dev/null +++ b/lib/prompt.sh @@ -0,0 +1,95 @@ +#!/bin/bash +# Prompt assembly — composes the final prompt from base + mode overlay. +# Injects runtime variables (scope budgets, current story, iteration count). + +# Build the complete prompt for a given agent role and mode. +# Usage: build_prompt "generator" "implement" +# build_prompt "evaluator" "implement" +build_prompt() { + local role="$1" # generator | evaluator + local mode="$2" # implement | explore | fix + + local base_file="$LOOP_DIR/prompts/${role}/_base.md" + local mode_file="$LOOP_DIR/prompts/${role}/${mode}.md" + + local prompt="" + + # Start with base prompt + if [ -f "$base_file" ]; then + prompt=$(cat "$base_file") + else + log "WARNING: Missing base prompt: $base_file" + return 1 + fi + + # Append mode-specific overlay + if [ -f "$mode_file" ]; then + prompt="${prompt} + +--- + +$(cat "$mode_file")" + else + log "WARNING: Missing mode prompt: $mode_file" + fi + + # Inject runtime variables + prompt=$(inject_variables "$prompt" "$mode") + + printf '%s\n' "$prompt" +} + +# Replace template variables in prompt text +inject_variables() { + local text="$1" + local mode="$2" + + # Scope budgets from config + local max_read max_write max_modify + max_read=$(get_config_value ".scopeBudgets.${mode}.maxFilesToRead" "50") + max_write=$(get_config_value ".scopeBudgets.${mode}.maxLinesToWrite" "500") + max_modify=$(get_config_value ".scopeBudgets.${mode}.maxFilesToModify" "10") + + text="${text//\{\{MAX_FILES_TO_READ\}\}/$max_read}" + text="${text//\{\{MAX_LINES_TO_WRITE\}\}/$max_write}" + text="${text//\{\{MAX_FILES_TO_MODIFY\}\}/$max_modify}" + text="${text//\{\{MODE\}\}/$mode}" + text="${text//\{\{ITERATION\}\}/$ITERATION}" + text="${text//\{\{MAX_ITERATIONS\}\}/$MAX_ITERATIONS}" + text="${text//\{\{LOOP_DIR\}\}/$LOOP_DIR}" + text="${text//\{\{PROJECT_ROOT\}\}/$PROJECT_ROOT}" + text="${text//\{\{CURRENT_STORY_ID\}\}/${CURRENT_STORY_ID:-unknown}}" + text="${text//\{\{PRE_GENERATOR_SHA\}\}/${PRE_GENERATOR_SHA:-HEAD~1}}" + + printf '%s\n' "$text" +} + +# Read a value from config.json with a default fallback +get_config_value() { + local path="$1" + local default="$2" + local config="$LOOP_DIR/config.json" + + [ -f "$config" ] || { echo "$default"; return; } + + if command -v jq &>/dev/null; then + local val + val=$(jq -r "$path // empty" "$config" 2>/dev/null) + echo "${val:-$default}" + else + LOOP_CONFIG="$config" LOOP_PATH="$path" LOOP_DEFAULT="$default" python3 -c " +import json, os +d = json.load(open(os.environ['LOOP_CONFIG'])) +keys = os.environ['LOOP_PATH'].lstrip('.').split('.') +for k in keys: + d = d.get(k) if isinstance(d, dict) else None + if d is None: + break +val = d if d is not None and d != {} else os.environ['LOOP_DEFAULT'] +# Normalize Python booleans to lowercase for shell compatibility +if isinstance(val, bool): + val = str(val).lower() +print(val, end='') +" + fi +} diff --git a/lib/state.sh b/lib/state.sh new file mode 100644 index 0000000..713fca8 --- /dev/null +++ b/lib/state.sh @@ -0,0 +1,359 @@ +#!/bin/bash +# State management for prd.json and progress.md. +# Provides functions to query story status, update pass/fail, and append progress. + +# Requires: jq (preferred) or python3 (fallback) + +# --- PRD Validation --- + +validate_prd() { + local prd="$LOOP_DIR/prd.json" + [ -f "$prd" ] || return 0 # no prd.json is handled elsewhere + + if command -v jq &>/dev/null; then + if ! jq -e '.userStories | type == "array" and length > 0' "$prd" >/dev/null 2>&1; then + log "ERROR: prd.json is missing or has no userStories array" + exit 1 + fi + else + LOOP_PRD="$prd" python3 -c " +import json, sys, os +d = json.load(open(os.environ['LOOP_PRD'])) +stories = d.get('userStories', []) +if not isinstance(stories, list) or len(stories) == 0: + print('[loop] ERROR: prd.json is missing or has no userStories array', file=sys.stderr) + sys.exit(1) +" + fi +} + +# --- PRD Queries --- + +# Get the ID of the highest-priority incomplete story (skips blocked stories) +next_story_id() { + local prd="$LOOP_DIR/prd.json" + [ -f "$prd" ] || return 1 + + if command -v jq &>/dev/null; then + jq -r '[.userStories[] | select(.passes == false and .blocked != true)] | sort_by(.priority // 999) | .[0].id // empty' "$prd" + else + LOOP_PRD="$prd" python3 -c " +import json, os +stories = json.load(open(os.environ['LOOP_PRD']))['userStories'] +pending = sorted([s for s in stories if not s['passes'] and not s.get('blocked')], key=lambda s: s.get('priority', 999)) +print(pending[0]['id'] if pending else '', end='') +" + fi +} + +# Check if all actionable stories are done (passed or blocked) +all_stories_pass() { + local prd="$LOOP_DIR/prd.json" + [ -f "$prd" ] || return 1 + + if command -v jq &>/dev/null; then + local actionable + actionable=$(jq '[.userStories[] | select(.passes == false and .blocked != true)] | length' "$prd") + [ "$actionable" -eq 0 ] + else + LOOP_PRD="$prd" python3 -c " +import json, sys, os +stories = json.load(open(os.environ['LOOP_PRD']))['userStories'] +actionable = [s for s in stories if not s['passes'] and not s.get('blocked')] +sys.exit(0 if len(actionable) == 0 else 1) +" + fi +} + +# Check if any stories are blocked +any_stories_blocked() { + local prd="$LOOP_DIR/prd.json" + [ -f "$prd" ] || return 1 + + if command -v jq &>/dev/null; then + local blocked + blocked=$(jq '[.userStories[] | select(.blocked == true)] | length' "$prd") + [ "$blocked" -gt 0 ] + else + LOOP_PRD="$prd" python3 -c " +import json, sys, os +stories = json.load(open(os.environ['LOOP_PRD']))['userStories'] +blocked = [s for s in stories if s.get('blocked')] +sys.exit(0 if len(blocked) > 0 else 1) +" + fi +} + +# Get total and completed story counts +story_counts() { + local prd="$LOOP_DIR/prd.json" + [ -f "$prd" ] || { echo "0/0"; return; } + + if command -v jq &>/dev/null; then + local total passed + total=$(jq '.userStories | length' "$prd") + passed=$(jq '[.userStories[] | select(.passes == true)] | length' "$prd") + echo "${passed}/${total}" + else + LOOP_PRD="$prd" python3 -c " +import json, os +stories = json.load(open(os.environ['LOOP_PRD']))['userStories'] +passed = sum(1 for s in stories if s['passes']) +print(f'{passed}/{len(stories)}', end='') +" + fi +} + +# --- PRD Mutations --- + +# Mark a story as passed +mark_story_pass() { + local story_id="$1" + local prd="$LOOP_DIR/prd.json" + + if command -v jq &>/dev/null; then + local updated + updated=$(jq --arg id "$story_id" \ + 'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)).passes = true else error("Story not found: \($id)") end' \ + "$prd" 2>&1) || { log "WARNING: mark_story_pass failed for '$story_id'"; return 1; } + printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd" + else + LOOP_STORY_ID="$story_id" LOOP_PRD="$prd" python3 -c " +import json, pathlib, os, sys +p = pathlib.Path(os.environ['LOOP_PRD']) +d = json.loads(p.read_text()) +story_id = os.environ['LOOP_STORY_ID'] +found = False +for s in d['userStories']: + if s['id'] == story_id: + s['passes'] = True + found = True + break +if not found: + print(f'[loop] WARNING: mark_story_pass failed for {story_id!r}', file=sys.stderr) + sys.exit(1) +p.write_text(json.dumps(d, indent=2)) +" + fi +} + +# Mark a story as failed with rejection reason +mark_story_reject() { + local story_id="$1" + local reason="$2" + local prd="$LOOP_DIR/prd.json" + + if command -v jq &>/dev/null; then + local updated + updated=$(jq --arg id "$story_id" --arg reason "$reason" \ + 'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)) |= (.passes = false | .rejections = ((.rejections // 0) + 1) | .notes = ((.notes // "") + "\n[REJECTED] " + $reason)) else error("Story not found: \($id)") end' \ + "$prd" 2>&1) || { log "WARNING: mark_story_reject failed for '$story_id'"; return 1; } + printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd" + else + # Pass reason via env var to avoid shell injection from evaluator output + LOOP_STORY_ID="$story_id" LOOP_REASON="$reason" LOOP_PRD="$prd" python3 -c " +import json, pathlib, os +p = pathlib.Path(os.environ['LOOP_PRD']) +d = json.loads(p.read_text()) +story_id = os.environ['LOOP_STORY_ID'] +reason = os.environ['LOOP_REASON'] +for s in d['userStories']: + if s['id'] == story_id: + s['passes'] = False + s['rejections'] = s.get('rejections', 0) + 1 + s['notes'] = s.get('notes', '') + '\n[REJECTED] ' + reason + break +p.write_text(json.dumps(d, indent=2)) +" + fi +} + +# Get rejection count for a story +story_rejections() { + local story_id="$1" + local prd="$LOOP_DIR/prd.json" + + if command -v jq &>/dev/null; then + jq -r --arg id "$story_id" \ + '.userStories[] | select(.id == $id) | .rejections // 0' "$prd" + else + LOOP_PRD="$prd" LOOP_STORY_ID="$story_id" python3 -c " +import json, os +stories = json.load(open(os.environ['LOOP_PRD']))['userStories'] +story_id = os.environ['LOOP_STORY_ID'] +for s in stories: + if s['id'] == story_id: + print(s.get('rejections', 0), end='') + break +" + fi +} + +# Mark a story as blocked (needs human review, skip in future iterations) +mark_story_blocked() { + local story_id="$1" + local reason="$2" + local prd="$LOOP_DIR/prd.json" + + if command -v jq &>/dev/null; then + local updated + updated=$(jq --arg id "$story_id" --arg reason "$reason" \ + 'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)) |= (.blocked = true | .notes = ((.notes // "") + "\n[BLOCKED] " + $reason)) else error("Story not found: \($id)") end' \ + "$prd" 2>&1) || { log "WARNING: mark_story_blocked failed for '$story_id'"; return 1; } + printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd" + else + LOOP_STORY_ID="$story_id" LOOP_REASON="$reason" LOOP_PRD="$prd" python3 -c " +import json, pathlib, os +p = pathlib.Path(os.environ['LOOP_PRD']) +d = json.loads(p.read_text()) +story_id = os.environ['LOOP_STORY_ID'] +reason = os.environ['LOOP_REASON'] +for s in d['userStories']: + if s['id'] == story_id: + s['blocked'] = True + s['notes'] = s.get('notes', '') + '\n[BLOCKED] ' + reason + break +p.write_text(json.dumps(d, indent=2)) +" + fi +} + +# --- Progress --- + +MAX_PROGRESS_ENTRIES=15 + +# Append a progress entry, rotating old entries to archive when limit is reached +append_progress() { + local entry="$1" + local progress="$LOOP_DIR/progress.md" + + if [ ! -f "$progress" ]; then + cp "$LOOP_DIR/templates/progress.md.template" "$progress" 2>/dev/null || \ + printf "# Progress\n\n## Codebase Patterns\n\n---\n\n## Session Log\n" > "$progress" + fi + + printf "\n%s\n" "$entry" >> "$progress" + + rotate_progress +} + +# Archive old session log entries to keep progress.md from growing unbounded. +# Preserves the Codebase Patterns section and keeps only the last N entries. +rotate_progress() { + local progress="$LOOP_DIR/progress.md" + [ -f "$progress" ] || return + + # Count session entries by counting "### " headers after the Session Log marker. + # Using headers instead of "---" separators avoids false positives from markdown + # code blocks or horizontal rules inside entries. + local entry_count + local session_start + session_start=$(grep -n '## Session Log' "$progress" | head -1 | cut -d: -f1) + if [ -z "$session_start" ]; then + return + fi + entry_count=$(tail -n +"$session_start" "$progress" | grep -c '^### ' 2>/dev/null || echo "0") + + if [ "$entry_count" -le "$MAX_PROGRESS_ENTRIES" ]; then + return + fi + + local archive="$LOOP_DIR/progress-archive.md" + + if command -v python3 &>/dev/null; then + LOOP_PROGRESS="$progress" LOOP_ARCHIVE="$archive" \ + LOOP_MAX_ENTRIES="$MAX_PROGRESS_ENTRIES" python3 -c " +import pathlib, os + +progress = pathlib.Path(os.environ['LOOP_PROGRESS']) +archive = pathlib.Path(os.environ['LOOP_ARCHIVE']) +max_entries = int(os.environ['LOOP_MAX_ENTRIES']) + +text = progress.read_text() + +# Split at 'Session Log' header +if '## Session Log' not in text: + exit(0) + +header, session_log = text.split('## Session Log', 1) + +# Split entries by '---' separator +# parts[0] is the preamble between '## Session Log' and the first '---' +parts = session_log.split('\n---\n') +preamble = parts[0] +entries = parts[1:] + +if len(entries) <= max_entries: + exit(0) + +# Keep last max_entries, archive the rest +to_archive = entries[:-max_entries] +to_keep = entries[-max_entries:] + +# Append archived entries +existing_archive = archive.read_text() if archive.exists() else '# Progress Archive\n' +existing_archive += '\n---\n'.join(to_archive) +archive.write_text(existing_archive) + +# Rewrite progress with header + preamble + kept entries +progress.write_text(header + '## Session Log' + preamble + '\n---\n' + '\n---\n'.join(to_keep)) +" + else + # Bash fallback: rotate session log entries with archiving. + # Uses awk to split on "### " entry headers for accurate counting + # (avoids false positives from "---" separators inside entries). + local session_start + session_start=$(grep -n '## Session Log' "$progress" | head -1 | cut -d: -f1) + [ -z "$session_start" ] && return + + # Extract header (everything up to and including "## Session Log" line) + local header_content + header_content=$(head -n "$session_start" "$progress") + + # Extract session content and split into entries by "### " headers + local session_content + session_content=$(tail -n +"$((session_start + 1))" "$progress") + + # Count entries by "### " headers + local entry_count + entry_count=$(echo "$session_content" | grep -c '^### ' 2>/dev/null || echo "0") + [ "$entry_count" -le "$MAX_PROGRESS_ENTRIES" ] && return + + # Find the line number (within session_content) of the Nth-from-last "### " header + local keep_from + keep_from=$(echo "$session_content" | grep -n '^### ' | tail -n "$MAX_PROGRESS_ENTRIES" | head -1 | cut -d: -f1) + [ -z "$keep_from" ] && return + + # Archive older entries + local to_archive + to_archive=$(echo "$session_content" | head -n "$((keep_from - 1))") + if [ -n "$to_archive" ]; then + if [ -f "$archive" ]; then + printf '\n%s' "$to_archive" >> "$archive" + else + printf '# Progress Archive\n\n%s\n' "$to_archive" > "$archive" + fi + fi + + # Keep recent entries + local kept_content + kept_content=$(echo "$session_content" | tail -n +"$keep_from") + printf '%s\n\n%s\n' "$header_content" "$kept_content" > "${progress}.tmp" \ + && mv "${progress}.tmp" "$progress" + fi +} + +# Get the branch name from prd.json +prd_branch_name() { + local prd="$LOOP_DIR/prd.json" + [ -f "$prd" ] || return 1 + + if command -v jq &>/dev/null; then + jq -r '.branchName // empty' "$prd" + else + LOOP_PRD="$prd" python3 -c " +import json, os +print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='') +" + fi +} diff --git a/loop.sh b/loop.sh new file mode 100755 index 0000000..744e11e --- /dev/null +++ b/loop.sh @@ -0,0 +1,403 @@ +#!/bin/bash +# Autonomous AI agent loop orchestrator +# Combines generator-evaluator architecture with iterative context-reset pattern. +# +# Usage: +# ./loop.sh [options] +# +# Options: +# --mode Operating mode (default: from config.json) +# --max Maximum iterations (default: from config.json) +# --skip-eval Skip evaluator pass +# --tool AI tool to use (default: from config.json) +# --no-hooks Don't install stop hooks +# --dry-run Print assembled prompts without running agents +# --resume Skip already-passed stories (explicit mode) +# --replan (reserved — not yet implemented) +# +# Each iteration: +# 1. Generator: picks highest-priority incomplete story, does the work +# 2. Evaluator: verifies the work, can PASS or REJECT +# Both get fresh context windows. Loop continues until all stories pass or max iterations. + +set -euo pipefail + +# --- Exit codes --- +EXIT_OK=0 # All stories complete +EXIT_ERROR=1 # Configuration or runtime error +EXIT_MAX_ITERATIONS=2 # Max iterations reached, work remains +EXIT_ALL_BLOCKED=3 # All remaining stories blocked for human review + +# --- Resolve paths --- +LOOP_DIR="$(cd "$(dirname "$0")" && pwd)" +PROJECT_ROOT="$(cd "$LOOP_DIR/.." && pwd)" +export LOOP_DIR PROJECT_ROOT + +# --- Lockfile (prevent concurrent runs) --- +LOCKFILE="$LOOP_DIR/.loop.lock" + +acquire_lock() { + # mkdir is atomic on POSIX — prevents race between check and create + if ! mkdir "$LOCKFILE" 2>/dev/null; then + local old_pid + old_pid=$(cat "$LOCKFILE/pid" 2>/dev/null) + if [ -n "$old_pid" ] && kill -0 "$old_pid" 2>/dev/null; then + echo "[loop] ERROR: Another loop instance is running (PID $old_pid)." + echo "[loop] If this is stale, remove $LOCKFILE and retry." + exit 1 + fi + # Stale lockfile — previous run crashed without cleanup + rm -rf "$LOCKFILE" + mkdir "$LOCKFILE" + fi + echo $$ > "$LOCKFILE/pid" +} + +release_lock() { + rm -rf "$LOCKFILE" +} + +acquire_lock + +# --- Source libraries --- +source "$LOOP_DIR/lib/hooks.sh" +source "$LOOP_DIR/lib/state.sh" +source "$LOOP_DIR/lib/archive.sh" +source "$LOOP_DIR/lib/prompt.sh" + +# --- Logging --- +log() { echo "[loop] $*"; } +log_header() { + echo "" + echo "═══════════════════════════════════════════════════════" + echo " $*" + echo "═══════════════════════════════════════════════════════" + echo "" +} + +# --- Preflight checks --- +if ! command -v jq &>/dev/null && ! command -v python3 &>/dev/null; then + log "ERROR: Either jq or python3 is required. Install one and retry." + exit 1 +fi + +# --- Load config defaults --- +CONFIG_FILE="$LOOP_DIR/config.json" +config_default() { get_config_value "$1" "$2"; } + +TOOL=$(config_default ".tool" "claude") +MODE=$(config_default ".mode" "implement") +MAX_ITERATIONS=$(config_default ".maxIterations" "20") +SKIP_EVAL=$(config_default ".skipEval" "false") +EVAL_RETRIES=$(config_default ".evalRetries" "2") +AUTO_HOOKS=$(config_default ".autoHooks" "true") +DRY_RUN=false +RESUME=false +# --- Parse CLI args (override config) --- +while [[ $# -gt 0 ]]; do + case $1 in + --mode) MODE="$2"; shift 2 ;; + --mode=*) MODE="${1#*=}"; shift ;; + --max) MAX_ITERATIONS="$2"; shift 2 ;; + --max=*) MAX_ITERATIONS="${1#*=}"; shift ;; + --skip-eval) SKIP_EVAL=true; shift ;; + --tool) TOOL="$2"; shift 2 ;; + --tool=*) TOOL="${1#*=}"; shift ;; + --no-hooks) AUTO_HOOKS=false; shift ;; + --dry-run) DRY_RUN=true; shift ;; + --resume) RESUME=true; shift ;; + --replan) log "ERROR: --replan is not yet implemented. Use /loop-plan interactively."; exit 1 ;; + [0-9]*) MAX_ITERATIONS="$1"; shift ;; + *) log "Unknown option: $1"; exit 1 ;; + esac +done + +export ITERATION=0 MAX_ITERATIONS MODE + +# --- Validate --- +if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then + log "ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix" + exit 1 +fi + +if [[ ! "$TOOL" =~ ^(claude|amp)$ ]]; then + log "ERROR: Invalid tool '$TOOL'. Must be: claude, amp" + exit 1 +fi + +# --- Setup --- +cd "$PROJECT_ROOT" + +cleanup() { + [ -n "${LOOP_AGENT_TMPFILE:-}" ] && rm -f "$LOOP_AGENT_TMPFILE" + [ "$AUTO_HOOKS" = true ] && remove_hooks + release_lock +} +LOOP_AGENT_TMPFILE="" + +if [ "$AUTO_HOOKS" = true ]; then + install_hooks +fi +trap cleanup EXIT INT TERM + +check_archive + +# Validate prd.json exists (AFTER archive check, which may delete it on branch change) +if [ ! -f "$LOOP_DIR/prd.json" ]; then + log "ERROR: No prd.json found. Run /loop-plan first to create one." + exit 1 +fi + +validate_prd + +# Run project init script if it exists +if [ -f "$LOOP_DIR/init.sh" ]; then + log "Running init.sh..." + bash "$LOOP_DIR/init.sh" +fi + +# Ensure correct git branch +BRANCH=$(prd_branch_name 2>/dev/null || echo "") +if [ -n "$BRANCH" ]; then + CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "") + if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then + log "Switching to branch: $BRANCH" + git checkout "$BRANCH" 2>/dev/null || \ + git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \ + git checkout -b "$BRANCH" + fi +fi + +# --- Agent runner --- +# Runs a prompt through the selected AI tool and captures output. +# Output is displayed live via tee to /dev/tty (if available) and captured to a temp file. +# The function prints the captured output to stdout for the caller to capture. +run_agent() { + local prompt="$1" + local output_file + output_file=$(mktemp) + LOOP_AGENT_TMPFILE="$output_file" # exposed for trap cleanup + + # Determine whether we can display live output + local has_tty=false + if { true > /dev/tty; } 2>/dev/null; then + has_tty=true + fi + + # Run in subshell so a non-zero exit from the AI tool doesn't kill the loop. + # The subshell inherits set -e but its exit status is captured, not propagated. + local agent_exit=0 + ( + case "$TOOL" in + claude) + if [ "$has_tty" = true ]; then + printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \ + claude --dangerously-skip-permissions --output-format text \ + --print 2>&1 | tee /dev/tty > "$output_file" + else + printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \ + claude --dangerously-skip-permissions --output-format text \ + --print 2>&1 > "$output_file" + fi + ;; + amp) + if [ "$has_tty" = true ]; then + printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \ + amp --dangerously-allow-all 2>&1 | tee /dev/tty > "$output_file" + else + printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \ + amp --dangerously-allow-all 2>&1 > "$output_file" + fi + ;; + *) + log "ERROR: Unknown tool '$TOOL'" + exit 1 + ;; + esac + ) || agent_exit=$? + + if [ "$agent_exit" -ne 0 ] && [ ! -s "$output_file" ]; then + log "WARNING: Agent exited with code $agent_exit and produced no output." + fi + + cat "$output_file" + rm -f "$output_file" + LOOP_AGENT_TMPFILE="" +} + +# --- Parse evaluator verdict --- +parse_verdict() { + local output="$1" + + if echo "$output" | grep -q "REJECT"; then + # Extract rejection reason (supports multiline) + local reason + reason=$(echo "$output" | sed -n '//,/<\/rejection_reason>/p' \ + | sed '1s/.*//' | sed '$s/<\/rejection_reason>.*//' \ + | tr '\n' ' ' | sed 's/ */ /g' | sed 's/^ //;s/ $//') + [ -z "$reason" ] && reason="Rejected without specific reason" + echo "REJECT:${reason}" + elif echo "$output" | grep -q "PASS"; then + echo "PASS" + else + # No explicit verdict — fail-safe: treat as reject so broken evaluators don't silently approve + log "WARNING: No verdict tag found in evaluator output. Treating as REJECT (fail-safe)." + echo "REJECT:Evaluator produced no verdict tag — output may be malformed" + fi +} + +# --- Main loop --- +log_header "Loop Starting" +log "Mode: $MODE" +log "Tool: $TOOL" +log "Max iter: $MAX_ITERATIONS" +log "Eval: $([[ $SKIP_EVAL == true ]] && echo 'off' || echo 'on')" +log "Dry run: $([[ $DRY_RUN == true ]] && echo 'yes' || echo 'no')" +log "Project: $PROJECT_ROOT" +log "Stories: $(story_counts 2>/dev/null || echo 'N/A')" +echo "" + +while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do + ITERATION=$((ITERATION + 1)) + export ITERATION + + # Check if all stories already pass + if all_stories_pass 2>/dev/null; then + log_header "All Stories Complete! ($(story_counts))" + snapshot_for_archive + exit 0 + fi + + # Capture which story the generator will work on (highest-priority incomplete) + CURRENT_STORY_ID=$(next_story_id 2>/dev/null || echo "") + export CURRENT_STORY_ID + + # No actionable story — all remaining are passed or blocked + if [ -z "$CURRENT_STORY_ID" ]; then + if [ "$RESUME" = true ]; then + log "Resume mode: no actionable stories remaining." + else + log "No actionable stories remaining (all passed or blocked)." + fi + snapshot_for_archive + if any_stories_blocked 2>/dev/null; then + log "Some stories are blocked and need human review. Run /loop-triage for details." + exit $EXIT_ALL_BLOCKED + fi + exit $EXIT_OK + fi + + # Capture git state before generator runs (for evaluator diff) + PRE_GENERATOR_SHA=$(git rev-parse HEAD 2>/dev/null || echo "") + export PRE_GENERATOR_SHA + + # --- Generator pass --- + log_header "Iteration $ITERATION / $MAX_ITERATIONS — GENERATOR${CURRENT_STORY_ID:+ ($CURRENT_STORY_ID)}" + + GENERATOR_PROMPT=$(build_prompt "generator" "$MODE") + + # --dry-run: print prompts and exit without running agents + if [ "$DRY_RUN" = true ]; then + log "=== GENERATOR PROMPT ===" + printf '%s\n' "$GENERATOR_PROMPT" + echo "" + if [ "$SKIP_EVAL" != true ] && [ -n "$CURRENT_STORY_ID" ]; then + EVAL_PROMPT=$(build_prompt "evaluator" "$MODE") + log "=== EVALUATOR PROMPT ===" + printf '%s\n' "$EVAL_PROMPT" + fi + log "Dry run complete. Showing prompts for story: ${CURRENT_STORY_ID:-unknown}" + exit 0 + fi + + GENERATOR_OUTPUT=$(run_agent "$GENERATOR_PROMPT") + + if [ -z "$GENERATOR_OUTPUT" ]; then + log "WARNING: Generator produced empty output (timeout or crash). Skipping to next iteration." + continue + fi + + # --- Scope budget check --- + # Verify the generator stayed within configured limits (files modified, lines written). + # Advisory in implement/fix modes (log warning), but enforced as rejection reason for evaluator. + if [ -n "$PRE_GENERATOR_SHA" ] && [ "$PRE_GENERATOR_SHA" != "" ]; then + SCOPE_FILES_MODIFIED=$(git diff --name-only "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | wc -l | tr -d ' ') + SCOPE_LINES_WRITTEN=$(git diff --stat "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0") + + MAX_MODIFY=$(config_default ".scopeBudgets.${MODE}.maxFilesToModify" "10") + MAX_WRITE=$(config_default ".scopeBudgets.${MODE}.maxLinesToWrite" "500") + + if [ "${SCOPE_FILES_MODIFIED:-0}" -gt "$MAX_MODIFY" ]; then + log "WARNING: Scope budget exceeded — modified $SCOPE_FILES_MODIFIED files (limit: $MAX_MODIFY)" + fi + if [ "${SCOPE_LINES_WRITTEN:-0}" -gt "$MAX_WRITE" ]; then + log "WARNING: Scope budget exceeded — wrote $SCOPE_LINES_WRITTEN lines (limit: $MAX_WRITE)" + fi + + export SCOPE_FILES_MODIFIED SCOPE_LINES_WRITTEN + fi + + # Check for completion sentinel + if echo "$GENERATOR_OUTPUT" | grep -q "COMPLETE"; then + log_header "Generator signaled COMPLETE ($(story_counts))" + snapshot_for_archive + exit 0 + fi + + # --- Evaluator pass --- + if [ "$SKIP_EVAL" != true ]; then + log_header "Iteration $ITERATION / $MAX_ITERATIONS — EVALUATOR${CURRENT_STORY_ID:+ ($CURRENT_STORY_ID)}" + + if [ -z "$CURRENT_STORY_ID" ]; then + log "WARNING: No actionable story ID found. Skipping evaluator." + continue + fi + + EVAL_PROMPT=$(build_prompt "evaluator" "$MODE") + EVAL_OUTPUT=$(run_agent "$EVAL_PROMPT") + + if [ -z "$EVAL_OUTPUT" ]; then + log "WARNING: Evaluator produced empty output (timeout or crash). Treating as REJECT." + EVAL_OUTPUT="REJECTEvaluator produced no output" + fi + + VERDICT=$(parse_verdict "$EVAL_OUTPUT") + + case "$VERDICT" in + PASS) + log "Evaluator: PASS" + if [ -n "$CURRENT_STORY_ID" ]; then + mark_story_pass "$CURRENT_STORY_ID" + fi + ;; + REJECT:*) + REASON="${VERDICT#REJECT:}" + log "Evaluator: REJECT — $REASON" + + if [ -n "$CURRENT_STORY_ID" ]; then + mark_story_reject "$CURRENT_STORY_ID" "$REASON" + + # Check retry limit — block story to prevent infinite retries + REJECTIONS=$(story_rejections "$CURRENT_STORY_ID") + REJECTIONS="${REJECTIONS:-0}" + if [ "$REJECTIONS" -ge "$EVAL_RETRIES" ]; then + log "WARNING: Story $CURRENT_STORY_ID rejected $REJECTIONS times (limit: $EVAL_RETRIES). Blocking for human review." + mark_story_blocked "$CURRENT_STORY_ID" "Rejected $REJECTIONS times. Last: $REASON" + append_progress "### BLOCKED: $CURRENT_STORY_ID + +Rejected $REJECTIONS times. Needs human review. Last reason: $REASON + +---" + fi + fi + ;; + esac + fi +done + +# --- Max iterations reached --- +log_header "Max Iterations Reached ($MAX_ITERATIONS)" +log "Stories completed: $(story_counts)" +log "Run /loop-triage to generate a handoff brief." +snapshot_for_archive +exit $EXIT_MAX_ITERATIONS diff --git a/prompts/evaluator/_base.md b/prompts/evaluator/_base.md new file mode 100644 index 0000000..9ab7165 --- /dev/null +++ b/prompts/evaluator/_base.md @@ -0,0 +1,92 @@ +You are an Evaluator agent in an autonomous agent loop. Your job is to VERIFY work done by a Generator agent. You are skeptical by default. + +## Bias Correction (READ THIS CAREFULLY) + +You (Claude) have well-documented tendencies that make you a poor QA agent by default: +- You **assume code works** if it looks reasonable +- You **accept "close enough"** implementations +- You **rationalize away** edge cases and missing pieces +- You **prioritize politeness** over accuracy + +**OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence. + +**Rejection is normal and healthy.** Rejecting 30-50% of generator iterations is expected. If you're passing everything, you are not being skeptical enough. + +## Your Target + +Evaluate story **`{{CURRENT_STORY_ID}}`**. This is the story the generator just worked on. + +## Evaluation Process + +1. **Read `.loop/prd.json`** — find story `{{CURRENT_STORY_ID}}` and its acceptance criteria +2. **Read the sprint contract** at `.loop/contracts/{{CURRENT_STORY_ID}}.contract.md` (if it exists) +3. **Read `.loop/progress.md`** — check the latest session log entry for what the generator claims to have done +4. **Examine the actual changes:** + - Run `git diff {{PRE_GENERATOR_SHA}}..HEAD` to see ALL changes the generator made + - Read the modified files IN FULL (not just the diff) to understand context +5. **For EACH acceptance criterion in prd.json**, independently verify: + - Does the code ACTUALLY satisfy this criterion? + - Not "does it look like it might" — does it ACTUALLY? +6. **Run quality checks yourself:** + - Typecheck (if applicable) + - Tests (if applicable) + - Lint (if applicable) +7. **Check for regressions:** + - Did the changes break anything that was working before? + - Did the generator modify files outside the story's scope? +8. **Check for anti-patterns:** + - Placeholder or stub implementations disguised as complete + - Hardcoded values that should be configurable + - Missing error handling at system boundaries + - Security issues (hardcoded secrets, unsanitized input, SQL injection) + +## Verdict Format + +You MUST end your response with EXACTLY ONE of these verdict blocks: + +### If the story genuinely passes all criteria: + +``` +PASS +``` + +### If any criterion is not met or issues are found: + +``` +REJECT + +[Specific, actionable description of what failed and why. +Include file paths and line numbers. +Be concrete — "the function doesn't handle null input" not "there might be edge cases".] + +``` + +## What Warrants Rejection + +- ANY acceptance criterion not actually met (not "mostly met" — MET) +- Tests fail +- Typecheck fails +- Placeholder/stub code left in place +- Security vulnerability introduced +- Regression in existing functionality +- Contract's Done Conditions not satisfied (if contract exists) + +## What Does NOT Warrant Rejection + +- Code style preferences (as long as it matches project conventions) +- Minor naming choices +- Missing optimization that wasn't in the criteria +- Absence of features not in the story scope + +## Scope Budget + +- Maximum files to read: {{MAX_FILES_TO_READ}} +- Focus your verification on the files the generator changed +- You do NOT need to read the entire codebase + +## Current State + +- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}} +- Mode: {{MODE}} +- Project root: {{PROJECT_ROOT}} +- Loop directory: {{LOOP_DIR}} diff --git a/prompts/evaluator/explore.md b/prompts/evaluator/explore.md new file mode 100644 index 0000000..ef6ec4a --- /dev/null +++ b/prompts/evaluator/explore.md @@ -0,0 +1,49 @@ +# Mode: Explore — Evaluator + +You are evaluating an analysis/exploration task. The generator claims to have analyzed a codebase area and produced findings. + +## Read-Only Enforcement (CHECK FIRST) + +Before any other checks, verify explore mode's read-only constraint: +1. Run `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only` +2. If ANY file outside `.loop/triage/` was modified or committed, **REJECT immediately** — explore mode is read-only. The generator must not modify host project files. + +## Exploration-Specific Checks + +1. **Read the analysis output** at `.loop/triage/{story-id}-analysis.md` +2. **Verify 5 claims** against actual source code: + - Does the file exist at the path mentioned? + - Does the code behave as described? + - Are the line counts roughly accurate? + - Are the "Issues Found" real issues or false alarms? + - Are the recommendations actionable? +3. **Check for omissions:** + - Did the generator miss obvious files in the area? + - Are there important code paths not covered? + - Are there recent git commits that change the analysis? + +## Claim Verification Format + +Before giving your verdict, document what you checked: + +``` +Claims Verified: +- [CONFIRMED] [claim] — verified in [file:line] +- [INCORRECT] [claim] — actual behavior is [what you found] +- [UNVERIFIABLE] [claim] — could not confirm (file missing, ambiguous) +``` + +## Grading Criteria + +- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed) +- **Completeness**: Did it cover the important parts of the area? +- **Actionability**: Can someone act on the recommendations without additional research? + +## Rejection Criteria + +Reject if: +- Fewer than 4 of 5 verified claims are accurate +- The analysis references files that don't exist +- Key files in the area were completely missed +- Recommendations are vague ("improve error handling") rather than specific ("add null check in auth.ts:42") +- The analysis appears to be based on assumptions rather than code reading diff --git a/prompts/evaluator/fix.md b/prompts/evaluator/fix.md new file mode 100644 index 0000000..565f81b --- /dev/null +++ b/prompts/evaluator/fix.md @@ -0,0 +1,34 @@ +# Mode: Fix — Evaluator + +You are evaluating a bug fix or tech debt reduction. The generator claims to have fixed an issue. + +## Fix-Specific Checks + +1. **Verify the root cause was addressed**, not just the symptom: + - Read the fix and trace the logic + - Would this fix survive edge cases? + - Did the generator patch around the bug or fix the actual cause? + +2. **Verify a regression test exists:** + - Is there a new or updated test? + - Does the test actually reproduce the original bug scenario? + - Would the test fail if the fix were reverted? + +3. **Check for regressions (CRITICAL for fix mode):** + - Run the full test suite, not just the new test + - Check that the fix doesn't change behavior for non-bug cases + - Look for side effects in shared code paths + +4. **Verify minimal diff:** + - Did the generator change only what was necessary? + - Are there unrelated changes mixed in? + - Is the refactor scope proportional to the debt item? + +## Rejection Criteria (Fix-Specific) + +- Fix addresses symptom but not root cause +- No regression test added +- Existing tests fail after the fix +- Unrelated changes included in the commit +- Fix introduces a new bug or security issue +- For refactors: external behavior changed (API contract, return values, side effects) diff --git a/prompts/evaluator/implement.md b/prompts/evaluator/implement.md new file mode 100644 index 0000000..c911f87 --- /dev/null +++ b/prompts/evaluator/implement.md @@ -0,0 +1,31 @@ +# Mode: Implement — Evaluator + +You are evaluating an implementation story. The generator claims to have built a feature. + +## Implementation-Specific Checks + +In addition to the base evaluation process: + +1. **Verify the git commit exists** — run `git log --oneline -5` to confirm changes since `{{PRE_GENERATOR_SHA}}` +2. **Check commit scope** — does `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only` only contain files relevant to this story? +3. **Read the actual test output** — if the generator claims tests pass, verify by running them yourself +4. **For UI stories:** + - Check that the component actually renders (not just that it exists) + - Verify event handlers are wired up (not just defined) + - Check accessibility basics (labels, semantic elements) +5. **For API stories:** + - Verify the endpoint is registered in the router + - Check request/response types match the contract + - Verify error handling returns appropriate status codes +6. **For database stories:** + - Verify migration runs cleanly + - Check indexes are created for query patterns + - Verify foreign key constraints + +## Common Generator Failures to Watch For + +- Created the file but didn't wire it into the application (route not registered, component not imported) +- Tests exist but don't actually assert meaningful behavior +- "Passes typecheck" but only because types are `any` or too loose +- UI component renders but doesn't respond to interaction +- API endpoint exists but returns hardcoded/mock data diff --git a/prompts/generator/_base.md b/prompts/generator/_base.md new file mode 100644 index 0000000..6181efe --- /dev/null +++ b/prompts/generator/_base.md @@ -0,0 +1,68 @@ +You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance of you runs each iteration — you have no memory of previous iterations except what's written in artifacts. + +## Startup Sequence + +1. **Read `.loop/progress.md`** — check the **Codebase Patterns** section first (top of file), then skim recent session log entries for context +2. **Read `.loop/prd.json`** — find the highest-priority story where `passes: false` +3. **Read the sprint contract** for that story at `.loop/contracts/{story-id}.contract.md` (if it exists) +4. **Check the story's `notes` field** — if it contains `[REJECTED]` entries, those are feedback from a previous evaluator. Address the specific issues raised. +5. **Confirm the git branch** — the loop has already checked you out on the correct branch per `prd.json.branchName`. Run `git branch --show-current` to verify if needed. + +## Work Rules + +- **ONE story per iteration.** Do not attempt multiple stories. +- **Read before writing.** Understand existing code before modifying it. Search for existing implementations before creating new ones. +- **Follow existing patterns.** Check Codebase Patterns in progress.md. Match the project's style, naming, and structure. +- **No placeholders.** Every implementation must be complete and functional. If a story is too large, stop and note what remains — do NOT leave stub/placeholder code. +- **Commit after completing the story.** Message format: `feat: [Story ID] - [Story Title]` + +## Quality Gates + +Before marking a story as complete: +- Run the project's type checker (if applicable) +- Run the project's test suite (if applicable) +- Run the project's linter (if applicable) +- All must pass. If they fail, fix the issues before committing. + +## After Completing the Story + +1. **Update `.loop/prd.json`** — set `passes: true` for the completed story (the harness also sets this on evaluator PASS as a safety net, but you should still do it) +2. **Append to `.loop/progress.md`** with this format: + +``` +### [Story ID] — [Story Title] +Date: YYYY-MM-DD HH:MM + +**What was done:** +- Bullet points of changes made + +**Files changed:** +- path/to/file.ext — brief description + +**Learnings for future iterations:** +- Patterns discovered, gotchas encountered, useful context + +--- +``` + +3. **Update Codebase Patterns** (top of progress.md) if you discovered a reusable pattern +4. **Update AGENTS.md/CLAUDE.md** in modified directories if you discovered genuinely reusable knowledge (API conventions, non-obvious requirements, testing approaches) + +## Completion Signal + +- If ALL stories in prd.json have `passes: true`, respond with: `COMPLETE` +- Otherwise, end your response normally. The next iteration will pick up the next story. + +## Scope Budget + +- Maximum files to read: {{MAX_FILES_TO_READ}} +- Maximum lines to write: {{MAX_LINES_TO_WRITE}} +- Maximum files to modify: {{MAX_FILES_TO_MODIFY}} +- If you approach a limit, stop and note what remains in progress.md. + +## Current State + +- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}} +- Mode: {{MODE}} +- Project root: {{PROJECT_ROOT}} +- Loop directory: {{LOOP_DIR}} diff --git a/prompts/generator/explore.md b/prompts/generator/explore.md new file mode 100644 index 0000000..34ab239 --- /dev/null +++ b/prompts/generator/explore.md @@ -0,0 +1,62 @@ +# Mode: Explore (Read-Only) + +You are analyzing an existing codebase to build understanding. You are NOT writing code. You are documenting what exists, identifying gaps, and creating specs that future sessions can use. + +## Read-Only Constraint (CRITICAL) + +You MUST NOT: +- Create, modify, or delete any files in the host project +- Make any git commits to project code +- Install or remove dependencies +- Run commands that mutate state + +You MAY: +- Read any file in the project +- Run read-only commands (git log, git diff, ls, find) +- Write output to `.loop/triage/` directory only + +## Exploration Workflow + +1. Read the story from prd.json — it describes what area to analyze +2. Read the relevant source code (not existing docs — verify against code) +3. Write your findings to `.loop/triage/{story-id}-analysis.md` +4. Mark the story as `passes: true` in prd.json +5. Append to progress.md + +## Analysis Output Format + +Write to `.loop/triage/{story-id}-analysis.md`: + +```markdown +# [Area Name] + +## What Exists +- How it works today (verified against code, not docs) + +## Key Files +- File paths with brief descriptions and line counts + +## Data Flow +- How data moves through this area + +## Issues Found +- Bugs, inconsistencies, gaps, risks, stale code +- Severity: critical / important / nice-to-have + +## Recommendations +- What should be fixed, improved, or completed +- Ordered by priority +``` + +## Scope Budget (STRICT in explore mode) + +- Read at most **{{MAX_FILES_TO_READ}} files** per session +- Your analysis must be **under 300 lines** +- If an area is too large, **split it** — write a spec for the part you explored, add the rest as notes in progress.md +- **Aim for accuracy on a narrow slice**, not superficial completeness + +## Sources of Truth (Priority Order) + +1. **The code itself** — always verify against source +2. **Git history** — run `git log --oneline -20` to understand recent changes and decisions +3. **Existing docs** — treat as potentially stale hints. Note contradictions in your analysis. diff --git a/prompts/generator/fix.md b/prompts/generator/fix.md new file mode 100644 index 0000000..e89d7c1 --- /dev/null +++ b/prompts/generator/fix.md @@ -0,0 +1,26 @@ +# Mode: Fix + +You are fixing bugs or reducing tech debt from a prioritized list. Each story is a targeted fix. + +## Fix Workflow + +1. Read the story — it describes the specific bug or debt item +2. Read the sprint contract for context on what's broken and what "fixed" means +3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists. +4. Make the minimal change to fix the issue +5. Write or update a test that would have caught this bug +6. Run quality gates +7. Commit + +## Constraints + +- **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations. +- **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions. +- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md. +- **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve. + +## Git Workflow + +- Commit message format: `fix: [Story ID] - [Story Title]` +- For tech debt: `refactor: [Story ID] - [Story Title]` +- Stage only the files you changed diff --git a/prompts/generator/implement.md b/prompts/generator/implement.md new file mode 100644 index 0000000..8ce0ef5 --- /dev/null +++ b/prompts/generator/implement.md @@ -0,0 +1,37 @@ +# Mode: Implement + +You are building features from a PRD. Each story is a small, self-contained unit of work. + +## Implementation Workflow + +1. Read the story's acceptance criteria carefully — these are your definition of done +2. If a sprint contract exists, follow its **Done Conditions** exactly +3. Plan your approach before writing code: + - What files need to change? + - What existing code can you reuse? + - What's the minimal change to satisfy the criteria? +4. Implement the story +5. Run quality gates (typecheck, lint, test) +6. Commit with a descriptive message +7. Mark the story as passed + +## Constraints + +- **Minimal changes only.** Do not refactor surrounding code. Do not add features beyond the story scope. +- **Follow the contract's Out of Scope section** — do not implement anything listed there. +- **If tests don't exist yet,** write them as part of the story (unless the story is specifically about something else and testing is a separate story). +- **If you need a dependency,** install it and note it in progress.md so future iterations know. + +## Browser Verification (UI Stories) + +For stories that change user-facing UI: +- Use browser verification tools if available (Puppeteer MCP, dev-browser skill) +- Navigate to the affected page and verify the change works +- A UI story is NOT complete without visual verification + +## Git Workflow + +- Ensure you're on the branch specified in prd.json +- Stage only the files you changed (not `git add .`) +- Commit message: `feat: [Story ID] - [Story Title]` +- Do NOT push — the loop handles that diff --git a/prompts/planner/plan.md b/prompts/planner/plan.md new file mode 100644 index 0000000..e2fc3ed --- /dev/null +++ b/prompts/planner/plan.md @@ -0,0 +1,42 @@ +# Planner Context + +This file is loaded by the `/loop-plan` skill to provide additional context for PRD generation. + +## Story Decomposition Guidelines + +When breaking a feature into stories, think about: + +### Independence +Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet. + +### Context Window Fit +A story must fit in a single AI context window (~100K tokens). This means: +- Reading relevant existing code +- Understanding the task +- Implementing the change +- Writing tests +- Running quality checks +- Committing + +Budget roughly: +- 30% of context for reading/understanding +- 40% for implementation +- 20% for testing and quality +- 10% for bookkeeping (prd.json, progress.md) + +### Failure Isolation +If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy. + +### Evaluability +Every story must have criteria the evaluator can independently verify. "The code is clean" is not evaluable. "The function returns 404 when the user doesn't exist" is evaluable. + +## PRD Anti-Patterns + +Avoid these common mistakes: + +- **Stories too large:** "Build the API" — split into individual endpoints +- **Stories too small:** "Create the file" — combine with meaningful work in that file +- **Vague criteria:** "Works correctly" — what does correctly mean? Be specific. +- **Missing dependencies:** Story 5 needs Story 3's database table but doesn't say so +- **Testing as afterthought:** Tests should be part of each story, not a separate "add tests" story at the end +- **UI without backend:** A UI story that calls an API that doesn't exist yet diff --git a/skills/loop-init/SKILL.md b/skills/loop-init/SKILL.md new file mode 100644 index 0000000..12f1e1d --- /dev/null +++ b/skills/loop-init/SKILL.md @@ -0,0 +1,141 @@ +--- +name: init +description: Initialize the agent loop harness in the current project. Scaffolds .loop/ directory, detects tech stack, picks mode, generates config, and flows into planning. +--- + +# /init — Initialize Agent Loop for a Project + +Set up the agent loop harness in the current project. This is the entry point for first-time use. + +## What This Skill Does + +1. Scaffolds the `.loop/` directory with prompts, templates, and lib scripts from the plugin +2. Analyzes the project to understand its tech stack, structure, and conventions +3. Asks the user what they want to accomplish (explore, implement, or fix) +4. Creates project-specific configuration (`config.json`, `init.sh`) +5. Flows into planning to generate the PRD and sprint contracts + +## Instructions + +When the user invokes this skill, follow this sequence: + +### Step 0: Scaffold .loop/ Directory + +Check if `.loop/` already exists in the project root. + +**If it does NOT exist**, create it by copying from the plugin: + +1. The plugin's root directory is available at `${CLAUDE_PLUGIN_ROOT}`. Copy the harness files: + +```bash +mkdir -p .loop +cp -r "${CLAUDE_PLUGIN_ROOT}/prompts" .loop/ +cp -r "${CLAUDE_PLUGIN_ROOT}/templates" .loop/ +cp -r "${CLAUDE_PLUGIN_ROOT}/lib" .loop/ +cp "${CLAUDE_PLUGIN_ROOT}/loop.sh" .loop/ +chmod +x .loop/loop.sh +``` + +**IMPORTANT:** If `${CLAUDE_PLUGIN_ROOT}` is not set or the path doesn't exist, look for the files in the plugin's own directory structure. The prompts, templates, and lib directories are bundled with this plugin. + +2. Create `.loop/.gitignore` with runtime artifacts: + +``` +prd.json +progress.md +progress-archive.md +config.json +init.sh +contracts/ +triage/ +archive/ +.archive-staging/ +.last-branch +.loop.lock +``` + +**If `.loop/` already exists**, ask the user if they want to re-initialize (which resets config but preserves prd.json/progress.md if they exist). + +### Step 1: Project Discovery + +Read the project to understand what we're working with: +- Check for `CLAUDE.md`, `AGENTS.md`, `README.md` at the project root +- Check for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Package.swift`, `composer.json` to identify the tech stack +- Run `ls` on the project root to see the top-level structure + +Present a brief summary: +> "I see this is a [language/framework] project with [key characteristics]. The main source is in [dir/]." + +### Step 2: Mode Selection + +Ask the user: + +> **What would you like to do?** +> +> a) **Explore** — Analyze the codebase to understand what exists, find issues, and document the system. No code changes. +> b) **Implement** — Build a new feature from a PRD. Code changes, commits, and tests. +> c) **Fix** — Work through a list of bugs or tech debt items. Targeted code changes. + +### Step 3: Clarifying Questions + +Based on the mode, ask 3-5 questions: + +**For Explore:** +- "What areas are you most interested in? (e.g., auth, database, API, frontend, everything)" +- "Are there known problem areas you want me to focus on?" +- "How many exploration sessions should I budget? (default: 20)" + +**For Implement:** +- "Describe the feature you want to build (1-3 sentences is fine)" +- "Are there any architectural constraints I should know about?" +- "Should I follow any specific patterns from the existing codebase?" + +**For Fix:** +- "Do you have a list of issues, or should I find them?" +- "Any areas that are off-limits for changes?" +- "What's the priority: security, stability, or code quality?" + +### Step 4: Generate Configuration + +Create `.loop/config.json` based on the project and user's answers: + +```json +{ + "tool": "claude", + "mode": "", + "maxIterations": , + "skipEval": false, + "evalRetries": 2, + "autoHooks": true, + "branchPrefix": "loop/", + "scopeBudgets": { + // Set based on project size and mode + } +} +``` + +Create `.loop/init.sh` with project-specific setup commands: +- Dev server startup (if applicable) +- Test runner command +- Type checker command +- Linter command +- Any environment setup needed + +Make `init.sh` executable. + +### Step 5: Flow into Planning + +Tell the user: +> "Project configured. Now let's plan the work." + +Then invoke the `/agent-loop:plan` skill to generate the PRD and sprint contracts. + +### Step 6: Ready to Run + +Once planning is complete, tell the user: +> "Everything is set up. To start the loop:" +> ``` +> /agent-loop:run # Interactive (recommended) — visible, can intervene +> .loop/loop.sh # Headless — fully autonomous +> ``` +> You can monitor progress in `.loop/progress.md` and check story status in `.loop/prd.json`. diff --git a/skills/loop-plan/SKILL.md b/skills/loop-plan/SKILL.md new file mode 100644 index 0000000..7fccb0f --- /dev/null +++ b/skills/loop-plan/SKILL.md @@ -0,0 +1,188 @@ +--- +name: plan +description: Interactive planning session that generates PRD (prd.json) and sprint contracts for the agent loop. Run /agent-loop:init first. +--- + +# /plan — Generate PRD and Sprint Contracts + +Interactive planning session that produces all artifacts needed for the autonomous agent loop. + +## Prerequisites + +- `.loop/` directory must exist with `config.json` (run `/agent-loop:init` first if not) +- User should have a clear idea of what they want to build/explore/fix + +## Usage + +``` +/loop-plan +``` + +Examples: +- `/loop-plan Add OAuth authentication with Google and GitHub` +- `/loop-plan Explore the payment processing system` +- `/loop-plan Fix all critical security issues from the audit` + +## Instructions + +### Step 1: Understand the Request + +If the user provided a feature description, use it. Otherwise ask: +> "What would you like to work on? Describe it in 1-3 sentences." + +### Step 2: Codebase Analysis + +Read key project files to understand existing patterns: +- Relevant source directories for the feature +- Existing tests to understand testing patterns +- Configuration files for conventions +- Recent git history (`git log --oneline -20`) for active work + +### Step 3: Clarifying Questions + +Ask 3-5 targeted questions based on what you found in the code. These should be questions where the answer isn't obvious from the codebase. Examples: + +- "I see you have both REST endpoints and GraphQL. Should this feature use REST or GraphQL?" +- "The existing auth uses JWT. Should I add OAuth alongside it or replace it?" +- "I found two competing patterns for data validation. Which should I follow?" + +**Do NOT ask questions you can answer from the code.** Only ask when human judgment is needed. + +### Step 4: Generate PRD (`prd.json`) + +Create `.loop/prd.json` with properly-sized, dependency-ordered stories. + +**Story Sizing Rules (CRITICAL):** +- Each story must be completable in ONE context window (~100K tokens of work) +- Target: 1-3 files changed per story +- Too big: "Build the authentication system" → split into migration, endpoint, middleware, UI, tests +- Too small: "Add import statement" → combine with the story that needs it + +**Dependency Ordering:** +1. Schema/database changes first (they block everything) +2. Backend logic (depends on schema) +3. Frontend components (depend on backend) +4. Integration/wiring (depends on components) +5. Polish/edge cases (depends on core being done) + +**Required Fields Per Story:** +```json +{ + "id": "US-001", + "title": "Short descriptive title", + "description": "As a [role], I want [feature] so that [benefit].", + "acceptanceCriteria": [ + "Specific, verifiable criterion", + "Another criterion", + "Typecheck passes" + ], + "priority": 1, + "passes": false, + "notes": "", + "rejections": 0 +} +``` + +**Acceptance Criteria Rules:** +- Every criterion must be independently verifiable (not "works well" — "returns 200 with valid token") +- Always include "Typecheck passes" (or equivalent for the language) +- UI stories must include "Verify UI renders and responds to interaction" +- API stories must include status code expectations +- Database stories must include migration success check + +### Step 5: Generate Sprint Contracts + +For each story, create `.loop/contracts/{story-id}.contract.md`: + +```markdown +# Sprint Contract: {Story ID} — {Story Title} + +## What Will Be Built +Concrete description of the deliverable. Not the user story — the actual thing being built. + +## Done Conditions +- [ ] Condition 1 (specific, testable) +- [ ] Condition 2 +- [ ] All acceptance criteria from prd.json met + +## Evaluation Criteria +What the evaluator will specifically check: +- [ ] Check 1 +- [ ] Check 2 +- [ ] No regressions in [specific area] + +## Out of Scope +Things explicitly NOT part of this story: +- Thing 1 +- Thing 2 + +## Key Files +Files likely to be created or modified: +- path/to/file.ext — what changes +- path/to/other.ext — what changes + +## Dependencies +- Depends on: [story IDs that must be done first, or "none"] +- Blocks: [story IDs that depend on this one, or "none"] +``` + +### Step 6: Initialize Progress File + +Create `.loop/progress.md` from the template with an initial Codebase Patterns section populated from what you learned during analysis: + +```markdown +# Progress + +## Codebase Patterns + +- [Pattern you discovered during analysis] +- [Convention you noticed] +- [Testing approach used in the project] + +--- + +## Session Log + +### Planning Session +Date: YYYY-MM-DD HH:MM + +**PRD created:** {N} stories for "{feature description}" +**Estimated iterations:** {N stories + ~30% for evaluator rejections} +**Key decisions:** +- [Decision 1 and why] +- [Decision 2 and why] + +--- +``` + +### Step 7: Present Summary + +Show the user a summary: + +> **Plan Ready** +> +> | Stories | Est. Iterations | Mode | Branch | +> |---------|----------------|------|--------| +> | {N} | {N+30%} | {mode} | {branchName} | +> +> **Story Overview:** +> 1. US-001: {title} (priority 1) +> 2. US-002: {title} (priority 2) +> ... +> +> Review the stories in `.loop/prd.json` and contracts in `.loop/contracts/`. +> Adjust anything you'd like, then run: +> ``` +> /agent-loop:run # Interactive (recommended) +> .loop/loop.sh # Headless +> ``` + +### Step 8: Wait for Feedback + +Let the user review and adjust. They might: +- Ask to split a story further +- Ask to reorder priorities +- Ask to add/remove stories +- Ask to change acceptance criteria + +Make the requested changes, then re-present the summary. diff --git a/skills/loop-run/SKILL.md b/skills/loop-run/SKILL.md new file mode 100644 index 0000000..275c81f --- /dev/null +++ b/skills/loop-run/SKILL.md @@ -0,0 +1,203 @@ +--- +name: run +description: Execute the generator-evaluator loop interactively inside Claude Code. Dispatches subagents with full visibility and intervention capability. Run /agent-loop:init and /agent-loop:plan first. +--- + +# /run — Execute Agent Loop Inside Claude Code + +Run the generator-evaluator loop natively in Claude Code using subagents. Unlike `loop.sh` (headless), this gives you full visibility into each agent's work and the ability to intervene at any point. + +## Usage + +``` +/agent-loop:run # Run until all stories pass or max iterations +/agent-loop:run 3 # Run at most 3 iterations +/agent-loop:run --skip-eval # Skip evaluator (generator marks stories done) +/agent-loop:run --story US-003 # Run only a specific story +``` + +## Prerequisites + +- `.loop/config.json` exists (run `/agent-loop:init` first) +- `.loop/prd.json` exists with stories (run `/agent-loop:plan` first) + +## Instructions + +When the user invokes `/loop-run`, follow this orchestration sequence exactly. + +### Step 0: Parse Arguments + +- If a number is provided, use it as max iterations. Otherwise read `maxIterations` from `.loop/config.json`. +- If `--skip-eval` is provided, skip the evaluator pass. +- If `--story ` is provided, only work on that specific story. + +### Step 1: Load State + +1. Read `.loop/config.json` — get `mode`, `maxIterations`, `evalRetries`, `scopeBudgets` +2. Read `.loop/prd.json` — get the story list and their statuses +3. Check `.loop/progress.md` exists; if not, create it from `.loop/templates/progress.md.template` + +Report to the user: + +> **Loop Ready** +> - Mode: {mode} +> - Stories: {passed}/{total} complete +> - Max iterations: {N} +> - Eval: {on/off} +> +> Starting loop. You can interrupt me at any time to adjust course. + +### Step 2: Iteration Loop + +For each iteration (1 to max iterations): + +#### 2a. Find Next Story + +Find the highest-priority story in `prd.json` where `passes` is `false` and `blocked` is not `true`. If `--story` was specified, use that story instead. + +**If no actionable story remains:** +- If all stories have `passes: true` → report success and stop +- If some stories are `blocked: true` → report which are blocked and suggest `/agent-loop:triage` +- Stop the loop + +#### 2b. Report Iteration Start + +Tell the user: +> **Iteration {N}/{max} — {story.id}: {story.title}** + +If the story has `[REJECTED]` entries in its `notes` field, summarize the previous feedback so the user has context. + +#### 2c. Assemble Generator Prompt + +Read these files and concatenate them with `---` separators: +1. `.loop/prompts/generator/_base.md` +2. `.loop/prompts/generator/{mode}.md` + +Then substitute these template variables in the assembled text: +- `{{MAX_FILES_TO_READ}}` → from `config.scopeBudgets.{mode}.maxFilesToRead` +- `{{MAX_LINES_TO_WRITE}}` → from `config.scopeBudgets.{mode}.maxLinesToWrite` +- `{{MAX_FILES_TO_MODIFY}}` → from `config.scopeBudgets.{mode}.maxFilesToModify` +- `{{MODE}}` → the mode +- `{{ITERATION}}` → current iteration number +- `{{MAX_ITERATIONS}}` → max iterations +- `{{LOOP_DIR}}` → path to `.loop/` directory +- `{{PROJECT_ROOT}}` → project root path +- `{{CURRENT_STORY_ID}}` → the story ID being worked on + +#### 2d. Capture Pre-Generator Git State + +Run `git rev-parse HEAD` and save it. This is needed for the evaluator's diff. + +#### 2e. Dispatch Generator Agent + +Use the **Agent tool** to launch the generator: + +``` +Agent( + prompt: , + description: "Generator: {story.id}", + subagent_type: "general-purpose", + mode: "auto" +) +``` + +**IMPORTANT:** Use `mode: "auto"` so the user can see tool calls but isn't prompted for every action. If the user has expressed a preference for more control, use `mode: "default"` instead. + +Wait for the agent to complete. The Agent tool returns the generator's final output. + +#### 2f. Check for Completion Signal + +If the generator output contains `COMPLETE`, report all stories complete and stop. + +#### 2g. Skip Evaluator (if configured) + +If `--skip-eval` was specified or `config.skipEval` is true, skip to step 2j. + +#### 2h. Assemble Evaluator Prompt + +Read these files and concatenate them: +1. `.loop/prompts/evaluator/_base.md` +2. `.loop/prompts/evaluator/{mode}.md` + +Substitute the same template variables as the generator, plus: +- `{{PRE_GENERATOR_SHA}}` → the git SHA captured in step 2d +- `{{CURRENT_STORY_ID}}` → the story ID + +#### 2i. Dispatch Evaluator Agent + +Use the **Agent tool** to launch the evaluator: + +``` +Agent( + prompt: , + description: "Evaluator: {story.id}", + subagent_type: "general-purpose", + mode: "auto" +) +``` + +Wait for completion. Parse the verdict from the output: + +- Look for `PASS` → story passes +- Look for `REJECT` → story rejected; extract reason from `...` +- No verdict tag found → treat as REJECT (fail-safe) + +#### 2j. Update State Based on Verdict + +**On PASS (or skip-eval):** +1. Update `.loop/prd.json` — set `passes: true` for the story +2. Report to user: ✓ **{story.id} PASSED** + +**On REJECT:** +1. Update `.loop/prd.json`: + - Keep `passes: false` + - Increment `rejections` count + - Append `[REJECTED] {reason}` to `notes` +2. Report to user: ✗ **{story.id} REJECTED** — {reason} +3. Check if `rejections` >= `evalRetries` from config: + - If yes: set `blocked: true` in prd.json, append `[BLOCKED]` to notes + - Report: ⚠ **{story.id} BLOCKED** — rejected {N} times, needs human review + +#### 2k. Append Progress Entry + +Append to `.loop/progress.md`: + +```markdown +### {story.id} — {story.title} +Date: {current date and time} +Iteration: {N} +Verdict: {PASS/REJECT/SKIP-EVAL} + +--- +``` + +#### 2l. Report Iteration Summary + +Show current story counts: `{passed}/{total} stories complete` + +If there are more iterations and more stories, continue to the next iteration. + +### Step 3: Loop Exit + +When the loop ends (all stories done, max iterations, or all remaining blocked), report: + +> **Loop Complete** +> - Iterations used: {N} +> - Stories: {passed}/{total} complete, {blocked} blocked +> - {Suggest `/agent-loop:triage` if anything is blocked or incomplete} + +### Error Handling + +- If an Agent subagent fails or returns empty output, log a warning and continue to the next iteration. Do NOT stop the loop for a single agent failure. +- If `prd.json` cannot be parsed, stop immediately and report the error. +- If the user interrupts (denies a tool call, says "stop", etc.), gracefully end the loop and report current status. + +### Key Differences from loop.sh + +| Feature | loop.sh | /loop-run | +|---------|---------|-----------| +| Execution | Headless (`claude --print`) | Visible in Claude Code | +| Intervention | Kill the process | Deny tool calls, chat mid-loop | +| Permissions | `--dangerously-skip-permissions` | User-controlled | +| Context | Fresh process per agent | Fresh Agent subagent per agent | +| State updates | Shell functions | Claude Code reads/writes files directly | diff --git a/skills/loop-triage/SKILL.md b/skills/loop-triage/SKILL.md new file mode 100644 index 0000000..3657a28 --- /dev/null +++ b/skills/loop-triage/SKILL.md @@ -0,0 +1,83 @@ +--- +name: triage +description: Generate a human handoff brief summarizing loop status — completed, blocked, and remaining stories with recommended next steps. +--- + +# /triage — Generate Human Handoff Brief + +Generate a triage brief summarizing the current state of a loop run. Use this when: +- The loop hit max iterations without completing +- You want a status check mid-run +- You're handing off to another developer + +## Instructions + +When the user invokes `/loop-triage`: + +### Step 1: Read Current State + +1. Read `.loop/prd.json` — get story statuses +2. Read `.loop/progress.md` — get session log and patterns +3. Read `.loop/config.json` — get mode and iteration settings +4. Check git log for recent commits on the loop branch + +### Step 2: Analyze + +For each story, determine: +- **Complete**: `passes: true`, verified by evaluator +- **In Progress**: `passes: false`, has been attempted (check progress.md for entries) +- **Blocked**: `passes: false`, rejected multiple times (check `rejections` count and `notes`) +- **Not Started**: `passes: false`, no progress.md entries, no rejections + +### Step 3: Generate Brief + +Write to `.loop/triage/TRIAGE_BRIEF.md`: + +```markdown +# Triage Brief + +Generated: {current date and time} +Mode: {mode from config.json} +Branch: {branchName from prd.json} + +## Status Summary + +- **Complete:** {N} stories +- **In Progress:** {N} stories +- **Blocked:** {N} stories (hit retry limit) +- **Not Started:** {N} stories + +## Story Details + +| ID | Title | Status | Rejections | Notes | +|----|-------|--------|------------|-------| +| US-001 | ... | Complete | 0 | | +| US-002 | ... | Blocked | 3 | Evaluator rejected: ... | +| US-003 | ... | Not Started | 0 | | + +## Key Patterns Discovered + +{Copy the Codebase Patterns section from progress.md} + +## Blocked Stories — Analysis + +For each blocked story, summarize: +- What was attempted +- Why it was rejected (from notes field) +- Suggested approach for a human to unblock it + +## Recommended Next Steps + +Based on the current state: +1. {Most important next action} +2. {Second priority} +3. {Third priority} + +## Files Modified + +{List all files changed across all commits on the loop branch, with brief descriptions} +``` + +### Step 4: Present to User + +Show the summary inline and tell the user where the full brief is saved. diff --git a/templates/contract.md.template b/templates/contract.md.template new file mode 100644 index 0000000..a14a72d --- /dev/null +++ b/templates/contract.md.template @@ -0,0 +1,35 @@ +# Sprint Contract: {{STORY_ID}} — {{STORY_TITLE}} + +## What Will Be Built + + + +## Done Conditions + +- [ ] +- [ ] +- [ ] All acceptance criteria from prd.json met +- [ ] Quality gates pass (typecheck, lint, test) + +## Evaluation Criteria + +What the evaluator will specifically check: +- [ ] +- [ ] +- [ ] No regressions in existing functionality + +## Out of Scope + +Things explicitly NOT part of this story: +- +- + +## Key Files + +Files likely to be created or modified: +- + +## Dependencies + +- Depends on: +- Blocks: diff --git a/templates/prd.json.example b/templates/prd.json.example new file mode 100644 index 0000000..1c04b3c --- /dev/null +++ b/templates/prd.json.example @@ -0,0 +1,54 @@ +{ + "project": "MyApp", + "branchName": "loop/add-user-auth", + "description": "Add user authentication with OAuth providers", + "userStories": [ + { + "id": "US-001", + "title": "Add users table with OAuth fields", + "description": "As a developer, I need a users table that stores OAuth provider info so we can persist authenticated users.", + "acceptanceCriteria": [ + "Create users table with id, email, name, oauth_provider, oauth_id, created_at columns", + "Generate and run migration successfully", + "Typecheck passes", + "Unit test for model creation passes" + ], + "priority": 1, + "passes": false, + "notes": "", + "rejections": 0 + }, + { + "id": "US-002", + "title": "Implement OAuth callback endpoint", + "description": "As a user, I want to sign in with Google so I can access my account without creating a password.", + "acceptanceCriteria": [ + "GET /auth/callback accepts OAuth authorization code", + "Exchanges code for access token with provider", + "Creates or updates user record", + "Returns JWT session token", + "Typecheck passes", + "Integration test for OAuth flow passes" + ], + "priority": 2, + "passes": false, + "notes": "", + "rejections": 0 + }, + { + "id": "US-003", + "title": "Add login page with OAuth button", + "description": "As a user, I want a login page with a 'Sign in with Google' button so I can authenticate.", + "acceptanceCriteria": [ + "Login page renders with OAuth button", + "Button redirects to provider authorization URL", + "Typecheck passes", + "Verify UI renders correctly in browser" + ], + "priority": 3, + "passes": false, + "notes": "", + "rejections": 0 + } + ] +} diff --git a/templates/progress.md.template b/templates/progress.md.template new file mode 100644 index 0000000..a200126 --- /dev/null +++ b/templates/progress.md.template @@ -0,0 +1,13 @@ +# Progress + +## Codebase Patterns + + + +--- + +## Session Log + + diff --git a/templates/triage-brief.md.template b/templates/triage-brief.md.template new file mode 100644 index 0000000..ee9508e --- /dev/null +++ b/templates/triage-brief.md.template @@ -0,0 +1,29 @@ +# Triage Brief + +Generated: +Mode: +Iterations completed: of + +## Status + +Stories: of complete + +| ID | Title | Status | Rejections | +|----|-------|--------|------------| + + +## Key Findings + + + +## Blockers + + + +## Recommended Next Steps + + + +## Files of Interest + +