feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for
long-running coding tasks. Ships as a Claude Code plugin — install
with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
This commit is contained in:
2026-03-27 08:03:18 -04:00
commit 17e5eb707f
29 changed files with 2546 additions and 0 deletions

View File

@@ -0,0 +1,14 @@
{
"name": "agent-loop",
"plugins": [
{
"name": "agent-loop",
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.",
"version": "0.1.0",
"source": {
"source": "github",
"repo": "https://git.jagfly.com/sheldon/loop-loop.git"
}
}
]
}

View File

@@ -0,0 +1,11 @@
{
"name": "agent-loop",
"version": "0.1.0",
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan with /agent-loop:init, then execute with /agent-loop:run.",
"author": {
"name": "Sheldon"
},
"repository": "https://git.jagfly.com/sheldon/loop-loop.git",
"license": "MIT",
"keywords": ["agent", "loop", "autonomous", "generator", "evaluator", "harness"]
}

19
.gitignore vendored Normal file
View File

@@ -0,0 +1,19 @@
# Runtime artifacts (generated per-project, not part of the harness)
prd.json
progress.md
progress-archive.md
config.json
init.sh
contracts/
triage/
archive/
.archive-staging/
.last-branch
.loop.lock
# OS
.DS_Store
Thumbs.db
# Claude Code
.claude/

166
README.md Normal file
View File

@@ -0,0 +1,166 @@
# Agent Loop
Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.
Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps).
A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous.
Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention).
## Install
### As a Claude Code Plugin (Recommended)
```
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
/plugin install agent-loop@agent-loop
```
Then in any project:
```
/agent-loop:init # Set up the loop for your project
/agent-loop:plan # Generate PRD and sprint contracts
/agent-loop:run # Run the loop interactively
```
### Manual Install
```bash
# Clone into your project
cp -r /path/to/loop-loop .loop
# Install skills as Claude Code commands
mkdir -p .claude/commands
for skill in loop-init loop-plan loop-run loop-triage; do
ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md"
done
# Then in Claude Code:
/loop-init && /loop-plan && /loop-run
```
## How It Works
```
[You + Claude Code] [Loop Execution]
/agent-loop:init Interactive (/agent-loop:run)
→ scaffolds .loop/ └─ dispatches Agent subagents
→ detects project └─ visible tool calls, can intervene
→ picks mode └─ chat mid-loop to adjust course
→ creates config.json
Headless (.loop/loop.sh)
/agent-loop:plan └─ spawns claude --print per iteration
→ asks clarifying questions └─ fully autonomous, no UI
→ generates prd.json
→ generates sprint contracts Both paths:
→ populates progress.md ├─→ Generator → picks story → implements → commits
├─→ Evaluator → verifies → PASS or REJECT
├─→ next iteration...
└─→ all stories pass → done
```
## Modes
| Mode | What it does | Git writes? |
|------|-------------|-------------|
| **implement** | Build features from a PRD | Yes |
| **explore** | Read-only codebase analysis | No |
| **fix** | Targeted bug fixes / tech debt | Yes |
## Running the Loop
### Option A: Interactive (`/loop-run`) — Recommended
Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop.
```
/loop-run # Run until done or max iterations
/loop-run 3 # Run at most 3 iterations
/loop-run --skip-eval # Skip evaluator pass
/loop-run --story US-003 # Run only a specific story
```
### Option B: Headless (`loop.sh`)
Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI.
```bash
.loop/loop.sh [options]
--mode <implement|explore|fix> Operating mode
--max <N> Maximum iterations (default: 20)
--skip-eval Skip evaluator pass
--tool <claude|amp> AI tool to use
--no-hooks Don't install stop hooks
--dry-run Print assembled prompts without running agents
--resume Skip already-passed stories (explicit exit when none remain)
```
## Architecture
### Generator
Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.
### Evaluator
Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback.
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.
### Sprint Contracts
Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
### State Persistence
| Artifact | Purpose |
|----------|---------|
| `prd.json` | Story status (pass/fail), acceptance criteria |
| `progress.md` | Append-only session log + codebase patterns |
| `contracts/` | Sprint contracts per story |
| `config.json` | Harness configuration |
| Git commits | Code changes with story-tagged messages |
## File Structure
```
.loop/
loop.sh # Main loop orchestrator
config.json # Project config (generated by /loop-init)
init.sh # Project setup script (generated by /loop-init)
prd.json # Active PRD (generated by /loop-plan)
progress.md # Cross-session memory (append-only)
prompts/
generator/_base.md # Shared generator instructions
generator/implement.md # Implement mode overlay
generator/explore.md # Explore mode overlay
generator/fix.md # Fix mode overlay
evaluator/_base.md # Skeptical evaluator base
evaluator/implement.md # Implement verification
evaluator/explore.md # Analysis verification
evaluator/fix.md # Fix verification
planner/plan.md # Planning context
templates/ # Reference templates
lib/ # Shell library functions
skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage)
contracts/ # Sprint contracts (generated by /loop-plan)
triage/ # Analysis output (explore mode)
archive/ # Completed feature archives
```
## Design Principles
- **Fresh context per iteration** — no accumulated hallucination drift
- **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism
- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop
- **Structured handoffs via artifacts** — not conversation memory
- **No git revert on rejection** — next generator sees partial work + feedback (more signal)
- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration
## Credits
- [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern
- [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design

26
config.json.example Normal file
View File

@@ -0,0 +1,26 @@
{
"tool": "claude",
"mode": "implement",
"maxIterations": 20,
"skipEval": false,
"evalRetries": 2,
"autoHooks": true,
"branchPrefix": "loop/",
"scopeBudgets": {
"explore": {
"maxFilesToRead": 15,
"maxLinesToWrite": 0,
"maxFilesToModify": 0
},
"implement": {
"maxFilesToRead": 50,
"maxLinesToWrite": 500,
"maxFilesToModify": 10
},
"fix": {
"maxFilesToRead": 30,
"maxLinesToWrite": 200,
"maxFilesToModify": 5
}
}
}

56
init.sh.example Normal file
View File

@@ -0,0 +1,56 @@
#!/bin/bash
# Project-specific initialization for the agent loop.
# Copy this to .loop/init.sh and customize for your project.
#
# This script runs at the start of each loop.sh invocation to ensure
# the development environment is ready. Keep it idempotent (safe to run multiple times).
set -euo pipefail
echo "[init] Setting up development environment..."
# --- Dependencies ---
# Uncomment and adapt for your project:
# Node.js
# if [ -f package.json ]; then
# npm install --silent
# fi
# Python
# if [ -f requirements.txt ]; then
# pip install -q -r requirements.txt
# fi
# Go
# if [ -f go.mod ]; then
# go mod download
# fi
# Rust
# if [ -f Cargo.toml ]; then
# cargo build --quiet
# fi
# --- Dev Server ---
# Start if not already running:
# if ! lsof -i :3000 &>/dev/null; then
# npm run dev &
# sleep 3
# fi
# --- Database ---
# Run migrations if needed:
# npm run migrate
# python manage.py migrate
# alembic upgrade head
# --- Verify ---
# Quick smoke test:
# npm run typecheck
# npm run test -- --run --silent
echo "[init] Environment ready."

108
install.sh Executable file
View File

@@ -0,0 +1,108 @@
#!/bin/bash
# Install Agent Loop globally for Claude Code.
#
# What this does:
# 1. Copies the harness to ~/.claude/loop/ (prompts, templates, lib, loop.sh)
# 2. Installs skills as Claude Code commands at ~/.claude/commands/
#
# After install, use /loop-init in any project to get started.
#
# Usage:
# ./install.sh # Install
# ./install.sh --uninstall # Remove
set -euo pipefail
CLAUDE_DIR="$HOME/.claude"
HARNESS_DIR="$CLAUDE_DIR/loop"
COMMANDS_DIR="$CLAUDE_DIR/commands"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
SKILLS=(loop-init loop-plan loop-run loop-triage)
# --- Colors (if terminal supports them) ---
if [ -t 1 ]; then
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
RED='\033[0;31m'
BOLD='\033[1m'
RESET='\033[0m'
else
GREEN='' YELLOW='' RED='' BOLD='' RESET=''
fi
info() { echo -e "${GREEN}[loop]${RESET} $*"; }
warn() { echo -e "${YELLOW}[loop]${RESET} $*"; }
error() { echo -e "${RED}[loop]${RESET} $*"; }
# --- Uninstall ---
if [[ "${1:-}" == "--uninstall" ]]; then
info "Uninstalling Agent Loop..."
if [ -d "$HARNESS_DIR" ]; then
rm -rf "$HARNESS_DIR"
info "Removed $HARNESS_DIR"
fi
for skill in "${SKILLS[@]}"; do
cmd="$COMMANDS_DIR/$skill.md"
if [ -f "$cmd" ]; then
rm -f "$cmd"
info "Removed $cmd"
fi
done
info "Done. Per-project .loop/ directories are untouched."
exit 0
fi
# --- Install ---
info "Installing Agent Loop..."
# Ensure ~/.claude/ exists
mkdir -p "$CLAUDE_DIR"
# Copy harness (prompts, templates, lib, loop.sh, config example)
if [ -d "$HARNESS_DIR" ]; then
warn "Updating existing install at $HARNESS_DIR"
rm -rf "$HARNESS_DIR"
fi
mkdir -p "$HARNESS_DIR"
cp -r "$SCRIPT_DIR/prompts" "$HARNESS_DIR/"
cp -r "$SCRIPT_DIR/templates" "$HARNESS_DIR/"
cp -r "$SCRIPT_DIR/lib" "$HARNESS_DIR/"
cp -r "$SCRIPT_DIR/skills" "$HARNESS_DIR/"
cp "$SCRIPT_DIR/loop.sh" "$HARNESS_DIR/"
cp "$SCRIPT_DIR/config.json.example" "$HARNESS_DIR/"
cp "$SCRIPT_DIR/init.sh.example" "$HARNESS_DIR/"
chmod +x "$HARNESS_DIR/loop.sh"
info "Harness installed to $HARNESS_DIR"
# Install Claude Code commands
mkdir -p "$COMMANDS_DIR"
for skill in "${SKILLS[@]}"; do
src="$HARNESS_DIR/skills/$skill/SKILL.md"
dest="$COMMANDS_DIR/$skill.md"
if [ -f "$src" ]; then
cp "$src" "$dest"
info "Installed /$skill command"
else
warn "Skill not found: $src (skipping)"
fi
done
echo ""
info "${BOLD}Installation complete.${RESET}"
echo ""
echo " Next steps (inside Claude Code, in any project):"
echo ""
echo " /loop-init # Set up the loop for your project"
echo " /loop-plan # Generate PRD and sprint contracts"
echo " /loop-run # Run the loop interactively"
echo ""
echo " Or run headless: .loop/loop.sh"
echo ""

83
lib/archive.sh Normal file
View File

@@ -0,0 +1,83 @@
#!/bin/bash
# Branch archiving — archives previous run artifacts when the branch changes.
# Preserves prd.json, progress.md, and contracts from the previous feature.
#
# Design: At the end of each run, snapshot_for_archive saves current artifacts
# to .archive-staging/. On the next run, if the branch changed, check_archive
# moves the snapshot to archive/ and cleans up. This avoids archiving the
# WRONG artifacts (the new feature's) when prd.json has already been overwritten.
LAST_BRANCH_FILE="$LOOP_DIR/.last-branch"
STAGING_DIR="$LOOP_DIR/.archive-staging"
# Snapshot current artifacts so they can be archived later if the branch changes.
# Call this at the END of a successful run or before exit.
snapshot_for_archive() {
rm -rf "$STAGING_DIR"
mkdir -p "$STAGING_DIR"
[ -f "$LOOP_DIR/prd.json" ] && cp "$LOOP_DIR/prd.json" "$STAGING_DIR/"
[ -f "$LOOP_DIR/progress.md" ] && cp "$LOOP_DIR/progress.md" "$STAGING_DIR/"
[ -d "$LOOP_DIR/contracts" ] && cp -r "$LOOP_DIR/contracts" "$STAGING_DIR/"
}
# Check if we need to archive and do so if branch changed.
# Reads the NEW branch from live prd.json and the OLD branch from the staging
# snapshot (which was saved at the end of the previous run). This avoids the
# bug where both branches read from the same (already-overwritten) prd.json.
check_archive() {
local current_branch
current_branch=$(prd_branch_name 2>/dev/null)
[ -z "$current_branch" ] && return
# Determine the previous branch from the staging snapshot (most reliable)
# or fall back to .last-branch file
local last_branch=""
if [ -f "$STAGING_DIR/prd.json" ]; then
if command -v jq &>/dev/null; then
last_branch=$(jq -r '.branchName // empty' "$STAGING_DIR/prd.json" 2>/dev/null)
else
last_branch=$(LOOP_PRD="$STAGING_DIR/prd.json" python3 -c "
import json, os
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
" 2>/dev/null)
fi
fi
[ -z "$last_branch" ] && [ -f "$LAST_BRANCH_FILE" ] && last_branch=$(cat "$LAST_BRANCH_FILE")
if [ -n "$last_branch" ] && [ "$last_branch" != "$current_branch" ]; then
archive_run "$last_branch"
fi
echo "$current_branch" > "$LAST_BRANCH_FILE"
}
# Archive the previous run's staged artifacts (NOT current prd.json)
archive_run() {
local branch_name="$1"
local feature_name
feature_name=$(echo "$branch_name" | sed 's|.*/||')
local archive_dir="$LOOP_DIR/archive/$(date +%Y-%m-%d)-${feature_name}"
mkdir -p "$archive_dir"
if [ -d "$STAGING_DIR" ]; then
# Use the staged snapshot (correct artifacts from the previous run)
cp -r "$STAGING_DIR"/* "$archive_dir/" 2>/dev/null || true
rm -rf "$STAGING_DIR"
else
# Fallback: no snapshot exists (first run or upgrade from old version).
# Current artifacts may belong to the new feature — archive what we have
# but warn the user.
log "WARNING: No archive snapshot found. Archiving current artifacts (may be from new feature)."
[ -f "$LOOP_DIR/prd.json" ] && cp "$LOOP_DIR/prd.json" "$archive_dir/"
[ -f "$LOOP_DIR/progress.md" ] && cp "$LOOP_DIR/progress.md" "$archive_dir/"
[ -d "$LOOP_DIR/contracts" ] && cp -r "$LOOP_DIR/contracts" "$archive_dir/"
fi
# Clean up old run's artifacts (progress.md, contracts — NOT prd.json which belongs to new feature)
rm -f "$LOOP_DIR/progress.md"
rm -rf "$LOOP_DIR/contracts"
log "Archived previous run to $archive_dir"
}

19
lib/hooks.sh Normal file
View File

@@ -0,0 +1,19 @@
#!/bin/bash
# Stop hook management for Claude Code loop continuation.
#
# NOTE: Hooks are currently no-ops. The loop uses `claude --print` (non-interactive),
# which runs to completion and exits naturally — no Stop hook is needed to signal
# iteration boundaries. The install/remove interface is preserved so that a future
# interactive mode can be added without changing loop.sh's call sites.
#
# If interactive mode is added, the hook mechanism will need redesign: `kill -INT $PPID`
# targets the hook runner's parent (Claude Code), not loop.sh. A sentinel-file or
# named-pipe approach would be more reliable.
install_hooks() {
: # no-op — see note above
}
remove_hooks() {
: # no-op — see note above
}

95
lib/prompt.sh Normal file
View File

@@ -0,0 +1,95 @@
#!/bin/bash
# Prompt assembly — composes the final prompt from base + mode overlay.
# Injects runtime variables (scope budgets, current story, iteration count).
# Build the complete prompt for a given agent role and mode.
# Usage: build_prompt "generator" "implement"
# build_prompt "evaluator" "implement"
build_prompt() {
local role="$1" # generator | evaluator
local mode="$2" # implement | explore | fix
local base_file="$LOOP_DIR/prompts/${role}/_base.md"
local mode_file="$LOOP_DIR/prompts/${role}/${mode}.md"
local prompt=""
# Start with base prompt
if [ -f "$base_file" ]; then
prompt=$(cat "$base_file")
else
log "WARNING: Missing base prompt: $base_file"
return 1
fi
# Append mode-specific overlay
if [ -f "$mode_file" ]; then
prompt="${prompt}
---
$(cat "$mode_file")"
else
log "WARNING: Missing mode prompt: $mode_file"
fi
# Inject runtime variables
prompt=$(inject_variables "$prompt" "$mode")
printf '%s\n' "$prompt"
}
# Replace template variables in prompt text
inject_variables() {
local text="$1"
local mode="$2"
# Scope budgets from config
local max_read max_write max_modify
max_read=$(get_config_value ".scopeBudgets.${mode}.maxFilesToRead" "50")
max_write=$(get_config_value ".scopeBudgets.${mode}.maxLinesToWrite" "500")
max_modify=$(get_config_value ".scopeBudgets.${mode}.maxFilesToModify" "10")
text="${text//\{\{MAX_FILES_TO_READ\}\}/$max_read}"
text="${text//\{\{MAX_LINES_TO_WRITE\}\}/$max_write}"
text="${text//\{\{MAX_FILES_TO_MODIFY\}\}/$max_modify}"
text="${text//\{\{MODE\}\}/$mode}"
text="${text//\{\{ITERATION\}\}/$ITERATION}"
text="${text//\{\{MAX_ITERATIONS\}\}/$MAX_ITERATIONS}"
text="${text//\{\{LOOP_DIR\}\}/$LOOP_DIR}"
text="${text//\{\{PROJECT_ROOT\}\}/$PROJECT_ROOT}"
text="${text//\{\{CURRENT_STORY_ID\}\}/${CURRENT_STORY_ID:-unknown}}"
text="${text//\{\{PRE_GENERATOR_SHA\}\}/${PRE_GENERATOR_SHA:-HEAD~1}}"
printf '%s\n' "$text"
}
# Read a value from config.json with a default fallback
get_config_value() {
local path="$1"
local default="$2"
local config="$LOOP_DIR/config.json"
[ -f "$config" ] || { echo "$default"; return; }
if command -v jq &>/dev/null; then
local val
val=$(jq -r "$path // empty" "$config" 2>/dev/null)
echo "${val:-$default}"
else
LOOP_CONFIG="$config" LOOP_PATH="$path" LOOP_DEFAULT="$default" python3 -c "
import json, os
d = json.load(open(os.environ['LOOP_CONFIG']))
keys = os.environ['LOOP_PATH'].lstrip('.').split('.')
for k in keys:
d = d.get(k) if isinstance(d, dict) else None
if d is None:
break
val = d if d is not None and d != {} else os.environ['LOOP_DEFAULT']
# Normalize Python booleans to lowercase for shell compatibility
if isinstance(val, bool):
val = str(val).lower()
print(val, end='')
"
fi
}

359
lib/state.sh Normal file
View File

@@ -0,0 +1,359 @@
#!/bin/bash
# State management for prd.json and progress.md.
# Provides functions to query story status, update pass/fail, and append progress.
# Requires: jq (preferred) or python3 (fallback)
# --- PRD Validation ---
validate_prd() {
local prd="$LOOP_DIR/prd.json"
[ -f "$prd" ] || return 0 # no prd.json is handled elsewhere
if command -v jq &>/dev/null; then
if ! jq -e '.userStories | type == "array" and length > 0' "$prd" >/dev/null 2>&1; then
log "ERROR: prd.json is missing or has no userStories array"
exit 1
fi
else
LOOP_PRD="$prd" python3 -c "
import json, sys, os
d = json.load(open(os.environ['LOOP_PRD']))
stories = d.get('userStories', [])
if not isinstance(stories, list) or len(stories) == 0:
print('[loop] ERROR: prd.json is missing or has no userStories array', file=sys.stderr)
sys.exit(1)
"
fi
}
# --- PRD Queries ---
# Get the ID of the highest-priority incomplete story (skips blocked stories)
next_story_id() {
local prd="$LOOP_DIR/prd.json"
[ -f "$prd" ] || return 1
if command -v jq &>/dev/null; then
jq -r '[.userStories[] | select(.passes == false and .blocked != true)] | sort_by(.priority // 999) | .[0].id // empty' "$prd"
else
LOOP_PRD="$prd" python3 -c "
import json, os
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
pending = sorted([s for s in stories if not s['passes'] and not s.get('blocked')], key=lambda s: s.get('priority', 999))
print(pending[0]['id'] if pending else '', end='')
"
fi
}
# Check if all actionable stories are done (passed or blocked)
all_stories_pass() {
local prd="$LOOP_DIR/prd.json"
[ -f "$prd" ] || return 1
if command -v jq &>/dev/null; then
local actionable
actionable=$(jq '[.userStories[] | select(.passes == false and .blocked != true)] | length' "$prd")
[ "$actionable" -eq 0 ]
else
LOOP_PRD="$prd" python3 -c "
import json, sys, os
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
actionable = [s for s in stories if not s['passes'] and not s.get('blocked')]
sys.exit(0 if len(actionable) == 0 else 1)
"
fi
}
# Check if any stories are blocked
any_stories_blocked() {
local prd="$LOOP_DIR/prd.json"
[ -f "$prd" ] || return 1
if command -v jq &>/dev/null; then
local blocked
blocked=$(jq '[.userStories[] | select(.blocked == true)] | length' "$prd")
[ "$blocked" -gt 0 ]
else
LOOP_PRD="$prd" python3 -c "
import json, sys, os
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
blocked = [s for s in stories if s.get('blocked')]
sys.exit(0 if len(blocked) > 0 else 1)
"
fi
}
# Get total and completed story counts
story_counts() {
local prd="$LOOP_DIR/prd.json"
[ -f "$prd" ] || { echo "0/0"; return; }
if command -v jq &>/dev/null; then
local total passed
total=$(jq '.userStories | length' "$prd")
passed=$(jq '[.userStories[] | select(.passes == true)] | length' "$prd")
echo "${passed}/${total}"
else
LOOP_PRD="$prd" python3 -c "
import json, os
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
passed = sum(1 for s in stories if s['passes'])
print(f'{passed}/{len(stories)}', end='')
"
fi
}
# --- PRD Mutations ---
# Mark a story as passed
mark_story_pass() {
local story_id="$1"
local prd="$LOOP_DIR/prd.json"
if command -v jq &>/dev/null; then
local updated
updated=$(jq --arg id "$story_id" \
'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)).passes = true else error("Story not found: \($id)") end' \
"$prd" 2>&1) || { log "WARNING: mark_story_pass failed for '$story_id'"; return 1; }
printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd"
else
LOOP_STORY_ID="$story_id" LOOP_PRD="$prd" python3 -c "
import json, pathlib, os, sys
p = pathlib.Path(os.environ['LOOP_PRD'])
d = json.loads(p.read_text())
story_id = os.environ['LOOP_STORY_ID']
found = False
for s in d['userStories']:
if s['id'] == story_id:
s['passes'] = True
found = True
break
if not found:
print(f'[loop] WARNING: mark_story_pass failed for {story_id!r}', file=sys.stderr)
sys.exit(1)
p.write_text(json.dumps(d, indent=2))
"
fi
}
# Mark a story as failed with rejection reason
mark_story_reject() {
local story_id="$1"
local reason="$2"
local prd="$LOOP_DIR/prd.json"
if command -v jq &>/dev/null; then
local updated
updated=$(jq --arg id "$story_id" --arg reason "$reason" \
'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)) |= (.passes = false | .rejections = ((.rejections // 0) + 1) | .notes = ((.notes // "") + "\n[REJECTED] " + $reason)) else error("Story not found: \($id)") end' \
"$prd" 2>&1) || { log "WARNING: mark_story_reject failed for '$story_id'"; return 1; }
printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd"
else
# Pass reason via env var to avoid shell injection from evaluator output
LOOP_STORY_ID="$story_id" LOOP_REASON="$reason" LOOP_PRD="$prd" python3 -c "
import json, pathlib, os
p = pathlib.Path(os.environ['LOOP_PRD'])
d = json.loads(p.read_text())
story_id = os.environ['LOOP_STORY_ID']
reason = os.environ['LOOP_REASON']
for s in d['userStories']:
if s['id'] == story_id:
s['passes'] = False
s['rejections'] = s.get('rejections', 0) + 1
s['notes'] = s.get('notes', '') + '\n[REJECTED] ' + reason
break
p.write_text(json.dumps(d, indent=2))
"
fi
}
# Get rejection count for a story
story_rejections() {
local story_id="$1"
local prd="$LOOP_DIR/prd.json"
if command -v jq &>/dev/null; then
jq -r --arg id "$story_id" \
'.userStories[] | select(.id == $id) | .rejections // 0' "$prd"
else
LOOP_PRD="$prd" LOOP_STORY_ID="$story_id" python3 -c "
import json, os
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
story_id = os.environ['LOOP_STORY_ID']
for s in stories:
if s['id'] == story_id:
print(s.get('rejections', 0), end='')
break
"
fi
}
# Mark a story as blocked (needs human review, skip in future iterations)
mark_story_blocked() {
local story_id="$1"
local reason="$2"
local prd="$LOOP_DIR/prd.json"
if command -v jq &>/dev/null; then
local updated
updated=$(jq --arg id "$story_id" --arg reason "$reason" \
'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)) |= (.blocked = true | .notes = ((.notes // "") + "\n[BLOCKED] " + $reason)) else error("Story not found: \($id)") end' \
"$prd" 2>&1) || { log "WARNING: mark_story_blocked failed for '$story_id'"; return 1; }
printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd"
else
LOOP_STORY_ID="$story_id" LOOP_REASON="$reason" LOOP_PRD="$prd" python3 -c "
import json, pathlib, os
p = pathlib.Path(os.environ['LOOP_PRD'])
d = json.loads(p.read_text())
story_id = os.environ['LOOP_STORY_ID']
reason = os.environ['LOOP_REASON']
for s in d['userStories']:
if s['id'] == story_id:
s['blocked'] = True
s['notes'] = s.get('notes', '') + '\n[BLOCKED] ' + reason
break
p.write_text(json.dumps(d, indent=2))
"
fi
}
# --- Progress ---
MAX_PROGRESS_ENTRIES=15
# Append a progress entry, rotating old entries to archive when limit is reached
append_progress() {
local entry="$1"
local progress="$LOOP_DIR/progress.md"
if [ ! -f "$progress" ]; then
cp "$LOOP_DIR/templates/progress.md.template" "$progress" 2>/dev/null || \
printf "# Progress\n\n## Codebase Patterns\n\n---\n\n## Session Log\n" > "$progress"
fi
printf "\n%s\n" "$entry" >> "$progress"
rotate_progress
}
# Archive old session log entries to keep progress.md from growing unbounded.
# Preserves the Codebase Patterns section and keeps only the last N entries.
rotate_progress() {
local progress="$LOOP_DIR/progress.md"
[ -f "$progress" ] || return
# Count session entries by counting "### " headers after the Session Log marker.
# Using headers instead of "---" separators avoids false positives from markdown
# code blocks or horizontal rules inside entries.
local entry_count
local session_start
session_start=$(grep -n '## Session Log' "$progress" | head -1 | cut -d: -f1)
if [ -z "$session_start" ]; then
return
fi
entry_count=$(tail -n +"$session_start" "$progress" | grep -c '^### ' 2>/dev/null || echo "0")
if [ "$entry_count" -le "$MAX_PROGRESS_ENTRIES" ]; then
return
fi
local archive="$LOOP_DIR/progress-archive.md"
if command -v python3 &>/dev/null; then
LOOP_PROGRESS="$progress" LOOP_ARCHIVE="$archive" \
LOOP_MAX_ENTRIES="$MAX_PROGRESS_ENTRIES" python3 -c "
import pathlib, os
progress = pathlib.Path(os.environ['LOOP_PROGRESS'])
archive = pathlib.Path(os.environ['LOOP_ARCHIVE'])
max_entries = int(os.environ['LOOP_MAX_ENTRIES'])
text = progress.read_text()
# Split at 'Session Log' header
if '## Session Log' not in text:
exit(0)
header, session_log = text.split('## Session Log', 1)
# Split entries by '---' separator
# parts[0] is the preamble between '## Session Log' and the first '---'
parts = session_log.split('\n---\n')
preamble = parts[0]
entries = parts[1:]
if len(entries) <= max_entries:
exit(0)
# Keep last max_entries, archive the rest
to_archive = entries[:-max_entries]
to_keep = entries[-max_entries:]
# Append archived entries
existing_archive = archive.read_text() if archive.exists() else '# Progress Archive\n'
existing_archive += '\n---\n'.join(to_archive)
archive.write_text(existing_archive)
# Rewrite progress with header + preamble + kept entries
progress.write_text(header + '## Session Log' + preamble + '\n---\n' + '\n---\n'.join(to_keep))
"
else
# Bash fallback: rotate session log entries with archiving.
# Uses awk to split on "### " entry headers for accurate counting
# (avoids false positives from "---" separators inside entries).
local session_start
session_start=$(grep -n '## Session Log' "$progress" | head -1 | cut -d: -f1)
[ -z "$session_start" ] && return
# Extract header (everything up to and including "## Session Log" line)
local header_content
header_content=$(head -n "$session_start" "$progress")
# Extract session content and split into entries by "### " headers
local session_content
session_content=$(tail -n +"$((session_start + 1))" "$progress")
# Count entries by "### " headers
local entry_count
entry_count=$(echo "$session_content" | grep -c '^### ' 2>/dev/null || echo "0")
[ "$entry_count" -le "$MAX_PROGRESS_ENTRIES" ] && return
# Find the line number (within session_content) of the Nth-from-last "### " header
local keep_from
keep_from=$(echo "$session_content" | grep -n '^### ' | tail -n "$MAX_PROGRESS_ENTRIES" | head -1 | cut -d: -f1)
[ -z "$keep_from" ] && return
# Archive older entries
local to_archive
to_archive=$(echo "$session_content" | head -n "$((keep_from - 1))")
if [ -n "$to_archive" ]; then
if [ -f "$archive" ]; then
printf '\n%s' "$to_archive" >> "$archive"
else
printf '# Progress Archive\n\n%s\n' "$to_archive" > "$archive"
fi
fi
# Keep recent entries
local kept_content
kept_content=$(echo "$session_content" | tail -n +"$keep_from")
printf '%s\n\n%s\n' "$header_content" "$kept_content" > "${progress}.tmp" \
&& mv "${progress}.tmp" "$progress"
fi
}
# Get the branch name from prd.json
prd_branch_name() {
local prd="$LOOP_DIR/prd.json"
[ -f "$prd" ] || return 1
if command -v jq &>/dev/null; then
jq -r '.branchName // empty' "$prd"
else
LOOP_PRD="$prd" python3 -c "
import json, os
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
"
fi
}

403
loop.sh Executable file
View File

@@ -0,0 +1,403 @@
#!/bin/bash
# Autonomous AI agent loop orchestrator
# Combines generator-evaluator architecture with iterative context-reset pattern.
#
# Usage:
# ./loop.sh [options]
#
# Options:
# --mode <implement|explore|fix> Operating mode (default: from config.json)
# --max <N> Maximum iterations (default: from config.json)
# --skip-eval Skip evaluator pass
# --tool <claude|amp> AI tool to use (default: from config.json)
# --no-hooks Don't install stop hooks
# --dry-run Print assembled prompts without running agents
# --resume Skip already-passed stories (explicit mode)
# --replan (reserved — not yet implemented)
#
# Each iteration:
# 1. Generator: picks highest-priority incomplete story, does the work
# 2. Evaluator: verifies the work, can PASS or REJECT
# Both get fresh context windows. Loop continues until all stories pass or max iterations.
set -euo pipefail
# --- Exit codes ---
EXIT_OK=0 # All stories complete
EXIT_ERROR=1 # Configuration or runtime error
EXIT_MAX_ITERATIONS=2 # Max iterations reached, work remains
EXIT_ALL_BLOCKED=3 # All remaining stories blocked for human review
# --- Resolve paths ---
LOOP_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$LOOP_DIR/.." && pwd)"
export LOOP_DIR PROJECT_ROOT
# --- Lockfile (prevent concurrent runs) ---
LOCKFILE="$LOOP_DIR/.loop.lock"
acquire_lock() {
# mkdir is atomic on POSIX — prevents race between check and create
if ! mkdir "$LOCKFILE" 2>/dev/null; then
local old_pid
old_pid=$(cat "$LOCKFILE/pid" 2>/dev/null)
if [ -n "$old_pid" ] && kill -0 "$old_pid" 2>/dev/null; then
echo "[loop] ERROR: Another loop instance is running (PID $old_pid)."
echo "[loop] If this is stale, remove $LOCKFILE and retry."
exit 1
fi
# Stale lockfile — previous run crashed without cleanup
rm -rf "$LOCKFILE"
mkdir "$LOCKFILE"
fi
echo $$ > "$LOCKFILE/pid"
}
release_lock() {
rm -rf "$LOCKFILE"
}
acquire_lock
# --- Source libraries ---
source "$LOOP_DIR/lib/hooks.sh"
source "$LOOP_DIR/lib/state.sh"
source "$LOOP_DIR/lib/archive.sh"
source "$LOOP_DIR/lib/prompt.sh"
# --- Logging ---
log() { echo "[loop] $*"; }
log_header() {
echo ""
echo "═══════════════════════════════════════════════════════"
echo " $*"
echo "═══════════════════════════════════════════════════════"
echo ""
}
# --- Preflight checks ---
if ! command -v jq &>/dev/null && ! command -v python3 &>/dev/null; then
log "ERROR: Either jq or python3 is required. Install one and retry."
exit 1
fi
# --- Load config defaults ---
CONFIG_FILE="$LOOP_DIR/config.json"
config_default() { get_config_value "$1" "$2"; }
TOOL=$(config_default ".tool" "claude")
MODE=$(config_default ".mode" "implement")
MAX_ITERATIONS=$(config_default ".maxIterations" "20")
SKIP_EVAL=$(config_default ".skipEval" "false")
EVAL_RETRIES=$(config_default ".evalRetries" "2")
AUTO_HOOKS=$(config_default ".autoHooks" "true")
DRY_RUN=false
RESUME=false
# --- Parse CLI args (override config) ---
while [[ $# -gt 0 ]]; do
case $1 in
--mode) MODE="$2"; shift 2 ;;
--mode=*) MODE="${1#*=}"; shift ;;
--max) MAX_ITERATIONS="$2"; shift 2 ;;
--max=*) MAX_ITERATIONS="${1#*=}"; shift ;;
--skip-eval) SKIP_EVAL=true; shift ;;
--tool) TOOL="$2"; shift 2 ;;
--tool=*) TOOL="${1#*=}"; shift ;;
--no-hooks) AUTO_HOOKS=false; shift ;;
--dry-run) DRY_RUN=true; shift ;;
--resume) RESUME=true; shift ;;
--replan) log "ERROR: --replan is not yet implemented. Use /loop-plan interactively."; exit 1 ;;
[0-9]*) MAX_ITERATIONS="$1"; shift ;;
*) log "Unknown option: $1"; exit 1 ;;
esac
done
export ITERATION=0 MAX_ITERATIONS MODE
# --- Validate ---
if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
log "ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix"
exit 1
fi
if [[ ! "$TOOL" =~ ^(claude|amp)$ ]]; then
log "ERROR: Invalid tool '$TOOL'. Must be: claude, amp"
exit 1
fi
# --- Setup ---
cd "$PROJECT_ROOT"
cleanup() {
[ -n "${LOOP_AGENT_TMPFILE:-}" ] && rm -f "$LOOP_AGENT_TMPFILE"
[ "$AUTO_HOOKS" = true ] && remove_hooks
release_lock
}
LOOP_AGENT_TMPFILE=""
if [ "$AUTO_HOOKS" = true ]; then
install_hooks
fi
trap cleanup EXIT INT TERM
check_archive
# Validate prd.json exists (AFTER archive check, which may delete it on branch change)
if [ ! -f "$LOOP_DIR/prd.json" ]; then
log "ERROR: No prd.json found. Run /loop-plan first to create one."
exit 1
fi
validate_prd
# Run project init script if it exists
if [ -f "$LOOP_DIR/init.sh" ]; then
log "Running init.sh..."
bash "$LOOP_DIR/init.sh"
fi
# Ensure correct git branch
BRANCH=$(prd_branch_name 2>/dev/null || echo "")
if [ -n "$BRANCH" ]; then
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then
log "Switching to branch: $BRANCH"
git checkout "$BRANCH" 2>/dev/null || \
git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \
git checkout -b "$BRANCH"
fi
fi
# --- Agent runner ---
# Runs a prompt through the selected AI tool and captures output.
# Output is displayed live via tee to /dev/tty (if available) and captured to a temp file.
# The function prints the captured output to stdout for the caller to capture.
run_agent() {
local prompt="$1"
local output_file
output_file=$(mktemp)
LOOP_AGENT_TMPFILE="$output_file" # exposed for trap cleanup
# Determine whether we can display live output
local has_tty=false
if { true > /dev/tty; } 2>/dev/null; then
has_tty=true
fi
# Run in subshell so a non-zero exit from the AI tool doesn't kill the loop.
# The subshell inherits set -e but its exit status is captured, not propagated.
local agent_exit=0
(
case "$TOOL" in
claude)
if [ "$has_tty" = true ]; then
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
claude --dangerously-skip-permissions --output-format text \
--print 2>&1 | tee /dev/tty > "$output_file"
else
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
claude --dangerously-skip-permissions --output-format text \
--print 2>&1 > "$output_file"
fi
;;
amp)
if [ "$has_tty" = true ]; then
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
amp --dangerously-allow-all 2>&1 | tee /dev/tty > "$output_file"
else
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
amp --dangerously-allow-all 2>&1 > "$output_file"
fi
;;
*)
log "ERROR: Unknown tool '$TOOL'"
exit 1
;;
esac
) || agent_exit=$?
if [ "$agent_exit" -ne 0 ] && [ ! -s "$output_file" ]; then
log "WARNING: Agent exited with code $agent_exit and produced no output."
fi
cat "$output_file"
rm -f "$output_file"
LOOP_AGENT_TMPFILE=""
}
# --- Parse evaluator verdict ---
parse_verdict() {
local output="$1"
if echo "$output" | grep -q "<verdict>REJECT</verdict>"; then
# Extract rejection reason (supports multiline)
local reason
reason=$(echo "$output" | sed -n '/<rejection_reason>/,/<\/rejection_reason>/p' \
| sed '1s/.*<rejection_reason>//' | sed '$s/<\/rejection_reason>.*//' \
| tr '\n' ' ' | sed 's/ */ /g' | sed 's/^ //;s/ $//')
[ -z "$reason" ] && reason="Rejected without specific reason"
echo "REJECT:${reason}"
elif echo "$output" | grep -q "<verdict>PASS</verdict>"; then
echo "PASS"
else
# No explicit verdict — fail-safe: treat as reject so broken evaluators don't silently approve
log "WARNING: No verdict tag found in evaluator output. Treating as REJECT (fail-safe)."
echo "REJECT:Evaluator produced no verdict tag — output may be malformed"
fi
}
# --- Main loop ---
log_header "Loop Starting"
log "Mode: $MODE"
log "Tool: $TOOL"
log "Max iter: $MAX_ITERATIONS"
log "Eval: $([[ $SKIP_EVAL == true ]] && echo 'off' || echo 'on')"
log "Dry run: $([[ $DRY_RUN == true ]] && echo 'yes' || echo 'no')"
log "Project: $PROJECT_ROOT"
log "Stories: $(story_counts 2>/dev/null || echo 'N/A')"
echo ""
while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
ITERATION=$((ITERATION + 1))
export ITERATION
# Check if all stories already pass
if all_stories_pass 2>/dev/null; then
log_header "All Stories Complete! ($(story_counts))"
snapshot_for_archive
exit 0
fi
# Capture which story the generator will work on (highest-priority incomplete)
CURRENT_STORY_ID=$(next_story_id 2>/dev/null || echo "")
export CURRENT_STORY_ID
# No actionable story — all remaining are passed or blocked
if [ -z "$CURRENT_STORY_ID" ]; then
if [ "$RESUME" = true ]; then
log "Resume mode: no actionable stories remaining."
else
log "No actionable stories remaining (all passed or blocked)."
fi
snapshot_for_archive
if any_stories_blocked 2>/dev/null; then
log "Some stories are blocked and need human review. Run /loop-triage for details."
exit $EXIT_ALL_BLOCKED
fi
exit $EXIT_OK
fi
# Capture git state before generator runs (for evaluator diff)
PRE_GENERATOR_SHA=$(git rev-parse HEAD 2>/dev/null || echo "")
export PRE_GENERATOR_SHA
# --- Generator pass ---
log_header "Iteration $ITERATION / $MAX_ITERATIONS — GENERATOR${CURRENT_STORY_ID:+ ($CURRENT_STORY_ID)}"
GENERATOR_PROMPT=$(build_prompt "generator" "$MODE")
# --dry-run: print prompts and exit without running agents
if [ "$DRY_RUN" = true ]; then
log "=== GENERATOR PROMPT ==="
printf '%s\n' "$GENERATOR_PROMPT"
echo ""
if [ "$SKIP_EVAL" != true ] && [ -n "$CURRENT_STORY_ID" ]; then
EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
log "=== EVALUATOR PROMPT ==="
printf '%s\n' "$EVAL_PROMPT"
fi
log "Dry run complete. Showing prompts for story: ${CURRENT_STORY_ID:-unknown}"
exit 0
fi
GENERATOR_OUTPUT=$(run_agent "$GENERATOR_PROMPT")
if [ -z "$GENERATOR_OUTPUT" ]; then
log "WARNING: Generator produced empty output (timeout or crash). Skipping to next iteration."
continue
fi
# --- Scope budget check ---
# Verify the generator stayed within configured limits (files modified, lines written).
# Advisory in implement/fix modes (log warning), but enforced as rejection reason for evaluator.
if [ -n "$PRE_GENERATOR_SHA" ] && [ "$PRE_GENERATOR_SHA" != "" ]; then
SCOPE_FILES_MODIFIED=$(git diff --name-only "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | wc -l | tr -d ' ')
SCOPE_LINES_WRITTEN=$(git diff --stat "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
MAX_MODIFY=$(config_default ".scopeBudgets.${MODE}.maxFilesToModify" "10")
MAX_WRITE=$(config_default ".scopeBudgets.${MODE}.maxLinesToWrite" "500")
if [ "${SCOPE_FILES_MODIFIED:-0}" -gt "$MAX_MODIFY" ]; then
log "WARNING: Scope budget exceeded — modified $SCOPE_FILES_MODIFIED files (limit: $MAX_MODIFY)"
fi
if [ "${SCOPE_LINES_WRITTEN:-0}" -gt "$MAX_WRITE" ]; then
log "WARNING: Scope budget exceeded — wrote $SCOPE_LINES_WRITTEN lines (limit: $MAX_WRITE)"
fi
export SCOPE_FILES_MODIFIED SCOPE_LINES_WRITTEN
fi
# Check for completion sentinel
if echo "$GENERATOR_OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then
log_header "Generator signaled COMPLETE ($(story_counts))"
snapshot_for_archive
exit 0
fi
# --- Evaluator pass ---
if [ "$SKIP_EVAL" != true ]; then
log_header "Iteration $ITERATION / $MAX_ITERATIONS — EVALUATOR${CURRENT_STORY_ID:+ ($CURRENT_STORY_ID)}"
if [ -z "$CURRENT_STORY_ID" ]; then
log "WARNING: No actionable story ID found. Skipping evaluator."
continue
fi
EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
EVAL_OUTPUT=$(run_agent "$EVAL_PROMPT")
if [ -z "$EVAL_OUTPUT" ]; then
log "WARNING: Evaluator produced empty output (timeout or crash). Treating as REJECT."
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no output</rejection_reason>"
fi
VERDICT=$(parse_verdict "$EVAL_OUTPUT")
case "$VERDICT" in
PASS)
log "Evaluator: PASS"
if [ -n "$CURRENT_STORY_ID" ]; then
mark_story_pass "$CURRENT_STORY_ID"
fi
;;
REJECT:*)
REASON="${VERDICT#REJECT:}"
log "Evaluator: REJECT — $REASON"
if [ -n "$CURRENT_STORY_ID" ]; then
mark_story_reject "$CURRENT_STORY_ID" "$REASON"
# Check retry limit — block story to prevent infinite retries
REJECTIONS=$(story_rejections "$CURRENT_STORY_ID")
REJECTIONS="${REJECTIONS:-0}"
if [ "$REJECTIONS" -ge "$EVAL_RETRIES" ]; then
log "WARNING: Story $CURRENT_STORY_ID rejected $REJECTIONS times (limit: $EVAL_RETRIES). Blocking for human review."
mark_story_blocked "$CURRENT_STORY_ID" "Rejected $REJECTIONS times. Last: $REASON"
append_progress "### BLOCKED: $CURRENT_STORY_ID
Rejected $REJECTIONS times. Needs human review. Last reason: $REASON
---"
fi
fi
;;
esac
fi
done
# --- Max iterations reached ---
log_header "Max Iterations Reached ($MAX_ITERATIONS)"
log "Stories completed: $(story_counts)"
log "Run /loop-triage to generate a handoff brief."
snapshot_for_archive
exit $EXIT_MAX_ITERATIONS

View File

@@ -0,0 +1,92 @@
You are an Evaluator agent in an autonomous agent loop. Your job is to VERIFY work done by a Generator agent. You are skeptical by default.
## Bias Correction (READ THIS CAREFULLY)
You (Claude) have well-documented tendencies that make you a poor QA agent by default:
- You **assume code works** if it looks reasonable
- You **accept "close enough"** implementations
- You **rationalize away** edge cases and missing pieces
- You **prioritize politeness** over accuracy
**OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.
**Rejection is normal and healthy.** Rejecting 30-50% of generator iterations is expected. If you're passing everything, you are not being skeptical enough.
## Your Target
Evaluate story **`{{CURRENT_STORY_ID}}`**. This is the story the generator just worked on.
## Evaluation Process
1. **Read `.loop/prd.json`** — find story `{{CURRENT_STORY_ID}}` and its acceptance criteria
2. **Read the sprint contract** at `.loop/contracts/{{CURRENT_STORY_ID}}.contract.md` (if it exists)
3. **Read `.loop/progress.md`** — check the latest session log entry for what the generator claims to have done
4. **Examine the actual changes:**
- Run `git diff {{PRE_GENERATOR_SHA}}..HEAD` to see ALL changes the generator made
- Read the modified files IN FULL (not just the diff) to understand context
5. **For EACH acceptance criterion in prd.json**, independently verify:
- Does the code ACTUALLY satisfy this criterion?
- Not "does it look like it might" — does it ACTUALLY?
6. **Run quality checks yourself:**
- Typecheck (if applicable)
- Tests (if applicable)
- Lint (if applicable)
7. **Check for regressions:**
- Did the changes break anything that was working before?
- Did the generator modify files outside the story's scope?
8. **Check for anti-patterns:**
- Placeholder or stub implementations disguised as complete
- Hardcoded values that should be configurable
- Missing error handling at system boundaries
- Security issues (hardcoded secrets, unsanitized input, SQL injection)
## Verdict Format
You MUST end your response with EXACTLY ONE of these verdict blocks:
### If the story genuinely passes all criteria:
```
<verdict>PASS</verdict>
```
### If any criterion is not met or issues are found:
```
<verdict>REJECT</verdict>
<rejection_reason>
[Specific, actionable description of what failed and why.
Include file paths and line numbers.
Be concrete — "the function doesn't handle null input" not "there might be edge cases".]
</rejection_reason>
```
## What Warrants Rejection
- ANY acceptance criterion not actually met (not "mostly met" — MET)
- Tests fail
- Typecheck fails
- Placeholder/stub code left in place
- Security vulnerability introduced
- Regression in existing functionality
- Contract's Done Conditions not satisfied (if contract exists)
## What Does NOT Warrant Rejection
- Code style preferences (as long as it matches project conventions)
- Minor naming choices
- Missing optimization that wasn't in the criteria
- Absence of features not in the story scope
## Scope Budget
- Maximum files to read: {{MAX_FILES_TO_READ}}
- Focus your verification on the files the generator changed
- You do NOT need to read the entire codebase
## Current State
- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}}
- Mode: {{MODE}}
- Project root: {{PROJECT_ROOT}}
- Loop directory: {{LOOP_DIR}}

View File

@@ -0,0 +1,49 @@
# Mode: Explore — Evaluator
You are evaluating an analysis/exploration task. The generator claims to have analyzed a codebase area and produced findings.
## Read-Only Enforcement (CHECK FIRST)
Before any other checks, verify explore mode's read-only constraint:
1. Run `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only`
2. If ANY file outside `.loop/triage/` was modified or committed, **REJECT immediately** — explore mode is read-only. The generator must not modify host project files.
## Exploration-Specific Checks
1. **Read the analysis output** at `.loop/triage/{story-id}-analysis.md`
2. **Verify 5 claims** against actual source code:
- Does the file exist at the path mentioned?
- Does the code behave as described?
- Are the line counts roughly accurate?
- Are the "Issues Found" real issues or false alarms?
- Are the recommendations actionable?
3. **Check for omissions:**
- Did the generator miss obvious files in the area?
- Are there important code paths not covered?
- Are there recent git commits that change the analysis?
## Claim Verification Format
Before giving your verdict, document what you checked:
```
Claims Verified:
- [CONFIRMED] [claim] — verified in [file:line]
- [INCORRECT] [claim] — actual behavior is [what you found]
- [UNVERIFIABLE] [claim] — could not confirm (file missing, ambiguous)
```
## Grading Criteria
- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed)
- **Completeness**: Did it cover the important parts of the area?
- **Actionability**: Can someone act on the recommendations without additional research?
## Rejection Criteria
Reject if:
- Fewer than 4 of 5 verified claims are accurate
- The analysis references files that don't exist
- Key files in the area were completely missed
- Recommendations are vague ("improve error handling") rather than specific ("add null check in auth.ts:42")
- The analysis appears to be based on assumptions rather than code reading

34
prompts/evaluator/fix.md Normal file
View File

@@ -0,0 +1,34 @@
# Mode: Fix — Evaluator
You are evaluating a bug fix or tech debt reduction. The generator claims to have fixed an issue.
## Fix-Specific Checks
1. **Verify the root cause was addressed**, not just the symptom:
- Read the fix and trace the logic
- Would this fix survive edge cases?
- Did the generator patch around the bug or fix the actual cause?
2. **Verify a regression test exists:**
- Is there a new or updated test?
- Does the test actually reproduce the original bug scenario?
- Would the test fail if the fix were reverted?
3. **Check for regressions (CRITICAL for fix mode):**
- Run the full test suite, not just the new test
- Check that the fix doesn't change behavior for non-bug cases
- Look for side effects in shared code paths
4. **Verify minimal diff:**
- Did the generator change only what was necessary?
- Are there unrelated changes mixed in?
- Is the refactor scope proportional to the debt item?
## Rejection Criteria (Fix-Specific)
- Fix addresses symptom but not root cause
- No regression test added
- Existing tests fail after the fix
- Unrelated changes included in the commit
- Fix introduces a new bug or security issue
- For refactors: external behavior changed (API contract, return values, side effects)

View File

@@ -0,0 +1,31 @@
# Mode: Implement — Evaluator
You are evaluating an implementation story. The generator claims to have built a feature.
## Implementation-Specific Checks
In addition to the base evaluation process:
1. **Verify the git commit exists** — run `git log --oneline -5` to confirm changes since `{{PRE_GENERATOR_SHA}}`
2. **Check commit scope** — does `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only` only contain files relevant to this story?
3. **Read the actual test output** — if the generator claims tests pass, verify by running them yourself
4. **For UI stories:**
- Check that the component actually renders (not just that it exists)
- Verify event handlers are wired up (not just defined)
- Check accessibility basics (labels, semantic elements)
5. **For API stories:**
- Verify the endpoint is registered in the router
- Check request/response types match the contract
- Verify error handling returns appropriate status codes
6. **For database stories:**
- Verify migration runs cleanly
- Check indexes are created for query patterns
- Verify foreign key constraints
## Common Generator Failures to Watch For
- Created the file but didn't wire it into the application (route not registered, component not imported)
- Tests exist but don't actually assert meaningful behavior
- "Passes typecheck" but only because types are `any` or too loose
- UI component renders but doesn't respond to interaction
- API endpoint exists but returns hardcoded/mock data

View File

@@ -0,0 +1,68 @@
You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance of you runs each iteration — you have no memory of previous iterations except what's written in artifacts.
## Startup Sequence
1. **Read `.loop/progress.md`** — check the **Codebase Patterns** section first (top of file), then skim recent session log entries for context
2. **Read `.loop/prd.json`** — find the highest-priority story where `passes: false`
3. **Read the sprint contract** for that story at `.loop/contracts/{story-id}.contract.md` (if it exists)
4. **Check the story's `notes` field** — if it contains `[REJECTED]` entries, those are feedback from a previous evaluator. Address the specific issues raised.
5. **Confirm the git branch** — the loop has already checked you out on the correct branch per `prd.json.branchName`. Run `git branch --show-current` to verify if needed.
## Work Rules
- **ONE story per iteration.** Do not attempt multiple stories.
- **Read before writing.** Understand existing code before modifying it. Search for existing implementations before creating new ones.
- **Follow existing patterns.** Check Codebase Patterns in progress.md. Match the project's style, naming, and structure.
- **No placeholders.** Every implementation must be complete and functional. If a story is too large, stop and note what remains — do NOT leave stub/placeholder code.
- **Commit after completing the story.** Message format: `feat: [Story ID] - [Story Title]`
## Quality Gates
Before marking a story as complete:
- Run the project's type checker (if applicable)
- Run the project's test suite (if applicable)
- Run the project's linter (if applicable)
- All must pass. If they fail, fix the issues before committing.
## After Completing the Story
1. **Update `.loop/prd.json`** — set `passes: true` for the completed story (the harness also sets this on evaluator PASS as a safety net, but you should still do it)
2. **Append to `.loop/progress.md`** with this format:
```
### [Story ID] — [Story Title]
Date: YYYY-MM-DD HH:MM
**What was done:**
- Bullet points of changes made
**Files changed:**
- path/to/file.ext — brief description
**Learnings for future iterations:**
- Patterns discovered, gotchas encountered, useful context
---
```
3. **Update Codebase Patterns** (top of progress.md) if you discovered a reusable pattern
4. **Update AGENTS.md/CLAUDE.md** in modified directories if you discovered genuinely reusable knowledge (API conventions, non-obvious requirements, testing approaches)
## Completion Signal
- If ALL stories in prd.json have `passes: true`, respond with: `<promise>COMPLETE</promise>`
- Otherwise, end your response normally. The next iteration will pick up the next story.
## Scope Budget
- Maximum files to read: {{MAX_FILES_TO_READ}}
- Maximum lines to write: {{MAX_LINES_TO_WRITE}}
- Maximum files to modify: {{MAX_FILES_TO_MODIFY}}
- If you approach a limit, stop and note what remains in progress.md.
## Current State
- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}}
- Mode: {{MODE}}
- Project root: {{PROJECT_ROOT}}
- Loop directory: {{LOOP_DIR}}

View File

@@ -0,0 +1,62 @@
# Mode: Explore (Read-Only)
You are analyzing an existing codebase to build understanding. You are NOT writing code. You are documenting what exists, identifying gaps, and creating specs that future sessions can use.
## Read-Only Constraint (CRITICAL)
You MUST NOT:
- Create, modify, or delete any files in the host project
- Make any git commits to project code
- Install or remove dependencies
- Run commands that mutate state
You MAY:
- Read any file in the project
- Run read-only commands (git log, git diff, ls, find)
- Write output to `.loop/triage/` directory only
## Exploration Workflow
1. Read the story from prd.json — it describes what area to analyze
2. Read the relevant source code (not existing docs — verify against code)
3. Write your findings to `.loop/triage/{story-id}-analysis.md`
4. Mark the story as `passes: true` in prd.json
5. Append to progress.md
## Analysis Output Format
Write to `.loop/triage/{story-id}-analysis.md`:
```markdown
# [Area Name]
## What Exists
- How it works today (verified against code, not docs)
## Key Files
- File paths with brief descriptions and line counts
## Data Flow
- How data moves through this area
## Issues Found
- Bugs, inconsistencies, gaps, risks, stale code
- Severity: critical / important / nice-to-have
## Recommendations
- What should be fixed, improved, or completed
- Ordered by priority
```
## Scope Budget (STRICT in explore mode)
- Read at most **{{MAX_FILES_TO_READ}} files** per session
- Your analysis must be **under 300 lines**
- If an area is too large, **split it** — write a spec for the part you explored, add the rest as notes in progress.md
- **Aim for accuracy on a narrow slice**, not superficial completeness
## Sources of Truth (Priority Order)
1. **The code itself** — always verify against source
2. **Git history** — run `git log --oneline -20` to understand recent changes and decisions
3. **Existing docs** — treat as potentially stale hints. Note contradictions in your analysis.

26
prompts/generator/fix.md Normal file
View File

@@ -0,0 +1,26 @@
# Mode: Fix
You are fixing bugs or reducing tech debt from a prioritized list. Each story is a targeted fix.
## Fix Workflow
1. Read the story — it describes the specific bug or debt item
2. Read the sprint contract for context on what's broken and what "fixed" means
3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists.
4. Make the minimal change to fix the issue
5. Write or update a test that would have caught this bug
6. Run quality gates
7. Commit
## Constraints
- **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations.
- **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions.
- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md.
- **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve.
## Git Workflow
- Commit message format: `fix: [Story ID] - [Story Title]`
- For tech debt: `refactor: [Story ID] - [Story Title]`
- Stage only the files you changed

View File

@@ -0,0 +1,37 @@
# Mode: Implement
You are building features from a PRD. Each story is a small, self-contained unit of work.
## Implementation Workflow
1. Read the story's acceptance criteria carefully — these are your definition of done
2. If a sprint contract exists, follow its **Done Conditions** exactly
3. Plan your approach before writing code:
- What files need to change?
- What existing code can you reuse?
- What's the minimal change to satisfy the criteria?
4. Implement the story
5. Run quality gates (typecheck, lint, test)
6. Commit with a descriptive message
7. Mark the story as passed
## Constraints
- **Minimal changes only.** Do not refactor surrounding code. Do not add features beyond the story scope.
- **Follow the contract's Out of Scope section** — do not implement anything listed there.
- **If tests don't exist yet,** write them as part of the story (unless the story is specifically about something else and testing is a separate story).
- **If you need a dependency,** install it and note it in progress.md so future iterations know.
## Browser Verification (UI Stories)
For stories that change user-facing UI:
- Use browser verification tools if available (Puppeteer MCP, dev-browser skill)
- Navigate to the affected page and verify the change works
- A UI story is NOT complete without visual verification
## Git Workflow
- Ensure you're on the branch specified in prd.json
- Stage only the files you changed (not `git add .`)
- Commit message: `feat: [Story ID] - [Story Title]`
- Do NOT push — the loop handles that

42
prompts/planner/plan.md Normal file
View File

@@ -0,0 +1,42 @@
# Planner Context
This file is loaded by the `/loop-plan` skill to provide additional context for PRD generation.
## Story Decomposition Guidelines
When breaking a feature into stories, think about:
### Independence
Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet.
### Context Window Fit
A story must fit in a single AI context window (~100K tokens). This means:
- Reading relevant existing code
- Understanding the task
- Implementing the change
- Writing tests
- Running quality checks
- Committing
Budget roughly:
- 30% of context for reading/understanding
- 40% for implementation
- 20% for testing and quality
- 10% for bookkeeping (prd.json, progress.md)
### Failure Isolation
If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy.
### Evaluability
Every story must have criteria the evaluator can independently verify. "The code is clean" is not evaluable. "The function returns 404 when the user doesn't exist" is evaluable.
## PRD Anti-Patterns
Avoid these common mistakes:
- **Stories too large:** "Build the API" — split into individual endpoints
- **Stories too small:** "Create the file" — combine with meaningful work in that file
- **Vague criteria:** "Works correctly" — what does correctly mean? Be specific.
- **Missing dependencies:** Story 5 needs Story 3's database table but doesn't say so
- **Testing as afterthought:** Tests should be part of each story, not a separate "add tests" story at the end
- **UI without backend:** A UI story that calls an API that doesn't exist yet

141
skills/loop-init/SKILL.md Normal file
View File

@@ -0,0 +1,141 @@
---
name: init
description: Initialize the agent loop harness in the current project. Scaffolds .loop/ directory, detects tech stack, picks mode, generates config, and flows into planning.
---
# /init — Initialize Agent Loop for a Project
Set up the agent loop harness in the current project. This is the entry point for first-time use.
## What This Skill Does
1. Scaffolds the `.loop/` directory with prompts, templates, and lib scripts from the plugin
2. Analyzes the project to understand its tech stack, structure, and conventions
3. Asks the user what they want to accomplish (explore, implement, or fix)
4. Creates project-specific configuration (`config.json`, `init.sh`)
5. Flows into planning to generate the PRD and sprint contracts
## Instructions
When the user invokes this skill, follow this sequence:
### Step 0: Scaffold .loop/ Directory
Check if `.loop/` already exists in the project root.
**If it does NOT exist**, create it by copying from the plugin:
1. The plugin's root directory is available at `${CLAUDE_PLUGIN_ROOT}`. Copy the harness files:
```bash
mkdir -p .loop
cp -r "${CLAUDE_PLUGIN_ROOT}/prompts" .loop/
cp -r "${CLAUDE_PLUGIN_ROOT}/templates" .loop/
cp -r "${CLAUDE_PLUGIN_ROOT}/lib" .loop/
cp "${CLAUDE_PLUGIN_ROOT}/loop.sh" .loop/
chmod +x .loop/loop.sh
```
**IMPORTANT:** If `${CLAUDE_PLUGIN_ROOT}` is not set or the path doesn't exist, look for the files in the plugin's own directory structure. The prompts, templates, and lib directories are bundled with this plugin.
2. Create `.loop/.gitignore` with runtime artifacts:
```
prd.json
progress.md
progress-archive.md
config.json
init.sh
contracts/
triage/
archive/
.archive-staging/
.last-branch
.loop.lock
```
**If `.loop/` already exists**, ask the user if they want to re-initialize (which resets config but preserves prd.json/progress.md if they exist).
### Step 1: Project Discovery
Read the project to understand what we're working with:
- Check for `CLAUDE.md`, `AGENTS.md`, `README.md` at the project root
- Check for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Package.swift`, `composer.json` to identify the tech stack
- Run `ls` on the project root to see the top-level structure
Present a brief summary:
> "I see this is a [language/framework] project with [key characteristics]. The main source is in [dir/]."
### Step 2: Mode Selection
Ask the user:
> **What would you like to do?**
>
> a) **Explore** — Analyze the codebase to understand what exists, find issues, and document the system. No code changes.
> b) **Implement** — Build a new feature from a PRD. Code changes, commits, and tests.
> c) **Fix** — Work through a list of bugs or tech debt items. Targeted code changes.
### Step 3: Clarifying Questions
Based on the mode, ask 3-5 questions:
**For Explore:**
- "What areas are you most interested in? (e.g., auth, database, API, frontend, everything)"
- "Are there known problem areas you want me to focus on?"
- "How many exploration sessions should I budget? (default: 20)"
**For Implement:**
- "Describe the feature you want to build (1-3 sentences is fine)"
- "Are there any architectural constraints I should know about?"
- "Should I follow any specific patterns from the existing codebase?"
**For Fix:**
- "Do you have a list of issues, or should I find them?"
- "Any areas that are off-limits for changes?"
- "What's the priority: security, stability, or code quality?"
### Step 4: Generate Configuration
Create `.loop/config.json` based on the project and user's answers:
```json
{
"tool": "claude",
"mode": "<selected mode>",
"maxIterations": <appropriate default>,
"skipEval": false,
"evalRetries": 2,
"autoHooks": true,
"branchPrefix": "loop/",
"scopeBudgets": {
// Set based on project size and mode
}
}
```
Create `.loop/init.sh` with project-specific setup commands:
- Dev server startup (if applicable)
- Test runner command
- Type checker command
- Linter command
- Any environment setup needed
Make `init.sh` executable.
### Step 5: Flow into Planning
Tell the user:
> "Project configured. Now let's plan the work."
Then invoke the `/agent-loop:plan` skill to generate the PRD and sprint contracts.
### Step 6: Ready to Run
Once planning is complete, tell the user:
> "Everything is set up. To start the loop:"
> ```
> /agent-loop:run # Interactive (recommended) — visible, can intervene
> .loop/loop.sh # Headless — fully autonomous
> ```
> You can monitor progress in `.loop/progress.md` and check story status in `.loop/prd.json`.

188
skills/loop-plan/SKILL.md Normal file
View File

@@ -0,0 +1,188 @@
---
name: plan
description: Interactive planning session that generates PRD (prd.json) and sprint contracts for the agent loop. Run /agent-loop:init first.
---
# /plan — Generate PRD and Sprint Contracts
Interactive planning session that produces all artifacts needed for the autonomous agent loop.
## Prerequisites
- `.loop/` directory must exist with `config.json` (run `/agent-loop:init` first if not)
- User should have a clear idea of what they want to build/explore/fix
## Usage
```
/loop-plan <optional feature description>
```
Examples:
- `/loop-plan Add OAuth authentication with Google and GitHub`
- `/loop-plan Explore the payment processing system`
- `/loop-plan Fix all critical security issues from the audit`
## Instructions
### Step 1: Understand the Request
If the user provided a feature description, use it. Otherwise ask:
> "What would you like to work on? Describe it in 1-3 sentences."
### Step 2: Codebase Analysis
Read key project files to understand existing patterns:
- Relevant source directories for the feature
- Existing tests to understand testing patterns
- Configuration files for conventions
- Recent git history (`git log --oneline -20`) for active work
### Step 3: Clarifying Questions
Ask 3-5 targeted questions based on what you found in the code. These should be questions where the answer isn't obvious from the codebase. Examples:
- "I see you have both REST endpoints and GraphQL. Should this feature use REST or GraphQL?"
- "The existing auth uses JWT. Should I add OAuth alongside it or replace it?"
- "I found two competing patterns for data validation. Which should I follow?"
**Do NOT ask questions you can answer from the code.** Only ask when human judgment is needed.
### Step 4: Generate PRD (`prd.json`)
Create `.loop/prd.json` with properly-sized, dependency-ordered stories.
**Story Sizing Rules (CRITICAL):**
- Each story must be completable in ONE context window (~100K tokens of work)
- Target: 1-3 files changed per story
- Too big: "Build the authentication system" → split into migration, endpoint, middleware, UI, tests
- Too small: "Add import statement" → combine with the story that needs it
**Dependency Ordering:**
1. Schema/database changes first (they block everything)
2. Backend logic (depends on schema)
3. Frontend components (depend on backend)
4. Integration/wiring (depends on components)
5. Polish/edge cases (depends on core being done)
**Required Fields Per Story:**
```json
{
"id": "US-001",
"title": "Short descriptive title",
"description": "As a [role], I want [feature] so that [benefit].",
"acceptanceCriteria": [
"Specific, verifiable criterion",
"Another criterion",
"Typecheck passes"
],
"priority": 1,
"passes": false,
"notes": "",
"rejections": 0
}
```
**Acceptance Criteria Rules:**
- Every criterion must be independently verifiable (not "works well" — "returns 200 with valid token")
- Always include "Typecheck passes" (or equivalent for the language)
- UI stories must include "Verify UI renders and responds to interaction"
- API stories must include status code expectations
- Database stories must include migration success check
### Step 5: Generate Sprint Contracts
For each story, create `.loop/contracts/{story-id}.contract.md`:
```markdown
# Sprint Contract: {Story ID} — {Story Title}
## What Will Be Built
Concrete description of the deliverable. Not the user story — the actual thing being built.
## Done Conditions
- [ ] Condition 1 (specific, testable)
- [ ] Condition 2
- [ ] All acceptance criteria from prd.json met
## Evaluation Criteria
What the evaluator will specifically check:
- [ ] Check 1
- [ ] Check 2
- [ ] No regressions in [specific area]
## Out of Scope
Things explicitly NOT part of this story:
- Thing 1
- Thing 2
## Key Files
Files likely to be created or modified:
- path/to/file.ext — what changes
- path/to/other.ext — what changes
## Dependencies
- Depends on: [story IDs that must be done first, or "none"]
- Blocks: [story IDs that depend on this one, or "none"]
```
### Step 6: Initialize Progress File
Create `.loop/progress.md` from the template with an initial Codebase Patterns section populated from what you learned during analysis:
```markdown
# Progress
## Codebase Patterns
- [Pattern you discovered during analysis]
- [Convention you noticed]
- [Testing approach used in the project]
---
## Session Log
### Planning Session
Date: YYYY-MM-DD HH:MM
**PRD created:** {N} stories for "{feature description}"
**Estimated iterations:** {N stories + ~30% for evaluator rejections}
**Key decisions:**
- [Decision 1 and why]
- [Decision 2 and why]
---
```
### Step 7: Present Summary
Show the user a summary:
> **Plan Ready**
>
> | Stories | Est. Iterations | Mode | Branch |
> |---------|----------------|------|--------|
> | {N} | {N+30%} | {mode} | {branchName} |
>
> **Story Overview:**
> 1. US-001: {title} (priority 1)
> 2. US-002: {title} (priority 2)
> ...
>
> Review the stories in `.loop/prd.json` and contracts in `.loop/contracts/`.
> Adjust anything you'd like, then run:
> ```
> /agent-loop:run # Interactive (recommended)
> .loop/loop.sh # Headless
> ```
### Step 8: Wait for Feedback
Let the user review and adjust. They might:
- Ask to split a story further
- Ask to reorder priorities
- Ask to add/remove stories
- Ask to change acceptance criteria
Make the requested changes, then re-present the summary.

203
skills/loop-run/SKILL.md Normal file
View File

@@ -0,0 +1,203 @@
---
name: run
description: Execute the generator-evaluator loop interactively inside Claude Code. Dispatches subagents with full visibility and intervention capability. Run /agent-loop:init and /agent-loop:plan first.
---
# /run — Execute Agent Loop Inside Claude Code
Run the generator-evaluator loop natively in Claude Code using subagents. Unlike `loop.sh` (headless), this gives you full visibility into each agent's work and the ability to intervene at any point.
## Usage
```
/agent-loop:run # Run until all stories pass or max iterations
/agent-loop:run 3 # Run at most 3 iterations
/agent-loop:run --skip-eval # Skip evaluator (generator marks stories done)
/agent-loop:run --story US-003 # Run only a specific story
```
## Prerequisites
- `.loop/config.json` exists (run `/agent-loop:init` first)
- `.loop/prd.json` exists with stories (run `/agent-loop:plan` first)
## Instructions
When the user invokes `/loop-run`, follow this orchestration sequence exactly.
### Step 0: Parse Arguments
- If a number is provided, use it as max iterations. Otherwise read `maxIterations` from `.loop/config.json`.
- If `--skip-eval` is provided, skip the evaluator pass.
- If `--story <ID>` is provided, only work on that specific story.
### Step 1: Load State
1. Read `.loop/config.json` — get `mode`, `maxIterations`, `evalRetries`, `scopeBudgets`
2. Read `.loop/prd.json` — get the story list and their statuses
3. Check `.loop/progress.md` exists; if not, create it from `.loop/templates/progress.md.template`
Report to the user:
> **Loop Ready**
> - Mode: {mode}
> - Stories: {passed}/{total} complete
> - Max iterations: {N}
> - Eval: {on/off}
>
> Starting loop. You can interrupt me at any time to adjust course.
### Step 2: Iteration Loop
For each iteration (1 to max iterations):
#### 2a. Find Next Story
Find the highest-priority story in `prd.json` where `passes` is `false` and `blocked` is not `true`. If `--story` was specified, use that story instead.
**If no actionable story remains:**
- If all stories have `passes: true` → report success and stop
- If some stories are `blocked: true` → report which are blocked and suggest `/agent-loop:triage`
- Stop the loop
#### 2b. Report Iteration Start
Tell the user:
> **Iteration {N}/{max} — {story.id}: {story.title}**
If the story has `[REJECTED]` entries in its `notes` field, summarize the previous feedback so the user has context.
#### 2c. Assemble Generator Prompt
Read these files and concatenate them with `---` separators:
1. `.loop/prompts/generator/_base.md`
2. `.loop/prompts/generator/{mode}.md`
Then substitute these template variables in the assembled text:
- `{{MAX_FILES_TO_READ}}` → from `config.scopeBudgets.{mode}.maxFilesToRead`
- `{{MAX_LINES_TO_WRITE}}` → from `config.scopeBudgets.{mode}.maxLinesToWrite`
- `{{MAX_FILES_TO_MODIFY}}` → from `config.scopeBudgets.{mode}.maxFilesToModify`
- `{{MODE}}` → the mode
- `{{ITERATION}}` → current iteration number
- `{{MAX_ITERATIONS}}` → max iterations
- `{{LOOP_DIR}}` → path to `.loop/` directory
- `{{PROJECT_ROOT}}` → project root path
- `{{CURRENT_STORY_ID}}` → the story ID being worked on
#### 2d. Capture Pre-Generator Git State
Run `git rev-parse HEAD` and save it. This is needed for the evaluator's diff.
#### 2e. Dispatch Generator Agent
Use the **Agent tool** to launch the generator:
```
Agent(
prompt: <assembled generator prompt>,
description: "Generator: {story.id}",
subagent_type: "general-purpose",
mode: "auto"
)
```
**IMPORTANT:** Use `mode: "auto"` so the user can see tool calls but isn't prompted for every action. If the user has expressed a preference for more control, use `mode: "default"` instead.
Wait for the agent to complete. The Agent tool returns the generator's final output.
#### 2f. Check for Completion Signal
If the generator output contains `<promise>COMPLETE</promise>`, report all stories complete and stop.
#### 2g. Skip Evaluator (if configured)
If `--skip-eval` was specified or `config.skipEval` is true, skip to step 2j.
#### 2h. Assemble Evaluator Prompt
Read these files and concatenate them:
1. `.loop/prompts/evaluator/_base.md`
2. `.loop/prompts/evaluator/{mode}.md`
Substitute the same template variables as the generator, plus:
- `{{PRE_GENERATOR_SHA}}` → the git SHA captured in step 2d
- `{{CURRENT_STORY_ID}}` → the story ID
#### 2i. Dispatch Evaluator Agent
Use the **Agent tool** to launch the evaluator:
```
Agent(
prompt: <assembled evaluator prompt>,
description: "Evaluator: {story.id}",
subagent_type: "general-purpose",
mode: "auto"
)
```
Wait for completion. Parse the verdict from the output:
- Look for `<verdict>PASS</verdict>` → story passes
- Look for `<verdict>REJECT</verdict>` → story rejected; extract reason from `<rejection_reason>...</rejection_reason>`
- No verdict tag found → treat as REJECT (fail-safe)
#### 2j. Update State Based on Verdict
**On PASS (or skip-eval):**
1. Update `.loop/prd.json` — set `passes: true` for the story
2. Report to user: ✓ **{story.id} PASSED**
**On REJECT:**
1. Update `.loop/prd.json`:
- Keep `passes: false`
- Increment `rejections` count
- Append `[REJECTED] {reason}` to `notes`
2. Report to user: ✗ **{story.id} REJECTED** — {reason}
3. Check if `rejections` >= `evalRetries` from config:
- If yes: set `blocked: true` in prd.json, append `[BLOCKED]` to notes
- Report: ⚠ **{story.id} BLOCKED** — rejected {N} times, needs human review
#### 2k. Append Progress Entry
Append to `.loop/progress.md`:
```markdown
### {story.id} — {story.title}
Date: {current date and time}
Iteration: {N}
Verdict: {PASS/REJECT/SKIP-EVAL}
---
```
#### 2l. Report Iteration Summary
Show current story counts: `{passed}/{total} stories complete`
If there are more iterations and more stories, continue to the next iteration.
### Step 3: Loop Exit
When the loop ends (all stories done, max iterations, or all remaining blocked), report:
> **Loop Complete**
> - Iterations used: {N}
> - Stories: {passed}/{total} complete, {blocked} blocked
> - {Suggest `/agent-loop:triage` if anything is blocked or incomplete}
### Error Handling
- If an Agent subagent fails or returns empty output, log a warning and continue to the next iteration. Do NOT stop the loop for a single agent failure.
- If `prd.json` cannot be parsed, stop immediately and report the error.
- If the user interrupts (denies a tool call, says "stop", etc.), gracefully end the loop and report current status.
### Key Differences from loop.sh
| Feature | loop.sh | /loop-run |
|---------|---------|-----------|
| Execution | Headless (`claude --print`) | Visible in Claude Code |
| Intervention | Kill the process | Deny tool calls, chat mid-loop |
| Permissions | `--dangerously-skip-permissions` | User-controlled |
| Context | Fresh process per agent | Fresh Agent subagent per agent |
| State updates | Shell functions | Claude Code reads/writes files directly |

View File

@@ -0,0 +1,83 @@
---
name: triage
description: Generate a human handoff brief summarizing loop status — completed, blocked, and remaining stories with recommended next steps.
---
# /triage — Generate Human Handoff Brief
Generate a triage brief summarizing the current state of a loop run. Use this when:
- The loop hit max iterations without completing
- You want a status check mid-run
- You're handing off to another developer
## Instructions
When the user invokes `/loop-triage`:
### Step 1: Read Current State
1. Read `.loop/prd.json` — get story statuses
2. Read `.loop/progress.md` — get session log and patterns
3. Read `.loop/config.json` — get mode and iteration settings
4. Check git log for recent commits on the loop branch
### Step 2: Analyze
For each story, determine:
- **Complete**: `passes: true`, verified by evaluator
- **In Progress**: `passes: false`, has been attempted (check progress.md for entries)
- **Blocked**: `passes: false`, rejected multiple times (check `rejections` count and `notes`)
- **Not Started**: `passes: false`, no progress.md entries, no rejections
### Step 3: Generate Brief
Write to `.loop/triage/TRIAGE_BRIEF.md`:
```markdown
# Triage Brief
Generated: {current date and time}
Mode: {mode from config.json}
Branch: {branchName from prd.json}
## Status Summary
- **Complete:** {N} stories
- **In Progress:** {N} stories
- **Blocked:** {N} stories (hit retry limit)
- **Not Started:** {N} stories
## Story Details
| ID | Title | Status | Rejections | Notes |
|----|-------|--------|------------|-------|
| US-001 | ... | Complete | 0 | |
| US-002 | ... | Blocked | 3 | Evaluator rejected: ... |
| US-003 | ... | Not Started | 0 | |
## Key Patterns Discovered
{Copy the Codebase Patterns section from progress.md}
## Blocked Stories — Analysis
For each blocked story, summarize:
- What was attempted
- Why it was rejected (from notes field)
- Suggested approach for a human to unblock it
## Recommended Next Steps
Based on the current state:
1. {Most important next action}
2. {Second priority}
3. {Third priority}
## Files Modified
{List all files changed across all commits on the loop branch, with brief descriptions}
```
### Step 4: Present to User
Show the summary inline and tell the user where the full brief is saved.

View File

@@ -0,0 +1,35 @@
# Sprint Contract: {{STORY_ID}} — {{STORY_TITLE}}
## What Will Be Built
<!-- Concrete description of the deliverable -->
## Done Conditions
- [ ] <!-- Specific, testable condition -->
- [ ] <!-- Another condition -->
- [ ] All acceptance criteria from prd.json met
- [ ] Quality gates pass (typecheck, lint, test)
## Evaluation Criteria
What the evaluator will specifically check:
- [ ] <!-- Specific verification step -->
- [ ] <!-- Another verification step -->
- [ ] No regressions in existing functionality
## Out of Scope
Things explicitly NOT part of this story:
- <!-- Thing 1 -->
- <!-- Thing 2 -->
## Key Files
Files likely to be created or modified:
- <!-- path/to/file.ext — what changes -->
## Dependencies
- Depends on: <!-- story IDs or "none" -->
- Blocks: <!-- story IDs or "none" -->

View File

@@ -0,0 +1,54 @@
{
"project": "MyApp",
"branchName": "loop/add-user-auth",
"description": "Add user authentication with OAuth providers",
"userStories": [
{
"id": "US-001",
"title": "Add users table with OAuth fields",
"description": "As a developer, I need a users table that stores OAuth provider info so we can persist authenticated users.",
"acceptanceCriteria": [
"Create users table with id, email, name, oauth_provider, oauth_id, created_at columns",
"Generate and run migration successfully",
"Typecheck passes",
"Unit test for model creation passes"
],
"priority": 1,
"passes": false,
"notes": "",
"rejections": 0
},
{
"id": "US-002",
"title": "Implement OAuth callback endpoint",
"description": "As a user, I want to sign in with Google so I can access my account without creating a password.",
"acceptanceCriteria": [
"GET /auth/callback accepts OAuth authorization code",
"Exchanges code for access token with provider",
"Creates or updates user record",
"Returns JWT session token",
"Typecheck passes",
"Integration test for OAuth flow passes"
],
"priority": 2,
"passes": false,
"notes": "",
"rejections": 0
},
{
"id": "US-003",
"title": "Add login page with OAuth button",
"description": "As a user, I want a login page with a 'Sign in with Google' button so I can authenticate.",
"acceptanceCriteria": [
"Login page renders with OAuth button",
"Button redirects to provider authorization URL",
"Typecheck passes",
"Verify UI renders correctly in browser"
],
"priority": 3,
"passes": false,
"notes": "",
"rejections": 0
}
]
}

View File

@@ -0,0 +1,13 @@
# Progress
## Codebase Patterns
<!-- Consolidate reusable patterns discovered across iterations here.
This section is READ FIRST by every new iteration.
Keep it concise — patterns only, not implementation details. -->
---
## Session Log
<!-- Each iteration appends a progress entry below. Never delete entries. -->

View File

@@ -0,0 +1,29 @@
# Triage Brief
Generated: <!-- insert current date and time -->
Mode: <!-- insert mode from config.json -->
Iterations completed: <!-- insert completed count --> of <!-- insert max iterations -->
## Status
Stories: <!-- insert completed count --> of <!-- insert total count --> complete
| ID | Title | Status | Rejections |
|----|-------|--------|------------|
<!-- Populated by /loop-triage skill -->
## Key Findings
<!-- Consolidated from progress.md session log -->
## Blockers
<!-- Stories that failed repeatedly or hit retry limits -->
## Recommended Next Steps
<!-- What a human should do next -->
## Files of Interest
<!-- Key files discovered or modified, with context -->