feat: agent loop harness with Claude Code plugin support
Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
This commit is contained in:
14
.claude-plugin/marketplace.json
Normal file
14
.claude-plugin/marketplace.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"name": "agent-loop",
|
||||
"plugins": [
|
||||
{
|
||||
"name": "agent-loop",
|
||||
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.",
|
||||
"version": "0.1.0",
|
||||
"source": {
|
||||
"source": "github",
|
||||
"repo": "https://git.jagfly.com/sheldon/loop-loop.git"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
11
.claude-plugin/plugin.json
Normal file
11
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,11 @@
|
||||
{
|
||||
"name": "agent-loop",
|
||||
"version": "0.1.0",
|
||||
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan with /agent-loop:init, then execute with /agent-loop:run.",
|
||||
"author": {
|
||||
"name": "Sheldon"
|
||||
},
|
||||
"repository": "https://git.jagfly.com/sheldon/loop-loop.git",
|
||||
"license": "MIT",
|
||||
"keywords": ["agent", "loop", "autonomous", "generator", "evaluator", "harness"]
|
||||
}
|
||||
19
.gitignore
vendored
Normal file
19
.gitignore
vendored
Normal file
@@ -0,0 +1,19 @@
|
||||
# Runtime artifacts (generated per-project, not part of the harness)
|
||||
prd.json
|
||||
progress.md
|
||||
progress-archive.md
|
||||
config.json
|
||||
init.sh
|
||||
contracts/
|
||||
triage/
|
||||
archive/
|
||||
.archive-staging/
|
||||
.last-branch
|
||||
.loop.lock
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Claude Code
|
||||
.claude/
|
||||
166
README.md
Normal file
166
README.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Agent Loop
|
||||
|
||||
Autonomous AI agent harness that combines a generator-evaluator architecture with iterative context-reset patterns for long-running coding tasks.
|
||||
|
||||
Inspired by [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/) and [Anthropic's harness design research](https://www.anthropic.com/engineering/harness-design-long-running-apps).
|
||||
|
||||
A generator-evaluator loop runs fresh agent instances per iteration. Each iteration: a **Generator** does the work, then an **Evaluator** verifies it. Human judgment stays in the planning phase; execution is autonomous.
|
||||
|
||||
Two execution modes: **headless** via `loop.sh` (fully autonomous bash process) or **interactive** via `/loop-run` (Claude Code-native with full visibility and intervention).
|
||||
|
||||
## Install
|
||||
|
||||
### As a Claude Code Plugin (Recommended)
|
||||
|
||||
```
|
||||
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
|
||||
/plugin install agent-loop@agent-loop
|
||||
```
|
||||
|
||||
Then in any project:
|
||||
|
||||
```
|
||||
/agent-loop:init # Set up the loop for your project
|
||||
/agent-loop:plan # Generate PRD and sprint contracts
|
||||
/agent-loop:run # Run the loop interactively
|
||||
```
|
||||
|
||||
### Manual Install
|
||||
|
||||
```bash
|
||||
# Clone into your project
|
||||
cp -r /path/to/loop-loop .loop
|
||||
|
||||
# Install skills as Claude Code commands
|
||||
mkdir -p .claude/commands
|
||||
for skill in loop-init loop-plan loop-run loop-triage; do
|
||||
ln -sf "../../.loop/skills/$skill/SKILL.md" ".claude/commands/$skill.md"
|
||||
done
|
||||
|
||||
# Then in Claude Code:
|
||||
/loop-init && /loop-plan && /loop-run
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
```
|
||||
[You + Claude Code] [Loop Execution]
|
||||
|
||||
/agent-loop:init Interactive (/agent-loop:run)
|
||||
→ scaffolds .loop/ └─ dispatches Agent subagents
|
||||
→ detects project └─ visible tool calls, can intervene
|
||||
→ picks mode └─ chat mid-loop to adjust course
|
||||
→ creates config.json
|
||||
Headless (.loop/loop.sh)
|
||||
/agent-loop:plan └─ spawns claude --print per iteration
|
||||
→ asks clarifying questions └─ fully autonomous, no UI
|
||||
→ generates prd.json
|
||||
→ generates sprint contracts Both paths:
|
||||
→ populates progress.md ├─→ Generator → picks story → implements → commits
|
||||
├─→ Evaluator → verifies → PASS or REJECT
|
||||
├─→ next iteration...
|
||||
└─→ all stories pass → done
|
||||
```
|
||||
|
||||
## Modes
|
||||
|
||||
| Mode | What it does | Git writes? |
|
||||
|------|-------------|-------------|
|
||||
| **implement** | Build features from a PRD | Yes |
|
||||
| **explore** | Read-only codebase analysis | No |
|
||||
| **fix** | Targeted bug fixes / tech debt | Yes |
|
||||
|
||||
## Running the Loop
|
||||
|
||||
### Option A: Interactive (`/loop-run`) — Recommended
|
||||
|
||||
Run inside Claude Code. You see every tool call, file edit, and test run. You can intervene at any point — deny a tool call, chat to adjust course, or stop the loop.
|
||||
|
||||
```
|
||||
/loop-run # Run until done or max iterations
|
||||
/loop-run 3 # Run at most 3 iterations
|
||||
/loop-run --skip-eval # Skip evaluator pass
|
||||
/loop-run --story US-003 # Run only a specific story
|
||||
```
|
||||
|
||||
### Option B: Headless (`loop.sh`)
|
||||
|
||||
Run as a standalone bash process. Fully autonomous — no UI, no intervention. Useful for background execution or CI.
|
||||
|
||||
```bash
|
||||
.loop/loop.sh [options]
|
||||
|
||||
--mode <implement|explore|fix> Operating mode
|
||||
--max <N> Maximum iterations (default: 20)
|
||||
--skip-eval Skip evaluator pass
|
||||
--tool <claude|amp> AI tool to use
|
||||
--no-hooks Don't install stop hooks
|
||||
--dry-run Print assembled prompts without running agents
|
||||
--resume Skip already-passed stories (explicit exit when none remain)
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Generator
|
||||
Fresh Claude Code instance each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.
|
||||
|
||||
### Evaluator
|
||||
Separate fresh instance after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests independently, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back to the generator with specific feedback.
|
||||
|
||||
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.
|
||||
|
||||
### Sprint Contracts
|
||||
Before the loop starts, `/loop-plan` generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
|
||||
|
||||
### State Persistence
|
||||
|
||||
| Artifact | Purpose |
|
||||
|----------|---------|
|
||||
| `prd.json` | Story status (pass/fail), acceptance criteria |
|
||||
| `progress.md` | Append-only session log + codebase patterns |
|
||||
| `contracts/` | Sprint contracts per story |
|
||||
| `config.json` | Harness configuration |
|
||||
| Git commits | Code changes with story-tagged messages |
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
.loop/
|
||||
loop.sh # Main loop orchestrator
|
||||
config.json # Project config (generated by /loop-init)
|
||||
init.sh # Project setup script (generated by /loop-init)
|
||||
prd.json # Active PRD (generated by /loop-plan)
|
||||
progress.md # Cross-session memory (append-only)
|
||||
|
||||
prompts/
|
||||
generator/_base.md # Shared generator instructions
|
||||
generator/implement.md # Implement mode overlay
|
||||
generator/explore.md # Explore mode overlay
|
||||
generator/fix.md # Fix mode overlay
|
||||
evaluator/_base.md # Skeptical evaluator base
|
||||
evaluator/implement.md # Implement verification
|
||||
evaluator/explore.md # Analysis verification
|
||||
evaluator/fix.md # Fix verification
|
||||
planner/plan.md # Planning context
|
||||
|
||||
templates/ # Reference templates
|
||||
lib/ # Shell library functions
|
||||
skills/ # Claude Code skills (/loop-init, /loop-plan, /loop-run, /loop-triage)
|
||||
contracts/ # Sprint contracts (generated by /loop-plan)
|
||||
triage/ # Analysis output (explore mode)
|
||||
archive/ # Completed feature archives
|
||||
```
|
||||
|
||||
## Design Principles
|
||||
|
||||
- **Fresh context per iteration** — no accumulated hallucination drift
|
||||
- **Separate generation from evaluation** — external skepticism is easier to tune than self-criticism
|
||||
- **Human judgment for planning, AI for execution** — interactive `/loop-plan`, autonomous loop
|
||||
- **Structured handoffs via artifacts** — not conversation memory
|
||||
- **No git revert on rejection** — next generator sees partial work + feedback (more signal)
|
||||
- **Advisory scope budgets** — prompt-enforced limits on files read/written per iteration
|
||||
|
||||
## Credits
|
||||
|
||||
- [Geoffrey Huntley](https://ghuntley.com/ralph/) — original Ralph pattern
|
||||
- [Anthropic Engineering](https://www.anthropic.com/engineering/harness-design-long-running-apps) — generator-evaluator harness design
|
||||
26
config.json.example
Normal file
26
config.json.example
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"tool": "claude",
|
||||
"mode": "implement",
|
||||
"maxIterations": 20,
|
||||
"skipEval": false,
|
||||
"evalRetries": 2,
|
||||
"autoHooks": true,
|
||||
"branchPrefix": "loop/",
|
||||
"scopeBudgets": {
|
||||
"explore": {
|
||||
"maxFilesToRead": 15,
|
||||
"maxLinesToWrite": 0,
|
||||
"maxFilesToModify": 0
|
||||
},
|
||||
"implement": {
|
||||
"maxFilesToRead": 50,
|
||||
"maxLinesToWrite": 500,
|
||||
"maxFilesToModify": 10
|
||||
},
|
||||
"fix": {
|
||||
"maxFilesToRead": 30,
|
||||
"maxLinesToWrite": 200,
|
||||
"maxFilesToModify": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
56
init.sh.example
Normal file
56
init.sh.example
Normal file
@@ -0,0 +1,56 @@
|
||||
#!/bin/bash
|
||||
# Project-specific initialization for the agent loop.
|
||||
# Copy this to .loop/init.sh and customize for your project.
|
||||
#
|
||||
# This script runs at the start of each loop.sh invocation to ensure
|
||||
# the development environment is ready. Keep it idempotent (safe to run multiple times).
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
echo "[init] Setting up development environment..."
|
||||
|
||||
# --- Dependencies ---
|
||||
# Uncomment and adapt for your project:
|
||||
|
||||
# Node.js
|
||||
# if [ -f package.json ]; then
|
||||
# npm install --silent
|
||||
# fi
|
||||
|
||||
# Python
|
||||
# if [ -f requirements.txt ]; then
|
||||
# pip install -q -r requirements.txt
|
||||
# fi
|
||||
|
||||
# Go
|
||||
# if [ -f go.mod ]; then
|
||||
# go mod download
|
||||
# fi
|
||||
|
||||
# Rust
|
||||
# if [ -f Cargo.toml ]; then
|
||||
# cargo build --quiet
|
||||
# fi
|
||||
|
||||
# --- Dev Server ---
|
||||
# Start if not already running:
|
||||
|
||||
# if ! lsof -i :3000 &>/dev/null; then
|
||||
# npm run dev &
|
||||
# sleep 3
|
||||
# fi
|
||||
|
||||
# --- Database ---
|
||||
# Run migrations if needed:
|
||||
|
||||
# npm run migrate
|
||||
# python manage.py migrate
|
||||
# alembic upgrade head
|
||||
|
||||
# --- Verify ---
|
||||
# Quick smoke test:
|
||||
|
||||
# npm run typecheck
|
||||
# npm run test -- --run --silent
|
||||
|
||||
echo "[init] Environment ready."
|
||||
108
install.sh
Executable file
108
install.sh
Executable file
@@ -0,0 +1,108 @@
|
||||
#!/bin/bash
|
||||
# Install Agent Loop globally for Claude Code.
|
||||
#
|
||||
# What this does:
|
||||
# 1. Copies the harness to ~/.claude/loop/ (prompts, templates, lib, loop.sh)
|
||||
# 2. Installs skills as Claude Code commands at ~/.claude/commands/
|
||||
#
|
||||
# After install, use /loop-init in any project to get started.
|
||||
#
|
||||
# Usage:
|
||||
# ./install.sh # Install
|
||||
# ./install.sh --uninstall # Remove
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
CLAUDE_DIR="$HOME/.claude"
|
||||
HARNESS_DIR="$CLAUDE_DIR/loop"
|
||||
COMMANDS_DIR="$CLAUDE_DIR/commands"
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
|
||||
SKILLS=(loop-init loop-plan loop-run loop-triage)
|
||||
|
||||
# --- Colors (if terminal supports them) ---
|
||||
if [ -t 1 ]; then
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[0;33m'
|
||||
RED='\033[0;31m'
|
||||
BOLD='\033[1m'
|
||||
RESET='\033[0m'
|
||||
else
|
||||
GREEN='' YELLOW='' RED='' BOLD='' RESET=''
|
||||
fi
|
||||
|
||||
info() { echo -e "${GREEN}[loop]${RESET} $*"; }
|
||||
warn() { echo -e "${YELLOW}[loop]${RESET} $*"; }
|
||||
error() { echo -e "${RED}[loop]${RESET} $*"; }
|
||||
|
||||
# --- Uninstall ---
|
||||
if [[ "${1:-}" == "--uninstall" ]]; then
|
||||
info "Uninstalling Agent Loop..."
|
||||
|
||||
if [ -d "$HARNESS_DIR" ]; then
|
||||
rm -rf "$HARNESS_DIR"
|
||||
info "Removed $HARNESS_DIR"
|
||||
fi
|
||||
|
||||
for skill in "${SKILLS[@]}"; do
|
||||
cmd="$COMMANDS_DIR/$skill.md"
|
||||
if [ -f "$cmd" ]; then
|
||||
rm -f "$cmd"
|
||||
info "Removed $cmd"
|
||||
fi
|
||||
done
|
||||
|
||||
info "Done. Per-project .loop/ directories are untouched."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- Install ---
|
||||
info "Installing Agent Loop..."
|
||||
|
||||
# Ensure ~/.claude/ exists
|
||||
mkdir -p "$CLAUDE_DIR"
|
||||
|
||||
# Copy harness (prompts, templates, lib, loop.sh, config example)
|
||||
if [ -d "$HARNESS_DIR" ]; then
|
||||
warn "Updating existing install at $HARNESS_DIR"
|
||||
rm -rf "$HARNESS_DIR"
|
||||
fi
|
||||
|
||||
mkdir -p "$HARNESS_DIR"
|
||||
cp -r "$SCRIPT_DIR/prompts" "$HARNESS_DIR/"
|
||||
cp -r "$SCRIPT_DIR/templates" "$HARNESS_DIR/"
|
||||
cp -r "$SCRIPT_DIR/lib" "$HARNESS_DIR/"
|
||||
cp -r "$SCRIPT_DIR/skills" "$HARNESS_DIR/"
|
||||
cp "$SCRIPT_DIR/loop.sh" "$HARNESS_DIR/"
|
||||
cp "$SCRIPT_DIR/config.json.example" "$HARNESS_DIR/"
|
||||
cp "$SCRIPT_DIR/init.sh.example" "$HARNESS_DIR/"
|
||||
chmod +x "$HARNESS_DIR/loop.sh"
|
||||
|
||||
info "Harness installed to $HARNESS_DIR"
|
||||
|
||||
# Install Claude Code commands
|
||||
mkdir -p "$COMMANDS_DIR"
|
||||
|
||||
for skill in "${SKILLS[@]}"; do
|
||||
src="$HARNESS_DIR/skills/$skill/SKILL.md"
|
||||
dest="$COMMANDS_DIR/$skill.md"
|
||||
|
||||
if [ -f "$src" ]; then
|
||||
cp "$src" "$dest"
|
||||
info "Installed /$skill command"
|
||||
else
|
||||
warn "Skill not found: $src (skipping)"
|
||||
fi
|
||||
done
|
||||
|
||||
echo ""
|
||||
info "${BOLD}Installation complete.${RESET}"
|
||||
echo ""
|
||||
echo " Next steps (inside Claude Code, in any project):"
|
||||
echo ""
|
||||
echo " /loop-init # Set up the loop for your project"
|
||||
echo " /loop-plan # Generate PRD and sprint contracts"
|
||||
echo " /loop-run # Run the loop interactively"
|
||||
echo ""
|
||||
echo " Or run headless: .loop/loop.sh"
|
||||
echo ""
|
||||
83
lib/archive.sh
Normal file
83
lib/archive.sh
Normal file
@@ -0,0 +1,83 @@
|
||||
#!/bin/bash
|
||||
# Branch archiving — archives previous run artifacts when the branch changes.
|
||||
# Preserves prd.json, progress.md, and contracts from the previous feature.
|
||||
#
|
||||
# Design: At the end of each run, snapshot_for_archive saves current artifacts
|
||||
# to .archive-staging/. On the next run, if the branch changed, check_archive
|
||||
# moves the snapshot to archive/ and cleans up. This avoids archiving the
|
||||
# WRONG artifacts (the new feature's) when prd.json has already been overwritten.
|
||||
|
||||
LAST_BRANCH_FILE="$LOOP_DIR/.last-branch"
|
||||
STAGING_DIR="$LOOP_DIR/.archive-staging"
|
||||
|
||||
# Snapshot current artifacts so they can be archived later if the branch changes.
|
||||
# Call this at the END of a successful run or before exit.
|
||||
snapshot_for_archive() {
|
||||
rm -rf "$STAGING_DIR"
|
||||
mkdir -p "$STAGING_DIR"
|
||||
|
||||
[ -f "$LOOP_DIR/prd.json" ] && cp "$LOOP_DIR/prd.json" "$STAGING_DIR/"
|
||||
[ -f "$LOOP_DIR/progress.md" ] && cp "$LOOP_DIR/progress.md" "$STAGING_DIR/"
|
||||
[ -d "$LOOP_DIR/contracts" ] && cp -r "$LOOP_DIR/contracts" "$STAGING_DIR/"
|
||||
}
|
||||
|
||||
# Check if we need to archive and do so if branch changed.
|
||||
# Reads the NEW branch from live prd.json and the OLD branch from the staging
|
||||
# snapshot (which was saved at the end of the previous run). This avoids the
|
||||
# bug where both branches read from the same (already-overwritten) prd.json.
|
||||
check_archive() {
|
||||
local current_branch
|
||||
current_branch=$(prd_branch_name 2>/dev/null)
|
||||
[ -z "$current_branch" ] && return
|
||||
|
||||
# Determine the previous branch from the staging snapshot (most reliable)
|
||||
# or fall back to .last-branch file
|
||||
local last_branch=""
|
||||
if [ -f "$STAGING_DIR/prd.json" ]; then
|
||||
if command -v jq &>/dev/null; then
|
||||
last_branch=$(jq -r '.branchName // empty' "$STAGING_DIR/prd.json" 2>/dev/null)
|
||||
else
|
||||
last_branch=$(LOOP_PRD="$STAGING_DIR/prd.json" python3 -c "
|
||||
import json, os
|
||||
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
|
||||
" 2>/dev/null)
|
||||
fi
|
||||
fi
|
||||
[ -z "$last_branch" ] && [ -f "$LAST_BRANCH_FILE" ] && last_branch=$(cat "$LAST_BRANCH_FILE")
|
||||
|
||||
if [ -n "$last_branch" ] && [ "$last_branch" != "$current_branch" ]; then
|
||||
archive_run "$last_branch"
|
||||
fi
|
||||
|
||||
echo "$current_branch" > "$LAST_BRANCH_FILE"
|
||||
}
|
||||
|
||||
# Archive the previous run's staged artifacts (NOT current prd.json)
|
||||
archive_run() {
|
||||
local branch_name="$1"
|
||||
local feature_name
|
||||
feature_name=$(echo "$branch_name" | sed 's|.*/||')
|
||||
|
||||
local archive_dir="$LOOP_DIR/archive/$(date +%Y-%m-%d)-${feature_name}"
|
||||
mkdir -p "$archive_dir"
|
||||
|
||||
if [ -d "$STAGING_DIR" ]; then
|
||||
# Use the staged snapshot (correct artifacts from the previous run)
|
||||
cp -r "$STAGING_DIR"/* "$archive_dir/" 2>/dev/null || true
|
||||
rm -rf "$STAGING_DIR"
|
||||
else
|
||||
# Fallback: no snapshot exists (first run or upgrade from old version).
|
||||
# Current artifacts may belong to the new feature — archive what we have
|
||||
# but warn the user.
|
||||
log "WARNING: No archive snapshot found. Archiving current artifacts (may be from new feature)."
|
||||
[ -f "$LOOP_DIR/prd.json" ] && cp "$LOOP_DIR/prd.json" "$archive_dir/"
|
||||
[ -f "$LOOP_DIR/progress.md" ] && cp "$LOOP_DIR/progress.md" "$archive_dir/"
|
||||
[ -d "$LOOP_DIR/contracts" ] && cp -r "$LOOP_DIR/contracts" "$archive_dir/"
|
||||
fi
|
||||
|
||||
# Clean up old run's artifacts (progress.md, contracts — NOT prd.json which belongs to new feature)
|
||||
rm -f "$LOOP_DIR/progress.md"
|
||||
rm -rf "$LOOP_DIR/contracts"
|
||||
|
||||
log "Archived previous run to $archive_dir"
|
||||
}
|
||||
19
lib/hooks.sh
Normal file
19
lib/hooks.sh
Normal file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
# Stop hook management for Claude Code loop continuation.
|
||||
#
|
||||
# NOTE: Hooks are currently no-ops. The loop uses `claude --print` (non-interactive),
|
||||
# which runs to completion and exits naturally — no Stop hook is needed to signal
|
||||
# iteration boundaries. The install/remove interface is preserved so that a future
|
||||
# interactive mode can be added without changing loop.sh's call sites.
|
||||
#
|
||||
# If interactive mode is added, the hook mechanism will need redesign: `kill -INT $PPID`
|
||||
# targets the hook runner's parent (Claude Code), not loop.sh. A sentinel-file or
|
||||
# named-pipe approach would be more reliable.
|
||||
|
||||
install_hooks() {
|
||||
: # no-op — see note above
|
||||
}
|
||||
|
||||
remove_hooks() {
|
||||
: # no-op — see note above
|
||||
}
|
||||
95
lib/prompt.sh
Normal file
95
lib/prompt.sh
Normal file
@@ -0,0 +1,95 @@
|
||||
#!/bin/bash
|
||||
# Prompt assembly — composes the final prompt from base + mode overlay.
|
||||
# Injects runtime variables (scope budgets, current story, iteration count).
|
||||
|
||||
# Build the complete prompt for a given agent role and mode.
|
||||
# Usage: build_prompt "generator" "implement"
|
||||
# build_prompt "evaluator" "implement"
|
||||
build_prompt() {
|
||||
local role="$1" # generator | evaluator
|
||||
local mode="$2" # implement | explore | fix
|
||||
|
||||
local base_file="$LOOP_DIR/prompts/${role}/_base.md"
|
||||
local mode_file="$LOOP_DIR/prompts/${role}/${mode}.md"
|
||||
|
||||
local prompt=""
|
||||
|
||||
# Start with base prompt
|
||||
if [ -f "$base_file" ]; then
|
||||
prompt=$(cat "$base_file")
|
||||
else
|
||||
log "WARNING: Missing base prompt: $base_file"
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Append mode-specific overlay
|
||||
if [ -f "$mode_file" ]; then
|
||||
prompt="${prompt}
|
||||
|
||||
---
|
||||
|
||||
$(cat "$mode_file")"
|
||||
else
|
||||
log "WARNING: Missing mode prompt: $mode_file"
|
||||
fi
|
||||
|
||||
# Inject runtime variables
|
||||
prompt=$(inject_variables "$prompt" "$mode")
|
||||
|
||||
printf '%s\n' "$prompt"
|
||||
}
|
||||
|
||||
# Replace template variables in prompt text
|
||||
inject_variables() {
|
||||
local text="$1"
|
||||
local mode="$2"
|
||||
|
||||
# Scope budgets from config
|
||||
local max_read max_write max_modify
|
||||
max_read=$(get_config_value ".scopeBudgets.${mode}.maxFilesToRead" "50")
|
||||
max_write=$(get_config_value ".scopeBudgets.${mode}.maxLinesToWrite" "500")
|
||||
max_modify=$(get_config_value ".scopeBudgets.${mode}.maxFilesToModify" "10")
|
||||
|
||||
text="${text//\{\{MAX_FILES_TO_READ\}\}/$max_read}"
|
||||
text="${text//\{\{MAX_LINES_TO_WRITE\}\}/$max_write}"
|
||||
text="${text//\{\{MAX_FILES_TO_MODIFY\}\}/$max_modify}"
|
||||
text="${text//\{\{MODE\}\}/$mode}"
|
||||
text="${text//\{\{ITERATION\}\}/$ITERATION}"
|
||||
text="${text//\{\{MAX_ITERATIONS\}\}/$MAX_ITERATIONS}"
|
||||
text="${text//\{\{LOOP_DIR\}\}/$LOOP_DIR}"
|
||||
text="${text//\{\{PROJECT_ROOT\}\}/$PROJECT_ROOT}"
|
||||
text="${text//\{\{CURRENT_STORY_ID\}\}/${CURRENT_STORY_ID:-unknown}}"
|
||||
text="${text//\{\{PRE_GENERATOR_SHA\}\}/${PRE_GENERATOR_SHA:-HEAD~1}}"
|
||||
|
||||
printf '%s\n' "$text"
|
||||
}
|
||||
|
||||
# Read a value from config.json with a default fallback
|
||||
get_config_value() {
|
||||
local path="$1"
|
||||
local default="$2"
|
||||
local config="$LOOP_DIR/config.json"
|
||||
|
||||
[ -f "$config" ] || { echo "$default"; return; }
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local val
|
||||
val=$(jq -r "$path // empty" "$config" 2>/dev/null)
|
||||
echo "${val:-$default}"
|
||||
else
|
||||
LOOP_CONFIG="$config" LOOP_PATH="$path" LOOP_DEFAULT="$default" python3 -c "
|
||||
import json, os
|
||||
d = json.load(open(os.environ['LOOP_CONFIG']))
|
||||
keys = os.environ['LOOP_PATH'].lstrip('.').split('.')
|
||||
for k in keys:
|
||||
d = d.get(k) if isinstance(d, dict) else None
|
||||
if d is None:
|
||||
break
|
||||
val = d if d is not None and d != {} else os.environ['LOOP_DEFAULT']
|
||||
# Normalize Python booleans to lowercase for shell compatibility
|
||||
if isinstance(val, bool):
|
||||
val = str(val).lower()
|
||||
print(val, end='')
|
||||
"
|
||||
fi
|
||||
}
|
||||
359
lib/state.sh
Normal file
359
lib/state.sh
Normal file
@@ -0,0 +1,359 @@
|
||||
#!/bin/bash
|
||||
# State management for prd.json and progress.md.
|
||||
# Provides functions to query story status, update pass/fail, and append progress.
|
||||
|
||||
# Requires: jq (preferred) or python3 (fallback)
|
||||
|
||||
# --- PRD Validation ---
|
||||
|
||||
validate_prd() {
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
[ -f "$prd" ] || return 0 # no prd.json is handled elsewhere
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
if ! jq -e '.userStories | type == "array" and length > 0' "$prd" >/dev/null 2>&1; then
|
||||
log "ERROR: prd.json is missing or has no userStories array"
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
LOOP_PRD="$prd" python3 -c "
|
||||
import json, sys, os
|
||||
d = json.load(open(os.environ['LOOP_PRD']))
|
||||
stories = d.get('userStories', [])
|
||||
if not isinstance(stories, list) or len(stories) == 0:
|
||||
print('[loop] ERROR: prd.json is missing or has no userStories array', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- PRD Queries ---
|
||||
|
||||
# Get the ID of the highest-priority incomplete story (skips blocked stories)
|
||||
next_story_id() {
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
[ -f "$prd" ] || return 1
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
jq -r '[.userStories[] | select(.passes == false and .blocked != true)] | sort_by(.priority // 999) | .[0].id // empty' "$prd"
|
||||
else
|
||||
LOOP_PRD="$prd" python3 -c "
|
||||
import json, os
|
||||
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
|
||||
pending = sorted([s for s in stories if not s['passes'] and not s.get('blocked')], key=lambda s: s.get('priority', 999))
|
||||
print(pending[0]['id'] if pending else '', end='')
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if all actionable stories are done (passed or blocked)
|
||||
all_stories_pass() {
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
[ -f "$prd" ] || return 1
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local actionable
|
||||
actionable=$(jq '[.userStories[] | select(.passes == false and .blocked != true)] | length' "$prd")
|
||||
[ "$actionable" -eq 0 ]
|
||||
else
|
||||
LOOP_PRD="$prd" python3 -c "
|
||||
import json, sys, os
|
||||
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
|
||||
actionable = [s for s in stories if not s['passes'] and not s.get('blocked')]
|
||||
sys.exit(0 if len(actionable) == 0 else 1)
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if any stories are blocked
|
||||
any_stories_blocked() {
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
[ -f "$prd" ] || return 1
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local blocked
|
||||
blocked=$(jq '[.userStories[] | select(.blocked == true)] | length' "$prd")
|
||||
[ "$blocked" -gt 0 ]
|
||||
else
|
||||
LOOP_PRD="$prd" python3 -c "
|
||||
import json, sys, os
|
||||
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
|
||||
blocked = [s for s in stories if s.get('blocked')]
|
||||
sys.exit(0 if len(blocked) > 0 else 1)
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# Get total and completed story counts
|
||||
story_counts() {
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
[ -f "$prd" ] || { echo "0/0"; return; }
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local total passed
|
||||
total=$(jq '.userStories | length' "$prd")
|
||||
passed=$(jq '[.userStories[] | select(.passes == true)] | length' "$prd")
|
||||
echo "${passed}/${total}"
|
||||
else
|
||||
LOOP_PRD="$prd" python3 -c "
|
||||
import json, os
|
||||
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
|
||||
passed = sum(1 for s in stories if s['passes'])
|
||||
print(f'{passed}/{len(stories)}', end='')
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- PRD Mutations ---
|
||||
|
||||
# Mark a story as passed
|
||||
mark_story_pass() {
|
||||
local story_id="$1"
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local updated
|
||||
updated=$(jq --arg id "$story_id" \
|
||||
'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)).passes = true else error("Story not found: \($id)") end' \
|
||||
"$prd" 2>&1) || { log "WARNING: mark_story_pass failed for '$story_id'"; return 1; }
|
||||
printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd"
|
||||
else
|
||||
LOOP_STORY_ID="$story_id" LOOP_PRD="$prd" python3 -c "
|
||||
import json, pathlib, os, sys
|
||||
p = pathlib.Path(os.environ['LOOP_PRD'])
|
||||
d = json.loads(p.read_text())
|
||||
story_id = os.environ['LOOP_STORY_ID']
|
||||
found = False
|
||||
for s in d['userStories']:
|
||||
if s['id'] == story_id:
|
||||
s['passes'] = True
|
||||
found = True
|
||||
break
|
||||
if not found:
|
||||
print(f'[loop] WARNING: mark_story_pass failed for {story_id!r}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
p.write_text(json.dumps(d, indent=2))
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# Mark a story as failed with rejection reason
|
||||
mark_story_reject() {
|
||||
local story_id="$1"
|
||||
local reason="$2"
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local updated
|
||||
updated=$(jq --arg id "$story_id" --arg reason "$reason" \
|
||||
'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)) |= (.passes = false | .rejections = ((.rejections // 0) + 1) | .notes = ((.notes // "") + "\n[REJECTED] " + $reason)) else error("Story not found: \($id)") end' \
|
||||
"$prd" 2>&1) || { log "WARNING: mark_story_reject failed for '$story_id'"; return 1; }
|
||||
printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd"
|
||||
else
|
||||
# Pass reason via env var to avoid shell injection from evaluator output
|
||||
LOOP_STORY_ID="$story_id" LOOP_REASON="$reason" LOOP_PRD="$prd" python3 -c "
|
||||
import json, pathlib, os
|
||||
p = pathlib.Path(os.environ['LOOP_PRD'])
|
||||
d = json.loads(p.read_text())
|
||||
story_id = os.environ['LOOP_STORY_ID']
|
||||
reason = os.environ['LOOP_REASON']
|
||||
for s in d['userStories']:
|
||||
if s['id'] == story_id:
|
||||
s['passes'] = False
|
||||
s['rejections'] = s.get('rejections', 0) + 1
|
||||
s['notes'] = s.get('notes', '') + '\n[REJECTED] ' + reason
|
||||
break
|
||||
p.write_text(json.dumps(d, indent=2))
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# Get rejection count for a story
|
||||
story_rejections() {
|
||||
local story_id="$1"
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
jq -r --arg id "$story_id" \
|
||||
'.userStories[] | select(.id == $id) | .rejections // 0' "$prd"
|
||||
else
|
||||
LOOP_PRD="$prd" LOOP_STORY_ID="$story_id" python3 -c "
|
||||
import json, os
|
||||
stories = json.load(open(os.environ['LOOP_PRD']))['userStories']
|
||||
story_id = os.environ['LOOP_STORY_ID']
|
||||
for s in stories:
|
||||
if s['id'] == story_id:
|
||||
print(s.get('rejections', 0), end='')
|
||||
break
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# Mark a story as blocked (needs human review, skip in future iterations)
|
||||
mark_story_blocked() {
|
||||
local story_id="$1"
|
||||
local reason="$2"
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
local updated
|
||||
updated=$(jq --arg id "$story_id" --arg reason "$reason" \
|
||||
'if any(.userStories[]; .id == $id) then (.userStories[] | select(.id == $id)) |= (.blocked = true | .notes = ((.notes // "") + "\n[BLOCKED] " + $reason)) else error("Story not found: \($id)") end' \
|
||||
"$prd" 2>&1) || { log "WARNING: mark_story_blocked failed for '$story_id'"; return 1; }
|
||||
printf '%s\n' "$updated" > "${prd}.tmp" && mv "${prd}.tmp" "$prd"
|
||||
else
|
||||
LOOP_STORY_ID="$story_id" LOOP_REASON="$reason" LOOP_PRD="$prd" python3 -c "
|
||||
import json, pathlib, os
|
||||
p = pathlib.Path(os.environ['LOOP_PRD'])
|
||||
d = json.loads(p.read_text())
|
||||
story_id = os.environ['LOOP_STORY_ID']
|
||||
reason = os.environ['LOOP_REASON']
|
||||
for s in d['userStories']:
|
||||
if s['id'] == story_id:
|
||||
s['blocked'] = True
|
||||
s['notes'] = s.get('notes', '') + '\n[BLOCKED] ' + reason
|
||||
break
|
||||
p.write_text(json.dumps(d, indent=2))
|
||||
"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- Progress ---
|
||||
|
||||
MAX_PROGRESS_ENTRIES=15
|
||||
|
||||
# Append a progress entry, rotating old entries to archive when limit is reached
|
||||
append_progress() {
|
||||
local entry="$1"
|
||||
local progress="$LOOP_DIR/progress.md"
|
||||
|
||||
if [ ! -f "$progress" ]; then
|
||||
cp "$LOOP_DIR/templates/progress.md.template" "$progress" 2>/dev/null || \
|
||||
printf "# Progress\n\n## Codebase Patterns\n\n---\n\n## Session Log\n" > "$progress"
|
||||
fi
|
||||
|
||||
printf "\n%s\n" "$entry" >> "$progress"
|
||||
|
||||
rotate_progress
|
||||
}
|
||||
|
||||
# Archive old session log entries to keep progress.md from growing unbounded.
|
||||
# Preserves the Codebase Patterns section and keeps only the last N entries.
|
||||
rotate_progress() {
|
||||
local progress="$LOOP_DIR/progress.md"
|
||||
[ -f "$progress" ] || return
|
||||
|
||||
# Count session entries by counting "### " headers after the Session Log marker.
|
||||
# Using headers instead of "---" separators avoids false positives from markdown
|
||||
# code blocks or horizontal rules inside entries.
|
||||
local entry_count
|
||||
local session_start
|
||||
session_start=$(grep -n '## Session Log' "$progress" | head -1 | cut -d: -f1)
|
||||
if [ -z "$session_start" ]; then
|
||||
return
|
||||
fi
|
||||
entry_count=$(tail -n +"$session_start" "$progress" | grep -c '^### ' 2>/dev/null || echo "0")
|
||||
|
||||
if [ "$entry_count" -le "$MAX_PROGRESS_ENTRIES" ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
local archive="$LOOP_DIR/progress-archive.md"
|
||||
|
||||
if command -v python3 &>/dev/null; then
|
||||
LOOP_PROGRESS="$progress" LOOP_ARCHIVE="$archive" \
|
||||
LOOP_MAX_ENTRIES="$MAX_PROGRESS_ENTRIES" python3 -c "
|
||||
import pathlib, os
|
||||
|
||||
progress = pathlib.Path(os.environ['LOOP_PROGRESS'])
|
||||
archive = pathlib.Path(os.environ['LOOP_ARCHIVE'])
|
||||
max_entries = int(os.environ['LOOP_MAX_ENTRIES'])
|
||||
|
||||
text = progress.read_text()
|
||||
|
||||
# Split at 'Session Log' header
|
||||
if '## Session Log' not in text:
|
||||
exit(0)
|
||||
|
||||
header, session_log = text.split('## Session Log', 1)
|
||||
|
||||
# Split entries by '---' separator
|
||||
# parts[0] is the preamble between '## Session Log' and the first '---'
|
||||
parts = session_log.split('\n---\n')
|
||||
preamble = parts[0]
|
||||
entries = parts[1:]
|
||||
|
||||
if len(entries) <= max_entries:
|
||||
exit(0)
|
||||
|
||||
# Keep last max_entries, archive the rest
|
||||
to_archive = entries[:-max_entries]
|
||||
to_keep = entries[-max_entries:]
|
||||
|
||||
# Append archived entries
|
||||
existing_archive = archive.read_text() if archive.exists() else '# Progress Archive\n'
|
||||
existing_archive += '\n---\n'.join(to_archive)
|
||||
archive.write_text(existing_archive)
|
||||
|
||||
# Rewrite progress with header + preamble + kept entries
|
||||
progress.write_text(header + '## Session Log' + preamble + '\n---\n' + '\n---\n'.join(to_keep))
|
||||
"
|
||||
else
|
||||
# Bash fallback: rotate session log entries with archiving.
|
||||
# Uses awk to split on "### " entry headers for accurate counting
|
||||
# (avoids false positives from "---" separators inside entries).
|
||||
local session_start
|
||||
session_start=$(grep -n '## Session Log' "$progress" | head -1 | cut -d: -f1)
|
||||
[ -z "$session_start" ] && return
|
||||
|
||||
# Extract header (everything up to and including "## Session Log" line)
|
||||
local header_content
|
||||
header_content=$(head -n "$session_start" "$progress")
|
||||
|
||||
# Extract session content and split into entries by "### " headers
|
||||
local session_content
|
||||
session_content=$(tail -n +"$((session_start + 1))" "$progress")
|
||||
|
||||
# Count entries by "### " headers
|
||||
local entry_count
|
||||
entry_count=$(echo "$session_content" | grep -c '^### ' 2>/dev/null || echo "0")
|
||||
[ "$entry_count" -le "$MAX_PROGRESS_ENTRIES" ] && return
|
||||
|
||||
# Find the line number (within session_content) of the Nth-from-last "### " header
|
||||
local keep_from
|
||||
keep_from=$(echo "$session_content" | grep -n '^### ' | tail -n "$MAX_PROGRESS_ENTRIES" | head -1 | cut -d: -f1)
|
||||
[ -z "$keep_from" ] && return
|
||||
|
||||
# Archive older entries
|
||||
local to_archive
|
||||
to_archive=$(echo "$session_content" | head -n "$((keep_from - 1))")
|
||||
if [ -n "$to_archive" ]; then
|
||||
if [ -f "$archive" ]; then
|
||||
printf '\n%s' "$to_archive" >> "$archive"
|
||||
else
|
||||
printf '# Progress Archive\n\n%s\n' "$to_archive" > "$archive"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Keep recent entries
|
||||
local kept_content
|
||||
kept_content=$(echo "$session_content" | tail -n +"$keep_from")
|
||||
printf '%s\n\n%s\n' "$header_content" "$kept_content" > "${progress}.tmp" \
|
||||
&& mv "${progress}.tmp" "$progress"
|
||||
fi
|
||||
}
|
||||
|
||||
# Get the branch name from prd.json
|
||||
prd_branch_name() {
|
||||
local prd="$LOOP_DIR/prd.json"
|
||||
[ -f "$prd" ] || return 1
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
jq -r '.branchName // empty' "$prd"
|
||||
else
|
||||
LOOP_PRD="$prd" python3 -c "
|
||||
import json, os
|
||||
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
|
||||
"
|
||||
fi
|
||||
}
|
||||
403
loop.sh
Executable file
403
loop.sh
Executable file
@@ -0,0 +1,403 @@
|
||||
#!/bin/bash
|
||||
# Autonomous AI agent loop orchestrator
|
||||
# Combines generator-evaluator architecture with iterative context-reset pattern.
|
||||
#
|
||||
# Usage:
|
||||
# ./loop.sh [options]
|
||||
#
|
||||
# Options:
|
||||
# --mode <implement|explore|fix> Operating mode (default: from config.json)
|
||||
# --max <N> Maximum iterations (default: from config.json)
|
||||
# --skip-eval Skip evaluator pass
|
||||
# --tool <claude|amp> AI tool to use (default: from config.json)
|
||||
# --no-hooks Don't install stop hooks
|
||||
# --dry-run Print assembled prompts without running agents
|
||||
# --resume Skip already-passed stories (explicit mode)
|
||||
# --replan (reserved — not yet implemented)
|
||||
#
|
||||
# Each iteration:
|
||||
# 1. Generator: picks highest-priority incomplete story, does the work
|
||||
# 2. Evaluator: verifies the work, can PASS or REJECT
|
||||
# Both get fresh context windows. Loop continues until all stories pass or max iterations.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# --- Exit codes ---
|
||||
EXIT_OK=0 # All stories complete
|
||||
EXIT_ERROR=1 # Configuration or runtime error
|
||||
EXIT_MAX_ITERATIONS=2 # Max iterations reached, work remains
|
||||
EXIT_ALL_BLOCKED=3 # All remaining stories blocked for human review
|
||||
|
||||
# --- Resolve paths ---
|
||||
LOOP_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$LOOP_DIR/.." && pwd)"
|
||||
export LOOP_DIR PROJECT_ROOT
|
||||
|
||||
# --- Lockfile (prevent concurrent runs) ---
|
||||
LOCKFILE="$LOOP_DIR/.loop.lock"
|
||||
|
||||
acquire_lock() {
|
||||
# mkdir is atomic on POSIX — prevents race between check and create
|
||||
if ! mkdir "$LOCKFILE" 2>/dev/null; then
|
||||
local old_pid
|
||||
old_pid=$(cat "$LOCKFILE/pid" 2>/dev/null)
|
||||
if [ -n "$old_pid" ] && kill -0 "$old_pid" 2>/dev/null; then
|
||||
echo "[loop] ERROR: Another loop instance is running (PID $old_pid)."
|
||||
echo "[loop] If this is stale, remove $LOCKFILE and retry."
|
||||
exit 1
|
||||
fi
|
||||
# Stale lockfile — previous run crashed without cleanup
|
||||
rm -rf "$LOCKFILE"
|
||||
mkdir "$LOCKFILE"
|
||||
fi
|
||||
echo $$ > "$LOCKFILE/pid"
|
||||
}
|
||||
|
||||
release_lock() {
|
||||
rm -rf "$LOCKFILE"
|
||||
}
|
||||
|
||||
acquire_lock
|
||||
|
||||
# --- Source libraries ---
|
||||
source "$LOOP_DIR/lib/hooks.sh"
|
||||
source "$LOOP_DIR/lib/state.sh"
|
||||
source "$LOOP_DIR/lib/archive.sh"
|
||||
source "$LOOP_DIR/lib/prompt.sh"
|
||||
|
||||
# --- Logging ---
|
||||
log() { echo "[loop] $*"; }
|
||||
log_header() {
|
||||
echo ""
|
||||
echo "═══════════════════════════════════════════════════════"
|
||||
echo " $*"
|
||||
echo "═══════════════════════════════════════════════════════"
|
||||
echo ""
|
||||
}
|
||||
|
||||
# --- Preflight checks ---
|
||||
if ! command -v jq &>/dev/null && ! command -v python3 &>/dev/null; then
|
||||
log "ERROR: Either jq or python3 is required. Install one and retry."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- Load config defaults ---
|
||||
CONFIG_FILE="$LOOP_DIR/config.json"
|
||||
config_default() { get_config_value "$1" "$2"; }
|
||||
|
||||
TOOL=$(config_default ".tool" "claude")
|
||||
MODE=$(config_default ".mode" "implement")
|
||||
MAX_ITERATIONS=$(config_default ".maxIterations" "20")
|
||||
SKIP_EVAL=$(config_default ".skipEval" "false")
|
||||
EVAL_RETRIES=$(config_default ".evalRetries" "2")
|
||||
AUTO_HOOKS=$(config_default ".autoHooks" "true")
|
||||
DRY_RUN=false
|
||||
RESUME=false
|
||||
# --- Parse CLI args (override config) ---
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--mode) MODE="$2"; shift 2 ;;
|
||||
--mode=*) MODE="${1#*=}"; shift ;;
|
||||
--max) MAX_ITERATIONS="$2"; shift 2 ;;
|
||||
--max=*) MAX_ITERATIONS="${1#*=}"; shift ;;
|
||||
--skip-eval) SKIP_EVAL=true; shift ;;
|
||||
--tool) TOOL="$2"; shift 2 ;;
|
||||
--tool=*) TOOL="${1#*=}"; shift ;;
|
||||
--no-hooks) AUTO_HOOKS=false; shift ;;
|
||||
--dry-run) DRY_RUN=true; shift ;;
|
||||
--resume) RESUME=true; shift ;;
|
||||
--replan) log "ERROR: --replan is not yet implemented. Use /loop-plan interactively."; exit 1 ;;
|
||||
[0-9]*) MAX_ITERATIONS="$1"; shift ;;
|
||||
*) log "Unknown option: $1"; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
export ITERATION=0 MAX_ITERATIONS MODE
|
||||
|
||||
# --- Validate ---
|
||||
if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
|
||||
log "ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! "$TOOL" =~ ^(claude|amp)$ ]]; then
|
||||
log "ERROR: Invalid tool '$TOOL'. Must be: claude, amp"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- Setup ---
|
||||
cd "$PROJECT_ROOT"
|
||||
|
||||
cleanup() {
|
||||
[ -n "${LOOP_AGENT_TMPFILE:-}" ] && rm -f "$LOOP_AGENT_TMPFILE"
|
||||
[ "$AUTO_HOOKS" = true ] && remove_hooks
|
||||
release_lock
|
||||
}
|
||||
LOOP_AGENT_TMPFILE=""
|
||||
|
||||
if [ "$AUTO_HOOKS" = true ]; then
|
||||
install_hooks
|
||||
fi
|
||||
trap cleanup EXIT INT TERM
|
||||
|
||||
check_archive
|
||||
|
||||
# Validate prd.json exists (AFTER archive check, which may delete it on branch change)
|
||||
if [ ! -f "$LOOP_DIR/prd.json" ]; then
|
||||
log "ERROR: No prd.json found. Run /loop-plan first to create one."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
validate_prd
|
||||
|
||||
# Run project init script if it exists
|
||||
if [ -f "$LOOP_DIR/init.sh" ]; then
|
||||
log "Running init.sh..."
|
||||
bash "$LOOP_DIR/init.sh"
|
||||
fi
|
||||
|
||||
# Ensure correct git branch
|
||||
BRANCH=$(prd_branch_name 2>/dev/null || echo "")
|
||||
if [ -n "$BRANCH" ]; then
|
||||
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
|
||||
if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then
|
||||
log "Switching to branch: $BRANCH"
|
||||
git checkout "$BRANCH" 2>/dev/null || \
|
||||
git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \
|
||||
git checkout -b "$BRANCH"
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- Agent runner ---
|
||||
# Runs a prompt through the selected AI tool and captures output.
|
||||
# Output is displayed live via tee to /dev/tty (if available) and captured to a temp file.
|
||||
# The function prints the captured output to stdout for the caller to capture.
|
||||
run_agent() {
|
||||
local prompt="$1"
|
||||
local output_file
|
||||
output_file=$(mktemp)
|
||||
LOOP_AGENT_TMPFILE="$output_file" # exposed for trap cleanup
|
||||
|
||||
# Determine whether we can display live output
|
||||
local has_tty=false
|
||||
if { true > /dev/tty; } 2>/dev/null; then
|
||||
has_tty=true
|
||||
fi
|
||||
|
||||
# Run in subshell so a non-zero exit from the AI tool doesn't kill the loop.
|
||||
# The subshell inherits set -e but its exit status is captured, not propagated.
|
||||
local agent_exit=0
|
||||
(
|
||||
case "$TOOL" in
|
||||
claude)
|
||||
if [ "$has_tty" = true ]; then
|
||||
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
|
||||
claude --dangerously-skip-permissions --output-format text \
|
||||
--print 2>&1 | tee /dev/tty > "$output_file"
|
||||
else
|
||||
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
|
||||
claude --dangerously-skip-permissions --output-format text \
|
||||
--print 2>&1 > "$output_file"
|
||||
fi
|
||||
;;
|
||||
amp)
|
||||
if [ "$has_tty" = true ]; then
|
||||
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
|
||||
amp --dangerously-allow-all 2>&1 | tee /dev/tty > "$output_file"
|
||||
else
|
||||
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
|
||||
amp --dangerously-allow-all 2>&1 > "$output_file"
|
||||
fi
|
||||
;;
|
||||
*)
|
||||
log "ERROR: Unknown tool '$TOOL'"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
) || agent_exit=$?
|
||||
|
||||
if [ "$agent_exit" -ne 0 ] && [ ! -s "$output_file" ]; then
|
||||
log "WARNING: Agent exited with code $agent_exit and produced no output."
|
||||
fi
|
||||
|
||||
cat "$output_file"
|
||||
rm -f "$output_file"
|
||||
LOOP_AGENT_TMPFILE=""
|
||||
}
|
||||
|
||||
# --- Parse evaluator verdict ---
|
||||
parse_verdict() {
|
||||
local output="$1"
|
||||
|
||||
if echo "$output" | grep -q "<verdict>REJECT</verdict>"; then
|
||||
# Extract rejection reason (supports multiline)
|
||||
local reason
|
||||
reason=$(echo "$output" | sed -n '/<rejection_reason>/,/<\/rejection_reason>/p' \
|
||||
| sed '1s/.*<rejection_reason>//' | sed '$s/<\/rejection_reason>.*//' \
|
||||
| tr '\n' ' ' | sed 's/ */ /g' | sed 's/^ //;s/ $//')
|
||||
[ -z "$reason" ] && reason="Rejected without specific reason"
|
||||
echo "REJECT:${reason}"
|
||||
elif echo "$output" | grep -q "<verdict>PASS</verdict>"; then
|
||||
echo "PASS"
|
||||
else
|
||||
# No explicit verdict — fail-safe: treat as reject so broken evaluators don't silently approve
|
||||
log "WARNING: No verdict tag found in evaluator output. Treating as REJECT (fail-safe)."
|
||||
echo "REJECT:Evaluator produced no verdict tag — output may be malformed"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- Main loop ---
|
||||
log_header "Loop Starting"
|
||||
log "Mode: $MODE"
|
||||
log "Tool: $TOOL"
|
||||
log "Max iter: $MAX_ITERATIONS"
|
||||
log "Eval: $([[ $SKIP_EVAL == true ]] && echo 'off' || echo 'on')"
|
||||
log "Dry run: $([[ $DRY_RUN == true ]] && echo 'yes' || echo 'no')"
|
||||
log "Project: $PROJECT_ROOT"
|
||||
log "Stories: $(story_counts 2>/dev/null || echo 'N/A')"
|
||||
echo ""
|
||||
|
||||
while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
|
||||
ITERATION=$((ITERATION + 1))
|
||||
export ITERATION
|
||||
|
||||
# Check if all stories already pass
|
||||
if all_stories_pass 2>/dev/null; then
|
||||
log_header "All Stories Complete! ($(story_counts))"
|
||||
snapshot_for_archive
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Capture which story the generator will work on (highest-priority incomplete)
|
||||
CURRENT_STORY_ID=$(next_story_id 2>/dev/null || echo "")
|
||||
export CURRENT_STORY_ID
|
||||
|
||||
# No actionable story — all remaining are passed or blocked
|
||||
if [ -z "$CURRENT_STORY_ID" ]; then
|
||||
if [ "$RESUME" = true ]; then
|
||||
log "Resume mode: no actionable stories remaining."
|
||||
else
|
||||
log "No actionable stories remaining (all passed or blocked)."
|
||||
fi
|
||||
snapshot_for_archive
|
||||
if any_stories_blocked 2>/dev/null; then
|
||||
log "Some stories are blocked and need human review. Run /loop-triage for details."
|
||||
exit $EXIT_ALL_BLOCKED
|
||||
fi
|
||||
exit $EXIT_OK
|
||||
fi
|
||||
|
||||
# Capture git state before generator runs (for evaluator diff)
|
||||
PRE_GENERATOR_SHA=$(git rev-parse HEAD 2>/dev/null || echo "")
|
||||
export PRE_GENERATOR_SHA
|
||||
|
||||
# --- Generator pass ---
|
||||
log_header "Iteration $ITERATION / $MAX_ITERATIONS — GENERATOR${CURRENT_STORY_ID:+ ($CURRENT_STORY_ID)}"
|
||||
|
||||
GENERATOR_PROMPT=$(build_prompt "generator" "$MODE")
|
||||
|
||||
# --dry-run: print prompts and exit without running agents
|
||||
if [ "$DRY_RUN" = true ]; then
|
||||
log "=== GENERATOR PROMPT ==="
|
||||
printf '%s\n' "$GENERATOR_PROMPT"
|
||||
echo ""
|
||||
if [ "$SKIP_EVAL" != true ] && [ -n "$CURRENT_STORY_ID" ]; then
|
||||
EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
|
||||
log "=== EVALUATOR PROMPT ==="
|
||||
printf '%s\n' "$EVAL_PROMPT"
|
||||
fi
|
||||
log "Dry run complete. Showing prompts for story: ${CURRENT_STORY_ID:-unknown}"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
GENERATOR_OUTPUT=$(run_agent "$GENERATOR_PROMPT")
|
||||
|
||||
if [ -z "$GENERATOR_OUTPUT" ]; then
|
||||
log "WARNING: Generator produced empty output (timeout or crash). Skipping to next iteration."
|
||||
continue
|
||||
fi
|
||||
|
||||
# --- Scope budget check ---
|
||||
# Verify the generator stayed within configured limits (files modified, lines written).
|
||||
# Advisory in implement/fix modes (log warning), but enforced as rejection reason for evaluator.
|
||||
if [ -n "$PRE_GENERATOR_SHA" ] && [ "$PRE_GENERATOR_SHA" != "" ]; then
|
||||
SCOPE_FILES_MODIFIED=$(git diff --name-only "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | wc -l | tr -d ' ')
|
||||
SCOPE_LINES_WRITTEN=$(git diff --stat "$PRE_GENERATOR_SHA" HEAD 2>/dev/null | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
|
||||
|
||||
MAX_MODIFY=$(config_default ".scopeBudgets.${MODE}.maxFilesToModify" "10")
|
||||
MAX_WRITE=$(config_default ".scopeBudgets.${MODE}.maxLinesToWrite" "500")
|
||||
|
||||
if [ "${SCOPE_FILES_MODIFIED:-0}" -gt "$MAX_MODIFY" ]; then
|
||||
log "WARNING: Scope budget exceeded — modified $SCOPE_FILES_MODIFIED files (limit: $MAX_MODIFY)"
|
||||
fi
|
||||
if [ "${SCOPE_LINES_WRITTEN:-0}" -gt "$MAX_WRITE" ]; then
|
||||
log "WARNING: Scope budget exceeded — wrote $SCOPE_LINES_WRITTEN lines (limit: $MAX_WRITE)"
|
||||
fi
|
||||
|
||||
export SCOPE_FILES_MODIFIED SCOPE_LINES_WRITTEN
|
||||
fi
|
||||
|
||||
# Check for completion sentinel
|
||||
if echo "$GENERATOR_OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then
|
||||
log_header "Generator signaled COMPLETE ($(story_counts))"
|
||||
snapshot_for_archive
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- Evaluator pass ---
|
||||
if [ "$SKIP_EVAL" != true ]; then
|
||||
log_header "Iteration $ITERATION / $MAX_ITERATIONS — EVALUATOR${CURRENT_STORY_ID:+ ($CURRENT_STORY_ID)}"
|
||||
|
||||
if [ -z "$CURRENT_STORY_ID" ]; then
|
||||
log "WARNING: No actionable story ID found. Skipping evaluator."
|
||||
continue
|
||||
fi
|
||||
|
||||
EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
|
||||
EVAL_OUTPUT=$(run_agent "$EVAL_PROMPT")
|
||||
|
||||
if [ -z "$EVAL_OUTPUT" ]; then
|
||||
log "WARNING: Evaluator produced empty output (timeout or crash). Treating as REJECT."
|
||||
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no output</rejection_reason>"
|
||||
fi
|
||||
|
||||
VERDICT=$(parse_verdict "$EVAL_OUTPUT")
|
||||
|
||||
case "$VERDICT" in
|
||||
PASS)
|
||||
log "Evaluator: PASS"
|
||||
if [ -n "$CURRENT_STORY_ID" ]; then
|
||||
mark_story_pass "$CURRENT_STORY_ID"
|
||||
fi
|
||||
;;
|
||||
REJECT:*)
|
||||
REASON="${VERDICT#REJECT:}"
|
||||
log "Evaluator: REJECT — $REASON"
|
||||
|
||||
if [ -n "$CURRENT_STORY_ID" ]; then
|
||||
mark_story_reject "$CURRENT_STORY_ID" "$REASON"
|
||||
|
||||
# Check retry limit — block story to prevent infinite retries
|
||||
REJECTIONS=$(story_rejections "$CURRENT_STORY_ID")
|
||||
REJECTIONS="${REJECTIONS:-0}"
|
||||
if [ "$REJECTIONS" -ge "$EVAL_RETRIES" ]; then
|
||||
log "WARNING: Story $CURRENT_STORY_ID rejected $REJECTIONS times (limit: $EVAL_RETRIES). Blocking for human review."
|
||||
mark_story_blocked "$CURRENT_STORY_ID" "Rejected $REJECTIONS times. Last: $REASON"
|
||||
append_progress "### BLOCKED: $CURRENT_STORY_ID
|
||||
|
||||
Rejected $REJECTIONS times. Needs human review. Last reason: $REASON
|
||||
|
||||
---"
|
||||
fi
|
||||
fi
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
done
|
||||
|
||||
# --- Max iterations reached ---
|
||||
log_header "Max Iterations Reached ($MAX_ITERATIONS)"
|
||||
log "Stories completed: $(story_counts)"
|
||||
log "Run /loop-triage to generate a handoff brief."
|
||||
snapshot_for_archive
|
||||
exit $EXIT_MAX_ITERATIONS
|
||||
92
prompts/evaluator/_base.md
Normal file
92
prompts/evaluator/_base.md
Normal file
@@ -0,0 +1,92 @@
|
||||
You are an Evaluator agent in an autonomous agent loop. Your job is to VERIFY work done by a Generator agent. You are skeptical by default.
|
||||
|
||||
## Bias Correction (READ THIS CAREFULLY)
|
||||
|
||||
You (Claude) have well-documented tendencies that make you a poor QA agent by default:
|
||||
- You **assume code works** if it looks reasonable
|
||||
- You **accept "close enough"** implementations
|
||||
- You **rationalize away** edge cases and missing pieces
|
||||
- You **prioritize politeness** over accuracy
|
||||
|
||||
**OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.
|
||||
|
||||
**Rejection is normal and healthy.** Rejecting 30-50% of generator iterations is expected. If you're passing everything, you are not being skeptical enough.
|
||||
|
||||
## Your Target
|
||||
|
||||
Evaluate story **`{{CURRENT_STORY_ID}}`**. This is the story the generator just worked on.
|
||||
|
||||
## Evaluation Process
|
||||
|
||||
1. **Read `.loop/prd.json`** — find story `{{CURRENT_STORY_ID}}` and its acceptance criteria
|
||||
2. **Read the sprint contract** at `.loop/contracts/{{CURRENT_STORY_ID}}.contract.md` (if it exists)
|
||||
3. **Read `.loop/progress.md`** — check the latest session log entry for what the generator claims to have done
|
||||
4. **Examine the actual changes:**
|
||||
- Run `git diff {{PRE_GENERATOR_SHA}}..HEAD` to see ALL changes the generator made
|
||||
- Read the modified files IN FULL (not just the diff) to understand context
|
||||
5. **For EACH acceptance criterion in prd.json**, independently verify:
|
||||
- Does the code ACTUALLY satisfy this criterion?
|
||||
- Not "does it look like it might" — does it ACTUALLY?
|
||||
6. **Run quality checks yourself:**
|
||||
- Typecheck (if applicable)
|
||||
- Tests (if applicable)
|
||||
- Lint (if applicable)
|
||||
7. **Check for regressions:**
|
||||
- Did the changes break anything that was working before?
|
||||
- Did the generator modify files outside the story's scope?
|
||||
8. **Check for anti-patterns:**
|
||||
- Placeholder or stub implementations disguised as complete
|
||||
- Hardcoded values that should be configurable
|
||||
- Missing error handling at system boundaries
|
||||
- Security issues (hardcoded secrets, unsanitized input, SQL injection)
|
||||
|
||||
## Verdict Format
|
||||
|
||||
You MUST end your response with EXACTLY ONE of these verdict blocks:
|
||||
|
||||
### If the story genuinely passes all criteria:
|
||||
|
||||
```
|
||||
<verdict>PASS</verdict>
|
||||
```
|
||||
|
||||
### If any criterion is not met or issues are found:
|
||||
|
||||
```
|
||||
<verdict>REJECT</verdict>
|
||||
<rejection_reason>
|
||||
[Specific, actionable description of what failed and why.
|
||||
Include file paths and line numbers.
|
||||
Be concrete — "the function doesn't handle null input" not "there might be edge cases".]
|
||||
</rejection_reason>
|
||||
```
|
||||
|
||||
## What Warrants Rejection
|
||||
|
||||
- ANY acceptance criterion not actually met (not "mostly met" — MET)
|
||||
- Tests fail
|
||||
- Typecheck fails
|
||||
- Placeholder/stub code left in place
|
||||
- Security vulnerability introduced
|
||||
- Regression in existing functionality
|
||||
- Contract's Done Conditions not satisfied (if contract exists)
|
||||
|
||||
## What Does NOT Warrant Rejection
|
||||
|
||||
- Code style preferences (as long as it matches project conventions)
|
||||
- Minor naming choices
|
||||
- Missing optimization that wasn't in the criteria
|
||||
- Absence of features not in the story scope
|
||||
|
||||
## Scope Budget
|
||||
|
||||
- Maximum files to read: {{MAX_FILES_TO_READ}}
|
||||
- Focus your verification on the files the generator changed
|
||||
- You do NOT need to read the entire codebase
|
||||
|
||||
## Current State
|
||||
|
||||
- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}}
|
||||
- Mode: {{MODE}}
|
||||
- Project root: {{PROJECT_ROOT}}
|
||||
- Loop directory: {{LOOP_DIR}}
|
||||
49
prompts/evaluator/explore.md
Normal file
49
prompts/evaluator/explore.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Mode: Explore — Evaluator
|
||||
|
||||
You are evaluating an analysis/exploration task. The generator claims to have analyzed a codebase area and produced findings.
|
||||
|
||||
## Read-Only Enforcement (CHECK FIRST)
|
||||
|
||||
Before any other checks, verify explore mode's read-only constraint:
|
||||
1. Run `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only`
|
||||
2. If ANY file outside `.loop/triage/` was modified or committed, **REJECT immediately** — explore mode is read-only. The generator must not modify host project files.
|
||||
|
||||
## Exploration-Specific Checks
|
||||
|
||||
1. **Read the analysis output** at `.loop/triage/{story-id}-analysis.md`
|
||||
2. **Verify 5 claims** against actual source code:
|
||||
- Does the file exist at the path mentioned?
|
||||
- Does the code behave as described?
|
||||
- Are the line counts roughly accurate?
|
||||
- Are the "Issues Found" real issues or false alarms?
|
||||
- Are the recommendations actionable?
|
||||
3. **Check for omissions:**
|
||||
- Did the generator miss obvious files in the area?
|
||||
- Are there important code paths not covered?
|
||||
- Are there recent git commits that change the analysis?
|
||||
|
||||
## Claim Verification Format
|
||||
|
||||
Before giving your verdict, document what you checked:
|
||||
|
||||
```
|
||||
Claims Verified:
|
||||
- [CONFIRMED] [claim] — verified in [file:line]
|
||||
- [INCORRECT] [claim] — actual behavior is [what you found]
|
||||
- [UNVERIFIABLE] [claim] — could not confirm (file missing, ambiguous)
|
||||
```
|
||||
|
||||
## Grading Criteria
|
||||
|
||||
- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed)
|
||||
- **Completeness**: Did it cover the important parts of the area?
|
||||
- **Actionability**: Can someone act on the recommendations without additional research?
|
||||
|
||||
## Rejection Criteria
|
||||
|
||||
Reject if:
|
||||
- Fewer than 4 of 5 verified claims are accurate
|
||||
- The analysis references files that don't exist
|
||||
- Key files in the area were completely missed
|
||||
- Recommendations are vague ("improve error handling") rather than specific ("add null check in auth.ts:42")
|
||||
- The analysis appears to be based on assumptions rather than code reading
|
||||
34
prompts/evaluator/fix.md
Normal file
34
prompts/evaluator/fix.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Mode: Fix — Evaluator
|
||||
|
||||
You are evaluating a bug fix or tech debt reduction. The generator claims to have fixed an issue.
|
||||
|
||||
## Fix-Specific Checks
|
||||
|
||||
1. **Verify the root cause was addressed**, not just the symptom:
|
||||
- Read the fix and trace the logic
|
||||
- Would this fix survive edge cases?
|
||||
- Did the generator patch around the bug or fix the actual cause?
|
||||
|
||||
2. **Verify a regression test exists:**
|
||||
- Is there a new or updated test?
|
||||
- Does the test actually reproduce the original bug scenario?
|
||||
- Would the test fail if the fix were reverted?
|
||||
|
||||
3. **Check for regressions (CRITICAL for fix mode):**
|
||||
- Run the full test suite, not just the new test
|
||||
- Check that the fix doesn't change behavior for non-bug cases
|
||||
- Look for side effects in shared code paths
|
||||
|
||||
4. **Verify minimal diff:**
|
||||
- Did the generator change only what was necessary?
|
||||
- Are there unrelated changes mixed in?
|
||||
- Is the refactor scope proportional to the debt item?
|
||||
|
||||
## Rejection Criteria (Fix-Specific)
|
||||
|
||||
- Fix addresses symptom but not root cause
|
||||
- No regression test added
|
||||
- Existing tests fail after the fix
|
||||
- Unrelated changes included in the commit
|
||||
- Fix introduces a new bug or security issue
|
||||
- For refactors: external behavior changed (API contract, return values, side effects)
|
||||
31
prompts/evaluator/implement.md
Normal file
31
prompts/evaluator/implement.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Mode: Implement — Evaluator
|
||||
|
||||
You are evaluating an implementation story. The generator claims to have built a feature.
|
||||
|
||||
## Implementation-Specific Checks
|
||||
|
||||
In addition to the base evaluation process:
|
||||
|
||||
1. **Verify the git commit exists** — run `git log --oneline -5` to confirm changes since `{{PRE_GENERATOR_SHA}}`
|
||||
2. **Check commit scope** — does `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only` only contain files relevant to this story?
|
||||
3. **Read the actual test output** — if the generator claims tests pass, verify by running them yourself
|
||||
4. **For UI stories:**
|
||||
- Check that the component actually renders (not just that it exists)
|
||||
- Verify event handlers are wired up (not just defined)
|
||||
- Check accessibility basics (labels, semantic elements)
|
||||
5. **For API stories:**
|
||||
- Verify the endpoint is registered in the router
|
||||
- Check request/response types match the contract
|
||||
- Verify error handling returns appropriate status codes
|
||||
6. **For database stories:**
|
||||
- Verify migration runs cleanly
|
||||
- Check indexes are created for query patterns
|
||||
- Verify foreign key constraints
|
||||
|
||||
## Common Generator Failures to Watch For
|
||||
|
||||
- Created the file but didn't wire it into the application (route not registered, component not imported)
|
||||
- Tests exist but don't actually assert meaningful behavior
|
||||
- "Passes typecheck" but only because types are `any` or too loose
|
||||
- UI component renders but doesn't respond to interaction
|
||||
- API endpoint exists but returns hardcoded/mock data
|
||||
68
prompts/generator/_base.md
Normal file
68
prompts/generator/_base.md
Normal file
@@ -0,0 +1,68 @@
|
||||
You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance of you runs each iteration — you have no memory of previous iterations except what's written in artifacts.
|
||||
|
||||
## Startup Sequence
|
||||
|
||||
1. **Read `.loop/progress.md`** — check the **Codebase Patterns** section first (top of file), then skim recent session log entries for context
|
||||
2. **Read `.loop/prd.json`** — find the highest-priority story where `passes: false`
|
||||
3. **Read the sprint contract** for that story at `.loop/contracts/{story-id}.contract.md` (if it exists)
|
||||
4. **Check the story's `notes` field** — if it contains `[REJECTED]` entries, those are feedback from a previous evaluator. Address the specific issues raised.
|
||||
5. **Confirm the git branch** — the loop has already checked you out on the correct branch per `prd.json.branchName`. Run `git branch --show-current` to verify if needed.
|
||||
|
||||
## Work Rules
|
||||
|
||||
- **ONE story per iteration.** Do not attempt multiple stories.
|
||||
- **Read before writing.** Understand existing code before modifying it. Search for existing implementations before creating new ones.
|
||||
- **Follow existing patterns.** Check Codebase Patterns in progress.md. Match the project's style, naming, and structure.
|
||||
- **No placeholders.** Every implementation must be complete and functional. If a story is too large, stop and note what remains — do NOT leave stub/placeholder code.
|
||||
- **Commit after completing the story.** Message format: `feat: [Story ID] - [Story Title]`
|
||||
|
||||
## Quality Gates
|
||||
|
||||
Before marking a story as complete:
|
||||
- Run the project's type checker (if applicable)
|
||||
- Run the project's test suite (if applicable)
|
||||
- Run the project's linter (if applicable)
|
||||
- All must pass. If they fail, fix the issues before committing.
|
||||
|
||||
## After Completing the Story
|
||||
|
||||
1. **Update `.loop/prd.json`** — set `passes: true` for the completed story (the harness also sets this on evaluator PASS as a safety net, but you should still do it)
|
||||
2. **Append to `.loop/progress.md`** with this format:
|
||||
|
||||
```
|
||||
### [Story ID] — [Story Title]
|
||||
Date: YYYY-MM-DD HH:MM
|
||||
|
||||
**What was done:**
|
||||
- Bullet points of changes made
|
||||
|
||||
**Files changed:**
|
||||
- path/to/file.ext — brief description
|
||||
|
||||
**Learnings for future iterations:**
|
||||
- Patterns discovered, gotchas encountered, useful context
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
3. **Update Codebase Patterns** (top of progress.md) if you discovered a reusable pattern
|
||||
4. **Update AGENTS.md/CLAUDE.md** in modified directories if you discovered genuinely reusable knowledge (API conventions, non-obvious requirements, testing approaches)
|
||||
|
||||
## Completion Signal
|
||||
|
||||
- If ALL stories in prd.json have `passes: true`, respond with: `<promise>COMPLETE</promise>`
|
||||
- Otherwise, end your response normally. The next iteration will pick up the next story.
|
||||
|
||||
## Scope Budget
|
||||
|
||||
- Maximum files to read: {{MAX_FILES_TO_READ}}
|
||||
- Maximum lines to write: {{MAX_LINES_TO_WRITE}}
|
||||
- Maximum files to modify: {{MAX_FILES_TO_MODIFY}}
|
||||
- If you approach a limit, stop and note what remains in progress.md.
|
||||
|
||||
## Current State
|
||||
|
||||
- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}}
|
||||
- Mode: {{MODE}}
|
||||
- Project root: {{PROJECT_ROOT}}
|
||||
- Loop directory: {{LOOP_DIR}}
|
||||
62
prompts/generator/explore.md
Normal file
62
prompts/generator/explore.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# Mode: Explore (Read-Only)
|
||||
|
||||
You are analyzing an existing codebase to build understanding. You are NOT writing code. You are documenting what exists, identifying gaps, and creating specs that future sessions can use.
|
||||
|
||||
## Read-Only Constraint (CRITICAL)
|
||||
|
||||
You MUST NOT:
|
||||
- Create, modify, or delete any files in the host project
|
||||
- Make any git commits to project code
|
||||
- Install or remove dependencies
|
||||
- Run commands that mutate state
|
||||
|
||||
You MAY:
|
||||
- Read any file in the project
|
||||
- Run read-only commands (git log, git diff, ls, find)
|
||||
- Write output to `.loop/triage/` directory only
|
||||
|
||||
## Exploration Workflow
|
||||
|
||||
1. Read the story from prd.json — it describes what area to analyze
|
||||
2. Read the relevant source code (not existing docs — verify against code)
|
||||
3. Write your findings to `.loop/triage/{story-id}-analysis.md`
|
||||
4. Mark the story as `passes: true` in prd.json
|
||||
5. Append to progress.md
|
||||
|
||||
## Analysis Output Format
|
||||
|
||||
Write to `.loop/triage/{story-id}-analysis.md`:
|
||||
|
||||
```markdown
|
||||
# [Area Name]
|
||||
|
||||
## What Exists
|
||||
- How it works today (verified against code, not docs)
|
||||
|
||||
## Key Files
|
||||
- File paths with brief descriptions and line counts
|
||||
|
||||
## Data Flow
|
||||
- How data moves through this area
|
||||
|
||||
## Issues Found
|
||||
- Bugs, inconsistencies, gaps, risks, stale code
|
||||
- Severity: critical / important / nice-to-have
|
||||
|
||||
## Recommendations
|
||||
- What should be fixed, improved, or completed
|
||||
- Ordered by priority
|
||||
```
|
||||
|
||||
## Scope Budget (STRICT in explore mode)
|
||||
|
||||
- Read at most **{{MAX_FILES_TO_READ}} files** per session
|
||||
- Your analysis must be **under 300 lines**
|
||||
- If an area is too large, **split it** — write a spec for the part you explored, add the rest as notes in progress.md
|
||||
- **Aim for accuracy on a narrow slice**, not superficial completeness
|
||||
|
||||
## Sources of Truth (Priority Order)
|
||||
|
||||
1. **The code itself** — always verify against source
|
||||
2. **Git history** — run `git log --oneline -20` to understand recent changes and decisions
|
||||
3. **Existing docs** — treat as potentially stale hints. Note contradictions in your analysis.
|
||||
26
prompts/generator/fix.md
Normal file
26
prompts/generator/fix.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Mode: Fix
|
||||
|
||||
You are fixing bugs or reducing tech debt from a prioritized list. Each story is a targeted fix.
|
||||
|
||||
## Fix Workflow
|
||||
|
||||
1. Read the story — it describes the specific bug or debt item
|
||||
2. Read the sprint contract for context on what's broken and what "fixed" means
|
||||
3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists.
|
||||
4. Make the minimal change to fix the issue
|
||||
5. Write or update a test that would have caught this bug
|
||||
6. Run quality gates
|
||||
7. Commit
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations.
|
||||
- **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions.
|
||||
- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md.
|
||||
- **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve.
|
||||
|
||||
## Git Workflow
|
||||
|
||||
- Commit message format: `fix: [Story ID] - [Story Title]`
|
||||
- For tech debt: `refactor: [Story ID] - [Story Title]`
|
||||
- Stage only the files you changed
|
||||
37
prompts/generator/implement.md
Normal file
37
prompts/generator/implement.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Mode: Implement
|
||||
|
||||
You are building features from a PRD. Each story is a small, self-contained unit of work.
|
||||
|
||||
## Implementation Workflow
|
||||
|
||||
1. Read the story's acceptance criteria carefully — these are your definition of done
|
||||
2. If a sprint contract exists, follow its **Done Conditions** exactly
|
||||
3. Plan your approach before writing code:
|
||||
- What files need to change?
|
||||
- What existing code can you reuse?
|
||||
- What's the minimal change to satisfy the criteria?
|
||||
4. Implement the story
|
||||
5. Run quality gates (typecheck, lint, test)
|
||||
6. Commit with a descriptive message
|
||||
7. Mark the story as passed
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Minimal changes only.** Do not refactor surrounding code. Do not add features beyond the story scope.
|
||||
- **Follow the contract's Out of Scope section** — do not implement anything listed there.
|
||||
- **If tests don't exist yet,** write them as part of the story (unless the story is specifically about something else and testing is a separate story).
|
||||
- **If you need a dependency,** install it and note it in progress.md so future iterations know.
|
||||
|
||||
## Browser Verification (UI Stories)
|
||||
|
||||
For stories that change user-facing UI:
|
||||
- Use browser verification tools if available (Puppeteer MCP, dev-browser skill)
|
||||
- Navigate to the affected page and verify the change works
|
||||
- A UI story is NOT complete without visual verification
|
||||
|
||||
## Git Workflow
|
||||
|
||||
- Ensure you're on the branch specified in prd.json
|
||||
- Stage only the files you changed (not `git add .`)
|
||||
- Commit message: `feat: [Story ID] - [Story Title]`
|
||||
- Do NOT push — the loop handles that
|
||||
42
prompts/planner/plan.md
Normal file
42
prompts/planner/plan.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Planner Context
|
||||
|
||||
This file is loaded by the `/loop-plan` skill to provide additional context for PRD generation.
|
||||
|
||||
## Story Decomposition Guidelines
|
||||
|
||||
When breaking a feature into stories, think about:
|
||||
|
||||
### Independence
|
||||
Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet.
|
||||
|
||||
### Context Window Fit
|
||||
A story must fit in a single AI context window (~100K tokens). This means:
|
||||
- Reading relevant existing code
|
||||
- Understanding the task
|
||||
- Implementing the change
|
||||
- Writing tests
|
||||
- Running quality checks
|
||||
- Committing
|
||||
|
||||
Budget roughly:
|
||||
- 30% of context for reading/understanding
|
||||
- 40% for implementation
|
||||
- 20% for testing and quality
|
||||
- 10% for bookkeeping (prd.json, progress.md)
|
||||
|
||||
### Failure Isolation
|
||||
If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy.
|
||||
|
||||
### Evaluability
|
||||
Every story must have criteria the evaluator can independently verify. "The code is clean" is not evaluable. "The function returns 404 when the user doesn't exist" is evaluable.
|
||||
|
||||
## PRD Anti-Patterns
|
||||
|
||||
Avoid these common mistakes:
|
||||
|
||||
- **Stories too large:** "Build the API" — split into individual endpoints
|
||||
- **Stories too small:** "Create the file" — combine with meaningful work in that file
|
||||
- **Vague criteria:** "Works correctly" — what does correctly mean? Be specific.
|
||||
- **Missing dependencies:** Story 5 needs Story 3's database table but doesn't say so
|
||||
- **Testing as afterthought:** Tests should be part of each story, not a separate "add tests" story at the end
|
||||
- **UI without backend:** A UI story that calls an API that doesn't exist yet
|
||||
141
skills/loop-init/SKILL.md
Normal file
141
skills/loop-init/SKILL.md
Normal file
@@ -0,0 +1,141 @@
|
||||
---
|
||||
name: init
|
||||
description: Initialize the agent loop harness in the current project. Scaffolds .loop/ directory, detects tech stack, picks mode, generates config, and flows into planning.
|
||||
---
|
||||
|
||||
# /init — Initialize Agent Loop for a Project
|
||||
|
||||
Set up the agent loop harness in the current project. This is the entry point for first-time use.
|
||||
|
||||
## What This Skill Does
|
||||
|
||||
1. Scaffolds the `.loop/` directory with prompts, templates, and lib scripts from the plugin
|
||||
2. Analyzes the project to understand its tech stack, structure, and conventions
|
||||
3. Asks the user what they want to accomplish (explore, implement, or fix)
|
||||
4. Creates project-specific configuration (`config.json`, `init.sh`)
|
||||
5. Flows into planning to generate the PRD and sprint contracts
|
||||
|
||||
## Instructions
|
||||
|
||||
When the user invokes this skill, follow this sequence:
|
||||
|
||||
### Step 0: Scaffold .loop/ Directory
|
||||
|
||||
Check if `.loop/` already exists in the project root.
|
||||
|
||||
**If it does NOT exist**, create it by copying from the plugin:
|
||||
|
||||
1. The plugin's root directory is available at `${CLAUDE_PLUGIN_ROOT}`. Copy the harness files:
|
||||
|
||||
```bash
|
||||
mkdir -p .loop
|
||||
cp -r "${CLAUDE_PLUGIN_ROOT}/prompts" .loop/
|
||||
cp -r "${CLAUDE_PLUGIN_ROOT}/templates" .loop/
|
||||
cp -r "${CLAUDE_PLUGIN_ROOT}/lib" .loop/
|
||||
cp "${CLAUDE_PLUGIN_ROOT}/loop.sh" .loop/
|
||||
chmod +x .loop/loop.sh
|
||||
```
|
||||
|
||||
**IMPORTANT:** If `${CLAUDE_PLUGIN_ROOT}` is not set or the path doesn't exist, look for the files in the plugin's own directory structure. The prompts, templates, and lib directories are bundled with this plugin.
|
||||
|
||||
2. Create `.loop/.gitignore` with runtime artifacts:
|
||||
|
||||
```
|
||||
prd.json
|
||||
progress.md
|
||||
progress-archive.md
|
||||
config.json
|
||||
init.sh
|
||||
contracts/
|
||||
triage/
|
||||
archive/
|
||||
.archive-staging/
|
||||
.last-branch
|
||||
.loop.lock
|
||||
```
|
||||
|
||||
**If `.loop/` already exists**, ask the user if they want to re-initialize (which resets config but preserves prd.json/progress.md if they exist).
|
||||
|
||||
### Step 1: Project Discovery
|
||||
|
||||
Read the project to understand what we're working with:
|
||||
- Check for `CLAUDE.md`, `AGENTS.md`, `README.md` at the project root
|
||||
- Check for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Package.swift`, `composer.json` to identify the tech stack
|
||||
- Run `ls` on the project root to see the top-level structure
|
||||
|
||||
Present a brief summary:
|
||||
> "I see this is a [language/framework] project with [key characteristics]. The main source is in [dir/]."
|
||||
|
||||
### Step 2: Mode Selection
|
||||
|
||||
Ask the user:
|
||||
|
||||
> **What would you like to do?**
|
||||
>
|
||||
> a) **Explore** — Analyze the codebase to understand what exists, find issues, and document the system. No code changes.
|
||||
> b) **Implement** — Build a new feature from a PRD. Code changes, commits, and tests.
|
||||
> c) **Fix** — Work through a list of bugs or tech debt items. Targeted code changes.
|
||||
|
||||
### Step 3: Clarifying Questions
|
||||
|
||||
Based on the mode, ask 3-5 questions:
|
||||
|
||||
**For Explore:**
|
||||
- "What areas are you most interested in? (e.g., auth, database, API, frontend, everything)"
|
||||
- "Are there known problem areas you want me to focus on?"
|
||||
- "How many exploration sessions should I budget? (default: 20)"
|
||||
|
||||
**For Implement:**
|
||||
- "Describe the feature you want to build (1-3 sentences is fine)"
|
||||
- "Are there any architectural constraints I should know about?"
|
||||
- "Should I follow any specific patterns from the existing codebase?"
|
||||
|
||||
**For Fix:**
|
||||
- "Do you have a list of issues, or should I find them?"
|
||||
- "Any areas that are off-limits for changes?"
|
||||
- "What's the priority: security, stability, or code quality?"
|
||||
|
||||
### Step 4: Generate Configuration
|
||||
|
||||
Create `.loop/config.json` based on the project and user's answers:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": "claude",
|
||||
"mode": "<selected mode>",
|
||||
"maxIterations": <appropriate default>,
|
||||
"skipEval": false,
|
||||
"evalRetries": 2,
|
||||
"autoHooks": true,
|
||||
"branchPrefix": "loop/",
|
||||
"scopeBudgets": {
|
||||
// Set based on project size and mode
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Create `.loop/init.sh` with project-specific setup commands:
|
||||
- Dev server startup (if applicable)
|
||||
- Test runner command
|
||||
- Type checker command
|
||||
- Linter command
|
||||
- Any environment setup needed
|
||||
|
||||
Make `init.sh` executable.
|
||||
|
||||
### Step 5: Flow into Planning
|
||||
|
||||
Tell the user:
|
||||
> "Project configured. Now let's plan the work."
|
||||
|
||||
Then invoke the `/agent-loop:plan` skill to generate the PRD and sprint contracts.
|
||||
|
||||
### Step 6: Ready to Run
|
||||
|
||||
Once planning is complete, tell the user:
|
||||
> "Everything is set up. To start the loop:"
|
||||
> ```
|
||||
> /agent-loop:run # Interactive (recommended) — visible, can intervene
|
||||
> .loop/loop.sh # Headless — fully autonomous
|
||||
> ```
|
||||
> You can monitor progress in `.loop/progress.md` and check story status in `.loop/prd.json`.
|
||||
188
skills/loop-plan/SKILL.md
Normal file
188
skills/loop-plan/SKILL.md
Normal file
@@ -0,0 +1,188 @@
|
||||
---
|
||||
name: plan
|
||||
description: Interactive planning session that generates PRD (prd.json) and sprint contracts for the agent loop. Run /agent-loop:init first.
|
||||
---
|
||||
|
||||
# /plan — Generate PRD and Sprint Contracts
|
||||
|
||||
Interactive planning session that produces all artifacts needed for the autonomous agent loop.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `.loop/` directory must exist with `config.json` (run `/agent-loop:init` first if not)
|
||||
- User should have a clear idea of what they want to build/explore/fix
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/loop-plan <optional feature description>
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `/loop-plan Add OAuth authentication with Google and GitHub`
|
||||
- `/loop-plan Explore the payment processing system`
|
||||
- `/loop-plan Fix all critical security issues from the audit`
|
||||
|
||||
## Instructions
|
||||
|
||||
### Step 1: Understand the Request
|
||||
|
||||
If the user provided a feature description, use it. Otherwise ask:
|
||||
> "What would you like to work on? Describe it in 1-3 sentences."
|
||||
|
||||
### Step 2: Codebase Analysis
|
||||
|
||||
Read key project files to understand existing patterns:
|
||||
- Relevant source directories for the feature
|
||||
- Existing tests to understand testing patterns
|
||||
- Configuration files for conventions
|
||||
- Recent git history (`git log --oneline -20`) for active work
|
||||
|
||||
### Step 3: Clarifying Questions
|
||||
|
||||
Ask 3-5 targeted questions based on what you found in the code. These should be questions where the answer isn't obvious from the codebase. Examples:
|
||||
|
||||
- "I see you have both REST endpoints and GraphQL. Should this feature use REST or GraphQL?"
|
||||
- "The existing auth uses JWT. Should I add OAuth alongside it or replace it?"
|
||||
- "I found two competing patterns for data validation. Which should I follow?"
|
||||
|
||||
**Do NOT ask questions you can answer from the code.** Only ask when human judgment is needed.
|
||||
|
||||
### Step 4: Generate PRD (`prd.json`)
|
||||
|
||||
Create `.loop/prd.json` with properly-sized, dependency-ordered stories.
|
||||
|
||||
**Story Sizing Rules (CRITICAL):**
|
||||
- Each story must be completable in ONE context window (~100K tokens of work)
|
||||
- Target: 1-3 files changed per story
|
||||
- Too big: "Build the authentication system" → split into migration, endpoint, middleware, UI, tests
|
||||
- Too small: "Add import statement" → combine with the story that needs it
|
||||
|
||||
**Dependency Ordering:**
|
||||
1. Schema/database changes first (they block everything)
|
||||
2. Backend logic (depends on schema)
|
||||
3. Frontend components (depend on backend)
|
||||
4. Integration/wiring (depends on components)
|
||||
5. Polish/edge cases (depends on core being done)
|
||||
|
||||
**Required Fields Per Story:**
|
||||
```json
|
||||
{
|
||||
"id": "US-001",
|
||||
"title": "Short descriptive title",
|
||||
"description": "As a [role], I want [feature] so that [benefit].",
|
||||
"acceptanceCriteria": [
|
||||
"Specific, verifiable criterion",
|
||||
"Another criterion",
|
||||
"Typecheck passes"
|
||||
],
|
||||
"priority": 1,
|
||||
"passes": false,
|
||||
"notes": "",
|
||||
"rejections": 0
|
||||
}
|
||||
```
|
||||
|
||||
**Acceptance Criteria Rules:**
|
||||
- Every criterion must be independently verifiable (not "works well" — "returns 200 with valid token")
|
||||
- Always include "Typecheck passes" (or equivalent for the language)
|
||||
- UI stories must include "Verify UI renders and responds to interaction"
|
||||
- API stories must include status code expectations
|
||||
- Database stories must include migration success check
|
||||
|
||||
### Step 5: Generate Sprint Contracts
|
||||
|
||||
For each story, create `.loop/contracts/{story-id}.contract.md`:
|
||||
|
||||
```markdown
|
||||
# Sprint Contract: {Story ID} — {Story Title}
|
||||
|
||||
## What Will Be Built
|
||||
Concrete description of the deliverable. Not the user story — the actual thing being built.
|
||||
|
||||
## Done Conditions
|
||||
- [ ] Condition 1 (specific, testable)
|
||||
- [ ] Condition 2
|
||||
- [ ] All acceptance criteria from prd.json met
|
||||
|
||||
## Evaluation Criteria
|
||||
What the evaluator will specifically check:
|
||||
- [ ] Check 1
|
||||
- [ ] Check 2
|
||||
- [ ] No regressions in [specific area]
|
||||
|
||||
## Out of Scope
|
||||
Things explicitly NOT part of this story:
|
||||
- Thing 1
|
||||
- Thing 2
|
||||
|
||||
## Key Files
|
||||
Files likely to be created or modified:
|
||||
- path/to/file.ext — what changes
|
||||
- path/to/other.ext — what changes
|
||||
|
||||
## Dependencies
|
||||
- Depends on: [story IDs that must be done first, or "none"]
|
||||
- Blocks: [story IDs that depend on this one, or "none"]
|
||||
```
|
||||
|
||||
### Step 6: Initialize Progress File
|
||||
|
||||
Create `.loop/progress.md` from the template with an initial Codebase Patterns section populated from what you learned during analysis:
|
||||
|
||||
```markdown
|
||||
# Progress
|
||||
|
||||
## Codebase Patterns
|
||||
|
||||
- [Pattern you discovered during analysis]
|
||||
- [Convention you noticed]
|
||||
- [Testing approach used in the project]
|
||||
|
||||
---
|
||||
|
||||
## Session Log
|
||||
|
||||
### Planning Session
|
||||
Date: YYYY-MM-DD HH:MM
|
||||
|
||||
**PRD created:** {N} stories for "{feature description}"
|
||||
**Estimated iterations:** {N stories + ~30% for evaluator rejections}
|
||||
**Key decisions:**
|
||||
- [Decision 1 and why]
|
||||
- [Decision 2 and why]
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
### Step 7: Present Summary
|
||||
|
||||
Show the user a summary:
|
||||
|
||||
> **Plan Ready**
|
||||
>
|
||||
> | Stories | Est. Iterations | Mode | Branch |
|
||||
> |---------|----------------|------|--------|
|
||||
> | {N} | {N+30%} | {mode} | {branchName} |
|
||||
>
|
||||
> **Story Overview:**
|
||||
> 1. US-001: {title} (priority 1)
|
||||
> 2. US-002: {title} (priority 2)
|
||||
> ...
|
||||
>
|
||||
> Review the stories in `.loop/prd.json` and contracts in `.loop/contracts/`.
|
||||
> Adjust anything you'd like, then run:
|
||||
> ```
|
||||
> /agent-loop:run # Interactive (recommended)
|
||||
> .loop/loop.sh # Headless
|
||||
> ```
|
||||
|
||||
### Step 8: Wait for Feedback
|
||||
|
||||
Let the user review and adjust. They might:
|
||||
- Ask to split a story further
|
||||
- Ask to reorder priorities
|
||||
- Ask to add/remove stories
|
||||
- Ask to change acceptance criteria
|
||||
|
||||
Make the requested changes, then re-present the summary.
|
||||
203
skills/loop-run/SKILL.md
Normal file
203
skills/loop-run/SKILL.md
Normal file
@@ -0,0 +1,203 @@
|
||||
---
|
||||
name: run
|
||||
description: Execute the generator-evaluator loop interactively inside Claude Code. Dispatches subagents with full visibility and intervention capability. Run /agent-loop:init and /agent-loop:plan first.
|
||||
---
|
||||
|
||||
# /run — Execute Agent Loop Inside Claude Code
|
||||
|
||||
Run the generator-evaluator loop natively in Claude Code using subagents. Unlike `loop.sh` (headless), this gives you full visibility into each agent's work and the ability to intervene at any point.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/agent-loop:run # Run until all stories pass or max iterations
|
||||
/agent-loop:run 3 # Run at most 3 iterations
|
||||
/agent-loop:run --skip-eval # Skip evaluator (generator marks stories done)
|
||||
/agent-loop:run --story US-003 # Run only a specific story
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `.loop/config.json` exists (run `/agent-loop:init` first)
|
||||
- `.loop/prd.json` exists with stories (run `/agent-loop:plan` first)
|
||||
|
||||
## Instructions
|
||||
|
||||
When the user invokes `/loop-run`, follow this orchestration sequence exactly.
|
||||
|
||||
### Step 0: Parse Arguments
|
||||
|
||||
- If a number is provided, use it as max iterations. Otherwise read `maxIterations` from `.loop/config.json`.
|
||||
- If `--skip-eval` is provided, skip the evaluator pass.
|
||||
- If `--story <ID>` is provided, only work on that specific story.
|
||||
|
||||
### Step 1: Load State
|
||||
|
||||
1. Read `.loop/config.json` — get `mode`, `maxIterations`, `evalRetries`, `scopeBudgets`
|
||||
2. Read `.loop/prd.json` — get the story list and their statuses
|
||||
3. Check `.loop/progress.md` exists; if not, create it from `.loop/templates/progress.md.template`
|
||||
|
||||
Report to the user:
|
||||
|
||||
> **Loop Ready**
|
||||
> - Mode: {mode}
|
||||
> - Stories: {passed}/{total} complete
|
||||
> - Max iterations: {N}
|
||||
> - Eval: {on/off}
|
||||
>
|
||||
> Starting loop. You can interrupt me at any time to adjust course.
|
||||
|
||||
### Step 2: Iteration Loop
|
||||
|
||||
For each iteration (1 to max iterations):
|
||||
|
||||
#### 2a. Find Next Story
|
||||
|
||||
Find the highest-priority story in `prd.json` where `passes` is `false` and `blocked` is not `true`. If `--story` was specified, use that story instead.
|
||||
|
||||
**If no actionable story remains:**
|
||||
- If all stories have `passes: true` → report success and stop
|
||||
- If some stories are `blocked: true` → report which are blocked and suggest `/agent-loop:triage`
|
||||
- Stop the loop
|
||||
|
||||
#### 2b. Report Iteration Start
|
||||
|
||||
Tell the user:
|
||||
> **Iteration {N}/{max} — {story.id}: {story.title}**
|
||||
|
||||
If the story has `[REJECTED]` entries in its `notes` field, summarize the previous feedback so the user has context.
|
||||
|
||||
#### 2c. Assemble Generator Prompt
|
||||
|
||||
Read these files and concatenate them with `---` separators:
|
||||
1. `.loop/prompts/generator/_base.md`
|
||||
2. `.loop/prompts/generator/{mode}.md`
|
||||
|
||||
Then substitute these template variables in the assembled text:
|
||||
- `{{MAX_FILES_TO_READ}}` → from `config.scopeBudgets.{mode}.maxFilesToRead`
|
||||
- `{{MAX_LINES_TO_WRITE}}` → from `config.scopeBudgets.{mode}.maxLinesToWrite`
|
||||
- `{{MAX_FILES_TO_MODIFY}}` → from `config.scopeBudgets.{mode}.maxFilesToModify`
|
||||
- `{{MODE}}` → the mode
|
||||
- `{{ITERATION}}` → current iteration number
|
||||
- `{{MAX_ITERATIONS}}` → max iterations
|
||||
- `{{LOOP_DIR}}` → path to `.loop/` directory
|
||||
- `{{PROJECT_ROOT}}` → project root path
|
||||
- `{{CURRENT_STORY_ID}}` → the story ID being worked on
|
||||
|
||||
#### 2d. Capture Pre-Generator Git State
|
||||
|
||||
Run `git rev-parse HEAD` and save it. This is needed for the evaluator's diff.
|
||||
|
||||
#### 2e. Dispatch Generator Agent
|
||||
|
||||
Use the **Agent tool** to launch the generator:
|
||||
|
||||
```
|
||||
Agent(
|
||||
prompt: <assembled generator prompt>,
|
||||
description: "Generator: {story.id}",
|
||||
subagent_type: "general-purpose",
|
||||
mode: "auto"
|
||||
)
|
||||
```
|
||||
|
||||
**IMPORTANT:** Use `mode: "auto"` so the user can see tool calls but isn't prompted for every action. If the user has expressed a preference for more control, use `mode: "default"` instead.
|
||||
|
||||
Wait for the agent to complete. The Agent tool returns the generator's final output.
|
||||
|
||||
#### 2f. Check for Completion Signal
|
||||
|
||||
If the generator output contains `<promise>COMPLETE</promise>`, report all stories complete and stop.
|
||||
|
||||
#### 2g. Skip Evaluator (if configured)
|
||||
|
||||
If `--skip-eval` was specified or `config.skipEval` is true, skip to step 2j.
|
||||
|
||||
#### 2h. Assemble Evaluator Prompt
|
||||
|
||||
Read these files and concatenate them:
|
||||
1. `.loop/prompts/evaluator/_base.md`
|
||||
2. `.loop/prompts/evaluator/{mode}.md`
|
||||
|
||||
Substitute the same template variables as the generator, plus:
|
||||
- `{{PRE_GENERATOR_SHA}}` → the git SHA captured in step 2d
|
||||
- `{{CURRENT_STORY_ID}}` → the story ID
|
||||
|
||||
#### 2i. Dispatch Evaluator Agent
|
||||
|
||||
Use the **Agent tool** to launch the evaluator:
|
||||
|
||||
```
|
||||
Agent(
|
||||
prompt: <assembled evaluator prompt>,
|
||||
description: "Evaluator: {story.id}",
|
||||
subagent_type: "general-purpose",
|
||||
mode: "auto"
|
||||
)
|
||||
```
|
||||
|
||||
Wait for completion. Parse the verdict from the output:
|
||||
|
||||
- Look for `<verdict>PASS</verdict>` → story passes
|
||||
- Look for `<verdict>REJECT</verdict>` → story rejected; extract reason from `<rejection_reason>...</rejection_reason>`
|
||||
- No verdict tag found → treat as REJECT (fail-safe)
|
||||
|
||||
#### 2j. Update State Based on Verdict
|
||||
|
||||
**On PASS (or skip-eval):**
|
||||
1. Update `.loop/prd.json` — set `passes: true` for the story
|
||||
2. Report to user: ✓ **{story.id} PASSED**
|
||||
|
||||
**On REJECT:**
|
||||
1. Update `.loop/prd.json`:
|
||||
- Keep `passes: false`
|
||||
- Increment `rejections` count
|
||||
- Append `[REJECTED] {reason}` to `notes`
|
||||
2. Report to user: ✗ **{story.id} REJECTED** — {reason}
|
||||
3. Check if `rejections` >= `evalRetries` from config:
|
||||
- If yes: set `blocked: true` in prd.json, append `[BLOCKED]` to notes
|
||||
- Report: ⚠ **{story.id} BLOCKED** — rejected {N} times, needs human review
|
||||
|
||||
#### 2k. Append Progress Entry
|
||||
|
||||
Append to `.loop/progress.md`:
|
||||
|
||||
```markdown
|
||||
### {story.id} — {story.title}
|
||||
Date: {current date and time}
|
||||
Iteration: {N}
|
||||
Verdict: {PASS/REJECT/SKIP-EVAL}
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
#### 2l. Report Iteration Summary
|
||||
|
||||
Show current story counts: `{passed}/{total} stories complete`
|
||||
|
||||
If there are more iterations and more stories, continue to the next iteration.
|
||||
|
||||
### Step 3: Loop Exit
|
||||
|
||||
When the loop ends (all stories done, max iterations, or all remaining blocked), report:
|
||||
|
||||
> **Loop Complete**
|
||||
> - Iterations used: {N}
|
||||
> - Stories: {passed}/{total} complete, {blocked} blocked
|
||||
> - {Suggest `/agent-loop:triage` if anything is blocked or incomplete}
|
||||
|
||||
### Error Handling
|
||||
|
||||
- If an Agent subagent fails or returns empty output, log a warning and continue to the next iteration. Do NOT stop the loop for a single agent failure.
|
||||
- If `prd.json` cannot be parsed, stop immediately and report the error.
|
||||
- If the user interrupts (denies a tool call, says "stop", etc.), gracefully end the loop and report current status.
|
||||
|
||||
### Key Differences from loop.sh
|
||||
|
||||
| Feature | loop.sh | /loop-run |
|
||||
|---------|---------|-----------|
|
||||
| Execution | Headless (`claude --print`) | Visible in Claude Code |
|
||||
| Intervention | Kill the process | Deny tool calls, chat mid-loop |
|
||||
| Permissions | `--dangerously-skip-permissions` | User-controlled |
|
||||
| Context | Fresh process per agent | Fresh Agent subagent per agent |
|
||||
| State updates | Shell functions | Claude Code reads/writes files directly |
|
||||
83
skills/loop-triage/SKILL.md
Normal file
83
skills/loop-triage/SKILL.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
name: triage
|
||||
description: Generate a human handoff brief summarizing loop status — completed, blocked, and remaining stories with recommended next steps.
|
||||
---
|
||||
|
||||
# /triage — Generate Human Handoff Brief
|
||||
|
||||
Generate a triage brief summarizing the current state of a loop run. Use this when:
|
||||
- The loop hit max iterations without completing
|
||||
- You want a status check mid-run
|
||||
- You're handing off to another developer
|
||||
|
||||
## Instructions
|
||||
|
||||
When the user invokes `/loop-triage`:
|
||||
|
||||
### Step 1: Read Current State
|
||||
|
||||
1. Read `.loop/prd.json` — get story statuses
|
||||
2. Read `.loop/progress.md` — get session log and patterns
|
||||
3. Read `.loop/config.json` — get mode and iteration settings
|
||||
4. Check git log for recent commits on the loop branch
|
||||
|
||||
### Step 2: Analyze
|
||||
|
||||
For each story, determine:
|
||||
- **Complete**: `passes: true`, verified by evaluator
|
||||
- **In Progress**: `passes: false`, has been attempted (check progress.md for entries)
|
||||
- **Blocked**: `passes: false`, rejected multiple times (check `rejections` count and `notes`)
|
||||
- **Not Started**: `passes: false`, no progress.md entries, no rejections
|
||||
|
||||
### Step 3: Generate Brief
|
||||
|
||||
Write to `.loop/triage/TRIAGE_BRIEF.md`:
|
||||
|
||||
```markdown
|
||||
# Triage Brief
|
||||
|
||||
Generated: {current date and time}
|
||||
Mode: {mode from config.json}
|
||||
Branch: {branchName from prd.json}
|
||||
|
||||
## Status Summary
|
||||
|
||||
- **Complete:** {N} stories
|
||||
- **In Progress:** {N} stories
|
||||
- **Blocked:** {N} stories (hit retry limit)
|
||||
- **Not Started:** {N} stories
|
||||
|
||||
## Story Details
|
||||
|
||||
| ID | Title | Status | Rejections | Notes |
|
||||
|----|-------|--------|------------|-------|
|
||||
| US-001 | ... | Complete | 0 | |
|
||||
| US-002 | ... | Blocked | 3 | Evaluator rejected: ... |
|
||||
| US-003 | ... | Not Started | 0 | |
|
||||
|
||||
## Key Patterns Discovered
|
||||
|
||||
{Copy the Codebase Patterns section from progress.md}
|
||||
|
||||
## Blocked Stories — Analysis
|
||||
|
||||
For each blocked story, summarize:
|
||||
- What was attempted
|
||||
- Why it was rejected (from notes field)
|
||||
- Suggested approach for a human to unblock it
|
||||
|
||||
## Recommended Next Steps
|
||||
|
||||
Based on the current state:
|
||||
1. {Most important next action}
|
||||
2. {Second priority}
|
||||
3. {Third priority}
|
||||
|
||||
## Files Modified
|
||||
|
||||
{List all files changed across all commits on the loop branch, with brief descriptions}
|
||||
```
|
||||
|
||||
### Step 4: Present to User
|
||||
|
||||
Show the summary inline and tell the user where the full brief is saved.
|
||||
35
templates/contract.md.template
Normal file
35
templates/contract.md.template
Normal file
@@ -0,0 +1,35 @@
|
||||
# Sprint Contract: {{STORY_ID}} — {{STORY_TITLE}}
|
||||
|
||||
## What Will Be Built
|
||||
|
||||
<!-- Concrete description of the deliverable -->
|
||||
|
||||
## Done Conditions
|
||||
|
||||
- [ ] <!-- Specific, testable condition -->
|
||||
- [ ] <!-- Another condition -->
|
||||
- [ ] All acceptance criteria from prd.json met
|
||||
- [ ] Quality gates pass (typecheck, lint, test)
|
||||
|
||||
## Evaluation Criteria
|
||||
|
||||
What the evaluator will specifically check:
|
||||
- [ ] <!-- Specific verification step -->
|
||||
- [ ] <!-- Another verification step -->
|
||||
- [ ] No regressions in existing functionality
|
||||
|
||||
## Out of Scope
|
||||
|
||||
Things explicitly NOT part of this story:
|
||||
- <!-- Thing 1 -->
|
||||
- <!-- Thing 2 -->
|
||||
|
||||
## Key Files
|
||||
|
||||
Files likely to be created or modified:
|
||||
- <!-- path/to/file.ext — what changes -->
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Depends on: <!-- story IDs or "none" -->
|
||||
- Blocks: <!-- story IDs or "none" -->
|
||||
54
templates/prd.json.example
Normal file
54
templates/prd.json.example
Normal file
@@ -0,0 +1,54 @@
|
||||
{
|
||||
"project": "MyApp",
|
||||
"branchName": "loop/add-user-auth",
|
||||
"description": "Add user authentication with OAuth providers",
|
||||
"userStories": [
|
||||
{
|
||||
"id": "US-001",
|
||||
"title": "Add users table with OAuth fields",
|
||||
"description": "As a developer, I need a users table that stores OAuth provider info so we can persist authenticated users.",
|
||||
"acceptanceCriteria": [
|
||||
"Create users table with id, email, name, oauth_provider, oauth_id, created_at columns",
|
||||
"Generate and run migration successfully",
|
||||
"Typecheck passes",
|
||||
"Unit test for model creation passes"
|
||||
],
|
||||
"priority": 1,
|
||||
"passes": false,
|
||||
"notes": "",
|
||||
"rejections": 0
|
||||
},
|
||||
{
|
||||
"id": "US-002",
|
||||
"title": "Implement OAuth callback endpoint",
|
||||
"description": "As a user, I want to sign in with Google so I can access my account without creating a password.",
|
||||
"acceptanceCriteria": [
|
||||
"GET /auth/callback accepts OAuth authorization code",
|
||||
"Exchanges code for access token with provider",
|
||||
"Creates or updates user record",
|
||||
"Returns JWT session token",
|
||||
"Typecheck passes",
|
||||
"Integration test for OAuth flow passes"
|
||||
],
|
||||
"priority": 2,
|
||||
"passes": false,
|
||||
"notes": "",
|
||||
"rejections": 0
|
||||
},
|
||||
{
|
||||
"id": "US-003",
|
||||
"title": "Add login page with OAuth button",
|
||||
"description": "As a user, I want a login page with a 'Sign in with Google' button so I can authenticate.",
|
||||
"acceptanceCriteria": [
|
||||
"Login page renders with OAuth button",
|
||||
"Button redirects to provider authorization URL",
|
||||
"Typecheck passes",
|
||||
"Verify UI renders correctly in browser"
|
||||
],
|
||||
"priority": 3,
|
||||
"passes": false,
|
||||
"notes": "",
|
||||
"rejections": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
13
templates/progress.md.template
Normal file
13
templates/progress.md.template
Normal file
@@ -0,0 +1,13 @@
|
||||
# Progress
|
||||
|
||||
## Codebase Patterns
|
||||
|
||||
<!-- Consolidate reusable patterns discovered across iterations here.
|
||||
This section is READ FIRST by every new iteration.
|
||||
Keep it concise — patterns only, not implementation details. -->
|
||||
|
||||
---
|
||||
|
||||
## Session Log
|
||||
|
||||
<!-- Each iteration appends a progress entry below. Never delete entries. -->
|
||||
29
templates/triage-brief.md.template
Normal file
29
templates/triage-brief.md.template
Normal file
@@ -0,0 +1,29 @@
|
||||
# Triage Brief
|
||||
|
||||
Generated: <!-- insert current date and time -->
|
||||
Mode: <!-- insert mode from config.json -->
|
||||
Iterations completed: <!-- insert completed count --> of <!-- insert max iterations -->
|
||||
|
||||
## Status
|
||||
|
||||
Stories: <!-- insert completed count --> of <!-- insert total count --> complete
|
||||
|
||||
| ID | Title | Status | Rejections |
|
||||
|----|-------|--------|------------|
|
||||
<!-- Populated by /loop-triage skill -->
|
||||
|
||||
## Key Findings
|
||||
|
||||
<!-- Consolidated from progress.md session log -->
|
||||
|
||||
## Blockers
|
||||
|
||||
<!-- Stories that failed repeatedly or hit retry limits -->
|
||||
|
||||
## Recommended Next Steps
|
||||
|
||||
<!-- What a human should do next -->
|
||||
|
||||
## Files of Interest
|
||||
|
||||
<!-- Key files discovered or modified, with context -->
|
||||
Reference in New Issue
Block a user