Compare commits
24 Commits
loop/githu
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| ecfbd0bb37 | |||
| 344b179b4d | |||
| b516492a91 | |||
| a1a3dfbd63 | |||
| bab002b927 | |||
| 71b00cf11f | |||
| 1bd8004854 | |||
| ad58a49182 | |||
| ce111b4cbe | |||
| 77fd9e0cd6 | |||
| 1efca3c185 | |||
| e4df81fdac | |||
| 6833d94cf4 | |||
| c293f53d90 | |||
| 9fd428ac51 | |||
| c46de6815c | |||
| b4d4e1952a | |||
| 60ce0fef54 | |||
| f26bdce534 | |||
| 2dc291aac4 | |||
| 1d059e218b | |||
| 80b0f0f4c1 | |||
| 5e4ad3b12e | |||
| 9a7fa3a1bd |
@@ -10,7 +10,7 @@
|
||||
"name": "agent-loop",
|
||||
"source": "./",
|
||||
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.",
|
||||
"version": "0.8.0",
|
||||
"version": "0.12.0",
|
||||
"author": {
|
||||
"name": "Sheldon"
|
||||
},
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "agent-loop",
|
||||
"version": "0.8.0",
|
||||
"version": "0.12.0",
|
||||
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Run /agent-loop:run to start.",
|
||||
"author": {
|
||||
"name": "Sheldon"
|
||||
|
||||
60
README.md
60
README.md
@@ -8,10 +8,7 @@ A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each i
|
||||
|
||||
## Install
|
||||
|
||||
### As a Claude Code Plugin (Recommended)
|
||||
|
||||
```
|
||||
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
|
||||
/plugin install agent-loop@agent-loop
|
||||
```
|
||||
|
||||
@@ -23,16 +20,18 @@ Then in any project:
|
||||
|
||||
That's it. The single command handles setup, planning, and execution.
|
||||
|
||||
### Manual Install
|
||||
## Prerequisites
|
||||
|
||||
```bash
|
||||
cp -r /path/to/loop-loop .loop
|
||||
```
|
||||
|
||||
Then run `.loop/loop.sh` directly.
|
||||
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) CLI installed
|
||||
- `tmux` available (used to run the loop in a detachable session)
|
||||
- `jq` or `python3` (for JSON state management)
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Write a spec describing what you want to build (`SPEC.md`, `docs/specs/*.md`, or similar). You can write it yourself, ask Claude to draft one, or use planning tools like `/plan`.
|
||||
2. Run `/agent-loop:run` — it scaffolds `.loop/`, generates stories from your spec, and presents them for review
|
||||
3. Say "go" — the loop launches in tmux and runs autonomously
|
||||
|
||||
```
|
||||
/agent-loop:run
|
||||
├─ Phase 1: Scaffold .loop/ (if needed)
|
||||
@@ -50,7 +49,7 @@ Then run `.loop/loop.sh` directly.
|
||||
|
||||
| Mode | What it does | Git writes? |
|
||||
|------|-------------|-------------|
|
||||
| **implement** | Build features from a PRD | Yes |
|
||||
| **implement** | Build features from a spec | Yes |
|
||||
| **explore** | Read-only codebase analysis | No |
|
||||
| **fix** | Targeted bug fixes / tech debt | Yes |
|
||||
|
||||
@@ -73,28 +72,15 @@ Or ask Claude Code "status" — it reads `.loop/prd.json` and `.loop/progress.md
|
||||
|
||||
Each generator and evaluator run is a full Claude Code session saved to history. Use `claude -r` to resume any session and inspect what happened, debug a rejection, or continue from where it left off.
|
||||
|
||||
## Headless Mode
|
||||
|
||||
For CI or background execution without the interactive UI:
|
||||
|
||||
```bash
|
||||
.loop/loop.sh --headless [options]
|
||||
|
||||
--mode <implement|explore|fix> Operating mode
|
||||
--max <N> Maximum iterations (default: 20)
|
||||
--skip-eval Skip evaluator pass
|
||||
--dry-run Print assembled prompts without running
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Generator
|
||||
Fresh Claude Code session each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.
|
||||
Fresh Claude Code session each iteration. Follows a strict startup sequence: reads progress.md, finds the next story from prd.json, reads the sprint contract, checks for evaluator feedback, reviews git history, and runs a smoke test if available — all before writing any code. Then implements the story, runs quality gates, commits, and marks it done.
|
||||
|
||||
### Evaluator
|
||||
Separate fresh session after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests and the application, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback.
|
||||
Separate fresh session after each generator pass. Skeptically verifies the work: checks each acceptance criterion against actual code with file paths and line numbers, runs tests, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback.
|
||||
|
||||
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.
|
||||
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction and few-shot calibration examples.
|
||||
|
||||
### Sprint Contracts
|
||||
Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
|
||||
@@ -109,27 +95,9 @@ Before the loop starts, the planner generates contracts for each story. These de
|
||||
| `config.json` | Harness configuration |
|
||||
| Git commits | Code changes with story-tagged messages |
|
||||
|
||||
## Optional: Runtime Testing Tools
|
||||
## Runtime Verification
|
||||
|
||||
The evaluator verifies code actually runs, not just that it looks correct. It uses whatever tools are available. For richer verification, install these optional MCP servers:
|
||||
|
||||
**Web projects (Playwright):**
|
||||
```bash
|
||||
claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
|
||||
```
|
||||
|
||||
**iOS/Xcode projects (XcodeBuildMCP):**
|
||||
```bash
|
||||
brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp
|
||||
claude mcp add xcodebuild -- xcodebuildmcp
|
||||
```
|
||||
|
||||
**iOS Simulator interaction:**
|
||||
```bash
|
||||
claude mcp add ios-simulator -- npx -y ios-simulator-mcp
|
||||
```
|
||||
|
||||
These are optional — the evaluator works without them but may miss runtime-only issues.
|
||||
The evaluator doesn't just read diffs — it runs tests, builds the project, and checks for runtime errors using whatever tools the project already has (test runners, linters, build commands).
|
||||
|
||||
## Design Principles
|
||||
|
||||
|
||||
@@ -11,12 +11,16 @@ You are a planner agent for the agent loop harness. Your job is to decompose a f
|
||||
|
||||
## CONSTRAINTS
|
||||
|
||||
- You may ONLY write files inside the `.loop/` directory
|
||||
- You may ONLY write files inside the `.loop/` directory (or the absolute loop directory path if one is provided)
|
||||
- You may NOT write any project source code (.js, .ts, .py, .go, .rs, .html, .css, etc.)
|
||||
- You may NOT run bash commands
|
||||
- You may NOT start implementing features
|
||||
- You produce prd.json and contracts, then STOP
|
||||
|
||||
## OUTPUT DIRECTORY
|
||||
|
||||
If the prompt specifies an absolute path for the loop directory (e.g., "Write all files to /path/to/worktree/.loop/"), use that absolute path for ALL file writes. Otherwise, use the relative `.loop/` path.
|
||||
|
||||
## YOUR TASK
|
||||
|
||||
You will be given a feature spec or description. Decompose it into stories.
|
||||
|
||||
@@ -102,5 +102,5 @@ echo " Next steps (inside Claude Code, in any project):"
|
||||
echo ""
|
||||
echo " /agent-loop:run # Single command — setup, plan, and run"
|
||||
echo ""
|
||||
echo " Or run headless: .loop/loop.sh"
|
||||
echo " Or run directly: .loop/loop.sh"
|
||||
echo ""
|
||||
|
||||
152
lib/archive.sh
152
lib/archive.sh
@@ -1,11 +1,17 @@
|
||||
#!/bin/bash
|
||||
# Branch archiving — archives previous run artifacts when the branch changes.
|
||||
# Preserves prd.json, progress.md, and contracts from the previous feature.
|
||||
# Run archiving — preserves prd.json, progress.md, and contracts from completed runs.
|
||||
#
|
||||
# Design: At the end of each run, snapshot_for_archive saves current artifacts
|
||||
# to .archive-staging/. On the next run, if the branch changed, check_archive
|
||||
# moves the snapshot to archive/ and cleans up. This avoids archiving the
|
||||
# WRONG artifacts (the new feature's) when prd.json has already been overwritten.
|
||||
# Two archive triggers:
|
||||
# 1. Branch change: check_archive detects a new branch and archives the staged snapshot.
|
||||
# 2. Completed run: archive_and_reset is called by the /run skill when prd.json shows
|
||||
# all stories passed (or the branch was deleted). This handles the common workflow
|
||||
# of merging a feature branch back to main and starting a new feature.
|
||||
#
|
||||
# Archive layout:
|
||||
# .loop/archive/
|
||||
# runs.log — one-line-per-run index for quick lookup
|
||||
# 2026-03-15-auth-system/ — full artifacts from that run
|
||||
# prd.json, progress.md, contracts/
|
||||
|
||||
LAST_BRANCH_FILE="$LOOP_DIR/.last-branch"
|
||||
STAGING_DIR="$LOOP_DIR/.archive-staging"
|
||||
@@ -85,5 +91,139 @@ archive_run() {
|
||||
rm -f "$LOOP_DIR/progress.md"
|
||||
rm -rf "$LOOP_DIR/contracts"
|
||||
|
||||
append_runs_log "$branch_name" "$archive_dir"
|
||||
log "Archived previous run to $archive_dir"
|
||||
}
|
||||
|
||||
# Archive current run artifacts and reset for a new run.
|
||||
# Called by the /run skill when a completed run is detected (all stories passed
|
||||
# or the feature branch no longer exists). Unlike check_archive (which reads from
|
||||
# staging), this archives the LIVE artifacts directly since we know they belong
|
||||
# to the completed run.
|
||||
archive_and_reset() {
|
||||
local loop_dir="${1:-.loop}"
|
||||
local prd="$loop_dir/prd.json"
|
||||
|
||||
[ -f "$prd" ] || return 0
|
||||
|
||||
# Read branch name from current prd.json
|
||||
local branch_name=""
|
||||
if command -v jq &>/dev/null; then
|
||||
branch_name=$(jq -r '.branchName // empty' "$prd" 2>/dev/null)
|
||||
elif command -v python3 &>/dev/null; then
|
||||
branch_name=$(LOOP_PRD="$prd" python3 -c "
|
||||
import json, os
|
||||
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
|
||||
" 2>/dev/null)
|
||||
fi
|
||||
|
||||
local feature_name
|
||||
feature_name=$(echo "${branch_name:-unknown}" | sed 's|.*/||')
|
||||
|
||||
local archive_dir="$loop_dir/archive/$(date +%Y-%m-%d)-${feature_name}"
|
||||
mkdir -p "$archive_dir"
|
||||
|
||||
# Archive live artifacts
|
||||
[ -f "$prd" ] && cp "$prd" "$archive_dir/"
|
||||
[ -f "$loop_dir/progress.md" ] && cp "$loop_dir/progress.md" "$archive_dir/"
|
||||
[ -f "$loop_dir/progress-archive.md" ] && cp "$loop_dir/progress-archive.md" "$archive_dir/"
|
||||
[ -d "$loop_dir/contracts" ] && cp -r "$loop_dir/contracts" "$archive_dir/"
|
||||
|
||||
# Verify archive has content before deleting originals
|
||||
if ! find "$archive_dir" -maxdepth 1 -type f | read -r; then
|
||||
echo "[archive] WARNING: Archive directory is empty — skipping reset to prevent data loss"
|
||||
return 1
|
||||
fi
|
||||
|
||||
append_runs_log "$branch_name" "$archive_dir"
|
||||
|
||||
# Reset run-specific files (keep config.json, init.sh, harness files)
|
||||
rm -f "$loop_dir/prd.json"
|
||||
rm -f "$loop_dir/progress.md"
|
||||
rm -f "$loop_dir/progress-archive.md"
|
||||
rm -rf "$loop_dir/contracts"
|
||||
rm -rf "$loop_dir/.archive-staging"
|
||||
rm -f "$loop_dir/.last-branch"
|
||||
rm -f "$loop_dir/.verdict"
|
||||
|
||||
echo "[archive] Archived completed run to $archive_dir"
|
||||
echo "[archive] .loop/ reset — ready for new stories"
|
||||
}
|
||||
|
||||
# Archive a completed run from a worktree back to the main project's .loop/archive/.
|
||||
# Called by the /run skill's completion handler after the loop finishes in a worktree.
|
||||
#
|
||||
# Usage: archive_from_worktree <worktree_loop_dir> <main_loop_dir>
|
||||
# worktree_loop_dir: absolute path to the worktree's .loop/ (source)
|
||||
# main_loop_dir: absolute path to the main project's .loop/ (destination)
|
||||
archive_from_worktree() {
|
||||
local wt_loop_dir="$1"
|
||||
local main_loop_dir="$2"
|
||||
local wt_prd="$wt_loop_dir/prd.json"
|
||||
|
||||
[ -f "$wt_prd" ] || { echo "[archive] WARNING: No prd.json in worktree — nothing to archive"; return 1; }
|
||||
|
||||
# Read branch name from worktree's prd.json
|
||||
local branch_name=""
|
||||
if command -v jq &>/dev/null; then
|
||||
branch_name=$(jq -r '.branchName // empty' "$wt_prd" 2>/dev/null)
|
||||
elif command -v python3 &>/dev/null; then
|
||||
branch_name=$(LOOP_PRD="$wt_prd" python3 -c "
|
||||
import json, os
|
||||
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
|
||||
" 2>/dev/null)
|
||||
fi
|
||||
|
||||
local feature_name
|
||||
feature_name=$(echo "${branch_name:-unknown}" | sed 's|.*/||')
|
||||
|
||||
local archive_dir="$main_loop_dir/archive/$(date +%Y-%m-%d)-${feature_name}"
|
||||
mkdir -p "$archive_dir"
|
||||
|
||||
# Copy artifacts from worktree
|
||||
[ -f "$wt_prd" ] && cp "$wt_prd" "$archive_dir/"
|
||||
[ -f "$wt_loop_dir/progress.md" ] && cp "$wt_loop_dir/progress.md" "$archive_dir/"
|
||||
[ -f "$wt_loop_dir/progress-archive.md" ] && cp "$wt_loop_dir/progress-archive.md" "$archive_dir/"
|
||||
[ -d "$wt_loop_dir/contracts" ] && cp -r "$wt_loop_dir/contracts" "$archive_dir/"
|
||||
[ -d "$wt_loop_dir/triage" ] && cp -r "$wt_loop_dir/triage" "$archive_dir/"
|
||||
|
||||
# Verify archive has content
|
||||
if ! find "$archive_dir" -maxdepth 1 -type f | read -r; then
|
||||
echo "[archive] WARNING: Archive directory is empty — copy may have failed"
|
||||
return 1
|
||||
fi
|
||||
|
||||
append_runs_log "$branch_name" "$archive_dir"
|
||||
|
||||
echo "[archive] Archived worktree run to $archive_dir"
|
||||
}
|
||||
|
||||
# Append a one-line summary to the runs log.
|
||||
append_runs_log() {
|
||||
local branch_name="$1"
|
||||
local archive_dir="$2"
|
||||
local runs_log
|
||||
runs_log="$(dirname "$archive_dir")/runs.log"
|
||||
|
||||
# Read story counts from the archived prd.json
|
||||
local total=0 passed=0 blocked=0
|
||||
local archived_prd="$archive_dir/prd.json"
|
||||
if [ -f "$archived_prd" ]; then
|
||||
if command -v jq &>/dev/null; then
|
||||
total=$(jq '.userStories | length' "$archived_prd" 2>/dev/null || echo 0)
|
||||
passed=$(jq '[.userStories[] | select(.passes == true)] | length' "$archived_prd" 2>/dev/null || echo 0)
|
||||
blocked=$(jq '[.userStories[] | select(.blocked == true)] | length' "$archived_prd" 2>/dev/null || echo 0)
|
||||
elif command -v python3 &>/dev/null; then
|
||||
eval "$(LOOP_PRD="$archived_prd" python3 -c "
|
||||
import json, os
|
||||
d = json.load(open(os.environ['LOOP_PRD']))
|
||||
s = d.get('userStories', [])
|
||||
print(f'total={len(s)} passed={sum(1 for x in s if x.get(\"passes\"))} blocked={sum(1 for x in s if x.get(\"blocked\"))}')
|
||||
" 2>/dev/null)" || true
|
||||
fi
|
||||
fi
|
||||
|
||||
printf '%s %-30s %s/%s passed %s blocked\n' \
|
||||
"$(date +%Y-%m-%d)" "${branch_name:-unknown}" "$passed" "$total" "$blocked" \
|
||||
>> "$runs_log"
|
||||
}
|
||||
|
||||
17
lib/hooks.sh
17
lib/hooks.sh
@@ -7,9 +7,18 @@
|
||||
#
|
||||
# Without this hook, claude would exit to an interactive prompt instead of
|
||||
# returning control to the loop script.
|
||||
#
|
||||
# IMPORTANT: The hook is scoped to only fire inside the agent-loop tmux session.
|
||||
# Without this guard, ANY Claude Code session opened in the same project directory
|
||||
# would pick up the hook and kill its own parent shell on exit.
|
||||
|
||||
SETTINGS_FILE="${PROJECT_ROOT}/.claude/settings.local.json"
|
||||
|
||||
# The hook checks AGENT_LOOP_ACTIVE before killing. This env var is exported by
|
||||
# loop.sh and inherited by CC sessions it spawns. Interactive CC sessions in the
|
||||
# same project won't have it set, so the hook is a no-op for them.
|
||||
HOOK_COMMAND='[ "${AGENT_LOOP_ACTIVE:-}" = "1" ] && kill -INT $PPID || true'
|
||||
|
||||
install_hooks() {
|
||||
if [ ! -f "$SETTINGS_FILE" ]; then
|
||||
mkdir -p "$(dirname "$SETTINGS_FILE")"
|
||||
@@ -17,14 +26,16 @@ install_hooks() {
|
||||
fi
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
jq '.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": "kill -INT $PPID || true"}]}]' \
|
||||
jq --arg cmd "$HOOK_COMMAND" \
|
||||
'.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": $cmd}]}]' \
|
||||
"$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE"
|
||||
else
|
||||
LOOP_SETTINGS="$SETTINGS_FILE" python3 -c "
|
||||
LOOP_HOOK_CMD="$HOOK_COMMAND" LOOP_SETTINGS="$SETTINGS_FILE" python3 -c "
|
||||
import json, os
|
||||
p = os.environ['LOOP_SETTINGS']
|
||||
cmd = os.environ['LOOP_HOOK_CMD']
|
||||
s = json.load(open(p)) if os.path.exists(p) else {}
|
||||
s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': 'kill -INT \$PPID || true'}]}]
|
||||
s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': cmd}]}]
|
||||
json.dump(s, open(p, 'w'), indent=2)
|
||||
"
|
||||
fi
|
||||
|
||||
157
loop.sh
157
loop.sh
@@ -13,7 +13,6 @@
|
||||
# --no-hooks Don't install stop hooks
|
||||
# --dry-run Print assembled prompts without running agents
|
||||
# --resume Skip already-passed stories (explicit mode)
|
||||
# --replan (reserved — not yet implemented)
|
||||
#
|
||||
# Each iteration:
|
||||
# 1. Generator: picks highest-priority incomplete story, does the work
|
||||
@@ -81,23 +80,6 @@ if ! command -v jq &>/dev/null && ! command -v python3 &>/dev/null; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- macOS timeout compatibility ---
|
||||
# macOS doesn't have GNU timeout. Use gtimeout (from coreutils) or a perl fallback.
|
||||
if ! command -v timeout &>/dev/null; then
|
||||
if command -v gtimeout &>/dev/null; then
|
||||
timeout() { gtimeout "$@"; }
|
||||
else
|
||||
# Perl-based fallback: runs command with alarm signal
|
||||
timeout() {
|
||||
local duration="$1"; shift
|
||||
perl -e '
|
||||
alarm shift @ARGV;
|
||||
exec @ARGV;
|
||||
' "$duration" "$@"
|
||||
}
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- Load config defaults ---
|
||||
CONFIG_FILE="$LOOP_DIR/config.json"
|
||||
config_default() { get_config_value "$1" "$2"; }
|
||||
@@ -122,15 +104,14 @@ while [[ $# -gt 0 ]]; do
|
||||
--tool=*) TOOL="${1#*=}"; shift ;;
|
||||
--no-hooks) AUTO_HOOKS=false; shift ;;
|
||||
--dry-run) DRY_RUN=true; shift ;;
|
||||
--headless) export LOOP_HEADLESS=true; shift ;;
|
||||
--resume) RESUME=true; shift ;;
|
||||
--replan) log "ERROR: --replan is not yet implemented. Use /agent-loop:stories interactively."; exit 1 ;;
|
||||
[0-9]*) MAX_ITERATIONS="$1"; shift ;;
|
||||
*) log "Unknown option: $1"; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
export ITERATION=0 MAX_ITERATIONS MODE
|
||||
export AGENT_LOOP_ACTIVE=1
|
||||
|
||||
# --- Validate ---
|
||||
if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
|
||||
@@ -147,7 +128,6 @@ fi
|
||||
cd "$PROJECT_ROOT"
|
||||
|
||||
cleanup() {
|
||||
[ -n "${LOOP_AGENT_TMPFILE:-}" ] && rm -f "$LOOP_AGENT_TMPFILE"
|
||||
# Remove hooks in case we exit mid-agent (Ctrl+C during a claude session)
|
||||
[ "$AUTO_HOOKS" = true ] && remove_hooks 2>/dev/null
|
||||
release_lock
|
||||
@@ -178,10 +158,11 @@ finish() {
|
||||
read -r -t 30 2>/dev/null || true
|
||||
exit "$exit_code"
|
||||
}
|
||||
LOOP_AGENT_TMPFILE=""
|
||||
|
||||
# NOTE: Stop hook is installed/removed per-agent in run_agent(), not globally.
|
||||
# This prevents the hook from killing the orchestrating CC session.
|
||||
# Install Stop hook once at startup. The AGENT_LOOP_ACTIVE env var guard ensures
|
||||
# it only fires for CC sessions spawned by this loop (not the user's other sessions).
|
||||
# Installing once avoids a race condition where per-iteration install_hooks writes
|
||||
# settings.local.json just before CC starts, and CC reads the old file.
|
||||
[ "$AUTO_HOOKS" = true ] && install_hooks
|
||||
trap cleanup EXIT INT TERM
|
||||
|
||||
check_archive
|
||||
@@ -200,12 +181,14 @@ if [ -f "$LOOP_DIR/init.sh" ]; then
|
||||
bash "$LOOP_DIR/init.sh"
|
||||
fi
|
||||
|
||||
# Ensure correct git branch
|
||||
# Verify we're on the expected branch (worktree should already be on it)
|
||||
BRANCH=$(prd_branch_name 2>/dev/null || echo "")
|
||||
if [ -n "$BRANCH" ]; then
|
||||
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
|
||||
if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then
|
||||
log "Switching to branch: $BRANCH"
|
||||
log "WARNING: Expected branch '$BRANCH' but on '$CURRENT_BRANCH'"
|
||||
log "If running in a worktree, the branch should already be checked out."
|
||||
log "Attempting to switch..."
|
||||
git checkout "$BRANCH" 2>/dev/null || \
|
||||
git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \
|
||||
git checkout -b "$BRANCH"
|
||||
@@ -215,14 +198,10 @@ fi
|
||||
# --- Agent runner ---
|
||||
# Runs a prompt through the selected AI tool.
|
||||
#
|
||||
# Interactive (default): Pipes prompt to claude WITHOUT --print.
|
||||
# This gives the full interactive CC UI — tool calls, file edits, etc.
|
||||
# A Stop hook (installed at startup) sends SIGINT to the loop when claude
|
||||
# finishes, which returns control to the while loop for the next iteration.
|
||||
# State is tracked via files (prd.json, .verdict), not stdout.
|
||||
#
|
||||
# Headless (LOOP_HEADLESS=true): Uses claude --print for CI/background.
|
||||
# Output captured to file for verdict parsing.
|
||||
# Pipes prompt to claude WITHOUT --print. This gives the full interactive
|
||||
# CC UI — tool calls, file edits, etc. A Stop hook sends SIGINT to the loop
|
||||
# when claude finishes, returning control to the while loop for the next
|
||||
# iteration. State is tracked via files (prd.json, .verdict), not stdout.
|
||||
run_agent() {
|
||||
local prompt="$1"
|
||||
local role="${2:-}"
|
||||
@@ -230,65 +209,26 @@ run_agent() {
|
||||
rm -f "$LOOP_DIR/.verdict"
|
||||
|
||||
local agent_exit=0
|
||||
if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
|
||||
# --- Interactive mode (Ralph pattern) ---
|
||||
# Install Stop hook just before claude starts, remove after it exits.
|
||||
# This scopes the hook to only affect the loop's claude sessions.
|
||||
[ "$AUTO_HOOKS" = true ] && install_hooks
|
||||
|
||||
(
|
||||
case "$TOOL" in
|
||||
claude)
|
||||
printf '%s\n' "$prompt" | claude --dangerously-skip-permissions
|
||||
;;
|
||||
amp)
|
||||
printf '%s\n' "$prompt" | amp --dangerously-allow-all
|
||||
;;
|
||||
*)
|
||||
log "ERROR: Unknown tool '$TOOL'"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
) || agent_exit=$?
|
||||
(
|
||||
case "$TOOL" in
|
||||
claude)
|
||||
printf '%s\n' "$prompt" | claude --dangerously-skip-permissions
|
||||
;;
|
||||
amp)
|
||||
printf '%s\n' "$prompt" | amp --dangerously-allow-all
|
||||
;;
|
||||
*)
|
||||
log "ERROR: Unknown tool '$TOOL'"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
) || agent_exit=$?
|
||||
sleep 2 # Brief pause between sessions
|
||||
|
||||
[ "$AUTO_HOOKS" = true ] && remove_hooks
|
||||
sleep 2 # Brief pause between sessions
|
||||
|
||||
# Read verdict from file if evaluator wrote one
|
||||
if [ "$role" = "evaluator" ] && [ -f "$LOOP_DIR/.verdict" ]; then
|
||||
cat "$LOOP_DIR/.verdict"
|
||||
fi
|
||||
else
|
||||
# --- Headless mode ---
|
||||
local output_file
|
||||
output_file=$(mktemp)
|
||||
LOOP_AGENT_TMPFILE="$output_file"
|
||||
|
||||
(
|
||||
case "$TOOL" in
|
||||
claude)
|
||||
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
|
||||
claude --dangerously-skip-permissions --output-format text \
|
||||
--print > "$output_file" 2>&1
|
||||
;;
|
||||
amp)
|
||||
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
|
||||
amp --dangerously-allow-all > "$output_file" 2>&1
|
||||
;;
|
||||
*)
|
||||
log "ERROR: Unknown tool '$TOOL'"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
) || agent_exit=$?
|
||||
|
||||
if [ "$agent_exit" -ne 0 ] && [ ! -s "$output_file" ]; then
|
||||
log "WARNING: Agent exited with code $agent_exit and produced no output."
|
||||
fi
|
||||
|
||||
cat "$output_file"
|
||||
rm -f "$output_file"
|
||||
LOOP_AGENT_TMPFILE=""
|
||||
# Read verdict from file if evaluator wrote one
|
||||
if [ "$role" = "evaluator" ] && [ -f "$LOOP_DIR/.verdict" ]; then
|
||||
cat "$LOOP_DIR/.verdict"
|
||||
fi
|
||||
}
|
||||
|
||||
@@ -371,18 +311,7 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
|
||||
# Interactive: run directly, no capture. User sees full CC UI.
|
||||
run_agent "$GENERATOR_PROMPT" "generator"
|
||||
GENERATOR_OUTPUT=""
|
||||
else
|
||||
# Headless: capture output for parsing.
|
||||
GENERATOR_OUTPUT=$(run_agent "$GENERATOR_PROMPT" "generator")
|
||||
if [ -z "$GENERATOR_OUTPUT" ]; then
|
||||
log "WARNING: Generator produced empty output (timeout or crash). Skipping to next iteration."
|
||||
continue
|
||||
fi
|
||||
fi
|
||||
run_agent "$GENERATOR_PROMPT" "generator"
|
||||
|
||||
# --- Scope budget check ---
|
||||
# Verify the generator stayed within configured limits (files modified, lines written).
|
||||
@@ -419,22 +348,12 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
|
||||
|
||||
EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
|
||||
|
||||
if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
|
||||
# Interactive: run directly, read verdict from file.
|
||||
run_agent "$EVAL_PROMPT" "evaluator"
|
||||
if [ -f "$LOOP_DIR/.verdict" ]; then
|
||||
EVAL_OUTPUT=$(cat "$LOOP_DIR/.verdict")
|
||||
else
|
||||
log "WARNING: No verdict file found. Treating as REJECT."
|
||||
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no verdict file</rejection_reason>"
|
||||
fi
|
||||
run_agent "$EVAL_PROMPT" "evaluator"
|
||||
if [ -f "$LOOP_DIR/.verdict" ]; then
|
||||
EVAL_OUTPUT=$(cat "$LOOP_DIR/.verdict")
|
||||
else
|
||||
# Headless: capture output for parsing.
|
||||
EVAL_OUTPUT=$(run_agent "$EVAL_PROMPT" "evaluator")
|
||||
if [ -z "$EVAL_OUTPUT" ]; then
|
||||
log "WARNING: Evaluator produced empty output. Treating as REJECT."
|
||||
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no output</rejection_reason>"
|
||||
fi
|
||||
log "WARNING: No verdict file found. Treating as REJECT."
|
||||
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no verdict file</rejection_reason>"
|
||||
fi
|
||||
|
||||
VERDICT=$(parse_verdict "$EVAL_OUTPUT")
|
||||
|
||||
@@ -10,7 +10,7 @@ You (Claude) have well-documented tendencies that make you a poor QA agent by de
|
||||
|
||||
**OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.
|
||||
|
||||
**Rejection is normal and healthy.** Rejecting 30-50% of iterations is expected.
|
||||
**Rejection is normal and healthy.** Do not hesitate to reject when criteria aren't met.
|
||||
|
||||
## Your Target
|
||||
|
||||
@@ -27,6 +27,36 @@ Evaluate story **`{{CURRENT_STORY_ID}}`**.
|
||||
7. Run quality checks yourself (typecheck, tests, lint)
|
||||
8. **Actually run the code.** Use whatever tools are available. Code that looks correct but doesn't run is not complete.
|
||||
|
||||
## Calibration Examples
|
||||
|
||||
<example type="bad-evaluation">
|
||||
"The generator created the new module and updated the config. The code looks clean and follows the existing pattern. Tests were not run but the implementation appears correct. PASS."
|
||||
|
||||
Why this is wrong: "appears correct" is not verification. The evaluator didn't run tests, didn't check that the new module is actually imported and used, and didn't read the modified files in full. This is a rubber stamp.
|
||||
</example>
|
||||
|
||||
<example type="good-rejection">
|
||||
"Checked acceptance criteria. Criterion 3 says 'both files import the shared utility instead of defining their own'. Verified file A — correct. Checked file B — still defines a local copy at line 36 and does not import the shared one. Also: file B line 96 calls a function from a module whose import was removed during the refactoring — this will crash at runtime.
|
||||
|
||||
REJECT: File B still has local duplicate (criterion 3 not met) and missing import will cause runtime error."
|
||||
|
||||
Why this is good: Verified each criterion against actual code with file paths and line numbers. Caught a regression the generator introduced. Specific and actionable.
|
||||
</example>
|
||||
|
||||
<example type="good-pass">
|
||||
"Checked all 4 acceptance criteria:
|
||||
1. New validation logic is active — verified at config.py:23-28. ✓
|
||||
2. Invalid input returns the expected error — verified at config.py:26. ✓
|
||||
3. Old workaround removed — grep returns zero matches. ✓
|
||||
4. Existing behavior unchanged — logic only triggers on the new condition. ✓
|
||||
|
||||
Ran git diff: only 2 files modified, changes scoped to this story. No imports removed, no regressions in surrounding code.
|
||||
|
||||
PASS."
|
||||
|
||||
Why this is good: Each criterion checked against specific lines. Verified no collateral damage. Concise but thorough.
|
||||
</example>
|
||||
|
||||
## Verdict
|
||||
|
||||
Write your verdict to `{{LOOP_DIR}}/.verdict` AND include it in your response.
|
||||
|
||||
@@ -37,7 +37,7 @@ Claims Verified:
|
||||
|
||||
## Grading Criteria
|
||||
|
||||
- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed)
|
||||
- **Accuracy**: Are the majority of verified claims correct? If more than one claim is incorrect, reject.
|
||||
- **Completeness**: Did it cover the important parts of the area?
|
||||
- **Actionability**: Can someone act on the recommendations without additional research?
|
||||
|
||||
|
||||
@@ -9,8 +9,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
|
||||
- Would this fix survive edge cases?
|
||||
- Did the generator patch around the bug or fix the actual cause?
|
||||
|
||||
2. **Verify a regression test exists:**
|
||||
- Is there a new or updated test?
|
||||
2. **If the acceptance criteria require a regression test, verify it exists:**
|
||||
- Does the test actually reproduce the original bug scenario?
|
||||
- Would the test fail if the fix were reverted?
|
||||
|
||||
@@ -27,7 +26,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
|
||||
## Rejection Criteria (Fix-Specific)
|
||||
|
||||
- Fix addresses symptom but not root cause
|
||||
- No regression test added
|
||||
- Acceptance criteria require a regression test but none was added
|
||||
- Existing tests fail after the fix
|
||||
- Unrelated changes included in the commit
|
||||
- Fix introduces a new bug or security issue
|
||||
|
||||
@@ -15,3 +15,6 @@ You are evaluating an implementation story. The generator claims to have built a
|
||||
- Tests exist but don't assert meaningful behavior
|
||||
- Passes typecheck only because types are overly loose
|
||||
- Code exists but doesn't actually run
|
||||
- Removed an import or variable during refactoring but it's still used elsewhere in the file
|
||||
- New instance of a shared resource (e.g., DB connection, rate limiter) instead of using the existing one
|
||||
- Internal error details (stack traces, exception messages) exposed in user-facing output instead of being logged server-side
|
||||
|
||||
@@ -1,24 +1,46 @@
|
||||
You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance runs each iteration — you have no memory except what's in artifacts.
|
||||
|
||||
## Startup
|
||||
## Startup (follow this exact sequence before writing any code)
|
||||
|
||||
1. Read `.loop/progress.md` — check Codebase Patterns first, then recent log entries
|
||||
2. Read `.loop/prd.json` — find the highest-priority story where `passes: false`
|
||||
3. Read the sprint contract at `.loop/contracts/{story-id}.contract.md` (if it exists)
|
||||
4. Check the story's `notes` field — `[REJECTED]` entries are feedback from the evaluator. Address them.
|
||||
5. Run `git log --oneline -10` — understand what previous iterations changed
|
||||
6. If the project has tests or a dev server, run a quick smoke test to verify the codebase is healthy. If a previous iteration broke something, fix it before moving on.
|
||||
|
||||
Do NOT start implementation until steps 1-5 are complete.
|
||||
|
||||
## Rules
|
||||
|
||||
- **ONE story per iteration.** Do not attempt multiple stories.
|
||||
- **Read before writing.** Understand existing code before modifying.
|
||||
- **No placeholders.** Every implementation must be complete and functional.
|
||||
- **Run quality gates** before committing (typecheck, tests, lint — whatever the project uses).
|
||||
- **Run quality gates** before committing. Check for common tools (`npm test`, `pytest`, `cargo test`, `make test`, `go test ./...`) and run what's available. If no test tooling exists, verify manually.
|
||||
- **Commit** with message: `feat: [Story ID] - [Story Title]`
|
||||
|
||||
## After Completing
|
||||
## If You Are Blocked
|
||||
|
||||
If you cannot complete the story (missing dependency, impossible as written, requires access you don't have), do NOT attempt a partial or broken implementation. Instead:
|
||||
1. Write a clear description of the blocker in the story's `notes` field in prd.json
|
||||
2. Leave `passes` as `false`
|
||||
3. Append the blocker to progress.md
|
||||
4. Stop — the loop will move on or escalate to a human
|
||||
|
||||
## Do Not Modify
|
||||
|
||||
- Other stories' `passes`, `notes`, or `acceptanceCriteria` fields — only modify the story you are working on
|
||||
- Sprint contracts in `.loop/contracts/`
|
||||
- `.loop/config.json`
|
||||
|
||||
## Before Marking Done
|
||||
|
||||
Go through each acceptance criterion in the story and verify your work satisfies it. Check the actual code, not your memory of what you wrote. If any criterion is not met, fix it before continuing. Do NOT set `passes: true` until every criterion is verified.
|
||||
|
||||
## After Verified
|
||||
|
||||
1. Update `.loop/prd.json` — set `passes: true` for the story
|
||||
2. Append a summary to `.loop/progress.md` — what was done, files changed, learnings
|
||||
2. Append a summary to `.loop/progress.md` — what was done and which files were changed
|
||||
3. Update Codebase Patterns in progress.md if you discovered a reusable pattern
|
||||
|
||||
## Completion Signal
|
||||
|
||||
@@ -8,7 +8,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is
|
||||
2. Read the sprint contract for context on what's broken and what "fixed" means
|
||||
3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists.
|
||||
4. Make the minimal change to fix the issue
|
||||
5. Write or update a test that would have caught this bug
|
||||
5. If the story's acceptance criteria require a regression test, write one
|
||||
6. Run quality gates
|
||||
7. Commit
|
||||
|
||||
@@ -16,7 +16,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is
|
||||
|
||||
- **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations.
|
||||
- **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions.
|
||||
- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md.
|
||||
- **Add a regression test only if the acceptance criteria require it.** Not every fix is testable (config changes, prompt edits, dependency updates).
|
||||
- **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve.
|
||||
|
||||
## Git Workflow
|
||||
|
||||
@@ -16,7 +16,7 @@ You are building features from a PRD. Each story is a small, self-contained unit
|
||||
|
||||
- **Minimal changes only.** Do not refactor surrounding code or add features beyond scope.
|
||||
- **Follow the contract's Out of Scope section.**
|
||||
- **If tests don't exist yet,** write them as part of the story.
|
||||
- **Write tests only if the story's acceptance criteria require them.**
|
||||
- **If you need a dependency,** install it and note it in progress.md.
|
||||
|
||||
## Git
|
||||
|
||||
@@ -9,26 +9,30 @@ When breaking a feature into stories, think about:
|
||||
### Independence
|
||||
Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet.
|
||||
|
||||
### Context Window Fit
|
||||
A story must fit in a single AI context window (~100K tokens). This means:
|
||||
- Reading relevant existing code
|
||||
- Understanding the task
|
||||
- Implementing the change
|
||||
- Writing tests
|
||||
- Running quality checks
|
||||
- Committing
|
||||
|
||||
Budget roughly:
|
||||
- 30% of context for reading/understanding
|
||||
- 40% for implementation
|
||||
- 20% for testing and quality
|
||||
- 10% for bookkeeping (prd.json, progress.md)
|
||||
### Scope
|
||||
A story must be completable in a single iteration. Keep each story focused — a handful of files modified, not a sweeping change across the whole codebase. If a story requires reading and modifying more than ~10 files, it's too big — split it.
|
||||
|
||||
### Failure Isolation
|
||||
If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy.
|
||||
|
||||
### Evaluability
|
||||
Every story must have criteria the evaluator can independently verify. "The code is clean" is not evaluable. "The function returns 404 when the user doesn't exist" is evaluable.
|
||||
Every story must have criteria the evaluator can independently verify by reading code, running commands, or testing behavior.
|
||||
|
||||
Good criteria are specific and checkable:
|
||||
- "Grep for 'HARDCODED_KEY' returns zero matches"
|
||||
- "The function returns 404 when the user doesn't exist"
|
||||
- "Running `npm test` passes with no failures"
|
||||
- "The config file contains entries for all three required env vars"
|
||||
|
||||
Bad criteria are vague with no way to check:
|
||||
- "The code is clean"
|
||||
- "Works correctly"
|
||||
- "Performance is improved"
|
||||
- "Error handling is robust"
|
||||
|
||||
For subjective work (design, UX, documentation), criteria should define what to evaluate and how to judge it — not just say "looks good":
|
||||
- "Design uses a consistent color palette and typography — no default library styles"
|
||||
- "A user can complete the primary action without guessing what to click"
|
||||
|
||||
## PRD Anti-Patterns
|
||||
|
||||
|
||||
101
setup.sh
101
setup.sh
@@ -10,10 +10,31 @@
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# --- Parse arguments ---
|
||||
ACTION="scaffold"
|
||||
MODE="${1:-implement}"
|
||||
WORKTREE_PATH=""
|
||||
MAIN_LOOP_DIR=""
|
||||
|
||||
case "$MODE" in
|
||||
--update)
|
||||
ACTION="update"
|
||||
MODE="${2:-implement}"
|
||||
;;
|
||||
--init-worktree)
|
||||
ACTION="init-worktree"
|
||||
WORKTREE_PATH="$2"
|
||||
MAIN_LOOP_DIR="$3"
|
||||
if [ -z "$WORKTREE_PATH" ] || [ -z "$MAIN_LOOP_DIR" ]; then
|
||||
echo "[setup] ERROR: --init-worktree requires <worktree_path> <main_loop_dir>"
|
||||
echo "[setup] Usage: setup.sh --init-worktree /path/to/worktree /path/to/main/.loop"
|
||||
exit 1
|
||||
fi
|
||||
;;
|
||||
esac
|
||||
|
||||
# --- Validate mode ---
|
||||
if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
|
||||
if [ "$ACTION" = "scaffold" ] && [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
|
||||
echo "[setup] ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix"
|
||||
exit 1
|
||||
fi
|
||||
@@ -45,6 +66,65 @@ fi
|
||||
|
||||
echo "[setup] Harness source: $HARNESS_SRC"
|
||||
|
||||
# Read plugin version from source
|
||||
PLUGIN_VERSION=""
|
||||
if [ -f "$HARNESS_SRC/.claude-plugin/plugin.json" ]; then
|
||||
if command -v jq &>/dev/null; then
|
||||
PLUGIN_VERSION=$(jq -r '.version // empty' "$HARNESS_SRC/.claude-plugin/plugin.json" 2>/dev/null)
|
||||
elif command -v python3 &>/dev/null; then
|
||||
PLUGIN_VERSION=$(python3 -c "import json; print(json.load(open('$HARNESS_SRC/.claude-plugin/plugin.json')).get('version',''),end='')" 2>/dev/null)
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- Update-only mode: refresh harness files without touching run state ---
|
||||
if [ "$ACTION" = "update" ]; then
|
||||
LOOP_DIR="$(pwd)/.loop"
|
||||
if [ ! -d "$LOOP_DIR" ]; then
|
||||
echo "[setup] ERROR: No .loop/ directory found. Run setup first."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[setup] Updating harness files..."
|
||||
cp -r "$HARNESS_SRC/prompts" "$LOOP_DIR/"
|
||||
cp -r "$HARNESS_SRC/templates" "$LOOP_DIR/"
|
||||
cp -r "$HARNESS_SRC/lib" "$LOOP_DIR/"
|
||||
cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/"
|
||||
chmod +x "$LOOP_DIR/loop.sh"
|
||||
|
||||
[ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
|
||||
|
||||
echo "[setup] Harness updated to ${PLUGIN_VERSION:-unknown}. Run state (prd.json, contracts, config.json) unchanged."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- Init-worktree mode: initialize .loop/ in a worktree from main's config ---
|
||||
if [ "$ACTION" = "init-worktree" ]; then
|
||||
LOOP_DIR="$WORKTREE_PATH/.loop"
|
||||
mkdir -p "$LOOP_DIR"
|
||||
|
||||
# Copy harness files from plugin source
|
||||
cp -r "$HARNESS_SRC/prompts" "$LOOP_DIR/"
|
||||
cp -r "$HARNESS_SRC/templates" "$LOOP_DIR/"
|
||||
cp -r "$HARNESS_SRC/lib" "$LOOP_DIR/"
|
||||
cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/"
|
||||
chmod +x "$LOOP_DIR/loop.sh"
|
||||
|
||||
# Copy project config and init from main's .loop/
|
||||
[ -f "$MAIN_LOOP_DIR/config.json" ] && cp "$MAIN_LOOP_DIR/config.json" "$LOOP_DIR/"
|
||||
[ -f "$MAIN_LOOP_DIR/init.sh" ] && cp "$MAIN_LOOP_DIR/init.sh" "$LOOP_DIR/"
|
||||
|
||||
# Stamp harness version
|
||||
[ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
|
||||
|
||||
# Create .gitignore for worktree's .loop/
|
||||
cat > "$LOOP_DIR/.gitignore" << 'GITIGNORE'
|
||||
*
|
||||
GITIGNORE
|
||||
|
||||
echo "[setup] Worktree .loop/ initialized at $LOOP_DIR"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- Ensure git repo exists ---
|
||||
if ! git rev-parse --git-dir &>/dev/null; then
|
||||
echo "[setup] No git repo found. Initializing..."
|
||||
@@ -57,9 +137,17 @@ PROJECT_ROOT="$(pwd)"
|
||||
LOOP_DIR="$PROJECT_ROOT/.loop"
|
||||
|
||||
if [ -d "$LOOP_DIR" ] && [ -f "$LOOP_DIR/prd.json" ]; then
|
||||
echo "[setup] .loop/ already exists with prd.json."
|
||||
echo "[setup] To re-initialize, delete .loop/ first: rm -rf .loop"
|
||||
exit 1
|
||||
echo "[setup] .loop/ already exists with prd.json — archiving previous run..."
|
||||
# Source state.sh (needed by archive.sh for story queries) and archive.sh
|
||||
LOOP_DIR="$LOOP_DIR" source "$LOOP_DIR/lib/state.sh" 2>/dev/null || true
|
||||
LOOP_DIR="$LOOP_DIR" source "$LOOP_DIR/lib/archive.sh" 2>/dev/null || true
|
||||
if type archive_and_reset &>/dev/null; then
|
||||
archive_and_reset "$LOOP_DIR"
|
||||
else
|
||||
# Fallback for old harness versions without archive_and_reset
|
||||
echo "[setup] WARNING: Could not archive (old harness version). To re-initialize, delete .loop/ first: rm -rf .loop"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
mkdir -p "$LOOP_DIR"
|
||||
@@ -71,6 +159,9 @@ cp -r "$HARNESS_SRC/lib" "$LOOP_DIR/"
|
||||
cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/"
|
||||
chmod +x "$LOOP_DIR/loop.sh"
|
||||
|
||||
# Stamp harness version
|
||||
[ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
|
||||
|
||||
# Verify critical files
|
||||
for f in prompts/generator/_base.md prompts/evaluator/_base.md templates/progress.md.template lib/state.sh loop.sh; do
|
||||
if [ ! -f "$LOOP_DIR/$f" ]; then
|
||||
@@ -91,6 +182,8 @@ triage/
|
||||
archive/
|
||||
.archive-staging/
|
||||
.last-branch
|
||||
.harness-version
|
||||
.active-worktree
|
||||
.loop.lock
|
||||
GITIGNORE
|
||||
|
||||
|
||||
@@ -1,16 +1,18 @@
|
||||
---
|
||||
name: run
|
||||
description: "Agent Loop — single entry point. Scaffolds .loop/ if missing, generates stories if no prd.json, then launches autonomous execution in tmux."
|
||||
description: "Agent Loop — single entry point. Scaffolds .loop/ if missing, creates a worktree, generates stories, then launches autonomous execution in tmux."
|
||||
---
|
||||
|
||||
# /run — Agent Loop
|
||||
|
||||
Single entry point for the agent loop. Handles setup and planning interactively, then launches autonomous execution in a tmux session.
|
||||
Single entry point for the agent loop. Handles setup and planning interactively, then launches autonomous execution in a git worktree via tmux.
|
||||
|
||||
Each run gets its own worktree (isolated working directory on a feature branch). Multiple loops can run in parallel on different specs. Completed runs are archived to the main project's `.loop/archive/`.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/agent-loop:run # Full flow: setup → stories → launch
|
||||
/agent-loop:run # Full flow: setup → worktree → stories → launch
|
||||
/agent-loop:run --skip-eval # Skip evaluator pass
|
||||
```
|
||||
|
||||
@@ -20,9 +22,9 @@ Follow this sequence. Each phase checks what exists and skips if already done.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Scaffold (if needed)
|
||||
## Phase 1: Scaffold Main .loop/ (if needed)
|
||||
|
||||
Check if `.loop/config.json` exists.
|
||||
Check if `.loop/config.json` exists in the current project root.
|
||||
|
||||
**If it does NOT exist**, run the setup script:
|
||||
|
||||
@@ -31,116 +33,202 @@ Ask the user: **Mode?** (a) Implement (b) Explore (c) Fix — default is Impleme
|
||||
Then run:
|
||||
|
||||
```bash
|
||||
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | head -1)" <mode>
|
||||
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" <mode>
|
||||
```
|
||||
|
||||
Show the output. If setup fails, stop.
|
||||
|
||||
**If it already exists**, skip to Phase 2.
|
||||
**If it already exists**, check if the harness files need updating. Compare the installed harness version against the plugin version:
|
||||
|
||||
```bash
|
||||
INSTALLED=$(cat .loop/.harness-version 2>/dev/null || echo "unknown")
|
||||
PLUGIN=$(jq -r '.version // empty' "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/.claude-plugin/plugin.json 2>/dev/null | tail -1)" 2>/dev/null || echo "unknown")
|
||||
echo "installed=$INSTALLED plugin=$PLUGIN"
|
||||
```
|
||||
|
||||
If the versions differ (or installed is "unknown"), update the harness files:
|
||||
|
||||
```bash
|
||||
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" --update
|
||||
```
|
||||
|
||||
Tell the user: *"Updated harness files to v{version}."*
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Generate Stories (if needed)
|
||||
## Phase 2: Create Worktree and Generate Stories
|
||||
|
||||
Check if `.loop/prd.json` exists.
|
||||
### 2a. Find the spec
|
||||
|
||||
**If it does NOT exist**, generate it:
|
||||
Search for existing specs or plans:
|
||||
- `docs/superpowers/specs/*.md`
|
||||
- `docs/superpowers/plans/*.md`
|
||||
- `docs/specs/*.md`
|
||||
- `docs/plans/*.md`
|
||||
- `SPEC.md`, `PRD.md`, `DESIGN.md`, `PLAN.md` at project root
|
||||
- Any markdown file that looks like a feature spec or implementation plan
|
||||
|
||||
1. Search for existing specs or plans:
|
||||
- `docs/superpowers/specs/*.md`
|
||||
- `docs/superpowers/plans/*.md`
|
||||
- `docs/specs/*.md`
|
||||
- `docs/plans/*.md`
|
||||
- `SPEC.md`, `PRD.md`, `DESIGN.md`, `PLAN.md` at project root
|
||||
- Any markdown file that looks like a feature spec or implementation plan
|
||||
If found: "I found a spec at `{path}`. Using it to generate stories."
|
||||
|
||||
If found: "I found a spec at `{path}`. Using it to generate stories."
|
||||
If NOT found, stop and tell the user:
|
||||
|
||||
If NOT found, stop and tell the user:
|
||||
> **No spec or plan found.** Agent Loop decomposes existing plans into stories — it doesn't create plans from scratch.
|
||||
>
|
||||
> Create a plan first, then re-run `/agent-loop:run`:
|
||||
> - Describe your idea to Claude and ask it to write a spec
|
||||
> - Use `/plan` if available
|
||||
> - Or create a markdown file at `docs/specs/` or `SPEC.md`
|
||||
>
|
||||
> The plan should describe what to build, the tech stack, and key requirements.
|
||||
|
||||
> **No spec or plan found.** Agent Loop decomposes existing plans into stories — it doesn't create plans from scratch.
|
||||
>
|
||||
> Create a plan first, then re-run `/agent-loop:run`:
|
||||
> - Describe your idea to Claude and ask it to write a spec
|
||||
> - Use `/plan` if available
|
||||
> - Or create a markdown file at `docs/specs/` or `SPEC.md`
|
||||
>
|
||||
> The plan should describe what to build, the tech stack, and key requirements.
|
||||
**STOP here. Do NOT ask the user to describe the project in a few sentences. Do NOT proceed without a spec file.**
|
||||
|
||||
**STOP here. Do NOT ask the user to describe the project in a few sentences. Do NOT proceed without a spec file.**
|
||||
### 2b. Derive names and create worktree
|
||||
|
||||
2. Read the project root and tech stack info.
|
||||
Read the spec title or filename to derive a feature slug. Examples:
|
||||
- `SPEC.md` with title "# Enhanced Spikes Editor" → slug: `enhanced-spikes-editor`
|
||||
- `docs/specs/auth-system.md` → slug: `auth-system`
|
||||
|
||||
3. Dispatch the **agent-loop:planner** agent:
|
||||
Derive paths:
|
||||
|
||||
```bash
|
||||
PROJECT_DIR=$(basename "$(pwd)")
|
||||
FEATURE_SLUG="<derived-slug>"
|
||||
BRANCH_NAME="loop/${FEATURE_SLUG}"
|
||||
WORKTREE_PATH="../${PROJECT_DIR}--loop-${FEATURE_SLUG}"
|
||||
MAIN_LOOP_DIR="$(pwd)/.loop"
|
||||
```
|
||||
|
||||
Check if the worktree already exists:
|
||||
|
||||
```bash
|
||||
if [ -d "$WORKTREE_PATH" ]; then
|
||||
echo "WORKTREE_EXISTS"
|
||||
else
|
||||
echo "WORKTREE_NEW"
|
||||
fi
|
||||
```
|
||||
|
||||
**If worktree exists**, check its state:
|
||||
- Read `{WORKTREE_PATH}/.loop/prd.json` — are all stories passed?
|
||||
- If all passed: ask user — "Previous run on `{BRANCH_NAME}` is complete. Archive and start fresh, or resume?"
|
||||
- If in progress: ask user — "Run in progress on `{BRANCH_NAME}` ({passed}/{total}). Resume, or archive and start fresh?"
|
||||
- If user says resume: skip to Phase 3 (launch in existing worktree)
|
||||
- If user says archive/fresh: archive from worktree to main, remove worktree, then continue below
|
||||
|
||||
**If worktree is new**, create it:
|
||||
|
||||
```bash
|
||||
git worktree add "$WORKTREE_PATH" -b "$BRANCH_NAME"
|
||||
```
|
||||
|
||||
If the branch already exists (e.g., from a previous run):
|
||||
|
||||
```bash
|
||||
git worktree add "$WORKTREE_PATH" "$BRANCH_NAME"
|
||||
```
|
||||
|
||||
Initialize the worktree's `.loop/`:
|
||||
|
||||
```bash
|
||||
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" --init-worktree "$WORKTREE_PATH" "$MAIN_LOOP_DIR"
|
||||
```
|
||||
|
||||
Initialize submodules if the project uses them:
|
||||
|
||||
```bash
|
||||
git -C "$WORKTREE_PATH" submodule update --init --recursive 2>/dev/null || true
|
||||
```
|
||||
|
||||
### 2c. Generate stories
|
||||
|
||||
Read the project root listing and tech stack info.
|
||||
|
||||
Dispatch the **agent-loop:planner** agent. Pass the **absolute worktree path** so the planner writes to the worktree's `.loop/`:
|
||||
|
||||
```
|
||||
Agent(
|
||||
subagent_type: "agent-loop:planner",
|
||||
prompt: "Generate prd.json and sprint contracts.\n\nMode: {mode}\nProject root: {path}\n\nSpec:\n{spec content}\n\nTech stack: {detected stack}",
|
||||
prompt: "Generate prd.json and sprint contracts.\n\nIMPORTANT: Write ALL files using absolute paths under: {WORKTREE_PATH}/.loop/\n- PRD: {WORKTREE_PATH}/.loop/prd.json\n- Contracts: {WORKTREE_PATH}/.loop/contracts/\n- Progress: {WORKTREE_PATH}/.loop/progress.md\n\nBranch name to use in prd.json: {BRANCH_NAME}\n\nMode: {mode}\nProject root: {WORKTREE_PATH}\n\nSpec:\n{spec content}\n\nTech stack: {detected stack}",
|
||||
description: "Planning: generate stories"
|
||||
)
|
||||
```
|
||||
|
||||
4. After the planner finishes, read `.loop/prd.json` and present:
|
||||
### 2d. Present stories
|
||||
|
||||
After the planner finishes, read `{WORKTREE_PATH}/.loop/prd.json` and present:
|
||||
|
||||
> **Stories generated — Review before running**
|
||||
>
|
||||
> Worktree: `{WORKTREE_PATH}` (branch: `{BRANCH_NAME}`)
|
||||
>
|
||||
> 1. US-001: {title}
|
||||
> 2. US-002: {title}
|
||||
> ...
|
||||
>
|
||||
> **Review:**
|
||||
> - `.loop/prd.json` — stories and acceptance criteria
|
||||
> - `.loop/contracts/` — done conditions per story
|
||||
> - `{WORKTREE_PATH}/.loop/prd.json` — stories and acceptance criteria
|
||||
> - `{WORKTREE_PATH}/.loop/contracts/` — done conditions per story
|
||||
>
|
||||
> Let me know if you want changes, or say **go** to start the loop.
|
||||
|
||||
5. **STOP and wait for the user.** Do NOT start the loop automatically. The user must say "go", "start", "run", "looks good", or similar before proceeding to Phase 3.
|
||||
|
||||
**If `prd.json` already exists**, skip to Phase 3.
|
||||
**STOP and wait for the user.** Do NOT start the loop automatically. The user must say "go", "start", "run", "looks good", or similar before proceeding to Phase 3.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Validate and Launch
|
||||
|
||||
1. Read `.loop/prd.json` and verify:
|
||||
1. Read `{WORKTREE_PATH}/.loop/prd.json` and verify:
|
||||
- Has a `userStories` array (NOT `sprints`, `stories`, or `tasks`)
|
||||
- Each story has: `id`, `title`, `passes`, `priority`
|
||||
- If invalid, show the error and stop.
|
||||
|
||||
2. Read `.loop/config.json` for `mode`, `maxIterations`.
|
||||
2. Read `{WORKTREE_PATH}/.loop/config.json` for `mode`, `maxIterations`.
|
||||
|
||||
3. Verify `.loop/loop.sh` exists and is executable.
|
||||
3. Verify `{WORKTREE_PATH}/.loop/loop.sh` exists and is executable.
|
||||
|
||||
4. Parse arguments for any flags to pass through (e.g., `--skip-eval`).
|
||||
|
||||
5. Build the loop.sh command with any flags:
|
||||
5. Build the loop.sh command and derive a unique tmux session name:
|
||||
|
||||
```bash
|
||||
LOOP_CMD=".loop/loop.sh"
|
||||
LOOP_CMD="{WORKTREE_PATH}/.loop/loop.sh"
|
||||
# Add --skip-eval if requested
|
||||
# Add --max N if specified
|
||||
|
||||
# Derive tmux session name from worktree directory name
|
||||
WORKTREE_DIR=$(basename "$WORKTREE_PATH")
|
||||
SESSION_NAME="agent-loop-${WORKTREE_DIR}"
|
||||
```
|
||||
|
||||
6. Kill any existing agent-loop tmux session, then launch detached:
|
||||
6. Kill any existing tmux session with this name, then launch detached in the worktree:
|
||||
|
||||
```bash
|
||||
tmux kill-session -t agent-loop 2>/dev/null; tmux new-session -d -s agent-loop -c <project_root> "$LOOP_CMD"
|
||||
tmux kill-session -t "$SESSION_NAME" 2>/dev/null; tmux new-session -d -s "$SESSION_NAME" -c "$WORKTREE_PATH" "$LOOP_CMD"
|
||||
```
|
||||
|
||||
7. Start a **background watcher** that waits for the loop to finish. Use the Bash tool with `run_in_background: true`:
|
||||
7. Save the worktree path and session name for the completion handler. Write a tracking file in main's .loop/:
|
||||
|
||||
```bash
|
||||
while tmux has-session -t agent-loop 2>/dev/null; do sleep 10; done; echo "LOOP_COMPLETE"
|
||||
cat > .loop/.active-worktree << EOF
|
||||
WORKTREE_PATH={WORKTREE_PATH}
|
||||
SESSION_NAME={SESSION_NAME}
|
||||
BRANCH_NAME={BRANCH_NAME}
|
||||
MAIN_LOOP_DIR={MAIN_LOOP_DIR}
|
||||
EOF
|
||||
```
|
||||
|
||||
This runs silently. When the tmux session exits, Claude Code gets notified automatically.
|
||||
8. Start a **background watcher** that waits for the loop to finish. Use the Bash tool with `run_in_background: true`:
|
||||
|
||||
8. Tell the user:
|
||||
```bash
|
||||
while tmux has-session -t "$SESSION_NAME" 2>/dev/null; do sleep 10; done; echo "LOOP_COMPLETE"
|
||||
```
|
||||
|
||||
> **Loop launched.** Watch it live:
|
||||
9. Tell the user:
|
||||
|
||||
> **Loop launched** as tmux session `{SESSION_NAME}`. Watch it live:
|
||||
> ```
|
||||
> ! tmux attach -t agent-loop
|
||||
> ! tmux attach -t {SESSION_NAME}
|
||||
> ```
|
||||
> (Type the above — it opens the session right here in your terminal.)
|
||||
>
|
||||
@@ -149,6 +237,13 @@ This runs silently. When the tmux session exits, Claude Code gets notified autom
|
||||
> - Ask me "status" anytime and I'll check progress.
|
||||
>
|
||||
> I'll notify you when the loop finishes.
|
||||
>
|
||||
> When complete, merge with:
|
||||
> ```
|
||||
> git merge {BRANCH_NAME}
|
||||
> git worktree remove {WORKTREE_PATH}
|
||||
> git branch -d {BRANCH_NAME}
|
||||
> ```
|
||||
|
||||
---
|
||||
|
||||
@@ -156,18 +251,45 @@ This runs silently. When the tmux session exits, Claude Code gets notified autom
|
||||
|
||||
When you receive the background task notification (the watcher prints "LOOP_COMPLETE"), the loop has finished. Automatically:
|
||||
|
||||
1. Read `.loop/prd.json` — count passed/failed/blocked stories
|
||||
2. Read `.loop/progress.md` — show the latest session log entries
|
||||
3. Check `git log --oneline` for commits made during the run
|
||||
4. Present a summary:
|
||||
1. Read the tracking file to get paths:
|
||||
|
||||
```bash
|
||||
cat .loop/.active-worktree
|
||||
```
|
||||
|
||||
2. Read `{WORKTREE_PATH}/.loop/prd.json` — count passed/failed/blocked stories
|
||||
3. Read `{WORKTREE_PATH}/.loop/progress.md` — show the latest session log entries
|
||||
4. Check `git log --oneline` on the feature branch for commits made during the run
|
||||
|
||||
5. Archive the run to main's `.loop/archive/`:
|
||||
|
||||
```bash
|
||||
source .loop/lib/state.sh && source .loop/lib/archive.sh && archive_from_worktree "{WORKTREE_PATH}/.loop" "$(pwd)/.loop"
|
||||
```
|
||||
|
||||
6. Clean up the tracking file:
|
||||
|
||||
```bash
|
||||
rm -f .loop/.active-worktree
|
||||
```
|
||||
|
||||
7. Present a summary:
|
||||
|
||||
> **Loop Complete**
|
||||
> - Stories: {passed}/{total} complete, {blocked} blocked
|
||||
> - Iterations: {from progress.md}
|
||||
> - Commits: {list from git log}
|
||||
> - Archived to: `.loop/archive/{date}-{feature}/`
|
||||
>
|
||||
> {If any stories blocked: "Some stories need human review. Run /agent-loop:triage for details."}
|
||||
> {If all passed: "All stories complete. Review the code and test it."}
|
||||
>
|
||||
> **When ready to merge:**
|
||||
> ```
|
||||
> git merge {BRANCH_NAME}
|
||||
> git worktree remove {WORKTREE_PATH}
|
||||
> git branch -d {BRANCH_NAME}
|
||||
> ```
|
||||
|
||||
---
|
||||
|
||||
@@ -175,11 +297,23 @@ When you receive the background task notification (the watcher prints "LOOP_COMP
|
||||
|
||||
If the user asks about progress (e.g., "status", "how's it going"):
|
||||
|
||||
1. Read `.loop/prd.json` — count passed/failed/blocked
|
||||
2. Capture recent tmux output:
|
||||
1. Check for active worktree tracking:
|
||||
|
||||
```bash
|
||||
tmux capture-pane -t agent-loop -p | tail -20
|
||||
cat .loop/.active-worktree 2>/dev/null
|
||||
```
|
||||
|
||||
3. Report current status.
|
||||
If no tracking file, check for tmux sessions matching the pattern:
|
||||
|
||||
```bash
|
||||
tmux list-sessions 2>/dev/null | grep "^agent-loop-"
|
||||
```
|
||||
|
||||
2. Read `{WORKTREE_PATH}/.loop/prd.json` — count passed/failed/blocked
|
||||
3. Capture recent tmux output:
|
||||
|
||||
```bash
|
||||
tmux capture-pane -t "$SESSION_NAME" -p | tail -20
|
||||
```
|
||||
|
||||
4. Report current status.
|
||||
|
||||
@@ -7,6 +7,8 @@ description: "Generate prd.json and sprint contracts by dispatching the planner
|
||||
|
||||
Dispatch the planner agent to decompose a spec into stories. The planner agent cannot write source code or run bash commands — it can only write to `.loop/`.
|
||||
|
||||
**Note:** In most cases, use `/agent-loop:run` instead — it handles worktree creation, story generation, and launching the loop in one flow. Use `/agent-loop:stories` only if you want to generate stories without launching the loop.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Check prerequisites
|
||||
@@ -40,9 +42,15 @@ Agent(
|
||||
)
|
||||
```
|
||||
|
||||
If a worktree path is known (e.g., passed as context), include it in the prompt:
|
||||
|
||||
```
|
||||
IMPORTANT: Write ALL files using absolute paths under: {WORKTREE_PATH}/.loop/
|
||||
```
|
||||
|
||||
### 5. Present results
|
||||
|
||||
After the planner finishes, read `.loop/prd.json` and show the user:
|
||||
After the planner finishes, read `.loop/prd.json` (or `{WORKTREE_PATH}/.loop/prd.json`) and show the user:
|
||||
|
||||
> **Plan Ready — Review Before Running**
|
||||
>
|
||||
|
||||
Reference in New Issue
Block a user