24 Commits

Author SHA1 Message Date
ecfbd0bb37 feat: worktree-based run isolation for parallel loops
Each /agent-loop:run now creates a git worktree for the feature branch
before generating stories. This provides full isolation:

- Multiple loops can run in parallel on different specs in the same project
- Main working directory stays on main, always available
- Each worktree has its own .loop/ state, tmux session, and branch
- Completed runs are archived to main's .loop/archive/ with runs.log

Changes:
- setup.sh: add --init-worktree mode for initializing worktree .loop/
- archive.sh: add archive_from_worktree() for cross-directory archiving
- loop.sh: replace branch checkout with validation (worktree is pre-checked-out)
- agents/planner.md: accept absolute path prefix for worktree .loop/ writes
- skills/run/SKILL.md: full rewrite — worktree creation in Phase 2, launch in
  Phase 3, archive on completion, .active-worktree tracking file
- skills/stories/SKILL.md: worktree-aware, defer to /run for full flow

Bump to 0.12.0.
2026-04-02 11:21:17 -04:00
344b179b4d feat: support parallel loops with per-project tmux session names
The tmux session name is now derived from the project directory name
(e.g., agent-loop-server, agent-loop-webapp). This allows running
multiple loops in parallel on different projects without collisions.

Previously hardcoded to "agent-loop", which meant launching a second
loop would kill the first project's tmux session.
2026-04-02 10:54:22 -04:00
b516492a91 fix: install Stop hook once at loop startup, not per-iteration
Per-iteration install/remove had a race condition: settings.local.json
was written immediately before CC started, and CC could read the old
file (without the hook) on the first iteration.

Now the hook is installed once when loop.sh starts and removed on exit.
The AGENT_LOOP_ACTIVE env var guard ensures it only fires for CC sessions
spawned by the loop, so keeping it installed the whole time is safe.
2026-04-02 10:51:48 -04:00
a1a3dfbd63 fix: use env var instead of tmux check for Stop hook scoping
The tmux display-message approach had edge cases: it could succeed outside
tmux, fail on first iteration, or behave differently depending on tmux
socket state.

Replace with AGENT_LOOP_ACTIVE env var exported by loop.sh. CC sessions
spawned by the loop inherit it; interactive CC sessions don't. Simple,
no external dependencies, no race conditions.
2026-04-02 10:42:46 -04:00
bab002b927 fix: prevent Stop hook from killing sessions outside tmux
tmux display-message succeeds even outside tmux by falling back to the
most recently created session (agent-loop). This caused the hook to
match and kill interactive CC sessions.

Fix: check $TMUX env var first — only set when actually inside tmux.
2026-04-02 09:14:43 -04:00
71b00cf11f feat: auto-update harness files when plugin version changes
setup.sh now stamps .harness-version in .loop/ at scaffold time. On each
/agent-loop:run, Phase 1 compares the installed harness version against
the plugin version and auto-updates lib/, prompts/, and loop.sh if stale.
Run state (prd.json, contracts, config.json) is preserved.

Also adds setup.sh --update mode for refreshing harness files without
re-scaffolding. Bump to 0.10.0.
2026-04-02 09:02:41 -04:00
1bd8004854 fix: scope Stop hook to agent-loop tmux session only
The Stop hook (kill -INT $PPID) was written to the project's
settings.local.json, causing ANY Claude Code session in the same project
to kill its parent shell on exit — not just the loop's sessions.

Now the hook checks tmux session name before firing: only CC sessions
inside the "agent-loop" tmux session trigger the kill. Other CC sessions
in the same project are unaffected.
2026-04-02 08:17:15 -04:00
ad58a49182 feat: auto-archive completed runs before starting new features
When /agent-loop:run detects a previous run with all stories passed (or the
feature branch deleted after merge), it archives the old artifacts and resets
.loop/ automatically — no more manual rm -rf .loop.

- Add archive_and_reset() for on-demand archiving from skills
- Add runs.log index tracking all archived runs
- Update /run and /stories skills to detect completed runs
- setup.sh archives instead of hard-failing when prd.json exists
- Bump version to 0.9.0
2026-04-02 07:40:07 -04:00
ce111b4cbe feat: add guidance for subjective acceptance criteria
Planner now has examples for design/UX criteria that are evaluable
without being purely binary. Prevents the planner from avoiding
qualitative criteria just because they aren't grep-checkable.
2026-03-28 12:59:42 -04:00
77fd9e0cd6 feat: add concrete examples of good vs bad acceptance criteria
Planner now sees specific examples of verifiable criteria (grep,
test commands, file checks) alongside vague anti-patterns. Drives
higher story quality which directly improves evaluator accuracy.
2026-03-28 12:56:53 -04:00
1efca3c185 feat: add blocker handling and artifact protection to generator
Generator now has explicit instructions for when it's stuck: write
the blocker to notes, leave passes as false, and stop. Also adds
a "Do Not Modify" section preventing changes to other stories,
contracts, or config.
2026-03-28 12:40:05 -04:00
e4df81fdac feat: add self-verification gate before generator marks story done
Generator must now verify each acceptance criterion against actual
code before setting passes: true. Acts as a first filter before
the evaluator runs, reducing false completions.
2026-03-28 12:36:24 -04:00
6833d94cf4 docs: mention using Claude or /plan to generate specs 2026-03-28 12:26:40 -04:00
c293f53d90 docs: make runtime verification claim accurate
Only claim what the evaluator actually does: runs tests, builds,
and checks for errors. Don't overstate MCP server discovery.
2026-03-28 12:20:31 -04:00
9fd428ac51 docs: replace specific MCP recommendations with general guidance
Avoid maintaining specific install commands that will go stale.
The evaluator uses whatever tools are available — let users
configure their own testing environment.
2026-03-28 12:19:50 -04:00
c46de6815c refactor: remove headless mode
Headless mode was half-built and untested. Agent-loop is a plugin
that runs interactively via tmux — there's no CI use case yet.
Removes --headless flag, timeout compatibility shim, output capture
logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.
2026-03-28 12:17:30 -04:00
b4d4e1952a docs: rewrite README for plugin-first install
- Remove manual install and install.sh references
- Add prerequisites section (tmux, jq/python3)
- Add step to write a spec before running
- Fix "PRD" → "spec" in modes table
- Add --headless to options list
- Update generator description with startup sequence
- Note evaluator calibration examples
2026-03-28 12:01:05 -04:00
60ce0fef54 fix: tighten vague language across all prompt files
- Remove blanket "write tests" instructions; tests only when
  acceptance criteria require them
- Replace arbitrary "30-50% rejection rate" with clear directive
- Replace "4/5 threshold" with "majority of claims" rule
- List concrete quality gate commands instead of "whatever project uses"
- Remove "learnings" from progress summary (too vague)
- Make error-leak pattern generic (not HTTP-specific)
- Align fix evaluator with updated test expectations
2026-03-28 11:58:13 -04:00
f26bdce534 fix: replace misleading context budget percentages with scope guidance
The planner prompt had vague context window budget percentages that
don't reflect how agents actually work. Replaced with concrete
scope guidance (keep stories to ~10 files) which aligns with the
existing scope budgets in config.json.
2026-03-28 11:49:04 -04:00
2dc291aac4 fix: make evaluator calibration examples project-agnostic
Replace ChaosRush-specific references with generic examples
that apply to any codebase.
2026-03-28 11:21:11 -04:00
1d059e218b feat: add few-shot calibration examples to evaluator prompt
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
2026-03-28 11:15:52 -04:00
80b0f0f4c1 feat: add regression patterns to evaluator implement prompt
Three new failure patterns: missing imports after refactoring,
orphaned resource instances, and error detail leakage. These were
observed in a real loop run where the evaluator missed them.
2026-03-28 10:57:44 -04:00
5e4ad3b12e feat: add smoke test step to generator startup sequence
Generator now runs a quick health check before implementing if the
project has tests or a dev server. Catches regressions from previous
iterations early instead of building on a broken foundation.
2026-03-27 21:09:36 -04:00
9a7fa3a1bd fix: enforce strict orientation sequence in generator prompt
Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.
2026-03-27 21:07:48 -04:00
19 changed files with 603 additions and 268 deletions

View File

@@ -10,7 +10,7 @@
"name": "agent-loop", "name": "agent-loop",
"source": "./", "source": "./",
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.", "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.",
"version": "0.8.0", "version": "0.12.0",
"author": { "author": {
"name": "Sheldon" "name": "Sheldon"
}, },

View File

@@ -1,6 +1,6 @@
{ {
"name": "agent-loop", "name": "agent-loop",
"version": "0.8.0", "version": "0.12.0",
"description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Run /agent-loop:run to start.", "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Run /agent-loop:run to start.",
"author": { "author": {
"name": "Sheldon" "name": "Sheldon"

View File

@@ -8,10 +8,7 @@ A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each i
## Install ## Install
### As a Claude Code Plugin (Recommended)
``` ```
/plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
/plugin install agent-loop@agent-loop /plugin install agent-loop@agent-loop
``` ```
@@ -23,16 +20,18 @@ Then in any project:
That's it. The single command handles setup, planning, and execution. That's it. The single command handles setup, planning, and execution.
### Manual Install ## Prerequisites
```bash - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) CLI installed
cp -r /path/to/loop-loop .loop - `tmux` available (used to run the loop in a detachable session)
``` - `jq` or `python3` (for JSON state management)
Then run `.loop/loop.sh` directly.
## How It Works ## How It Works
1. Write a spec describing what you want to build (`SPEC.md`, `docs/specs/*.md`, or similar). You can write it yourself, ask Claude to draft one, or use planning tools like `/plan`.
2. Run `/agent-loop:run` — it scaffolds `.loop/`, generates stories from your spec, and presents them for review
3. Say "go" — the loop launches in tmux and runs autonomously
``` ```
/agent-loop:run /agent-loop:run
├─ Phase 1: Scaffold .loop/ (if needed) ├─ Phase 1: Scaffold .loop/ (if needed)
@@ -50,7 +49,7 @@ Then run `.loop/loop.sh` directly.
| Mode | What it does | Git writes? | | Mode | What it does | Git writes? |
|------|-------------|-------------| |------|-------------|-------------|
| **implement** | Build features from a PRD | Yes | | **implement** | Build features from a spec | Yes |
| **explore** | Read-only codebase analysis | No | | **explore** | Read-only codebase analysis | No |
| **fix** | Targeted bug fixes / tech debt | Yes | | **fix** | Targeted bug fixes / tech debt | Yes |
@@ -73,28 +72,15 @@ Or ask Claude Code "status" — it reads `.loop/prd.json` and `.loop/progress.md
Each generator and evaluator run is a full Claude Code session saved to history. Use `claude -r` to resume any session and inspect what happened, debug a rejection, or continue from where it left off. Each generator and evaluator run is a full Claude Code session saved to history. Use `claude -r` to resume any session and inspect what happened, debug a rejection, or continue from where it left off.
## Headless Mode
For CI or background execution without the interactive UI:
```bash
.loop/loop.sh --headless [options]
--mode <implement|explore|fix> Operating mode
--max <N> Maximum iterations (default: 20)
--skip-eval Skip evaluator pass
--dry-run Print assembled prompts without running
```
## Architecture ## Architecture
### Generator ### Generator
Fresh Claude Code session each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done. Fresh Claude Code session each iteration. Follows a strict startup sequence: reads progress.md, finds the next story from prd.json, reads the sprint contract, checks for evaluator feedback, reviews git history, and runs a smoke test if available — all before writing any code. Then implements the story, runs quality gates, commits, and marks it done.
### Evaluator ### Evaluator
Separate fresh session after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests and the application, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback. Separate fresh session after each generator pass. Skeptically verifies the work: checks each acceptance criterion against actual code with file paths and line numbers, runs tests, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback.
Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction. Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction and few-shot calibration examples.
### Sprint Contracts ### Sprint Contracts
Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete. Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
@@ -109,27 +95,9 @@ Before the loop starts, the planner generates contracts for each story. These de
| `config.json` | Harness configuration | | `config.json` | Harness configuration |
| Git commits | Code changes with story-tagged messages | | Git commits | Code changes with story-tagged messages |
## Optional: Runtime Testing Tools ## Runtime Verification
The evaluator verifies code actually runs, not just that it looks correct. It uses whatever tools are available. For richer verification, install these optional MCP servers: The evaluator doesn't just read diffs — it runs tests, builds the project, and checks for runtime errors using whatever tools the project already has (test runners, linters, build commands).
**Web projects (Playwright):**
```bash
claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
```
**iOS/Xcode projects (XcodeBuildMCP):**
```bash
brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp
claude mcp add xcodebuild -- xcodebuildmcp
```
**iOS Simulator interaction:**
```bash
claude mcp add ios-simulator -- npx -y ios-simulator-mcp
```
These are optional — the evaluator works without them but may miss runtime-only issues.
## Design Principles ## Design Principles

View File

@@ -11,12 +11,16 @@ You are a planner agent for the agent loop harness. Your job is to decompose a f
## CONSTRAINTS ## CONSTRAINTS
- You may ONLY write files inside the `.loop/` directory - You may ONLY write files inside the `.loop/` directory (or the absolute loop directory path if one is provided)
- You may NOT write any project source code (.js, .ts, .py, .go, .rs, .html, .css, etc.) - You may NOT write any project source code (.js, .ts, .py, .go, .rs, .html, .css, etc.)
- You may NOT run bash commands - You may NOT run bash commands
- You may NOT start implementing features - You may NOT start implementing features
- You produce prd.json and contracts, then STOP - You produce prd.json and contracts, then STOP
## OUTPUT DIRECTORY
If the prompt specifies an absolute path for the loop directory (e.g., "Write all files to /path/to/worktree/.loop/"), use that absolute path for ALL file writes. Otherwise, use the relative `.loop/` path.
## YOUR TASK ## YOUR TASK
You will be given a feature spec or description. Decompose it into stories. You will be given a feature spec or description. Decompose it into stories.

View File

@@ -102,5 +102,5 @@ echo " Next steps (inside Claude Code, in any project):"
echo "" echo ""
echo " /agent-loop:run # Single command — setup, plan, and run" echo " /agent-loop:run # Single command — setup, plan, and run"
echo "" echo ""
echo " Or run headless: .loop/loop.sh" echo " Or run directly: .loop/loop.sh"
echo "" echo ""

View File

@@ -1,11 +1,17 @@
#!/bin/bash #!/bin/bash
# Branch archiving — archives previous run artifacts when the branch changes. # Run archiving — preserves prd.json, progress.md, and contracts from completed runs.
# Preserves prd.json, progress.md, and contracts from the previous feature.
# #
# Design: At the end of each run, snapshot_for_archive saves current artifacts # Two archive triggers:
# to .archive-staging/. On the next run, if the branch changed, check_archive # 1. Branch change: check_archive detects a new branch and archives the staged snapshot.
# moves the snapshot to archive/ and cleans up. This avoids archiving the # 2. Completed run: archive_and_reset is called by the /run skill when prd.json shows
# WRONG artifacts (the new feature's) when prd.json has already been overwritten. # all stories passed (or the branch was deleted). This handles the common workflow
# of merging a feature branch back to main and starting a new feature.
#
# Archive layout:
# .loop/archive/
# runs.log — one-line-per-run index for quick lookup
# 2026-03-15-auth-system/ — full artifacts from that run
# prd.json, progress.md, contracts/
LAST_BRANCH_FILE="$LOOP_DIR/.last-branch" LAST_BRANCH_FILE="$LOOP_DIR/.last-branch"
STAGING_DIR="$LOOP_DIR/.archive-staging" STAGING_DIR="$LOOP_DIR/.archive-staging"
@@ -85,5 +91,139 @@ archive_run() {
rm -f "$LOOP_DIR/progress.md" rm -f "$LOOP_DIR/progress.md"
rm -rf "$LOOP_DIR/contracts" rm -rf "$LOOP_DIR/contracts"
append_runs_log "$branch_name" "$archive_dir"
log "Archived previous run to $archive_dir" log "Archived previous run to $archive_dir"
} }
# Archive current run artifacts and reset for a new run.
# Called by the /run skill when a completed run is detected (all stories passed
# or the feature branch no longer exists). Unlike check_archive (which reads from
# staging), this archives the LIVE artifacts directly since we know they belong
# to the completed run.
archive_and_reset() {
local loop_dir="${1:-.loop}"
local prd="$loop_dir/prd.json"
[ -f "$prd" ] || return 0
# Read branch name from current prd.json
local branch_name=""
if command -v jq &>/dev/null; then
branch_name=$(jq -r '.branchName // empty' "$prd" 2>/dev/null)
elif command -v python3 &>/dev/null; then
branch_name=$(LOOP_PRD="$prd" python3 -c "
import json, os
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
" 2>/dev/null)
fi
local feature_name
feature_name=$(echo "${branch_name:-unknown}" | sed 's|.*/||')
local archive_dir="$loop_dir/archive/$(date +%Y-%m-%d)-${feature_name}"
mkdir -p "$archive_dir"
# Archive live artifacts
[ -f "$prd" ] && cp "$prd" "$archive_dir/"
[ -f "$loop_dir/progress.md" ] && cp "$loop_dir/progress.md" "$archive_dir/"
[ -f "$loop_dir/progress-archive.md" ] && cp "$loop_dir/progress-archive.md" "$archive_dir/"
[ -d "$loop_dir/contracts" ] && cp -r "$loop_dir/contracts" "$archive_dir/"
# Verify archive has content before deleting originals
if ! find "$archive_dir" -maxdepth 1 -type f | read -r; then
echo "[archive] WARNING: Archive directory is empty — skipping reset to prevent data loss"
return 1
fi
append_runs_log "$branch_name" "$archive_dir"
# Reset run-specific files (keep config.json, init.sh, harness files)
rm -f "$loop_dir/prd.json"
rm -f "$loop_dir/progress.md"
rm -f "$loop_dir/progress-archive.md"
rm -rf "$loop_dir/contracts"
rm -rf "$loop_dir/.archive-staging"
rm -f "$loop_dir/.last-branch"
rm -f "$loop_dir/.verdict"
echo "[archive] Archived completed run to $archive_dir"
echo "[archive] .loop/ reset — ready for new stories"
}
# Archive a completed run from a worktree back to the main project's .loop/archive/.
# Called by the /run skill's completion handler after the loop finishes in a worktree.
#
# Usage: archive_from_worktree <worktree_loop_dir> <main_loop_dir>
# worktree_loop_dir: absolute path to the worktree's .loop/ (source)
# main_loop_dir: absolute path to the main project's .loop/ (destination)
archive_from_worktree() {
local wt_loop_dir="$1"
local main_loop_dir="$2"
local wt_prd="$wt_loop_dir/prd.json"
[ -f "$wt_prd" ] || { echo "[archive] WARNING: No prd.json in worktree — nothing to archive"; return 1; }
# Read branch name from worktree's prd.json
local branch_name=""
if command -v jq &>/dev/null; then
branch_name=$(jq -r '.branchName // empty' "$wt_prd" 2>/dev/null)
elif command -v python3 &>/dev/null; then
branch_name=$(LOOP_PRD="$wt_prd" python3 -c "
import json, os
print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
" 2>/dev/null)
fi
local feature_name
feature_name=$(echo "${branch_name:-unknown}" | sed 's|.*/||')
local archive_dir="$main_loop_dir/archive/$(date +%Y-%m-%d)-${feature_name}"
mkdir -p "$archive_dir"
# Copy artifacts from worktree
[ -f "$wt_prd" ] && cp "$wt_prd" "$archive_dir/"
[ -f "$wt_loop_dir/progress.md" ] && cp "$wt_loop_dir/progress.md" "$archive_dir/"
[ -f "$wt_loop_dir/progress-archive.md" ] && cp "$wt_loop_dir/progress-archive.md" "$archive_dir/"
[ -d "$wt_loop_dir/contracts" ] && cp -r "$wt_loop_dir/contracts" "$archive_dir/"
[ -d "$wt_loop_dir/triage" ] && cp -r "$wt_loop_dir/triage" "$archive_dir/"
# Verify archive has content
if ! find "$archive_dir" -maxdepth 1 -type f | read -r; then
echo "[archive] WARNING: Archive directory is empty — copy may have failed"
return 1
fi
append_runs_log "$branch_name" "$archive_dir"
echo "[archive] Archived worktree run to $archive_dir"
}
# Append a one-line summary to the runs log.
append_runs_log() {
local branch_name="$1"
local archive_dir="$2"
local runs_log
runs_log="$(dirname "$archive_dir")/runs.log"
# Read story counts from the archived prd.json
local total=0 passed=0 blocked=0
local archived_prd="$archive_dir/prd.json"
if [ -f "$archived_prd" ]; then
if command -v jq &>/dev/null; then
total=$(jq '.userStories | length' "$archived_prd" 2>/dev/null || echo 0)
passed=$(jq '[.userStories[] | select(.passes == true)] | length' "$archived_prd" 2>/dev/null || echo 0)
blocked=$(jq '[.userStories[] | select(.blocked == true)] | length' "$archived_prd" 2>/dev/null || echo 0)
elif command -v python3 &>/dev/null; then
eval "$(LOOP_PRD="$archived_prd" python3 -c "
import json, os
d = json.load(open(os.environ['LOOP_PRD']))
s = d.get('userStories', [])
print(f'total={len(s)} passed={sum(1 for x in s if x.get(\"passes\"))} blocked={sum(1 for x in s if x.get(\"blocked\"))}')
" 2>/dev/null)" || true
fi
fi
printf '%s %-30s %s/%s passed %s blocked\n' \
"$(date +%Y-%m-%d)" "${branch_name:-unknown}" "$passed" "$total" "$blocked" \
>> "$runs_log"
}

View File

@@ -7,9 +7,18 @@
# #
# Without this hook, claude would exit to an interactive prompt instead of # Without this hook, claude would exit to an interactive prompt instead of
# returning control to the loop script. # returning control to the loop script.
#
# IMPORTANT: The hook is scoped to only fire inside the agent-loop tmux session.
# Without this guard, ANY Claude Code session opened in the same project directory
# would pick up the hook and kill its own parent shell on exit.
SETTINGS_FILE="${PROJECT_ROOT}/.claude/settings.local.json" SETTINGS_FILE="${PROJECT_ROOT}/.claude/settings.local.json"
# The hook checks AGENT_LOOP_ACTIVE before killing. This env var is exported by
# loop.sh and inherited by CC sessions it spawns. Interactive CC sessions in the
# same project won't have it set, so the hook is a no-op for them.
HOOK_COMMAND='[ "${AGENT_LOOP_ACTIVE:-}" = "1" ] && kill -INT $PPID || true'
install_hooks() { install_hooks() {
if [ ! -f "$SETTINGS_FILE" ]; then if [ ! -f "$SETTINGS_FILE" ]; then
mkdir -p "$(dirname "$SETTINGS_FILE")" mkdir -p "$(dirname "$SETTINGS_FILE")"
@@ -17,14 +26,16 @@ install_hooks() {
fi fi
if command -v jq &>/dev/null; then if command -v jq &>/dev/null; then
jq '.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": "kill -INT $PPID || true"}]}]' \ jq --arg cmd "$HOOK_COMMAND" \
'.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": $cmd}]}]' \
"$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE" "$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE"
else else
LOOP_SETTINGS="$SETTINGS_FILE" python3 -c " LOOP_HOOK_CMD="$HOOK_COMMAND" LOOP_SETTINGS="$SETTINGS_FILE" python3 -c "
import json, os import json, os
p = os.environ['LOOP_SETTINGS'] p = os.environ['LOOP_SETTINGS']
cmd = os.environ['LOOP_HOOK_CMD']
s = json.load(open(p)) if os.path.exists(p) else {} s = json.load(open(p)) if os.path.exists(p) else {}
s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': 'kill -INT \$PPID || true'}]}] s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': cmd}]}]
json.dump(s, open(p, 'w'), indent=2) json.dump(s, open(p, 'w'), indent=2)
" "
fi fi

157
loop.sh
View File

@@ -13,7 +13,6 @@
# --no-hooks Don't install stop hooks # --no-hooks Don't install stop hooks
# --dry-run Print assembled prompts without running agents # --dry-run Print assembled prompts without running agents
# --resume Skip already-passed stories (explicit mode) # --resume Skip already-passed stories (explicit mode)
# --replan (reserved — not yet implemented)
# #
# Each iteration: # Each iteration:
# 1. Generator: picks highest-priority incomplete story, does the work # 1. Generator: picks highest-priority incomplete story, does the work
@@ -81,23 +80,6 @@ if ! command -v jq &>/dev/null && ! command -v python3 &>/dev/null; then
exit 1 exit 1
fi fi
# --- macOS timeout compatibility ---
# macOS doesn't have GNU timeout. Use gtimeout (from coreutils) or a perl fallback.
if ! command -v timeout &>/dev/null; then
if command -v gtimeout &>/dev/null; then
timeout() { gtimeout "$@"; }
else
# Perl-based fallback: runs command with alarm signal
timeout() {
local duration="$1"; shift
perl -e '
alarm shift @ARGV;
exec @ARGV;
' "$duration" "$@"
}
fi
fi
# --- Load config defaults --- # --- Load config defaults ---
CONFIG_FILE="$LOOP_DIR/config.json" CONFIG_FILE="$LOOP_DIR/config.json"
config_default() { get_config_value "$1" "$2"; } config_default() { get_config_value "$1" "$2"; }
@@ -122,15 +104,14 @@ while [[ $# -gt 0 ]]; do
--tool=*) TOOL="${1#*=}"; shift ;; --tool=*) TOOL="${1#*=}"; shift ;;
--no-hooks) AUTO_HOOKS=false; shift ;; --no-hooks) AUTO_HOOKS=false; shift ;;
--dry-run) DRY_RUN=true; shift ;; --dry-run) DRY_RUN=true; shift ;;
--headless) export LOOP_HEADLESS=true; shift ;;
--resume) RESUME=true; shift ;; --resume) RESUME=true; shift ;;
--replan) log "ERROR: --replan is not yet implemented. Use /agent-loop:stories interactively."; exit 1 ;;
[0-9]*) MAX_ITERATIONS="$1"; shift ;; [0-9]*) MAX_ITERATIONS="$1"; shift ;;
*) log "Unknown option: $1"; exit 1 ;; *) log "Unknown option: $1"; exit 1 ;;
esac esac
done done
export ITERATION=0 MAX_ITERATIONS MODE export ITERATION=0 MAX_ITERATIONS MODE
export AGENT_LOOP_ACTIVE=1
# --- Validate --- # --- Validate ---
if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
@@ -147,7 +128,6 @@ fi
cd "$PROJECT_ROOT" cd "$PROJECT_ROOT"
cleanup() { cleanup() {
[ -n "${LOOP_AGENT_TMPFILE:-}" ] && rm -f "$LOOP_AGENT_TMPFILE"
# Remove hooks in case we exit mid-agent (Ctrl+C during a claude session) # Remove hooks in case we exit mid-agent (Ctrl+C during a claude session)
[ "$AUTO_HOOKS" = true ] && remove_hooks 2>/dev/null [ "$AUTO_HOOKS" = true ] && remove_hooks 2>/dev/null
release_lock release_lock
@@ -178,10 +158,11 @@ finish() {
read -r -t 30 2>/dev/null || true read -r -t 30 2>/dev/null || true
exit "$exit_code" exit "$exit_code"
} }
LOOP_AGENT_TMPFILE="" # Install Stop hook once at startup. The AGENT_LOOP_ACTIVE env var guard ensures
# it only fires for CC sessions spawned by this loop (not the user's other sessions).
# NOTE: Stop hook is installed/removed per-agent in run_agent(), not globally. # Installing once avoids a race condition where per-iteration install_hooks writes
# This prevents the hook from killing the orchestrating CC session. # settings.local.json just before CC starts, and CC reads the old file.
[ "$AUTO_HOOKS" = true ] && install_hooks
trap cleanup EXIT INT TERM trap cleanup EXIT INT TERM
check_archive check_archive
@@ -200,12 +181,14 @@ if [ -f "$LOOP_DIR/init.sh" ]; then
bash "$LOOP_DIR/init.sh" bash "$LOOP_DIR/init.sh"
fi fi
# Ensure correct git branch # Verify we're on the expected branch (worktree should already be on it)
BRANCH=$(prd_branch_name 2>/dev/null || echo "") BRANCH=$(prd_branch_name 2>/dev/null || echo "")
if [ -n "$BRANCH" ]; then if [ -n "$BRANCH" ]; then
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "") CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then
log "Switching to branch: $BRANCH" log "WARNING: Expected branch '$BRANCH' but on '$CURRENT_BRANCH'"
log "If running in a worktree, the branch should already be checked out."
log "Attempting to switch..."
git checkout "$BRANCH" 2>/dev/null || \ git checkout "$BRANCH" 2>/dev/null || \
git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \ git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \
git checkout -b "$BRANCH" git checkout -b "$BRANCH"
@@ -215,14 +198,10 @@ fi
# --- Agent runner --- # --- Agent runner ---
# Runs a prompt through the selected AI tool. # Runs a prompt through the selected AI tool.
# #
# Interactive (default): Pipes prompt to claude WITHOUT --print. # Pipes prompt to claude WITHOUT --print. This gives the full interactive
# This gives the full interactive CC UI — tool calls, file edits, etc. # CC UI — tool calls, file edits, etc. A Stop hook sends SIGINT to the loop
# A Stop hook (installed at startup) sends SIGINT to the loop when claude # when claude finishes, returning control to the while loop for the next
# finishes, which returns control to the while loop for the next iteration. # iteration. State is tracked via files (prd.json, .verdict), not stdout.
# State is tracked via files (prd.json, .verdict), not stdout.
#
# Headless (LOOP_HEADLESS=true): Uses claude --print for CI/background.
# Output captured to file for verdict parsing.
run_agent() { run_agent() {
local prompt="$1" local prompt="$1"
local role="${2:-}" local role="${2:-}"
@@ -230,65 +209,26 @@ run_agent() {
rm -f "$LOOP_DIR/.verdict" rm -f "$LOOP_DIR/.verdict"
local agent_exit=0 local agent_exit=0
if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
# --- Interactive mode (Ralph pattern) ---
# Install Stop hook just before claude starts, remove after it exits.
# This scopes the hook to only affect the loop's claude sessions.
[ "$AUTO_HOOKS" = true ] && install_hooks
( (
case "$TOOL" in case "$TOOL" in
claude) claude)
printf '%s\n' "$prompt" | claude --dangerously-skip-permissions printf '%s\n' "$prompt" | claude --dangerously-skip-permissions
;; ;;
amp) amp)
printf '%s\n' "$prompt" | amp --dangerously-allow-all printf '%s\n' "$prompt" | amp --dangerously-allow-all
;; ;;
*) *)
log "ERROR: Unknown tool '$TOOL'" log "ERROR: Unknown tool '$TOOL'"
exit 1 exit 1
;; ;;
esac esac
) || agent_exit=$? ) || agent_exit=$?
sleep 2 # Brief pause between sessions
[ "$AUTO_HOOKS" = true ] && remove_hooks # Read verdict from file if evaluator wrote one
sleep 2 # Brief pause between sessions if [ "$role" = "evaluator" ] && [ -f "$LOOP_DIR/.verdict" ]; then
cat "$LOOP_DIR/.verdict"
# Read verdict from file if evaluator wrote one
if [ "$role" = "evaluator" ] && [ -f "$LOOP_DIR/.verdict" ]; then
cat "$LOOP_DIR/.verdict"
fi
else
# --- Headless mode ---
local output_file
output_file=$(mktemp)
LOOP_AGENT_TMPFILE="$output_file"
(
case "$TOOL" in
claude)
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
claude --dangerously-skip-permissions --output-format text \
--print > "$output_file" 2>&1
;;
amp)
printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
amp --dangerously-allow-all > "$output_file" 2>&1
;;
*)
log "ERROR: Unknown tool '$TOOL'"
exit 1
;;
esac
) || agent_exit=$?
if [ "$agent_exit" -ne 0 ] && [ ! -s "$output_file" ]; then
log "WARNING: Agent exited with code $agent_exit and produced no output."
fi
cat "$output_file"
rm -f "$output_file"
LOOP_AGENT_TMPFILE=""
fi fi
} }
@@ -371,18 +311,7 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
exit 0 exit 0
fi fi
if [ "${LOOP_HEADLESS:-false}" != "true" ]; then run_agent "$GENERATOR_PROMPT" "generator"
# Interactive: run directly, no capture. User sees full CC UI.
run_agent "$GENERATOR_PROMPT" "generator"
GENERATOR_OUTPUT=""
else
# Headless: capture output for parsing.
GENERATOR_OUTPUT=$(run_agent "$GENERATOR_PROMPT" "generator")
if [ -z "$GENERATOR_OUTPUT" ]; then
log "WARNING: Generator produced empty output (timeout or crash). Skipping to next iteration."
continue
fi
fi
# --- Scope budget check --- # --- Scope budget check ---
# Verify the generator stayed within configured limits (files modified, lines written). # Verify the generator stayed within configured limits (files modified, lines written).
@@ -419,22 +348,12 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
EVAL_PROMPT=$(build_prompt "evaluator" "$MODE") EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
if [ "${LOOP_HEADLESS:-false}" != "true" ]; then run_agent "$EVAL_PROMPT" "evaluator"
# Interactive: run directly, read verdict from file. if [ -f "$LOOP_DIR/.verdict" ]; then
run_agent "$EVAL_PROMPT" "evaluator" EVAL_OUTPUT=$(cat "$LOOP_DIR/.verdict")
if [ -f "$LOOP_DIR/.verdict" ]; then
EVAL_OUTPUT=$(cat "$LOOP_DIR/.verdict")
else
log "WARNING: No verdict file found. Treating as REJECT."
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no verdict file</rejection_reason>"
fi
else else
# Headless: capture output for parsing. log "WARNING: No verdict file found. Treating as REJECT."
EVAL_OUTPUT=$(run_agent "$EVAL_PROMPT" "evaluator") EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no verdict file</rejection_reason>"
if [ -z "$EVAL_OUTPUT" ]; then
log "WARNING: Evaluator produced empty output. Treating as REJECT."
EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no output</rejection_reason>"
fi
fi fi
VERDICT=$(parse_verdict "$EVAL_OUTPUT") VERDICT=$(parse_verdict "$EVAL_OUTPUT")

View File

@@ -10,7 +10,7 @@ You (Claude) have well-documented tendencies that make you a poor QA agent by de
**OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence. **OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.
**Rejection is normal and healthy.** Rejecting 30-50% of iterations is expected. **Rejection is normal and healthy.** Do not hesitate to reject when criteria aren't met.
## Your Target ## Your Target
@@ -27,6 +27,36 @@ Evaluate story **`{{CURRENT_STORY_ID}}`**.
7. Run quality checks yourself (typecheck, tests, lint) 7. Run quality checks yourself (typecheck, tests, lint)
8. **Actually run the code.** Use whatever tools are available. Code that looks correct but doesn't run is not complete. 8. **Actually run the code.** Use whatever tools are available. Code that looks correct but doesn't run is not complete.
## Calibration Examples
<example type="bad-evaluation">
"The generator created the new module and updated the config. The code looks clean and follows the existing pattern. Tests were not run but the implementation appears correct. PASS."
Why this is wrong: "appears correct" is not verification. The evaluator didn't run tests, didn't check that the new module is actually imported and used, and didn't read the modified files in full. This is a rubber stamp.
</example>
<example type="good-rejection">
"Checked acceptance criteria. Criterion 3 says 'both files import the shared utility instead of defining their own'. Verified file A — correct. Checked file B — still defines a local copy at line 36 and does not import the shared one. Also: file B line 96 calls a function from a module whose import was removed during the refactoring — this will crash at runtime.
REJECT: File B still has local duplicate (criterion 3 not met) and missing import will cause runtime error."
Why this is good: Verified each criterion against actual code with file paths and line numbers. Caught a regression the generator introduced. Specific and actionable.
</example>
<example type="good-pass">
"Checked all 4 acceptance criteria:
1. New validation logic is active — verified at config.py:23-28. ✓
2. Invalid input returns the expected error — verified at config.py:26. ✓
3. Old workaround removed — grep returns zero matches. ✓
4. Existing behavior unchanged — logic only triggers on the new condition. ✓
Ran git diff: only 2 files modified, changes scoped to this story. No imports removed, no regressions in surrounding code.
PASS."
Why this is good: Each criterion checked against specific lines. Verified no collateral damage. Concise but thorough.
</example>
## Verdict ## Verdict
Write your verdict to `{{LOOP_DIR}}/.verdict` AND include it in your response. Write your verdict to `{{LOOP_DIR}}/.verdict` AND include it in your response.

View File

@@ -37,7 +37,7 @@ Claims Verified:
## Grading Criteria ## Grading Criteria
- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed) - **Accuracy**: Are the majority of verified claims correct? If more than one claim is incorrect, reject.
- **Completeness**: Did it cover the important parts of the area? - **Completeness**: Did it cover the important parts of the area?
- **Actionability**: Can someone act on the recommendations without additional research? - **Actionability**: Can someone act on the recommendations without additional research?

View File

@@ -9,8 +9,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
- Would this fix survive edge cases? - Would this fix survive edge cases?
- Did the generator patch around the bug or fix the actual cause? - Did the generator patch around the bug or fix the actual cause?
2. **Verify a regression test exists:** 2. **If the acceptance criteria require a regression test, verify it exists:**
- Is there a new or updated test?
- Does the test actually reproduce the original bug scenario? - Does the test actually reproduce the original bug scenario?
- Would the test fail if the fix were reverted? - Would the test fail if the fix were reverted?
@@ -27,7 +26,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
## Rejection Criteria (Fix-Specific) ## Rejection Criteria (Fix-Specific)
- Fix addresses symptom but not root cause - Fix addresses symptom but not root cause
- No regression test added - Acceptance criteria require a regression test but none was added
- Existing tests fail after the fix - Existing tests fail after the fix
- Unrelated changes included in the commit - Unrelated changes included in the commit
- Fix introduces a new bug or security issue - Fix introduces a new bug or security issue

View File

@@ -15,3 +15,6 @@ You are evaluating an implementation story. The generator claims to have built a
- Tests exist but don't assert meaningful behavior - Tests exist but don't assert meaningful behavior
- Passes typecheck only because types are overly loose - Passes typecheck only because types are overly loose
- Code exists but doesn't actually run - Code exists but doesn't actually run
- Removed an import or variable during refactoring but it's still used elsewhere in the file
- New instance of a shared resource (e.g., DB connection, rate limiter) instead of using the existing one
- Internal error details (stack traces, exception messages) exposed in user-facing output instead of being logged server-side

View File

@@ -1,24 +1,46 @@
You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance runs each iteration — you have no memory except what's in artifacts. You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance runs each iteration — you have no memory except what's in artifacts.
## Startup ## Startup (follow this exact sequence before writing any code)
1. Read `.loop/progress.md` — check Codebase Patterns first, then recent log entries 1. Read `.loop/progress.md` — check Codebase Patterns first, then recent log entries
2. Read `.loop/prd.json` — find the highest-priority story where `passes: false` 2. Read `.loop/prd.json` — find the highest-priority story where `passes: false`
3. Read the sprint contract at `.loop/contracts/{story-id}.contract.md` (if it exists) 3. Read the sprint contract at `.loop/contracts/{story-id}.contract.md` (if it exists)
4. Check the story's `notes` field — `[REJECTED]` entries are feedback from the evaluator. Address them. 4. Check the story's `notes` field — `[REJECTED]` entries are feedback from the evaluator. Address them.
5. Run `git log --oneline -10` — understand what previous iterations changed
6. If the project has tests or a dev server, run a quick smoke test to verify the codebase is healthy. If a previous iteration broke something, fix it before moving on.
Do NOT start implementation until steps 1-5 are complete.
## Rules ## Rules
- **ONE story per iteration.** Do not attempt multiple stories. - **ONE story per iteration.** Do not attempt multiple stories.
- **Read before writing.** Understand existing code before modifying. - **Read before writing.** Understand existing code before modifying.
- **No placeholders.** Every implementation must be complete and functional. - **No placeholders.** Every implementation must be complete and functional.
- **Run quality gates** before committing (typecheck, tests, lint — whatever the project uses). - **Run quality gates** before committing. Check for common tools (`npm test`, `pytest`, `cargo test`, `make test`, `go test ./...`) and run what's available. If no test tooling exists, verify manually.
- **Commit** with message: `feat: [Story ID] - [Story Title]` - **Commit** with message: `feat: [Story ID] - [Story Title]`
## After Completing ## If You Are Blocked
If you cannot complete the story (missing dependency, impossible as written, requires access you don't have), do NOT attempt a partial or broken implementation. Instead:
1. Write a clear description of the blocker in the story's `notes` field in prd.json
2. Leave `passes` as `false`
3. Append the blocker to progress.md
4. Stop — the loop will move on or escalate to a human
## Do Not Modify
- Other stories' `passes`, `notes`, or `acceptanceCriteria` fields — only modify the story you are working on
- Sprint contracts in `.loop/contracts/`
- `.loop/config.json`
## Before Marking Done
Go through each acceptance criterion in the story and verify your work satisfies it. Check the actual code, not your memory of what you wrote. If any criterion is not met, fix it before continuing. Do NOT set `passes: true` until every criterion is verified.
## After Verified
1. Update `.loop/prd.json` — set `passes: true` for the story 1. Update `.loop/prd.json` — set `passes: true` for the story
2. Append a summary to `.loop/progress.md` — what was done, files changed, learnings 2. Append a summary to `.loop/progress.md` — what was done and which files were changed
3. Update Codebase Patterns in progress.md if you discovered a reusable pattern 3. Update Codebase Patterns in progress.md if you discovered a reusable pattern
## Completion Signal ## Completion Signal

View File

@@ -8,7 +8,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is
2. Read the sprint contract for context on what's broken and what "fixed" means 2. Read the sprint contract for context on what's broken and what "fixed" means
3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists. 3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists.
4. Make the minimal change to fix the issue 4. Make the minimal change to fix the issue
5. Write or update a test that would have caught this bug 5. If the story's acceptance criteria require a regression test, write one
6. Run quality gates 6. Run quality gates
7. Commit 7. Commit
@@ -16,7 +16,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is
- **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations. - **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations.
- **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions. - **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions.
- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md. - **Add a regression test only if the acceptance criteria require it.** Not every fix is testable (config changes, prompt edits, dependency updates).
- **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve. - **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve.
## Git Workflow ## Git Workflow

View File

@@ -16,7 +16,7 @@ You are building features from a PRD. Each story is a small, self-contained unit
- **Minimal changes only.** Do not refactor surrounding code or add features beyond scope. - **Minimal changes only.** Do not refactor surrounding code or add features beyond scope.
- **Follow the contract's Out of Scope section.** - **Follow the contract's Out of Scope section.**
- **If tests don't exist yet,** write them as part of the story. - **Write tests only if the story's acceptance criteria require them.**
- **If you need a dependency,** install it and note it in progress.md. - **If you need a dependency,** install it and note it in progress.md.
## Git ## Git

View File

@@ -9,26 +9,30 @@ When breaking a feature into stories, think about:
### Independence ### Independence
Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet. Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet.
### Context Window Fit ### Scope
A story must fit in a single AI context window (~100K tokens). This means: A story must be completable in a single iteration. Keep each story focused — a handful of files modified, not a sweeping change across the whole codebase. If a story requires reading and modifying more than ~10 files, it's too big — split it.
- Reading relevant existing code
- Understanding the task
- Implementing the change
- Writing tests
- Running quality checks
- Committing
Budget roughly:
- 30% of context for reading/understanding
- 40% for implementation
- 20% for testing and quality
- 10% for bookkeeping (prd.json, progress.md)
### Failure Isolation ### Failure Isolation
If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy. If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy.
### Evaluability ### Evaluability
Every story must have criteria the evaluator can independently verify. "The code is clean" is not evaluable. "The function returns 404 when the user doesn't exist" is evaluable. Every story must have criteria the evaluator can independently verify by reading code, running commands, or testing behavior.
Good criteria are specific and checkable:
- "Grep for 'HARDCODED_KEY' returns zero matches"
- "The function returns 404 when the user doesn't exist"
- "Running `npm test` passes with no failures"
- "The config file contains entries for all three required env vars"
Bad criteria are vague with no way to check:
- "The code is clean"
- "Works correctly"
- "Performance is improved"
- "Error handling is robust"
For subjective work (design, UX, documentation), criteria should define what to evaluate and how to judge it — not just say "looks good":
- "Design uses a consistent color palette and typography — no default library styles"
- "A user can complete the primary action without guessing what to click"
## PRD Anti-Patterns ## PRD Anti-Patterns

101
setup.sh
View File

@@ -10,10 +10,31 @@
set -euo pipefail set -euo pipefail
# --- Parse arguments ---
ACTION="scaffold"
MODE="${1:-implement}" MODE="${1:-implement}"
WORKTREE_PATH=""
MAIN_LOOP_DIR=""
case "$MODE" in
--update)
ACTION="update"
MODE="${2:-implement}"
;;
--init-worktree)
ACTION="init-worktree"
WORKTREE_PATH="$2"
MAIN_LOOP_DIR="$3"
if [ -z "$WORKTREE_PATH" ] || [ -z "$MAIN_LOOP_DIR" ]; then
echo "[setup] ERROR: --init-worktree requires <worktree_path> <main_loop_dir>"
echo "[setup] Usage: setup.sh --init-worktree /path/to/worktree /path/to/main/.loop"
exit 1
fi
;;
esac
# --- Validate mode --- # --- Validate mode ---
if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then if [ "$ACTION" = "scaffold" ] && [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
echo "[setup] ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix" echo "[setup] ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix"
exit 1 exit 1
fi fi
@@ -45,6 +66,65 @@ fi
echo "[setup] Harness source: $HARNESS_SRC" echo "[setup] Harness source: $HARNESS_SRC"
# Read plugin version from source
PLUGIN_VERSION=""
if [ -f "$HARNESS_SRC/.claude-plugin/plugin.json" ]; then
if command -v jq &>/dev/null; then
PLUGIN_VERSION=$(jq -r '.version // empty' "$HARNESS_SRC/.claude-plugin/plugin.json" 2>/dev/null)
elif command -v python3 &>/dev/null; then
PLUGIN_VERSION=$(python3 -c "import json; print(json.load(open('$HARNESS_SRC/.claude-plugin/plugin.json')).get('version',''),end='')" 2>/dev/null)
fi
fi
# --- Update-only mode: refresh harness files without touching run state ---
if [ "$ACTION" = "update" ]; then
LOOP_DIR="$(pwd)/.loop"
if [ ! -d "$LOOP_DIR" ]; then
echo "[setup] ERROR: No .loop/ directory found. Run setup first."
exit 1
fi
echo "[setup] Updating harness files..."
cp -r "$HARNESS_SRC/prompts" "$LOOP_DIR/"
cp -r "$HARNESS_SRC/templates" "$LOOP_DIR/"
cp -r "$HARNESS_SRC/lib" "$LOOP_DIR/"
cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/"
chmod +x "$LOOP_DIR/loop.sh"
[ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
echo "[setup] Harness updated to ${PLUGIN_VERSION:-unknown}. Run state (prd.json, contracts, config.json) unchanged."
exit 0
fi
# --- Init-worktree mode: initialize .loop/ in a worktree from main's config ---
if [ "$ACTION" = "init-worktree" ]; then
LOOP_DIR="$WORKTREE_PATH/.loop"
mkdir -p "$LOOP_DIR"
# Copy harness files from plugin source
cp -r "$HARNESS_SRC/prompts" "$LOOP_DIR/"
cp -r "$HARNESS_SRC/templates" "$LOOP_DIR/"
cp -r "$HARNESS_SRC/lib" "$LOOP_DIR/"
cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/"
chmod +x "$LOOP_DIR/loop.sh"
# Copy project config and init from main's .loop/
[ -f "$MAIN_LOOP_DIR/config.json" ] && cp "$MAIN_LOOP_DIR/config.json" "$LOOP_DIR/"
[ -f "$MAIN_LOOP_DIR/init.sh" ] && cp "$MAIN_LOOP_DIR/init.sh" "$LOOP_DIR/"
# Stamp harness version
[ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
# Create .gitignore for worktree's .loop/
cat > "$LOOP_DIR/.gitignore" << 'GITIGNORE'
*
GITIGNORE
echo "[setup] Worktree .loop/ initialized at $LOOP_DIR"
exit 0
fi
# --- Ensure git repo exists --- # --- Ensure git repo exists ---
if ! git rev-parse --git-dir &>/dev/null; then if ! git rev-parse --git-dir &>/dev/null; then
echo "[setup] No git repo found. Initializing..." echo "[setup] No git repo found. Initializing..."
@@ -57,9 +137,17 @@ PROJECT_ROOT="$(pwd)"
LOOP_DIR="$PROJECT_ROOT/.loop" LOOP_DIR="$PROJECT_ROOT/.loop"
if [ -d "$LOOP_DIR" ] && [ -f "$LOOP_DIR/prd.json" ]; then if [ -d "$LOOP_DIR" ] && [ -f "$LOOP_DIR/prd.json" ]; then
echo "[setup] .loop/ already exists with prd.json." echo "[setup] .loop/ already exists with prd.json — archiving previous run..."
echo "[setup] To re-initialize, delete .loop/ first: rm -rf .loop" # Source state.sh (needed by archive.sh for story queries) and archive.sh
exit 1 LOOP_DIR="$LOOP_DIR" source "$LOOP_DIR/lib/state.sh" 2>/dev/null || true
LOOP_DIR="$LOOP_DIR" source "$LOOP_DIR/lib/archive.sh" 2>/dev/null || true
if type archive_and_reset &>/dev/null; then
archive_and_reset "$LOOP_DIR"
else
# Fallback for old harness versions without archive_and_reset
echo "[setup] WARNING: Could not archive (old harness version). To re-initialize, delete .loop/ first: rm -rf .loop"
exit 1
fi
fi fi
mkdir -p "$LOOP_DIR" mkdir -p "$LOOP_DIR"
@@ -71,6 +159,9 @@ cp -r "$HARNESS_SRC/lib" "$LOOP_DIR/"
cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/" cp "$HARNESS_SRC/loop.sh" "$LOOP_DIR/"
chmod +x "$LOOP_DIR/loop.sh" chmod +x "$LOOP_DIR/loop.sh"
# Stamp harness version
[ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
# Verify critical files # Verify critical files
for f in prompts/generator/_base.md prompts/evaluator/_base.md templates/progress.md.template lib/state.sh loop.sh; do for f in prompts/generator/_base.md prompts/evaluator/_base.md templates/progress.md.template lib/state.sh loop.sh; do
if [ ! -f "$LOOP_DIR/$f" ]; then if [ ! -f "$LOOP_DIR/$f" ]; then
@@ -91,6 +182,8 @@ triage/
archive/ archive/
.archive-staging/ .archive-staging/
.last-branch .last-branch
.harness-version
.active-worktree
.loop.lock .loop.lock
GITIGNORE GITIGNORE

View File

@@ -1,16 +1,18 @@
--- ---
name: run name: run
description: "Agent Loop — single entry point. Scaffolds .loop/ if missing, generates stories if no prd.json, then launches autonomous execution in tmux." description: "Agent Loop — single entry point. Scaffolds .loop/ if missing, creates a worktree, generates stories, then launches autonomous execution in tmux."
--- ---
# /run — Agent Loop # /run — Agent Loop
Single entry point for the agent loop. Handles setup and planning interactively, then launches autonomous execution in a tmux session. Single entry point for the agent loop. Handles setup and planning interactively, then launches autonomous execution in a git worktree via tmux.
Each run gets its own worktree (isolated working directory on a feature branch). Multiple loops can run in parallel on different specs. Completed runs are archived to the main project's `.loop/archive/`.
## Usage ## Usage
``` ```
/agent-loop:run # Full flow: setup → stories → launch /agent-loop:run # Full flow: setup → worktree → stories → launch
/agent-loop:run --skip-eval # Skip evaluator pass /agent-loop:run --skip-eval # Skip evaluator pass
``` ```
@@ -20,9 +22,9 @@ Follow this sequence. Each phase checks what exists and skips if already done.
--- ---
## Phase 1: Scaffold (if needed) ## Phase 1: Scaffold Main .loop/ (if needed)
Check if `.loop/config.json` exists. Check if `.loop/config.json` exists in the current project root.
**If it does NOT exist**, run the setup script: **If it does NOT exist**, run the setup script:
@@ -31,116 +33,202 @@ Ask the user: **Mode?** (a) Implement (b) Explore (c) Fix — default is Impleme
Then run: Then run:
```bash ```bash
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | head -1)" <mode> bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" <mode>
``` ```
Show the output. If setup fails, stop. Show the output. If setup fails, stop.
**If it already exists**, skip to Phase 2. **If it already exists**, check if the harness files need updating. Compare the installed harness version against the plugin version:
```bash
INSTALLED=$(cat .loop/.harness-version 2>/dev/null || echo "unknown")
PLUGIN=$(jq -r '.version // empty' "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/.claude-plugin/plugin.json 2>/dev/null | tail -1)" 2>/dev/null || echo "unknown")
echo "installed=$INSTALLED plugin=$PLUGIN"
```
If the versions differ (or installed is "unknown"), update the harness files:
```bash
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" --update
```
Tell the user: *"Updated harness files to v{version}."*
--- ---
## Phase 2: Generate Stories (if needed) ## Phase 2: Create Worktree and Generate Stories
Check if `.loop/prd.json` exists. ### 2a. Find the spec
**If it does NOT exist**, generate it: Search for existing specs or plans:
- `docs/superpowers/specs/*.md`
- `docs/superpowers/plans/*.md`
- `docs/specs/*.md`
- `docs/plans/*.md`
- `SPEC.md`, `PRD.md`, `DESIGN.md`, `PLAN.md` at project root
- Any markdown file that looks like a feature spec or implementation plan
1. Search for existing specs or plans: If found: "I found a spec at `{path}`. Using it to generate stories."
- `docs/superpowers/specs/*.md`
- `docs/superpowers/plans/*.md`
- `docs/specs/*.md`
- `docs/plans/*.md`
- `SPEC.md`, `PRD.md`, `DESIGN.md`, `PLAN.md` at project root
- Any markdown file that looks like a feature spec or implementation plan
If found: "I found a spec at `{path}`. Using it to generate stories." If NOT found, stop and tell the user:
If NOT found, stop and tell the user: > **No spec or plan found.** Agent Loop decomposes existing plans into stories — it doesn't create plans from scratch.
>
> Create a plan first, then re-run `/agent-loop:run`:
> - Describe your idea to Claude and ask it to write a spec
> - Use `/plan` if available
> - Or create a markdown file at `docs/specs/` or `SPEC.md`
>
> The plan should describe what to build, the tech stack, and key requirements.
> **No spec or plan found.** Agent Loop decomposes existing plans into stories — it doesn't create plans from scratch. **STOP here. Do NOT ask the user to describe the project in a few sentences. Do NOT proceed without a spec file.**
>
> Create a plan first, then re-run `/agent-loop:run`:
> - Describe your idea to Claude and ask it to write a spec
> - Use `/plan` if available
> - Or create a markdown file at `docs/specs/` or `SPEC.md`
>
> The plan should describe what to build, the tech stack, and key requirements.
**STOP here. Do NOT ask the user to describe the project in a few sentences. Do NOT proceed without a spec file.** ### 2b. Derive names and create worktree
2. Read the project root and tech stack info. Read the spec title or filename to derive a feature slug. Examples:
- `SPEC.md` with title "# Enhanced Spikes Editor" → slug: `enhanced-spikes-editor`
- `docs/specs/auth-system.md` → slug: `auth-system`
3. Dispatch the **agent-loop:planner** agent: Derive paths:
```bash
PROJECT_DIR=$(basename "$(pwd)")
FEATURE_SLUG="<derived-slug>"
BRANCH_NAME="loop/${FEATURE_SLUG}"
WORKTREE_PATH="../${PROJECT_DIR}--loop-${FEATURE_SLUG}"
MAIN_LOOP_DIR="$(pwd)/.loop"
```
Check if the worktree already exists:
```bash
if [ -d "$WORKTREE_PATH" ]; then
echo "WORKTREE_EXISTS"
else
echo "WORKTREE_NEW"
fi
```
**If worktree exists**, check its state:
- Read `{WORKTREE_PATH}/.loop/prd.json` — are all stories passed?
- If all passed: ask user — "Previous run on `{BRANCH_NAME}` is complete. Archive and start fresh, or resume?"
- If in progress: ask user — "Run in progress on `{BRANCH_NAME}` ({passed}/{total}). Resume, or archive and start fresh?"
- If user says resume: skip to Phase 3 (launch in existing worktree)
- If user says archive/fresh: archive from worktree to main, remove worktree, then continue below
**If worktree is new**, create it:
```bash
git worktree add "$WORKTREE_PATH" -b "$BRANCH_NAME"
```
If the branch already exists (e.g., from a previous run):
```bash
git worktree add "$WORKTREE_PATH" "$BRANCH_NAME"
```
Initialize the worktree's `.loop/`:
```bash
bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" --init-worktree "$WORKTREE_PATH" "$MAIN_LOOP_DIR"
```
Initialize submodules if the project uses them:
```bash
git -C "$WORKTREE_PATH" submodule update --init --recursive 2>/dev/null || true
```
### 2c. Generate stories
Read the project root listing and tech stack info.
Dispatch the **agent-loop:planner** agent. Pass the **absolute worktree path** so the planner writes to the worktree's `.loop/`:
``` ```
Agent( Agent(
subagent_type: "agent-loop:planner", subagent_type: "agent-loop:planner",
prompt: "Generate prd.json and sprint contracts.\n\nMode: {mode}\nProject root: {path}\n\nSpec:\n{spec content}\n\nTech stack: {detected stack}", prompt: "Generate prd.json and sprint contracts.\n\nIMPORTANT: Write ALL files using absolute paths under: {WORKTREE_PATH}/.loop/\n- PRD: {WORKTREE_PATH}/.loop/prd.json\n- Contracts: {WORKTREE_PATH}/.loop/contracts/\n- Progress: {WORKTREE_PATH}/.loop/progress.md\n\nBranch name to use in prd.json: {BRANCH_NAME}\n\nMode: {mode}\nProject root: {WORKTREE_PATH}\n\nSpec:\n{spec content}\n\nTech stack: {detected stack}",
description: "Planning: generate stories" description: "Planning: generate stories"
) )
``` ```
4. After the planner finishes, read `.loop/prd.json` and present: ### 2d. Present stories
After the planner finishes, read `{WORKTREE_PATH}/.loop/prd.json` and present:
> **Stories generated — Review before running** > **Stories generated — Review before running**
> >
> Worktree: `{WORKTREE_PATH}` (branch: `{BRANCH_NAME}`)
>
> 1. US-001: {title} > 1. US-001: {title}
> 2. US-002: {title} > 2. US-002: {title}
> ... > ...
> >
> **Review:** > **Review:**
> - `.loop/prd.json` — stories and acceptance criteria > - `{WORKTREE_PATH}/.loop/prd.json` — stories and acceptance criteria
> - `.loop/contracts/` — done conditions per story > - `{WORKTREE_PATH}/.loop/contracts/` — done conditions per story
> >
> Let me know if you want changes, or say **go** to start the loop. > Let me know if you want changes, or say **go** to start the loop.
5. **STOP and wait for the user.** Do NOT start the loop automatically. The user must say "go", "start", "run", "looks good", or similar before proceeding to Phase 3. **STOP and wait for the user.** Do NOT start the loop automatically. The user must say "go", "start", "run", "looks good", or similar before proceeding to Phase 3.
**If `prd.json` already exists**, skip to Phase 3.
--- ---
## Phase 3: Validate and Launch ## Phase 3: Validate and Launch
1. Read `.loop/prd.json` and verify: 1. Read `{WORKTREE_PATH}/.loop/prd.json` and verify:
- Has a `userStories` array (NOT `sprints`, `stories`, or `tasks`) - Has a `userStories` array (NOT `sprints`, `stories`, or `tasks`)
- Each story has: `id`, `title`, `passes`, `priority` - Each story has: `id`, `title`, `passes`, `priority`
- If invalid, show the error and stop. - If invalid, show the error and stop.
2. Read `.loop/config.json` for `mode`, `maxIterations`. 2. Read `{WORKTREE_PATH}/.loop/config.json` for `mode`, `maxIterations`.
3. Verify `.loop/loop.sh` exists and is executable. 3. Verify `{WORKTREE_PATH}/.loop/loop.sh` exists and is executable.
4. Parse arguments for any flags to pass through (e.g., `--skip-eval`). 4. Parse arguments for any flags to pass through (e.g., `--skip-eval`).
5. Build the loop.sh command with any flags: 5. Build the loop.sh command and derive a unique tmux session name:
```bash ```bash
LOOP_CMD=".loop/loop.sh" LOOP_CMD="{WORKTREE_PATH}/.loop/loop.sh"
# Add --skip-eval if requested # Add --skip-eval if requested
# Add --max N if specified # Add --max N if specified
# Derive tmux session name from worktree directory name
WORKTREE_DIR=$(basename "$WORKTREE_PATH")
SESSION_NAME="agent-loop-${WORKTREE_DIR}"
``` ```
6. Kill any existing agent-loop tmux session, then launch detached: 6. Kill any existing tmux session with this name, then launch detached in the worktree:
```bash ```bash
tmux kill-session -t agent-loop 2>/dev/null; tmux new-session -d -s agent-loop -c <project_root> "$LOOP_CMD" tmux kill-session -t "$SESSION_NAME" 2>/dev/null; tmux new-session -d -s "$SESSION_NAME" -c "$WORKTREE_PATH" "$LOOP_CMD"
``` ```
7. Start a **background watcher** that waits for the loop to finish. Use the Bash tool with `run_in_background: true`: 7. Save the worktree path and session name for the completion handler. Write a tracking file in main's .loop/:
```bash ```bash
while tmux has-session -t agent-loop 2>/dev/null; do sleep 10; done; echo "LOOP_COMPLETE" cat > .loop/.active-worktree << EOF
WORKTREE_PATH={WORKTREE_PATH}
SESSION_NAME={SESSION_NAME}
BRANCH_NAME={BRANCH_NAME}
MAIN_LOOP_DIR={MAIN_LOOP_DIR}
EOF
``` ```
This runs silently. When the tmux session exits, Claude Code gets notified automatically. 8. Start a **background watcher** that waits for the loop to finish. Use the Bash tool with `run_in_background: true`:
8. Tell the user: ```bash
while tmux has-session -t "$SESSION_NAME" 2>/dev/null; do sleep 10; done; echo "LOOP_COMPLETE"
```
> **Loop launched.** Watch it live: 9. Tell the user:
> **Loop launched** as tmux session `{SESSION_NAME}`. Watch it live:
> ``` > ```
> ! tmux attach -t agent-loop > ! tmux attach -t {SESSION_NAME}
> ``` > ```
> (Type the above — it opens the session right here in your terminal.) > (Type the above — it opens the session right here in your terminal.)
> >
@@ -149,6 +237,13 @@ This runs silently. When the tmux session exits, Claude Code gets notified autom
> - Ask me "status" anytime and I'll check progress. > - Ask me "status" anytime and I'll check progress.
> >
> I'll notify you when the loop finishes. > I'll notify you when the loop finishes.
>
> When complete, merge with:
> ```
> git merge {BRANCH_NAME}
> git worktree remove {WORKTREE_PATH}
> git branch -d {BRANCH_NAME}
> ```
--- ---
@@ -156,18 +251,45 @@ This runs silently. When the tmux session exits, Claude Code gets notified autom
When you receive the background task notification (the watcher prints "LOOP_COMPLETE"), the loop has finished. Automatically: When you receive the background task notification (the watcher prints "LOOP_COMPLETE"), the loop has finished. Automatically:
1. Read `.loop/prd.json` — count passed/failed/blocked stories 1. Read the tracking file to get paths:
2. Read `.loop/progress.md` — show the latest session log entries
3. Check `git log --oneline` for commits made during the run ```bash
4. Present a summary: cat .loop/.active-worktree
```
2. Read `{WORKTREE_PATH}/.loop/prd.json` — count passed/failed/blocked stories
3. Read `{WORKTREE_PATH}/.loop/progress.md` — show the latest session log entries
4. Check `git log --oneline` on the feature branch for commits made during the run
5. Archive the run to main's `.loop/archive/`:
```bash
source .loop/lib/state.sh && source .loop/lib/archive.sh && archive_from_worktree "{WORKTREE_PATH}/.loop" "$(pwd)/.loop"
```
6. Clean up the tracking file:
```bash
rm -f .loop/.active-worktree
```
7. Present a summary:
> **Loop Complete** > **Loop Complete**
> - Stories: {passed}/{total} complete, {blocked} blocked > - Stories: {passed}/{total} complete, {blocked} blocked
> - Iterations: {from progress.md} > - Iterations: {from progress.md}
> - Commits: {list from git log} > - Commits: {list from git log}
> - Archived to: `.loop/archive/{date}-{feature}/`
> >
> {If any stories blocked: "Some stories need human review. Run /agent-loop:triage for details."} > {If any stories blocked: "Some stories need human review. Run /agent-loop:triage for details."}
> {If all passed: "All stories complete. Review the code and test it."} > {If all passed: "All stories complete. Review the code and test it."}
>
> **When ready to merge:**
> ```
> git merge {BRANCH_NAME}
> git worktree remove {WORKTREE_PATH}
> git branch -d {BRANCH_NAME}
> ```
--- ---
@@ -175,11 +297,23 @@ When you receive the background task notification (the watcher prints "LOOP_COMP
If the user asks about progress (e.g., "status", "how's it going"): If the user asks about progress (e.g., "status", "how's it going"):
1. Read `.loop/prd.json` — count passed/failed/blocked 1. Check for active worktree tracking:
2. Capture recent tmux output:
```bash ```bash
tmux capture-pane -t agent-loop -p | tail -20 cat .loop/.active-worktree 2>/dev/null
``` ```
3. Report current status. If no tracking file, check for tmux sessions matching the pattern:
```bash
tmux list-sessions 2>/dev/null | grep "^agent-loop-"
```
2. Read `{WORKTREE_PATH}/.loop/prd.json` — count passed/failed/blocked
3. Capture recent tmux output:
```bash
tmux capture-pane -t "$SESSION_NAME" -p | tail -20
```
4. Report current status.

View File

@@ -7,6 +7,8 @@ description: "Generate prd.json and sprint contracts by dispatching the planner
Dispatch the planner agent to decompose a spec into stories. The planner agent cannot write source code or run bash commands — it can only write to `.loop/`. Dispatch the planner agent to decompose a spec into stories. The planner agent cannot write source code or run bash commands — it can only write to `.loop/`.
**Note:** In most cases, use `/agent-loop:run` instead — it handles worktree creation, story generation, and launching the loop in one flow. Use `/agent-loop:stories` only if you want to generate stories without launching the loop.
## Instructions ## Instructions
### 1. Check prerequisites ### 1. Check prerequisites
@@ -40,9 +42,15 @@ Agent(
) )
``` ```
If a worktree path is known (e.g., passed as context), include it in the prompt:
```
IMPORTANT: Write ALL files using absolute paths under: {WORKTREE_PATH}/.loop/
```
### 5. Present results ### 5. Present results
After the planner finishes, read `.loop/prd.json` and show the user: After the planner finishes, read `.loop/prd.json` (or `{WORKTREE_PATH}/.loop/prd.json`) and show the user:
> **Plan Ready — Review Before Running** > **Plan Ready — Review Before Running**
> >