feat: worktree-based run isolation for parallel loops

Each /agent-loop:run now creates a git worktree for the feature branch before generating stories. This provides full isolation: - Multiple loops can run in parallel on different specs in the same project - Main working directory stays on main, always available - Each worktree has its own .loop/ state, tmux session, and branch - Completed runs are archived to main's .loop/archive/ with runs.log Changes: - setup.sh: add --init-worktree mode for initializing worktree .loop/ - archive.sh: add archive_from_worktree() for cross-directory archiving - loop.sh: replace branch checkout with validation (worktree is pre-checked-out) - agents/planner.md: accept absolute path prefix for worktree .loop/ writes - skills/run/SKILL.md: full rewrite — worktree creation in Phase 2, launch in Phase 3, archive on completion, .active-worktree tracking file - skills/stories/SKILL.md: worktree-aware, defer to /run for full flow Bump to 0.12.0.
feat: support parallel loops with per-project tmux session names
2026-04-02 11:21:17 -04:00 · 2026-04-02 10:54:22 -04:00 · 2026-04-02 10:51:48 -04:00 · 2026-04-02 10:42:46 -04:00 · 2026-04-02 09:14:43 -04:00 · 2026-04-02 09:02:41 -04:00
19 changed files with 603 additions and 268 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -10,7 +10,7 @@
      "name": "agent-loop",
      "source": "./",
      "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Plan interactively, then execute with full visibility.",
-      "version": "0.8.0",
+      "version": "0.12.0",
      "author": {
        "name": "Sheldon"
      },
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
  "name": "agent-loop",
-  "version": "0.8.0",
+  "version": "0.12.0",
  "description": "Autonomous generator-evaluator agent loop for long-running coding tasks. Run /agent-loop:run to start.",
  "author": {
    "name": "Sheldon"
--- a/README.md
+++ b/README.md
@@ -8,10 +8,7 @@ A generator-evaluator loop runs fresh Claude Code sessions per iteration. Each i
 ## Install
 ### As a Claude Code Plugin (Recommended)
 ```
 /plugin marketplace add https://git.jagfly.com/sheldon/loop-loop.git
 /plugin install agent-loop@agent-loop
 ```
@@ -23,16 +20,18 @@ Then in any project:
 That's it. The single command handles setup, planning, and execution.
-### Manual Install
+## Prerequisites
-```bash
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) CLI installed
-cp -r /path/to/loop-loop .loop
+- `tmux` available (used to run the loop in a detachable session)
-```
+- `jq` or `python3` (for JSON state management)
 Then run `.loop/loop.sh` directly.
 ## How It Works
 1. Write a spec describing what you want to build (`SPEC.md`, `docs/specs/*.md`, or similar). You can write it yourself, ask Claude to draft one, or use planning tools like `/plan`.
 2. Run `/agent-loop:run` — it scaffolds `.loop/`, generates stories from your spec, and presents them for review
 3. Say "go" — the loop launches in tmux and runs autonomously
 ```
 /agent-loop:run
  ├─ Phase 1: Scaffold .loop/ (if needed)
@@ -50,7 +49,7 @@ Then run `.loop/loop.sh` directly.
 | Mode | What it does | Git writes? |
 |------|-------------|-------------|
-| **implement** | Build features from a PRD | Yes |
+| **implement** | Build features from a spec | Yes |
 | **explore** | Read-only codebase analysis | No |
 | **fix** | Targeted bug fixes / tech debt | Yes |
@@ -73,28 +72,15 @@ Or ask Claude Code "status" — it reads `.loop/prd.json` and `.loop/progress.md
 Each generator and evaluator run is a full Claude Code session saved to history. Use `claude -r` to resume any session and inspect what happened, debug a rejection, or continue from where it left off.
 ## Headless Mode
 For CI or background execution without the interactive UI:
 ```bash
 .loop/loop.sh --headless [options]
 --mode <implement|explore|fix>   Operating mode
 --max <N>                        Maximum iterations (default: 20)
 --skip-eval                      Skip evaluator pass
 --dry-run                        Print assembled prompts without running
 ```
 ## Architecture
 ### Generator
-Fresh Claude Code session each iteration. Reads `prd.json` to find the highest-priority incomplete story, reads the sprint contract, implements the story, runs quality gates, commits, and marks it done.
+Fresh Claude Code session each iteration. Follows a strict startup sequence: reads progress.md, finds the next story from prd.json, reads the sprint contract, checks for evaluator feedback, reviews git history, and runs a smoke test if available — all before writing any code. Then implements the story, runs quality gates, commits, and marks it done.
 ### Evaluator
-Separate fresh session after each generator pass. Skeptically verifies the work: checks acceptance criteria against actual code, runs tests and the application, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback.
+Separate fresh session after each generator pass. Skeptically verifies the work: checks each acceptance criterion against actual code with file paths and line numbers, runs tests, and issues a `PASS` or `REJECT` verdict. Rejection sends the story back with specific feedback.
-Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction.
+Evaluator skepticism is deliberately tuned — Claude's default tendency is to rationalize away issues. The evaluator prompt includes explicit bias correction and few-shot calibration examples.
 ### Sprint Contracts
 Before the loop starts, the planner generates contracts for each story. These define "done" conditions that both generator and evaluator reference, eliminating ambiguity about whether work is complete.
@@ -109,27 +95,9 @@ Before the loop starts, the planner generates contracts for each story. These de
 | `config.json` | Harness configuration |
 | Git commits | Code changes with story-tagged messages |
-## Optional: Runtime Testing Tools
+## Runtime Verification
-The evaluator verifies code actually runs, not just that it looks correct. It uses whatever tools are available. For richer verification, install these optional MCP servers:
+The evaluator doesn't just read diffs — it runs tests, builds the project, and checks for runtime errors using whatever tools the project already has (test runners, linters, build commands).
 **Web projects (Playwright):**
 ```bash
 claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
 ```
 **iOS/Xcode projects (XcodeBuildMCP):**
 ```bash
 brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp
 claude mcp add xcodebuild -- xcodebuildmcp
 ```
 **iOS Simulator interaction:**
 ```bash
 claude mcp add ios-simulator -- npx -y ios-simulator-mcp
 ```
 These are optional — the evaluator works without them but may miss runtime-only issues.
 ## Design Principles
--- a/agents/planner.md
+++ b/agents/planner.md
@@ -11,12 +11,16 @@ You are a planner agent for the agent loop harness. Your job is to decompose a f
 ## CONSTRAINTS
- You may ONLY write files inside the `.loop/` directory
+- You may ONLY write files inside the `.loop/` directory (or the absolute loop directory path if one is provided)
 - You may NOT write any project source code (.js, .ts, .py, .go, .rs, .html, .css, etc.)
 - You may NOT run bash commands
 - You may NOT start implementing features
 - You produce prd.json and contracts, then STOP
 ## OUTPUT DIRECTORY
 If the prompt specifies an absolute path for the loop directory (e.g., "Write all files to /path/to/worktree/.loop/"), use that absolute path for ALL file writes. Otherwise, use the relative `.loop/` path.
 ## YOUR TASK
 You will be given a feature spec or description. Decompose it into stories.
--- a/install.sh
+++ b/install.sh
@@ -102,5 +102,5 @@ echo "  Next steps (inside Claude Code, in any project):"
 echo ""
 echo "    /agent-loop:run      # Single command — setup, plan, and run"
 echo ""
-echo "  Or run headless:    .loop/loop.sh"
+echo "  Or run directly:    .loop/loop.sh"
 echo ""
--- a/lib/archive.sh
+++ b/lib/archive.sh
@@ -1,11 +1,17 @@
 #!/bin/bash
-# Branch archiving — archives previous run artifacts when the branch changes.
+# Run archiving — preserves prd.json, progress.md, and contracts from completed runs.
 # Preserves prd.json, progress.md, and contracts from the previous feature.
 #
-# Design: At the end of each run, snapshot_for_archive saves current artifacts
+# Two archive triggers:
-# to .archive-staging/. On the next run, if the branch changed, check_archive
+#   1. Branch change: check_archive detects a new branch and archives the staged snapshot.
-# moves the snapshot to archive/ and cleans up. This avoids archiving the
+#   2. Completed run: archive_and_reset is called by the /run skill when prd.json shows
-# WRONG artifacts (the new feature's) when prd.json has already been overwritten.
+#      all stories passed (or the branch was deleted). This handles the common workflow
 #      of merging a feature branch back to main and starting a new feature.
 #
 # Archive layout:
 #   .loop/archive/
 #     runs.log                          — one-line-per-run index for quick lookup
 #     2026-03-15-auth-system/           — full artifacts from that run
 #       prd.json, progress.md, contracts/
 LAST_BRANCH_FILE="$LOOP_DIR/.last-branch"
 STAGING_DIR="$LOOP_DIR/.archive-staging"
@@ -85,5 +91,139 @@ archive_run() {
    rm -f "$LOOP_DIR/progress.md"
    rm -rf "$LOOP_DIR/contracts"
    append_runs_log "$branch_name" "$archive_dir"
    log "Archived previous run to $archive_dir"
 }
 # Archive current run artifacts and reset for a new run.
 # Called by the /run skill when a completed run is detected (all stories passed
 # or the feature branch no longer exists). Unlike check_archive (which reads from
 # staging), this archives the LIVE artifacts directly since we know they belong
 # to the completed run.
 archive_and_reset() {
    local loop_dir="${1:-.loop}"
    local prd="$loop_dir/prd.json"
    [ -f "$prd" ] || return 0
    # Read branch name from current prd.json
    local branch_name=""
    if command -v jq &>/dev/null; then
        branch_name=$(jq -r '.branchName // empty' "$prd" 2>/dev/null)
    elif command -v python3 &>/dev/null; then
        branch_name=$(LOOP_PRD="$prd" python3 -c "
 import json, os
 print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
 " 2>/dev/null)
    fi
    local feature_name
    feature_name=$(echo "${branch_name:-unknown}" | sed 's|.*/||')
    local archive_dir="$loop_dir/archive/$(date +%Y-%m-%d)-${feature_name}"
    mkdir -p "$archive_dir"
    # Archive live artifacts
    [ -f "$prd" ] && cp "$prd" "$archive_dir/"
    [ -f "$loop_dir/progress.md" ] && cp "$loop_dir/progress.md" "$archive_dir/"
    [ -f "$loop_dir/progress-archive.md" ] && cp "$loop_dir/progress-archive.md" "$archive_dir/"
    [ -d "$loop_dir/contracts" ] && cp -r "$loop_dir/contracts" "$archive_dir/"
    # Verify archive has content before deleting originals
    if ! find "$archive_dir" -maxdepth 1 -type f | read -r; then
        echo "[archive] WARNING: Archive directory is empty — skipping reset to prevent data loss"
        return 1
    fi
    append_runs_log "$branch_name" "$archive_dir"
    # Reset run-specific files (keep config.json, init.sh, harness files)
    rm -f "$loop_dir/prd.json"
    rm -f "$loop_dir/progress.md"
    rm -f "$loop_dir/progress-archive.md"
    rm -rf "$loop_dir/contracts"
    rm -rf "$loop_dir/.archive-staging"
    rm -f "$loop_dir/.last-branch"
    rm -f "$loop_dir/.verdict"
    echo "[archive] Archived completed run to $archive_dir"
    echo "[archive] .loop/ reset — ready for new stories"
 }
 # Archive a completed run from a worktree back to the main project's .loop/archive/.
 # Called by the /run skill's completion handler after the loop finishes in a worktree.
 #
 # Usage: archive_from_worktree <worktree_loop_dir> <main_loop_dir>
 #   worktree_loop_dir: absolute path to the worktree's .loop/ (source)
 #   main_loop_dir:     absolute path to the main project's .loop/ (destination)
 archive_from_worktree() {
    local wt_loop_dir="$1"
    local main_loop_dir="$2"
    local wt_prd="$wt_loop_dir/prd.json"
    [ -f "$wt_prd" ] || { echo "[archive] WARNING: No prd.json in worktree — nothing to archive"; return 1; }
    # Read branch name from worktree's prd.json
    local branch_name=""
    if command -v jq &>/dev/null; then
        branch_name=$(jq -r '.branchName // empty' "$wt_prd" 2>/dev/null)
    elif command -v python3 &>/dev/null; then
        branch_name=$(LOOP_PRD="$wt_prd" python3 -c "
 import json, os
 print(json.load(open(os.environ['LOOP_PRD'])).get('branchName', ''), end='')
 " 2>/dev/null)
    fi
    local feature_name
    feature_name=$(echo "${branch_name:-unknown}" | sed 's|.*/||')
    local archive_dir="$main_loop_dir/archive/$(date +%Y-%m-%d)-${feature_name}"
    mkdir -p "$archive_dir"
    # Copy artifacts from worktree
    [ -f "$wt_prd" ] && cp "$wt_prd" "$archive_dir/"
    [ -f "$wt_loop_dir/progress.md" ] && cp "$wt_loop_dir/progress.md" "$archive_dir/"
    [ -f "$wt_loop_dir/progress-archive.md" ] && cp "$wt_loop_dir/progress-archive.md" "$archive_dir/"
    [ -d "$wt_loop_dir/contracts" ] && cp -r "$wt_loop_dir/contracts" "$archive_dir/"
    [ -d "$wt_loop_dir/triage" ] && cp -r "$wt_loop_dir/triage" "$archive_dir/"
    # Verify archive has content
    if ! find "$archive_dir" -maxdepth 1 -type f | read -r; then
        echo "[archive] WARNING: Archive directory is empty — copy may have failed"
        return 1
    fi
    append_runs_log "$branch_name" "$archive_dir"
    echo "[archive] Archived worktree run to $archive_dir"
 }
 # Append a one-line summary to the runs log.
 append_runs_log() {
    local branch_name="$1"
    local archive_dir="$2"
    local runs_log
    runs_log="$(dirname "$archive_dir")/runs.log"
    # Read story counts from the archived prd.json
    local total=0 passed=0 blocked=0
    local archived_prd="$archive_dir/prd.json"
    if [ -f "$archived_prd" ]; then
        if command -v jq &>/dev/null; then
            total=$(jq '.userStories | length' "$archived_prd" 2>/dev/null || echo 0)
            passed=$(jq '[.userStories[] | select(.passes == true)] | length' "$archived_prd" 2>/dev/null || echo 0)
            blocked=$(jq '[.userStories[] | select(.blocked == true)] | length' "$archived_prd" 2>/dev/null || echo 0)
        elif command -v python3 &>/dev/null; then
            eval "$(LOOP_PRD="$archived_prd" python3 -c "
 import json, os
 d = json.load(open(os.environ['LOOP_PRD']))
 s = d.get('userStories', [])
 print(f'total={len(s)} passed={sum(1 for x in s if x.get(\"passes\"))} blocked={sum(1 for x in s if x.get(\"blocked\"))}')
 " 2>/dev/null)" || true
        fi
    fi
    printf '%s  %-30s  %s/%s passed  %s blocked\n' \
        "$(date +%Y-%m-%d)" "${branch_name:-unknown}" "$passed" "$total" "$blocked" \
        >> "$runs_log"
 }
--- a/lib/hooks.sh
+++ b/lib/hooks.sh
@@ -7,9 +7,18 @@
 #
 # Without this hook, claude would exit to an interactive prompt instead of
 # returning control to the loop script.
 #
 # IMPORTANT: The hook is scoped to only fire inside the agent-loop tmux session.
 # Without this guard, ANY Claude Code session opened in the same project directory
 # would pick up the hook and kill its own parent shell on exit.
 SETTINGS_FILE="${PROJECT_ROOT}/.claude/settings.local.json"
 # The hook checks AGENT_LOOP_ACTIVE before killing. This env var is exported by
 # loop.sh and inherited by CC sessions it spawns. Interactive CC sessions in the
 # same project won't have it set, so the hook is a no-op for them.
 HOOK_COMMAND='[ "${AGENT_LOOP_ACTIVE:-}" = "1" ] && kill -INT $PPID || true'
 install_hooks() {
    if [ ! -f "$SETTINGS_FILE" ]; then
        mkdir -p "$(dirname "$SETTINGS_FILE")"
@@ -17,14 +26,16 @@ install_hooks() {
    fi
    if command -v jq &>/dev/null; then
-        jq '.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": "kill -INT $PPID || true"}]}]' \
+        jq --arg cmd "$HOOK_COMMAND" \
            '.hooks.Stop = [{"matcher": "", "hooks": [{"type": "command", "command": $cmd}]}]' \
            "$SETTINGS_FILE" > "${SETTINGS_FILE}.tmp" && mv "${SETTINGS_FILE}.tmp" "$SETTINGS_FILE"
    else
-        LOOP_SETTINGS="$SETTINGS_FILE" python3 -c "
+        LOOP_HOOK_CMD="$HOOK_COMMAND" LOOP_SETTINGS="$SETTINGS_FILE" python3 -c "
 import json, os
 p = os.environ['LOOP_SETTINGS']
 cmd = os.environ['LOOP_HOOK_CMD']
 s = json.load(open(p)) if os.path.exists(p) else {}
-s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': 'kill -INT \$PPID || true'}]}]
+s.setdefault('hooks', {})['Stop'] = [{'matcher': '', 'hooks': [{'type': 'command', 'command': cmd}]}]
 json.dump(s, open(p, 'w'), indent=2)
 "
    fi
--- a/loop.sh
+++ b/loop.sh
@@ -13,7 +13,6 @@
 #   --no-hooks                       Don't install stop hooks
 #   --dry-run                        Print assembled prompts without running agents
 #   --resume                         Skip already-passed stories (explicit mode)
 #   --replan                         (reserved — not yet implemented)
 #
 # Each iteration:
 #   1. Generator: picks highest-priority incomplete story, does the work
@@ -81,23 +80,6 @@ if ! command -v jq &>/dev/null && ! command -v python3 &>/dev/null; then
    exit 1
 fi
 # --- macOS timeout compatibility ---
 # macOS doesn't have GNU timeout. Use gtimeout (from coreutils) or a perl fallback.
 if ! command -v timeout &>/dev/null; then
    if command -v gtimeout &>/dev/null; then
        timeout() { gtimeout "$@"; }
    else
        # Perl-based fallback: runs command with alarm signal
        timeout() {
            local duration="$1"; shift
            perl -e '
                alarm shift @ARGV;
                exec @ARGV;
            ' "$duration" "$@"
        }
    fi
 fi
 # --- Load config defaults ---
 CONFIG_FILE="$LOOP_DIR/config.json"
 config_default() { get_config_value "$1" "$2"; }
@@ -122,15 +104,14 @@ while [[ $# -gt 0 ]]; do
        --tool=*) TOOL="${1#*=}"; shift ;;
        --no-hooks) AUTO_HOOKS=false; shift ;;
        --dry-run) DRY_RUN=true; shift ;;
        --headless) export LOOP_HEADLESS=true; shift ;;
        --resume) RESUME=true; shift ;;
        --replan) log "ERROR: --replan is not yet implemented. Use /agent-loop:stories interactively."; exit 1 ;;
        [0-9]*) MAX_ITERATIONS="$1"; shift ;;
        *) log "Unknown option: $1"; exit 1 ;;
    esac
 done
 export ITERATION=0 MAX_ITERATIONS MODE
 export AGENT_LOOP_ACTIVE=1
 # --- Validate ---
 if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
@@ -147,7 +128,6 @@ fi
 cd "$PROJECT_ROOT"
 cleanup() {
    [ -n "${LOOP_AGENT_TMPFILE:-}" ] && rm -f "$LOOP_AGENT_TMPFILE"
    # Remove hooks in case we exit mid-agent (Ctrl+C during a claude session)
    [ "$AUTO_HOOKS" = true ] && remove_hooks 2>/dev/null
    release_lock
@@ -178,10 +158,11 @@ finish() {
    read -r -t 30 2>/dev/null || true
    exit "$exit_code"
 }
-LOOP_AGENT_TMPFILE=""
+# Install Stop hook once at startup. The AGENT_LOOP_ACTIVE env var guard ensures
-
+# it only fires for CC sessions spawned by this loop (not the user's other sessions).
-# NOTE: Stop hook is installed/removed per-agent in run_agent(), not globally.
+# Installing once avoids a race condition where per-iteration install_hooks writes
-# This prevents the hook from killing the orchestrating CC session.
+# settings.local.json just before CC starts, and CC reads the old file.
 [ "$AUTO_HOOKS" = true ] && install_hooks
 trap cleanup EXIT INT TERM
 check_archive
@@ -200,12 +181,14 @@ if [ -f "$LOOP_DIR/init.sh" ]; then
    bash "$LOOP_DIR/init.sh"
 fi
-# Ensure correct git branch
+# Verify we're on the expected branch (worktree should already be on it)
 BRANCH=$(prd_branch_name 2>/dev/null || echo "")
 if [ -n "$BRANCH" ]; then
    CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
    if [ "$CURRENT_BRANCH" != "$BRANCH" ]; then
-        log "Switching to branch: $BRANCH"
+        log "WARNING: Expected branch '$BRANCH' but on '$CURRENT_BRANCH'"
        log "If running in a worktree, the branch should already be checked out."
        log "Attempting to switch..."
        git checkout "$BRANCH" 2>/dev/null || \
            git checkout -b "$BRANCH" "origin/$BRANCH" 2>/dev/null || \
            git checkout -b "$BRANCH"
@@ -215,14 +198,10 @@ fi
 # --- Agent runner ---
 # Runs a prompt through the selected AI tool.
 #
-# Interactive (default): Pipes prompt to claude WITHOUT --print.
+# Pipes prompt to claude WITHOUT --print. This gives the full interactive
-#   This gives the full interactive CC UI — tool calls, file edits, etc.
+# CC UI — tool calls, file edits, etc. A Stop hook sends SIGINT to the loop
-#   A Stop hook (installed at startup) sends SIGINT to the loop when claude
+# when claude finishes, returning control to the while loop for the next
-#   finishes, which returns control to the while loop for the next iteration.
+# iteration. State is tracked via files (prd.json, .verdict), not stdout.
 #   State is tracked via files (prd.json, .verdict), not stdout.
 #
 # Headless (LOOP_HEADLESS=true): Uses claude --print for CI/background.
 #   Output captured to file for verdict parsing.
 run_agent() {
    local prompt="$1"
    local role="${2:-}"
@@ -230,65 +209,26 @@ run_agent() {
    rm -f "$LOOP_DIR/.verdict"
    local agent_exit=0
    if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
        # --- Interactive mode (Ralph pattern) ---
        # Install Stop hook just before claude starts, remove after it exits.
        # This scopes the hook to only affect the loop's claude sessions.
        [ "$AUTO_HOOKS" = true ] && install_hooks
-        (
+    (
-            case "$TOOL" in
+        case "$TOOL" in
-                claude)
+            claude)
-                    printf '%s\n' "$prompt" | claude --dangerously-skip-permissions
+                printf '%s\n' "$prompt" | claude --dangerously-skip-permissions
-                    ;;
+                ;;
-                amp)
+            amp)
-                    printf '%s\n' "$prompt" | amp --dangerously-allow-all
+                printf '%s\n' "$prompt" | amp --dangerously-allow-all
-                    ;;
+                ;;
-                *)
+            *)
-                    log "ERROR: Unknown tool '$TOOL'"
+                log "ERROR: Unknown tool '$TOOL'"
-                    exit 1
+                exit 1
-                    ;;
+                ;;
-            esac
+        esac
-        ) || agent_exit=$?
+    ) || agent_exit=$?
    sleep 2  # Brief pause between sessions
-        [ "$AUTO_HOOKS" = true ] && remove_hooks
+    # Read verdict from file if evaluator wrote one
-        sleep 2  # Brief pause between sessions
+    if [ "$role" = "evaluator" ] && [ -f "$LOOP_DIR/.verdict" ]; then
-
+        cat "$LOOP_DIR/.verdict"
        # Read verdict from file if evaluator wrote one
        if [ "$role" = "evaluator" ] && [ -f "$LOOP_DIR/.verdict" ]; then
            cat "$LOOP_DIR/.verdict"
        fi
    else
        # --- Headless mode ---
        local output_file
        output_file=$(mktemp)
        LOOP_AGENT_TMPFILE="$output_file"
        (
            case "$TOOL" in
                claude)
                    printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
                        claude --dangerously-skip-permissions --output-format text \
                        --print > "$output_file" 2>&1
                    ;;
                amp)
                    printf '%s\n' "$prompt" | timeout "${LOOP_AGENT_TIMEOUT:-600}" \
                        amp --dangerously-allow-all > "$output_file" 2>&1
                    ;;
                *)
                    log "ERROR: Unknown tool '$TOOL'"
                    exit 1
                    ;;
            esac
        ) || agent_exit=$?
        if [ "$agent_exit" -ne 0 ] && [ ! -s "$output_file" ]; then
            log "WARNING: Agent exited with code $agent_exit and produced no output."
        fi
        cat "$output_file"
        rm -f "$output_file"
        LOOP_AGENT_TMPFILE=""
    fi
 }
@@ -371,18 +311,7 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
        exit 0
    fi
-    if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
+    run_agent "$GENERATOR_PROMPT" "generator"
        # Interactive: run directly, no capture. User sees full CC UI.
        run_agent "$GENERATOR_PROMPT" "generator"
        GENERATOR_OUTPUT=""
    else
        # Headless: capture output for parsing.
        GENERATOR_OUTPUT=$(run_agent "$GENERATOR_PROMPT" "generator")
        if [ -z "$GENERATOR_OUTPUT" ]; then
            log "WARNING: Generator produced empty output (timeout or crash). Skipping to next iteration."
            continue
        fi
    fi
    # --- Scope budget check ---
    # Verify the generator stayed within configured limits (files modified, lines written).
@@ -419,22 +348,12 @@ while [ "$ITERATION" -lt "$MAX_ITERATIONS" ]; do
        EVAL_PROMPT=$(build_prompt "evaluator" "$MODE")
-        if [ "${LOOP_HEADLESS:-false}" != "true" ]; then
+        run_agent "$EVAL_PROMPT" "evaluator"
-            # Interactive: run directly, read verdict from file.
+        if [ -f "$LOOP_DIR/.verdict" ]; then
-            run_agent "$EVAL_PROMPT" "evaluator"
+            EVAL_OUTPUT=$(cat "$LOOP_DIR/.verdict")
            if [ -f "$LOOP_DIR/.verdict" ]; then
                EVAL_OUTPUT=$(cat "$LOOP_DIR/.verdict")
            else
                log "WARNING: No verdict file found. Treating as REJECT."
                EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no verdict file</rejection_reason>"
            fi
        else
-            # Headless: capture output for parsing.
+            log "WARNING: No verdict file found. Treating as REJECT."
-            EVAL_OUTPUT=$(run_agent "$EVAL_PROMPT" "evaluator")
+            EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no verdict file</rejection_reason>"
            if [ -z "$EVAL_OUTPUT" ]; then
                log "WARNING: Evaluator produced empty output. Treating as REJECT."
                EVAL_OUTPUT="<verdict>REJECT</verdict><rejection_reason>Evaluator produced no output</rejection_reason>"
            fi
        fi
        VERDICT=$(parse_verdict "$EVAL_OUTPUT")
--- a/prompts/evaluator/_base.md
+++ b/prompts/evaluator/_base.md
@@ -10,7 +10,7 @@ You (Claude) have well-documented tendencies that make you a poor QA agent by de
 **OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.
-**Rejection is normal and healthy.** Rejecting 30-50% of iterations is expected.
+**Rejection is normal and healthy.** Do not hesitate to reject when criteria aren't met.
 ## Your Target
@@ -27,6 +27,36 @@ Evaluate story **`{{CURRENT_STORY_ID}}`**.
 7. Run quality checks yourself (typecheck, tests, lint)
 8. **Actually run the code.** Use whatever tools are available. Code that looks correct but doesn't run is not complete.
 ## Calibration Examples
 <example type="bad-evaluation">
 "The generator created the new module and updated the config. The code looks clean and follows the existing pattern. Tests were not run but the implementation appears correct. PASS."
 Why this is wrong: "appears correct" is not verification. The evaluator didn't run tests, didn't check that the new module is actually imported and used, and didn't read the modified files in full. This is a rubber stamp.
 </example>
 <example type="good-rejection">
 "Checked acceptance criteria. Criterion 3 says 'both files import the shared utility instead of defining their own'. Verified file A — correct. Checked file B — still defines a local copy at line 36 and does not import the shared one. Also: file B line 96 calls a function from a module whose import was removed during the refactoring — this will crash at runtime.
 REJECT: File B still has local duplicate (criterion 3 not met) and missing import will cause runtime error."
 Why this is good: Verified each criterion against actual code with file paths and line numbers. Caught a regression the generator introduced. Specific and actionable.
 </example>
 <example type="good-pass">
 "Checked all 4 acceptance criteria:
 1. New validation logic is active — verified at config.py:23-28. ✓
 2. Invalid input returns the expected error — verified at config.py:26. ✓
 3. Old workaround removed — grep returns zero matches. ✓
 4. Existing behavior unchanged — logic only triggers on the new condition. ✓
 Ran git diff: only 2 files modified, changes scoped to this story. No imports removed, no regressions in surrounding code.
 PASS."
 Why this is good: Each criterion checked against specific lines. Verified no collateral damage. Concise but thorough.
 </example>
 ## Verdict
 Write your verdict to `{{LOOP_DIR}}/.verdict` AND include it in your response.
--- a/prompts/evaluator/explore.md
+++ b/prompts/evaluator/explore.md
@@ -37,7 +37,7 @@ Claims Verified:
 ## Grading Criteria
- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed)
+- **Accuracy**: Are the majority of verified claims correct? If more than one claim is incorrect, reject.
 - **Completeness**: Did it cover the important parts of the area?
 - **Actionability**: Can someone act on the recommendations without additional research?
--- a/prompts/evaluator/fix.md
+++ b/prompts/evaluator/fix.md
@@ -9,8 +9,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
   - Would this fix survive edge cases?
   - Did the generator patch around the bug or fix the actual cause?
-2. **Verify a regression test exists:**
+2. **If the acceptance criteria require a regression test, verify it exists:**
   - Is there a new or updated test?
   - Does the test actually reproduce the original bug scenario?
   - Would the test fail if the fix were reverted?
@@ -27,7 +26,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
 ## Rejection Criteria (Fix-Specific)
 - Fix addresses symptom but not root cause
- No regression test added
+- Acceptance criteria require a regression test but none was added
 - Existing tests fail after the fix
 - Unrelated changes included in the commit
 - Fix introduces a new bug or security issue
--- a/prompts/evaluator/implement.md
+++ b/prompts/evaluator/implement.md
@@ -15,3 +15,6 @@ You are evaluating an implementation story. The generator claims to have built a
 - Tests exist but don't assert meaningful behavior
 - Passes typecheck only because types are overly loose
 - Code exists but doesn't actually run
 - Removed an import or variable during refactoring but it's still used elsewhere in the file
 - New instance of a shared resource (e.g., DB connection, rate limiter) instead of using the existing one
 - Internal error details (stack traces, exception messages) exposed in user-facing output instead of being logged server-side
--- a/prompts/generator/_base.md
+++ b/prompts/generator/_base.md
@@ -1,24 +1,46 @@
 You are a Generator agent in an autonomous agent loop. Each iteration you complete ONE task, then stop. A fresh instance runs each iteration — you have no memory except what's in artifacts.
-## Startup
+## Startup (follow this exact sequence before writing any code)
 1. Read `.loop/progress.md` — check Codebase Patterns first, then recent log entries
 2. Read `.loop/prd.json` — find the highest-priority story where `passes: false`
 3. Read the sprint contract at `.loop/contracts/{story-id}.contract.md` (if it exists)
 4. Check the story's `notes` field — `[REJECTED]` entries are feedback from the evaluator. Address them.
 5. Run `git log --oneline -10` — understand what previous iterations changed
 6. If the project has tests or a dev server, run a quick smoke test to verify the codebase is healthy. If a previous iteration broke something, fix it before moving on.
 Do NOT start implementation until steps 1-5 are complete.
 ## Rules
 - **ONE story per iteration.** Do not attempt multiple stories.
 - **Read before writing.** Understand existing code before modifying.
 - **No placeholders.** Every implementation must be complete and functional.
- **Run quality gates** before committing (typecheck, tests, lint — whatever the project uses).
+- **Run quality gates** before committing. Check for common tools (`npm test`, `pytest`, `cargo test`, `make test`, `go test ./...`) and run what's available. If no test tooling exists, verify manually.
 - **Commit** with message: `feat: [Story ID] - [Story Title]`
-## After Completing
+## If You Are Blocked
 If you cannot complete the story (missing dependency, impossible as written, requires access you don't have), do NOT attempt a partial or broken implementation. Instead:
 1. Write a clear description of the blocker in the story's `notes` field in prd.json
 2. Leave `passes` as `false`
 3. Append the blocker to progress.md
 4. Stop — the loop will move on or escalate to a human
 ## Do Not Modify
 - Other stories' `passes`, `notes`, or `acceptanceCriteria` fields — only modify the story you are working on
 - Sprint contracts in `.loop/contracts/`
 - `.loop/config.json`
 ## Before Marking Done
 Go through each acceptance criterion in the story and verify your work satisfies it. Check the actual code, not your memory of what you wrote. If any criterion is not met, fix it before continuing. Do NOT set `passes: true` until every criterion is verified.
 ## After Verified
 1. Update `.loop/prd.json` — set `passes: true` for the story
-2. Append a summary to `.loop/progress.md` — what was done, files changed, learnings
+2. Append a summary to `.loop/progress.md` — what was done and which files were changed
 3. Update Codebase Patterns in progress.md if you discovered a reusable pattern
 ## Completion Signal
--- a/prompts/generator/fix.md
+++ b/prompts/generator/fix.md
@@ -8,7 +8,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is
 2. Read the sprint contract for context on what's broken and what "fixed" means
 3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists.
 4. Make the minimal change to fix the issue
-5. Write or update a test that would have caught this bug
+5. If the story's acceptance criteria require a regression test, write one
 6. Run quality gates
 7. Commit
@@ -16,7 +16,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is
 - **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations.
 - **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions.
- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md.
+- **Add a regression test only if the acceptance criteria require it.** Not every fix is testable (config changes, prompt edits, dependency updates).
 - **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve.
 ## Git Workflow
--- a/prompts/generator/implement.md
+++ b/prompts/generator/implement.md
@@ -16,7 +16,7 @@ You are building features from a PRD. Each story is a small, self-contained unit
 - **Minimal changes only.** Do not refactor surrounding code or add features beyond scope.
 - **Follow the contract's Out of Scope section.**
- **If tests don't exist yet,** write them as part of the story.
+- **Write tests only if the story's acceptance criteria require them.**
 - **If you need a dependency,** install it and note it in progress.md.
 ## Git
--- a/prompts/planner/plan.md
+++ b/prompts/planner/plan.md
@@ -9,26 +9,30 @@ When breaking a feature into stories, think about:
 ### Independence
 Each story should be independently deployable. After completing story N, the codebase should be in a valid, working state — even if the feature isn't fully built yet.
-### Context Window Fit
+### Scope
-A story must fit in a single AI context window (~100K tokens). This means:
+A story must be completable in a single iteration. Keep each story focused — a handful of files modified, not a sweeping change across the whole codebase. If a story requires reading and modifying more than ~10 files, it's too big — split it.
 - Reading relevant existing code
 - Understanding the task
 - Implementing the change
 - Writing tests
 - Running quality checks
 - Committing
 Budget roughly:
 - 30% of context for reading/understanding
 - 40% for implementation
 - 20% for testing and quality
 - 10% for bookkeeping (prd.json, progress.md)
 ### Failure Isolation
 If a story fails (evaluator rejects it), the next iteration should be able to retry it cleanly. Stories with too many moving parts are hard to retry because partial state is messy.
 ### Evaluability
-Every story must have criteria the evaluator can independently verify. "The code is clean" is not evaluable. "The function returns 404 when the user doesn't exist" is evaluable.
+Every story must have criteria the evaluator can independently verify by reading code, running commands, or testing behavior.
 Good criteria are specific and checkable:
 - "Grep for 'HARDCODED_KEY' returns zero matches"
 - "The function returns 404 when the user doesn't exist"
 - "Running `npm test` passes with no failures"
 - "The config file contains entries for all three required env vars"
 Bad criteria are vague with no way to check:
 - "The code is clean"
 - "Works correctly"
 - "Performance is improved"
 - "Error handling is robust"
 For subjective work (design, UX, documentation), criteria should define what to evaluate and how to judge it — not just say "looks good":
 - "Design uses a consistent color palette and typography — no default library styles"
 - "A user can complete the primary action without guessing what to click"
 ## PRD Anti-Patterns
--- a/setup.sh
+++ b/setup.sh
@@ -10,10 +10,31 @@
 set -euo pipefail
 # --- Parse arguments ---
 ACTION="scaffold"
 MODE="${1:-implement}"
 WORKTREE_PATH=""
 MAIN_LOOP_DIR=""
 case "$MODE" in
    --update)
        ACTION="update"
        MODE="${2:-implement}"
        ;;
    --init-worktree)
        ACTION="init-worktree"
        WORKTREE_PATH="$2"
        MAIN_LOOP_DIR="$3"
        if [ -z "$WORKTREE_PATH" ] || [ -z "$MAIN_LOOP_DIR" ]; then
            echo "[setup] ERROR: --init-worktree requires <worktree_path> <main_loop_dir>"
            echo "[setup] Usage: setup.sh --init-worktree /path/to/worktree /path/to/main/.loop"
            exit 1
        fi
        ;;
 esac
 # --- Validate mode ---
-if [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
+if [ "$ACTION" = "scaffold" ] && [[ ! "$MODE" =~ ^(implement|explore|fix)$ ]]; then
    echo "[setup] ERROR: Invalid mode '$MODE'. Must be: implement, explore, fix"
    exit 1
 fi
@@ -45,6 +66,65 @@ fi
 echo "[setup] Harness source: $HARNESS_SRC"
 # Read plugin version from source
 PLUGIN_VERSION=""
 if [ -f "$HARNESS_SRC/.claude-plugin/plugin.json" ]; then
    if command -v jq &>/dev/null; then
        PLUGIN_VERSION=$(jq -r '.version // empty' "$HARNESS_SRC/.claude-plugin/plugin.json" 2>/dev/null)
    elif command -v python3 &>/dev/null; then
        PLUGIN_VERSION=$(python3 -c "import json; print(json.load(open('$HARNESS_SRC/.claude-plugin/plugin.json')).get('version',''),end='')" 2>/dev/null)
    fi
 fi
 # --- Update-only mode: refresh harness files without touching run state ---
 if [ "$ACTION" = "update" ]; then
    LOOP_DIR="$(pwd)/.loop"
    if [ ! -d "$LOOP_DIR" ]; then
        echo "[setup] ERROR: No .loop/ directory found. Run setup first."
        exit 1
    fi
    echo "[setup] Updating harness files..."
    cp -r "$HARNESS_SRC/prompts"    "$LOOP_DIR/"
    cp -r "$HARNESS_SRC/templates"  "$LOOP_DIR/"
    cp -r "$HARNESS_SRC/lib"        "$LOOP_DIR/"
    cp    "$HARNESS_SRC/loop.sh"    "$LOOP_DIR/"
    chmod +x "$LOOP_DIR/loop.sh"
    [ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
    echo "[setup] Harness updated to ${PLUGIN_VERSION:-unknown}. Run state (prd.json, contracts, config.json) unchanged."
    exit 0
 fi
 # --- Init-worktree mode: initialize .loop/ in a worktree from main's config ---
 if [ "$ACTION" = "init-worktree" ]; then
    LOOP_DIR="$WORKTREE_PATH/.loop"
    mkdir -p "$LOOP_DIR"
    # Copy harness files from plugin source
    cp -r "$HARNESS_SRC/prompts"    "$LOOP_DIR/"
    cp -r "$HARNESS_SRC/templates"  "$LOOP_DIR/"
    cp -r "$HARNESS_SRC/lib"        "$LOOP_DIR/"
    cp    "$HARNESS_SRC/loop.sh"    "$LOOP_DIR/"
    chmod +x "$LOOP_DIR/loop.sh"
    # Copy project config and init from main's .loop/
    [ -f "$MAIN_LOOP_DIR/config.json" ] && cp "$MAIN_LOOP_DIR/config.json" "$LOOP_DIR/"
    [ -f "$MAIN_LOOP_DIR/init.sh" ] && cp "$MAIN_LOOP_DIR/init.sh" "$LOOP_DIR/"
    # Stamp harness version
    [ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
    # Create .gitignore for worktree's .loop/
    cat > "$LOOP_DIR/.gitignore" << 'GITIGNORE'
 *
 GITIGNORE
    echo "[setup] Worktree .loop/ initialized at $LOOP_DIR"
    exit 0
 fi
 # --- Ensure git repo exists ---
 if ! git rev-parse --git-dir &>/dev/null; then
    echo "[setup] No git repo found. Initializing..."
@@ -57,9 +137,17 @@ PROJECT_ROOT="$(pwd)"
 LOOP_DIR="$PROJECT_ROOT/.loop"
 if [ -d "$LOOP_DIR" ] && [ -f "$LOOP_DIR/prd.json" ]; then
-    echo "[setup] .loop/ already exists with prd.json."
+    echo "[setup] .loop/ already exists with prd.json — archiving previous run..."
-    echo "[setup] To re-initialize, delete .loop/ first: rm -rf .loop"
+    # Source state.sh (needed by archive.sh for story queries) and archive.sh
-    exit 1
+    LOOP_DIR="$LOOP_DIR" source "$LOOP_DIR/lib/state.sh" 2>/dev/null || true
    LOOP_DIR="$LOOP_DIR" source "$LOOP_DIR/lib/archive.sh" 2>/dev/null || true
    if type archive_and_reset &>/dev/null; then
        archive_and_reset "$LOOP_DIR"
    else
        # Fallback for old harness versions without archive_and_reset
        echo "[setup] WARNING: Could not archive (old harness version). To re-initialize, delete .loop/ first: rm -rf .loop"
        exit 1
    fi
 fi
 mkdir -p "$LOOP_DIR"
@@ -71,6 +159,9 @@ cp -r "$HARNESS_SRC/lib"        "$LOOP_DIR/"
 cp    "$HARNESS_SRC/loop.sh"    "$LOOP_DIR/"
 chmod +x "$LOOP_DIR/loop.sh"
 # Stamp harness version
 [ -n "$PLUGIN_VERSION" ] && echo "$PLUGIN_VERSION" > "$LOOP_DIR/.harness-version"
 # Verify critical files
 for f in prompts/generator/_base.md prompts/evaluator/_base.md templates/progress.md.template lib/state.sh loop.sh; do
    if [ ! -f "$LOOP_DIR/$f" ]; then
@@ -91,6 +182,8 @@ triage/
 archive/
 .archive-staging/
 .last-branch
 .harness-version
 .active-worktree
 .loop.lock
 GITIGNORE
--- a/skills/run/SKILL.md
+++ b/skills/run/SKILL.md
@@ -1,16 +1,18 @@
 ---
 name: run
-description: "Agent Loop — single entry point. Scaffolds .loop/ if missing, generates stories if no prd.json, then launches autonomous execution in tmux."
+description: "Agent Loop — single entry point. Scaffolds .loop/ if missing, creates a worktree, generates stories, then launches autonomous execution in tmux."
 ---
 # /run — Agent Loop
-Single entry point for the agent loop. Handles setup and planning interactively, then launches autonomous execution in a tmux session.
+Single entry point for the agent loop. Handles setup and planning interactively, then launches autonomous execution in a git worktree via tmux.
 Each run gets its own worktree (isolated working directory on a feature branch). Multiple loops can run in parallel on different specs. Completed runs are archived to the main project's `.loop/archive/`.
 ## Usage
 ```
-/agent-loop:run                    # Full flow: setup → stories → launch
+/agent-loop:run                    # Full flow: setup → worktree → stories → launch
 /agent-loop:run --skip-eval        # Skip evaluator pass
 ```
@@ -20,9 +22,9 @@ Follow this sequence. Each phase checks what exists and skips if already done.
 ---
-## Phase 1: Scaffold (if needed)
+## Phase 1: Scaffold Main .loop/ (if needed)
-Check if `.loop/config.json` exists.
+Check if `.loop/config.json` exists in the current project root.
 **If it does NOT exist**, run the setup script:
@@ -31,116 +33,202 @@ Ask the user: **Mode?** (a) Implement (b) Explore (c) Fix — default is Impleme
 Then run:
 ```bash
-bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | head -1)" <mode>
+bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" <mode>
 ```
 Show the output. If setup fails, stop.
-**If it already exists**, skip to Phase 2.
+**If it already exists**, check if the harness files need updating. Compare the installed harness version against the plugin version:
 ```bash
 INSTALLED=$(cat .loop/.harness-version 2>/dev/null || echo "unknown")
 PLUGIN=$(jq -r '.version // empty' "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/.claude-plugin/plugin.json 2>/dev/null | tail -1)" 2>/dev/null || echo "unknown")
 echo "installed=$INSTALLED plugin=$PLUGIN"
 ```
 If the versions differ (or installed is "unknown"), update the harness files:
 ```bash
 bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" --update
 ```
 Tell the user: *"Updated harness files to v{version}."*
 ---
-## Phase 2: Generate Stories (if needed)
+## Phase 2: Create Worktree and Generate Stories
-Check if `.loop/prd.json` exists.
+### 2a. Find the spec
-**If it does NOT exist**, generate it:
+Search for existing specs or plans:
 - `docs/superpowers/specs/*.md`
 - `docs/superpowers/plans/*.md`
 - `docs/specs/*.md`
 - `docs/plans/*.md`
 - `SPEC.md`, `PRD.md`, `DESIGN.md`, `PLAN.md` at project root
 - Any markdown file that looks like a feature spec or implementation plan
-1. Search for existing specs or plans:
+If found: "I found a spec at `{path}`. Using it to generate stories."
   - `docs/superpowers/specs/*.md`
   - `docs/superpowers/plans/*.md`
   - `docs/specs/*.md`
   - `docs/plans/*.md`
   - `SPEC.md`, `PRD.md`, `DESIGN.md`, `PLAN.md` at project root
   - Any markdown file that looks like a feature spec or implementation plan
-   If found: "I found a spec at `{path}`. Using it to generate stories."
+If NOT found, stop and tell the user:
-   If NOT found, stop and tell the user:
+> **No spec or plan found.** Agent Loop decomposes existing plans into stories — it doesn't create plans from scratch.
 >
 > Create a plan first, then re-run `/agent-loop:run`:
 > - Describe your idea to Claude and ask it to write a spec
 > - Use `/plan` if available
 > - Or create a markdown file at `docs/specs/` or `SPEC.md`
 >
 > The plan should describe what to build, the tech stack, and key requirements.
-   > **No spec or plan found.** Agent Loop decomposes existing plans into stories — it doesn't create plans from scratch.
+**STOP here. Do NOT ask the user to describe the project in a few sentences. Do NOT proceed without a spec file.**
   >
   > Create a plan first, then re-run `/agent-loop:run`:
   > - Describe your idea to Claude and ask it to write a spec
   > - Use `/plan` if available
   > - Or create a markdown file at `docs/specs/` or `SPEC.md`
   >
   > The plan should describe what to build, the tech stack, and key requirements.
-   **STOP here. Do NOT ask the user to describe the project in a few sentences. Do NOT proceed without a spec file.**
+### 2b. Derive names and create worktree
-2. Read the project root and tech stack info.
+Read the spec title or filename to derive a feature slug. Examples:
 - `SPEC.md` with title "# Enhanced Spikes Editor" → slug: `enhanced-spikes-editor`
 - `docs/specs/auth-system.md` → slug: `auth-system`
-3. Dispatch the **agent-loop:planner** agent:
+Derive paths:
 ```bash
 PROJECT_DIR=$(basename "$(pwd)")
 FEATURE_SLUG="<derived-slug>"
 BRANCH_NAME="loop/${FEATURE_SLUG}"
 WORKTREE_PATH="../${PROJECT_DIR}--loop-${FEATURE_SLUG}"
 MAIN_LOOP_DIR="$(pwd)/.loop"
 ```
 Check if the worktree already exists:
 ```bash
 if [ -d "$WORKTREE_PATH" ]; then
    echo "WORKTREE_EXISTS"
 else
    echo "WORKTREE_NEW"
 fi
 ```
 **If worktree exists**, check its state:
 - Read `{WORKTREE_PATH}/.loop/prd.json` — are all stories passed?
 - If all passed: ask user — "Previous run on `{BRANCH_NAME}` is complete. Archive and start fresh, or resume?"
 - If in progress: ask user — "Run in progress on `{BRANCH_NAME}` ({passed}/{total}). Resume, or archive and start fresh?"
 - If user says resume: skip to Phase 3 (launch in existing worktree)
 - If user says archive/fresh: archive from worktree to main, remove worktree, then continue below
 **If worktree is new**, create it:
 ```bash
 git worktree add "$WORKTREE_PATH" -b "$BRANCH_NAME"
 ```
 If the branch already exists (e.g., from a previous run):
 ```bash
 git worktree add "$WORKTREE_PATH" "$BRANCH_NAME"
 ```
 Initialize the worktree's `.loop/`:
 ```bash
 bash "$(ls -d ~/.claude/plugins/cache/agent-loop/agent-loop/*/setup.sh 2>/dev/null | tail -1)" --init-worktree "$WORKTREE_PATH" "$MAIN_LOOP_DIR"
 ```
 Initialize submodules if the project uses them:
 ```bash
 git -C "$WORKTREE_PATH" submodule update --init --recursive 2>/dev/null || true
 ```
 ### 2c. Generate stories
 Read the project root listing and tech stack info.
 Dispatch the **agent-loop:planner** agent. Pass the **absolute worktree path** so the planner writes to the worktree's `.loop/`:
 ```
 Agent(
  subagent_type: "agent-loop:planner",
-  prompt: "Generate prd.json and sprint contracts.\n\nMode: {mode}\nProject root: {path}\n\nSpec:\n{spec content}\n\nTech stack: {detected stack}",
+  prompt: "Generate prd.json and sprint contracts.\n\nIMPORTANT: Write ALL files using absolute paths under: {WORKTREE_PATH}/.loop/\n- PRD: {WORKTREE_PATH}/.loop/prd.json\n- Contracts: {WORKTREE_PATH}/.loop/contracts/\n- Progress: {WORKTREE_PATH}/.loop/progress.md\n\nBranch name to use in prd.json: {BRANCH_NAME}\n\nMode: {mode}\nProject root: {WORKTREE_PATH}\n\nSpec:\n{spec content}\n\nTech stack: {detected stack}",
  description: "Planning: generate stories"
 )
 ```
-4. After the planner finishes, read `.loop/prd.json` and present:
+### 2d. Present stories
 After the planner finishes, read `{WORKTREE_PATH}/.loop/prd.json` and present:
 > **Stories generated — Review before running**
 >
 > Worktree: `{WORKTREE_PATH}` (branch: `{BRANCH_NAME}`)
 >
 > 1. US-001: {title}
 > 2. US-002: {title}
 > ...
 >
 > **Review:**
-> - `.loop/prd.json` — stories and acceptance criteria
+> - `{WORKTREE_PATH}/.loop/prd.json` — stories and acceptance criteria
-> - `.loop/contracts/` — done conditions per story
+> - `{WORKTREE_PATH}/.loop/contracts/` — done conditions per story
 >
 > Let me know if you want changes, or say **go** to start the loop.
-5. **STOP and wait for the user.** Do NOT start the loop automatically. The user must say "go", "start", "run", "looks good", or similar before proceeding to Phase 3.
+**STOP and wait for the user.** Do NOT start the loop automatically. The user must say "go", "start", "run", "looks good", or similar before proceeding to Phase 3.
 **If `prd.json` already exists**, skip to Phase 3.
 ---
 ## Phase 3: Validate and Launch
-1. Read `.loop/prd.json` and verify:
+1. Read `{WORKTREE_PATH}/.loop/prd.json` and verify:
   - Has a `userStories` array (NOT `sprints`, `stories`, or `tasks`)
   - Each story has: `id`, `title`, `passes`, `priority`
   - If invalid, show the error and stop.
-2. Read `.loop/config.json` for `mode`, `maxIterations`.
+2. Read `{WORKTREE_PATH}/.loop/config.json` for `mode`, `maxIterations`.
-3. Verify `.loop/loop.sh` exists and is executable.
+3. Verify `{WORKTREE_PATH}/.loop/loop.sh` exists and is executable.
 4. Parse arguments for any flags to pass through (e.g., `--skip-eval`).
-5. Build the loop.sh command with any flags:
+5. Build the loop.sh command and derive a unique tmux session name:
 ```bash
-LOOP_CMD=".loop/loop.sh"
+LOOP_CMD="{WORKTREE_PATH}/.loop/loop.sh"
 # Add --skip-eval if requested
 # Add --max N if specified
 # Derive tmux session name from worktree directory name
 WORKTREE_DIR=$(basename "$WORKTREE_PATH")
 SESSION_NAME="agent-loop-${WORKTREE_DIR}"
 ```
-6. Kill any existing agent-loop tmux session, then launch detached:
+6. Kill any existing tmux session with this name, then launch detached in the worktree:
 ```bash
-tmux kill-session -t agent-loop 2>/dev/null; tmux new-session -d -s agent-loop -c <project_root> "$LOOP_CMD"
+tmux kill-session -t "$SESSION_NAME" 2>/dev/null; tmux new-session -d -s "$SESSION_NAME" -c "$WORKTREE_PATH" "$LOOP_CMD"
 ```
-7. Start a **background watcher** that waits for the loop to finish. Use the Bash tool with `run_in_background: true`:
+7. Save the worktree path and session name for the completion handler. Write a tracking file in main's .loop/:
 ```bash
-while tmux has-session -t agent-loop 2>/dev/null; do sleep 10; done; echo "LOOP_COMPLETE"
+cat > .loop/.active-worktree << EOF
 WORKTREE_PATH={WORKTREE_PATH}
 SESSION_NAME={SESSION_NAME}
 BRANCH_NAME={BRANCH_NAME}
 MAIN_LOOP_DIR={MAIN_LOOP_DIR}
 EOF
 ```
-This runs silently. When the tmux session exits, Claude Code gets notified automatically.
+8. Start a **background watcher** that waits for the loop to finish. Use the Bash tool with `run_in_background: true`:
-8. Tell the user:
+```bash
 while tmux has-session -t "$SESSION_NAME" 2>/dev/null; do sleep 10; done; echo "LOOP_COMPLETE"
 ```
-> **Loop launched.** Watch it live:
+9. Tell the user:
 > **Loop launched** as tmux session `{SESSION_NAME}`. Watch it live:
 > ```
-> ! tmux attach -t agent-loop
+> ! tmux attach -t {SESSION_NAME}
 > ```
 > (Type the above — it opens the session right here in your terminal.)
 >
@@ -149,6 +237,13 @@ This runs silently. When the tmux session exits, Claude Code gets notified autom
 > - Ask me "status" anytime and I'll check progress.
 >
 > I'll notify you when the loop finishes.
 >
 > When complete, merge with:
 > ```
 > git merge {BRANCH_NAME}
 > git worktree remove {WORKTREE_PATH}
 > git branch -d {BRANCH_NAME}
 > ```
 ---
@@ -156,18 +251,45 @@ This runs silently. When the tmux session exits, Claude Code gets notified autom
 When you receive the background task notification (the watcher prints "LOOP_COMPLETE"), the loop has finished. Automatically:
-1. Read `.loop/prd.json` — count passed/failed/blocked stories
+1. Read the tracking file to get paths:
-2. Read `.loop/progress.md` — show the latest session log entries
+
-3. Check `git log --oneline` for commits made during the run
+```bash
-4. Present a summary:
+cat .loop/.active-worktree
 ```
 2. Read `{WORKTREE_PATH}/.loop/prd.json` — count passed/failed/blocked stories
 3. Read `{WORKTREE_PATH}/.loop/progress.md` — show the latest session log entries
 4. Check `git log --oneline` on the feature branch for commits made during the run
 5. Archive the run to main's `.loop/archive/`:
 ```bash
 source .loop/lib/state.sh && source .loop/lib/archive.sh && archive_from_worktree "{WORKTREE_PATH}/.loop" "$(pwd)/.loop"
 ```
 6. Clean up the tracking file:
 ```bash
 rm -f .loop/.active-worktree
 ```
 7. Present a summary:
 > **Loop Complete**
 > - Stories: {passed}/{total} complete, {blocked} blocked
 > - Iterations: {from progress.md}
 > - Commits: {list from git log}
 > - Archived to: `.loop/archive/{date}-{feature}/`
 >
 > {If any stories blocked: "Some stories need human review. Run /agent-loop:triage for details."}
 > {If all passed: "All stories complete. Review the code and test it."}
 >
 > **When ready to merge:**
 > ```
 > git merge {BRANCH_NAME}
 > git worktree remove {WORKTREE_PATH}
 > git branch -d {BRANCH_NAME}
 > ```
 ---
@@ -175,11 +297,23 @@ When you receive the background task notification (the watcher prints "LOOP_COMP
 If the user asks about progress (e.g., "status", "how's it going"):
-1. Read `.loop/prd.json` — count passed/failed/blocked
+1. Check for active worktree tracking:
 2. Capture recent tmux output:
 ```bash
-tmux capture-pane -t agent-loop -p | tail -20
+cat .loop/.active-worktree 2>/dev/null
 ```
-3. Report current status.
+If no tracking file, check for tmux sessions matching the pattern:
 ```bash
 tmux list-sessions 2>/dev/null | grep "^agent-loop-"
 ```
 2. Read `{WORKTREE_PATH}/.loop/prd.json` — count passed/failed/blocked
 3. Capture recent tmux output:
 ```bash
 tmux capture-pane -t "$SESSION_NAME" -p | tail -20
 ```
 4. Report current status.
--- a/skills/stories/SKILL.md
+++ b/skills/stories/SKILL.md
@@ -7,6 +7,8 @@ description: "Generate prd.json and sprint contracts by dispatching the planner
 Dispatch the planner agent to decompose a spec into stories. The planner agent cannot write source code or run bash commands — it can only write to `.loop/`.
 **Note:** In most cases, use `/agent-loop:run` instead — it handles worktree creation, story generation, and launching the loop in one flow. Use `/agent-loop:stories` only if you want to generate stories without launching the loop.
 ## Instructions
 ### 1. Check prerequisites
@@ -40,9 +42,15 @@ Agent(
 )
 ```
 If a worktree path is known (e.g., passed as context), include it in the prompt:
 ```
 IMPORTANT: Write ALL files using absolute paths under: {WORKTREE_PATH}/.loop/
 ```
 ### 5. Present results
-After the planner finishes, read `.loop/prd.json` and show the user:
+After the planner finishes, read `.loop/prd.json` (or `{WORKTREE_PATH}/.loop/prd.json`) and show the user:
 > **Plan Ready — Review Before Running**
 >
Author	SHA1	Message	Date
Sheldon Finlay	ecfbd0bb37	feat: worktree-based run isolation for parallel loops Each /agent-loop:run now creates a git worktree for the feature branch before generating stories. This provides full isolation: - Multiple loops can run in parallel on different specs in the same project - Main working directory stays on main, always available - Each worktree has its own .loop/ state, tmux session, and branch - Completed runs are archived to main's .loop/archive/ with runs.log Changes: - setup.sh: add --init-worktree mode for initializing worktree .loop/ - archive.sh: add archive_from_worktree() for cross-directory archiving - loop.sh: replace branch checkout with validation (worktree is pre-checked-out) - agents/planner.md: accept absolute path prefix for worktree .loop/ writes - skills/run/SKILL.md: full rewrite — worktree creation in Phase 2, launch in Phase 3, archive on completion, .active-worktree tracking file - skills/stories/SKILL.md: worktree-aware, defer to /run for full flow Bump to 0.12.0.	2026-04-02 11:21:17 -04:00
Sheldon Finlay	344b179b4d	feat: support parallel loops with per-project tmux session names The tmux session name is now derived from the project directory name (e.g., agent-loop-server, agent-loop-webapp). This allows running multiple loops in parallel on different projects without collisions. Previously hardcoded to "agent-loop", which meant launching a second loop would kill the first project's tmux session.	2026-04-02 10:54:22 -04:00
Sheldon Finlay	b516492a91	fix: install Stop hook once at loop startup, not per-iteration Per-iteration install/remove had a race condition: settings.local.json was written immediately before CC started, and CC could read the old file (without the hook) on the first iteration. Now the hook is installed once when loop.sh starts and removed on exit. The AGENT_LOOP_ACTIVE env var guard ensures it only fires for CC sessions spawned by the loop, so keeping it installed the whole time is safe.	2026-04-02 10:51:48 -04:00
Sheldon Finlay	a1a3dfbd63	fix: use env var instead of tmux check for Stop hook scoping The tmux display-message approach had edge cases: it could succeed outside tmux, fail on first iteration, or behave differently depending on tmux socket state. Replace with AGENT_LOOP_ACTIVE env var exported by loop.sh. CC sessions spawned by the loop inherit it; interactive CC sessions don't. Simple, no external dependencies, no race conditions.	2026-04-02 10:42:46 -04:00
Sheldon Finlay	bab002b927	fix: prevent Stop hook from killing sessions outside tmux tmux display-message succeeds even outside tmux by falling back to the most recently created session (agent-loop). This caused the hook to match and kill interactive CC sessions. Fix: check $TMUX env var first — only set when actually inside tmux.	2026-04-02 09:14:43 -04:00
Sheldon Finlay	71b00cf11f	feat: auto-update harness files when plugin version changes setup.sh now stamps .harness-version in .loop/ at scaffold time. On each /agent-loop:run, Phase 1 compares the installed harness version against the plugin version and auto-updates lib/, prompts/, and loop.sh if stale. Run state (prd.json, contracts, config.json) is preserved. Also adds setup.sh --update mode for refreshing harness files without re-scaffolding. Bump to 0.10.0.	2026-04-02 09:02:41 -04:00
Sheldon Finlay	1bd8004854	fix: scope Stop hook to agent-loop tmux session only The Stop hook (kill -INT $PPID) was written to the project's settings.local.json, causing ANY Claude Code session in the same project to kill its parent shell on exit — not just the loop's sessions. Now the hook checks tmux session name before firing: only CC sessions inside the "agent-loop" tmux session trigger the kill. Other CC sessions in the same project are unaffected.	2026-04-02 08:17:15 -04:00
Sheldon Finlay	ad58a49182	feat: auto-archive completed runs before starting new features When /agent-loop:run detects a previous run with all stories passed (or the feature branch deleted after merge), it archives the old artifacts and resets .loop/ automatically — no more manual rm -rf .loop. - Add archive_and_reset() for on-demand archiving from skills - Add runs.log index tracking all archived runs - Update /run and /stories skills to detect completed runs - setup.sh archives instead of hard-failing when prd.json exists - Bump version to 0.9.0	2026-04-02 07:40:07 -04:00
Sheldon Finlay	ce111b4cbe	feat: add guidance for subjective acceptance criteria Planner now has examples for design/UX criteria that are evaluable without being purely binary. Prevents the planner from avoiding qualitative criteria just because they aren't grep-checkable.	2026-03-28 12:59:42 -04:00
Sheldon Finlay	77fd9e0cd6	feat: add concrete examples of good vs bad acceptance criteria Planner now sees specific examples of verifiable criteria (grep, test commands, file checks) alongside vague anti-patterns. Drives higher story quality which directly improves evaluator accuracy.	2026-03-28 12:56:53 -04:00
Sheldon Finlay	1efca3c185	feat: add blocker handling and artifact protection to generator Generator now has explicit instructions for when it's stuck: write the blocker to notes, leave passes as false, and stop. Also adds a "Do Not Modify" section preventing changes to other stories, contracts, or config.	2026-03-28 12:40:05 -04:00
Sheldon Finlay	e4df81fdac	feat: add self-verification gate before generator marks story done Generator must now verify each acceptance criterion against actual code before setting passes: true. Acts as a first filter before the evaluator runs, reducing false completions.	2026-03-28 12:36:24 -04:00
Sheldon Finlay	6833d94cf4	docs: mention using Claude or /plan to generate specs	2026-03-28 12:26:40 -04:00
Sheldon Finlay	c293f53d90	docs: make runtime verification claim accurate Only claim what the evaluator actually does: runs tests, builds, and checks for errors. Don't overstate MCP server discovery.	2026-03-28 12:20:31 -04:00
Sheldon Finlay	9fd428ac51	docs: replace specific MCP recommendations with general guidance Avoid maintaining specific install commands that will go stale. The evaluator uses whatever tools are available — let users configure their own testing environment.	2026-03-28 12:19:50 -04:00
Sheldon Finlay	c46de6815c	refactor: remove headless mode Headless mode was half-built and untested. Agent-loop is a plugin that runs interactively via tmux — there's no CI use case yet. Removes --headless flag, timeout compatibility shim, output capture logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.	2026-03-28 12:17:30 -04:00
Sheldon Finlay	b4d4e1952a	docs: rewrite README for plugin-first install - Remove manual install and install.sh references - Add prerequisites section (tmux, jq/python3) - Add step to write a spec before running - Fix "PRD" → "spec" in modes table - Add --headless to options list - Update generator description with startup sequence - Note evaluator calibration examples	2026-03-28 12:01:05 -04:00
Sheldon Finlay	60ce0fef54	fix: tighten vague language across all prompt files - Remove blanket "write tests" instructions; tests only when acceptance criteria require them - Replace arbitrary "30-50% rejection rate" with clear directive - Replace "4/5 threshold" with "majority of claims" rule - List concrete quality gate commands instead of "whatever project uses" - Remove "learnings" from progress summary (too vague) - Make error-leak pattern generic (not HTTP-specific) - Align fix evaluator with updated test expectations	2026-03-28 11:58:13 -04:00
Sheldon Finlay	f26bdce534	fix: replace misleading context budget percentages with scope guidance The planner prompt had vague context window budget percentages that don't reflect how agents actually work. Replaced with concrete scope guidance (keep stories to ~10 files) which aligns with the existing scope budgets in config.json.	2026-03-28 11:49:04 -04:00
Sheldon Finlay	2dc291aac4	fix: make evaluator calibration examples project-agnostic Replace ChaosRush-specific references with generic examples that apply to any codebase.	2026-03-28 11:21:11 -04:00
Sheldon Finlay	1d059e218b	feat: add few-shot calibration examples to evaluator prompt Three examples showing bad rubber-stamp, good rejection, and good pass patterns. Based on Anthropic's harness design recommendation to calibrate evaluators with few-shot score breakdowns, and informed by real failures observed in a production loop run.	2026-03-28 11:15:52 -04:00
Sheldon Finlay	80b0f0f4c1	feat: add regression patterns to evaluator implement prompt Three new failure patterns: missing imports after refactoring, orphaned resource instances, and error detail leakage. These were observed in a real loop run where the evaluator missed them.	2026-03-28 10:57:44 -04:00
Sheldon Finlay	5e4ad3b12e	feat: add smoke test step to generator startup sequence Generator now runs a quick health check before implementing if the project has tests or a dev server. Catches regressions from previous iterations early instead of building on a broken foundation.	2026-03-27 21:09:36 -04:00
Sheldon Finlay	9a7fa3a1bd	fix: enforce strict orientation sequence in generator prompt Add git log step and explicit gate requiring all startup steps complete before implementation begins. Based on Anthropic's prompting guide recommendation for prescriptive session orientation.	2026-03-27 21:07:48 -04:00