feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
2026-03-27 08:03:18 -04:00
commit 17e5eb707f
29 changed files with 2546 additions and 0 deletions
--- a/skills/loop-init/SKILL.md
+++ b/skills/loop-init/SKILL.md
@@ -0,0 +1,141 @@
+---
+name: init
+description: Initialize the agent loop harness in the current project. Scaffolds .loop/ directory, detects tech stack, picks mode, generates config, and flows into planning.
+---
+
+# /init — Initialize Agent Loop for a Project
+
+Set up the agent loop harness in the current project. This is the entry point for first-time use.
+
+## What This Skill Does
+
+1. Scaffolds the `.loop/` directory with prompts, templates, and lib scripts from the plugin
+2. Analyzes the project to understand its tech stack, structure, and conventions
+3. Asks the user what they want to accomplish (explore, implement, or fix)
+4. Creates project-specific configuration (`config.json`, `init.sh`)
+5. Flows into planning to generate the PRD and sprint contracts
+
+## Instructions
+
+When the user invokes this skill, follow this sequence:
+
+### Step 0: Scaffold .loop/ Directory
+
+Check if `.loop/` already exists in the project root.
+
+**If it does NOT exist**, create it by copying from the plugin:
+
+1. The plugin's root directory is available at `${CLAUDE_PLUGIN_ROOT}`. Copy the harness files:
+
+```bash
+mkdir -p .loop
+cp -r "${CLAUDE_PLUGIN_ROOT}/prompts" .loop/
+cp -r "${CLAUDE_PLUGIN_ROOT}/templates" .loop/
+cp -r "${CLAUDE_PLUGIN_ROOT}/lib" .loop/
+cp "${CLAUDE_PLUGIN_ROOT}/loop.sh" .loop/
+chmod +x .loop/loop.sh
+```
+
+**IMPORTANT:** If `${CLAUDE_PLUGIN_ROOT}` is not set or the path doesn't exist, look for the files in the plugin's own directory structure. The prompts, templates, and lib directories are bundled with this plugin.
+
+2. Create `.loop/.gitignore` with runtime artifacts:
+
+```
+prd.json
+progress.md
+progress-archive.md
+config.json
+init.sh
+contracts/
+triage/
+archive/
+.archive-staging/
+.last-branch
+.loop.lock
+```
+
+**If `.loop/` already exists**, ask the user if they want to re-initialize (which resets config but preserves prd.json/progress.md if they exist).
+
+### Step 1: Project Discovery
+
+Read the project to understand what we're working with:
+- Check for `CLAUDE.md`, `AGENTS.md`, `README.md` at the project root
+- Check for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Package.swift`, `composer.json` to identify the tech stack
+- Run `ls` on the project root to see the top-level structure
+
+Present a brief summary:
+> "I see this is a [language/framework] project with [key characteristics]. The main source is in [dir/]."
+
+### Step 2: Mode Selection
+
+Ask the user:
+
+> **What would you like to do?**
+>
+> a) **Explore** — Analyze the codebase to understand what exists, find issues, and document the system. No code changes.
+> b) **Implement** — Build a new feature from a PRD. Code changes, commits, and tests.
+> c) **Fix** — Work through a list of bugs or tech debt items. Targeted code changes.
+
+### Step 3: Clarifying Questions
+
+Based on the mode, ask 3-5 questions:
+
+**For Explore:**
+- "What areas are you most interested in? (e.g., auth, database, API, frontend, everything)"
+- "Are there known problem areas you want me to focus on?"
+- "How many exploration sessions should I budget? (default: 20)"
+
+**For Implement:**
+- "Describe the feature you want to build (1-3 sentences is fine)"
+- "Are there any architectural constraints I should know about?"
+- "Should I follow any specific patterns from the existing codebase?"
+
+**For Fix:**
+- "Do you have a list of issues, or should I find them?"
+- "Any areas that are off-limits for changes?"
+- "What's the priority: security, stability, or code quality?"
+
+### Step 4: Generate Configuration
+
+Create `.loop/config.json` based on the project and user's answers:
+
+```json
+{
+  "tool": "claude",
+  "mode": "<selected mode>",
+  "maxIterations": <appropriate default>,
+  "skipEval": false,
+  "evalRetries": 2,
+  "autoHooks": true,
+  "branchPrefix": "loop/",
+  "scopeBudgets": {
+    // Set based on project size and mode
+  }
+}
+```
+
+Create `.loop/init.sh` with project-specific setup commands:
+- Dev server startup (if applicable)
+- Test runner command
+- Type checker command
+- Linter command
+- Any environment setup needed
+
+Make `init.sh` executable.
+
+### Step 5: Flow into Planning
+
+Tell the user:
+> "Project configured. Now let's plan the work."
+
+Then invoke the `/agent-loop:plan` skill to generate the PRD and sprint contracts.
+
+### Step 6: Ready to Run
+
+Once planning is complete, tell the user:
+> "Everything is set up. To start the loop:"
+> ```
+> /agent-loop:run            # Interactive (recommended) — visible, can intervene
+> .loop/loop.sh              # Headless — fully autonomous
+> ```
+> You can monitor progress in `.loop/progress.md` and check story status in `.loop/prd.json`.
--- a/skills/loop-plan/SKILL.md
+++ b/skills/loop-plan/SKILL.md
@@ -0,0 +1,188 @@
+---
+name: plan
+description: Interactive planning session that generates PRD (prd.json) and sprint contracts for the agent loop. Run /agent-loop:init first.
+---
+
+# /plan — Generate PRD and Sprint Contracts
+
+Interactive planning session that produces all artifacts needed for the autonomous agent loop.
+
+## Prerequisites
+
+- `.loop/` directory must exist with `config.json` (run `/agent-loop:init` first if not)
+- User should have a clear idea of what they want to build/explore/fix
+
+## Usage
+
+```
+/loop-plan <optional feature description>
+```
+
+Examples:
+- `/loop-plan Add OAuth authentication with Google and GitHub`
+- `/loop-plan Explore the payment processing system`
+- `/loop-plan Fix all critical security issues from the audit`
+
+## Instructions
+
+### Step 1: Understand the Request
+
+If the user provided a feature description, use it. Otherwise ask:
+> "What would you like to work on? Describe it in 1-3 sentences."
+
+### Step 2: Codebase Analysis
+
+Read key project files to understand existing patterns:
+- Relevant source directories for the feature
+- Existing tests to understand testing patterns
+- Configuration files for conventions
+- Recent git history (`git log --oneline -20`) for active work
+
+### Step 3: Clarifying Questions
+
+Ask 3-5 targeted questions based on what you found in the code. These should be questions where the answer isn't obvious from the codebase. Examples:
+
+- "I see you have both REST endpoints and GraphQL. Should this feature use REST or GraphQL?"
+- "The existing auth uses JWT. Should I add OAuth alongside it or replace it?"
+- "I found two competing patterns for data validation. Which should I follow?"
+
+**Do NOT ask questions you can answer from the code.** Only ask when human judgment is needed.
+
+### Step 4: Generate PRD (`prd.json`)
+
+Create `.loop/prd.json` with properly-sized, dependency-ordered stories.
+
+**Story Sizing Rules (CRITICAL):**
+- Each story must be completable in ONE context window (~100K tokens of work)
+- Target: 1-3 files changed per story
+- Too big: "Build the authentication system" → split into migration, endpoint, middleware, UI, tests
+- Too small: "Add import statement" → combine with the story that needs it
+
+**Dependency Ordering:**
+1. Schema/database changes first (they block everything)
+2. Backend logic (depends on schema)
+3. Frontend components (depend on backend)
+4. Integration/wiring (depends on components)
+5. Polish/edge cases (depends on core being done)
+
+**Required Fields Per Story:**
+```json
+{
+  "id": "US-001",
+  "title": "Short descriptive title",
+  "description": "As a [role], I want [feature] so that [benefit].",
+  "acceptanceCriteria": [
+    "Specific, verifiable criterion",
+    "Another criterion",
+    "Typecheck passes"
+  ],
+  "priority": 1,
+  "passes": false,
+  "notes": "",
+  "rejections": 0
+}
+```
+
+**Acceptance Criteria Rules:**
+- Every criterion must be independently verifiable (not "works well" — "returns 200 with valid token")
+- Always include "Typecheck passes" (or equivalent for the language)
+- UI stories must include "Verify UI renders and responds to interaction"
+- API stories must include status code expectations
+- Database stories must include migration success check
+
+### Step 5: Generate Sprint Contracts
+
+For each story, create `.loop/contracts/{story-id}.contract.md`:
+
+```markdown
+# Sprint Contract: {Story ID} — {Story Title}
+
+## What Will Be Built
+Concrete description of the deliverable. Not the user story — the actual thing being built.
+
+## Done Conditions
+- [ ] Condition 1 (specific, testable)
+- [ ] Condition 2
+- [ ] All acceptance criteria from prd.json met
+
+## Evaluation Criteria
+What the evaluator will specifically check:
+- [ ] Check 1
+- [ ] Check 2
+- [ ] No regressions in [specific area]
+
+## Out of Scope
+Things explicitly NOT part of this story:
+- Thing 1
+- Thing 2
+
+## Key Files
+Files likely to be created or modified:
+- path/to/file.ext — what changes
+- path/to/other.ext — what changes
+
+## Dependencies
+- Depends on: [story IDs that must be done first, or "none"]
+- Blocks: [story IDs that depend on this one, or "none"]
+```
+
+### Step 6: Initialize Progress File
+
+Create `.loop/progress.md` from the template with an initial Codebase Patterns section populated from what you learned during analysis:
+
+```markdown
+# Progress
+
+## Codebase Patterns
+
+- [Pattern you discovered during analysis]
+- [Convention you noticed]
+- [Testing approach used in the project]
+
+---
+
+## Session Log
+
+### Planning Session
+Date: YYYY-MM-DD HH:MM
+
+**PRD created:** {N} stories for "{feature description}"
+**Estimated iterations:** {N stories + ~30% for evaluator rejections}
+**Key decisions:**
+- [Decision 1 and why]
+- [Decision 2 and why]
+
+---
+```
+
+### Step 7: Present Summary
+
+Show the user a summary:
+
+> **Plan Ready**
+>
+> | Stories | Est. Iterations | Mode | Branch |
+> |---------|----------------|------|--------|
+> | {N}     | {N+30%}        | {mode} | {branchName} |
+>
+> **Story Overview:**
+> 1. US-001: {title} (priority 1)
+> 2. US-002: {title} (priority 2)
+> ...
+>
+> Review the stories in `.loop/prd.json` and contracts in `.loop/contracts/`.
+> Adjust anything you'd like, then run:
+> ```
+> /agent-loop:run            # Interactive (recommended)
+> .loop/loop.sh              # Headless
+> ```
+
+### Step 8: Wait for Feedback
+
+Let the user review and adjust. They might:
+- Ask to split a story further
+- Ask to reorder priorities
+- Ask to add/remove stories
+- Ask to change acceptance criteria
+
+Make the requested changes, then re-present the summary.
--- a/skills/loop-run/SKILL.md
+++ b/skills/loop-run/SKILL.md
@@ -0,0 +1,203 @@
+---
+name: run
+description: Execute the generator-evaluator loop interactively inside Claude Code. Dispatches subagents with full visibility and intervention capability. Run /agent-loop:init and /agent-loop:plan first.
+---
+
+# /run — Execute Agent Loop Inside Claude Code
+
+Run the generator-evaluator loop natively in Claude Code using subagents. Unlike `loop.sh` (headless), this gives you full visibility into each agent's work and the ability to intervene at any point.
+
+## Usage
+
+```
+/agent-loop:run                    # Run until all stories pass or max iterations
+/agent-loop:run 3                  # Run at most 3 iterations
+/agent-loop:run --skip-eval        # Skip evaluator (generator marks stories done)
+/agent-loop:run --story US-003     # Run only a specific story
+```
+
+## Prerequisites
+
+- `.loop/config.json` exists (run `/agent-loop:init` first)
+- `.loop/prd.json` exists with stories (run `/agent-loop:plan` first)
+
+## Instructions
+
+When the user invokes `/loop-run`, follow this orchestration sequence exactly.
+
+### Step 0: Parse Arguments
+
+- If a number is provided, use it as max iterations. Otherwise read `maxIterations` from `.loop/config.json`.
+- If `--skip-eval` is provided, skip the evaluator pass.
+- If `--story <ID>` is provided, only work on that specific story.
+
+### Step 1: Load State
+
+1. Read `.loop/config.json` — get `mode`, `maxIterations`, `evalRetries`, `scopeBudgets`
+2. Read `.loop/prd.json` — get the story list and their statuses
+3. Check `.loop/progress.md` exists; if not, create it from `.loop/templates/progress.md.template`
+
+Report to the user:
+
+> **Loop Ready**
+> - Mode: {mode}
+> - Stories: {passed}/{total} complete
+> - Max iterations: {N}
+> - Eval: {on/off}
+>
+> Starting loop. You can interrupt me at any time to adjust course.
+
+### Step 2: Iteration Loop
+
+For each iteration (1 to max iterations):
+
+#### 2a. Find Next Story
+
+Find the highest-priority story in `prd.json` where `passes` is `false` and `blocked` is not `true`. If `--story` was specified, use that story instead.
+
+**If no actionable story remains:**
+- If all stories have `passes: true` → report success and stop
+- If some stories are `blocked: true` → report which are blocked and suggest `/agent-loop:triage`
+- Stop the loop
+
+#### 2b. Report Iteration Start
+
+Tell the user:
+> **Iteration {N}/{max} — {story.id}: {story.title}**
+
+If the story has `[REJECTED]` entries in its `notes` field, summarize the previous feedback so the user has context.
+
+#### 2c. Assemble Generator Prompt
+
+Read these files and concatenate them with `---` separators:
+1. `.loop/prompts/generator/_base.md`
+2. `.loop/prompts/generator/{mode}.md`
+
+Then substitute these template variables in the assembled text:
+- `{{MAX_FILES_TO_READ}}` → from `config.scopeBudgets.{mode}.maxFilesToRead`
+- `{{MAX_LINES_TO_WRITE}}` → from `config.scopeBudgets.{mode}.maxLinesToWrite`
+- `{{MAX_FILES_TO_MODIFY}}` → from `config.scopeBudgets.{mode}.maxFilesToModify`
+- `{{MODE}}` → the mode
+- `{{ITERATION}}` → current iteration number
+- `{{MAX_ITERATIONS}}` → max iterations
+- `{{LOOP_DIR}}` → path to `.loop/` directory
+- `{{PROJECT_ROOT}}` → project root path
+- `{{CURRENT_STORY_ID}}` → the story ID being worked on
+
+#### 2d. Capture Pre-Generator Git State
+
+Run `git rev-parse HEAD` and save it. This is needed for the evaluator's diff.
+
+#### 2e. Dispatch Generator Agent
+
+Use the **Agent tool** to launch the generator:
+
+```
+Agent(
+  prompt: <assembled generator prompt>,
+  description: "Generator: {story.id}",
+  subagent_type: "general-purpose",
+  mode: "auto"
+)
+```
+
+**IMPORTANT:** Use `mode: "auto"` so the user can see tool calls but isn't prompted for every action. If the user has expressed a preference for more control, use `mode: "default"` instead.
+
+Wait for the agent to complete. The Agent tool returns the generator's final output.
+
+#### 2f. Check for Completion Signal
+
+If the generator output contains `<promise>COMPLETE</promise>`, report all stories complete and stop.
+
+#### 2g. Skip Evaluator (if configured)
+
+If `--skip-eval` was specified or `config.skipEval` is true, skip to step 2j.
+
+#### 2h. Assemble Evaluator Prompt
+
+Read these files and concatenate them:
+1. `.loop/prompts/evaluator/_base.md`
+2. `.loop/prompts/evaluator/{mode}.md`
+
+Substitute the same template variables as the generator, plus:
+- `{{PRE_GENERATOR_SHA}}` → the git SHA captured in step 2d
+- `{{CURRENT_STORY_ID}}` → the story ID
+
+#### 2i. Dispatch Evaluator Agent
+
+Use the **Agent tool** to launch the evaluator:
+
+```
+Agent(
+  prompt: <assembled evaluator prompt>,
+  description: "Evaluator: {story.id}",
+  subagent_type: "general-purpose",
+  mode: "auto"
+)
+```
+
+Wait for completion. Parse the verdict from the output:
+
+- Look for `<verdict>PASS</verdict>` → story passes
+- Look for `<verdict>REJECT</verdict>` → story rejected; extract reason from `<rejection_reason>...</rejection_reason>`
+- No verdict tag found → treat as REJECT (fail-safe)
+
+#### 2j. Update State Based on Verdict
+
+**On PASS (or skip-eval):**
+1. Update `.loop/prd.json` — set `passes: true` for the story
+2. Report to user: ✓ **{story.id} PASSED**
+
+**On REJECT:**
+1. Update `.loop/prd.json`:
+   - Keep `passes: false`
+   - Increment `rejections` count
+   - Append `[REJECTED] {reason}` to `notes`
+2. Report to user: ✗ **{story.id} REJECTED** — {reason}
+3. Check if `rejections` >= `evalRetries` from config:
+   - If yes: set `blocked: true` in prd.json, append `[BLOCKED]` to notes
+   - Report: ⚠ **{story.id} BLOCKED** — rejected {N} times, needs human review
+
+#### 2k. Append Progress Entry
+
+Append to `.loop/progress.md`:
+
+```markdown
+### {story.id} — {story.title}
+Date: {current date and time}
+Iteration: {N}
+Verdict: {PASS/REJECT/SKIP-EVAL}
+
+---
+```
+
+#### 2l. Report Iteration Summary
+
+Show current story counts: `{passed}/{total} stories complete`
+
+If there are more iterations and more stories, continue to the next iteration.
+
+### Step 3: Loop Exit
+
+When the loop ends (all stories done, max iterations, or all remaining blocked), report:
+
+> **Loop Complete**
+> - Iterations used: {N}
+> - Stories: {passed}/{total} complete, {blocked} blocked
+> - {Suggest `/agent-loop:triage` if anything is blocked or incomplete}
+
+### Error Handling
+
+- If an Agent subagent fails or returns empty output, log a warning and continue to the next iteration. Do NOT stop the loop for a single agent failure.
+- If `prd.json` cannot be parsed, stop immediately and report the error.
+- If the user interrupts (denies a tool call, says "stop", etc.), gracefully end the loop and report current status.
+
+### Key Differences from loop.sh
+
+| Feature | loop.sh | /loop-run |
+|---------|---------|-----------|
+| Execution | Headless (`claude --print`) | Visible in Claude Code |
+| Intervention | Kill the process | Deny tool calls, chat mid-loop |
+| Permissions | `--dangerously-skip-permissions` | User-controlled |
+| Context | Fresh process per agent | Fresh Agent subagent per agent |
+| State updates | Shell functions | Claude Code reads/writes files directly |
--- a/skills/loop-triage/SKILL.md
+++ b/skills/loop-triage/SKILL.md
@@ -0,0 +1,83 @@
+---
+name: triage
+description: Generate a human handoff brief summarizing loop status — completed, blocked, and remaining stories with recommended next steps.
+---
+
+# /triage — Generate Human Handoff Brief
+
+Generate a triage brief summarizing the current state of a loop run. Use this when:
+- The loop hit max iterations without completing
+- You want a status check mid-run
+- You're handing off to another developer
+
+## Instructions
+
+When the user invokes `/loop-triage`:
+
+### Step 1: Read Current State
+
+1. Read `.loop/prd.json` — get story statuses
+2. Read `.loop/progress.md` — get session log and patterns
+3. Read `.loop/config.json` — get mode and iteration settings
+4. Check git log for recent commits on the loop branch
+
+### Step 2: Analyze
+
+For each story, determine:
+- **Complete**: `passes: true`, verified by evaluator
+- **In Progress**: `passes: false`, has been attempted (check progress.md for entries)
+- **Blocked**: `passes: false`, rejected multiple times (check `rejections` count and `notes`)
+- **Not Started**: `passes: false`, no progress.md entries, no rejections
+
+### Step 3: Generate Brief
+
+Write to `.loop/triage/TRIAGE_BRIEF.md`:
+
+```markdown
+# Triage Brief
+
+Generated: {current date and time}
+Mode: {mode from config.json}
+Branch: {branchName from prd.json}
+
+## Status Summary
+
+- **Complete:** {N} stories
+- **In Progress:** {N} stories
+- **Blocked:** {N} stories (hit retry limit)
+- **Not Started:** {N} stories
+
+## Story Details
+
+| ID | Title | Status | Rejections | Notes |
+|----|-------|--------|------------|-------|
+| US-001 | ... | Complete | 0 | |
+| US-002 | ... | Blocked | 3 | Evaluator rejected: ... |
+| US-003 | ... | Not Started | 0 | |
+
+## Key Patterns Discovered
+
+{Copy the Codebase Patterns section from progress.md}
+
+## Blocked Stories — Analysis
+
+For each blocked story, summarize:
+- What was attempted
+- Why it was rejected (from notes field)
+- Suggested approach for a human to unblock it
+
+## Recommended Next Steps
+
+Based on the current state:
+1. {Most important next action}
+2. {Second priority}
+3. {Third priority}
+
+## Files Modified
+
+{List all files changed across all commits on the loop branch, with brief descriptions}
+```
+
+### Step 4: Present to User
+
+Show the summary inline and tell the user where the full brief is saved.