feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for
long-running coding tasks. Ships as a Claude Code plugin — install
with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
This commit is contained in:
2026-03-27 08:03:18 -04:00
commit 17e5eb707f
29 changed files with 2546 additions and 0 deletions

141
skills/loop-init/SKILL.md Normal file
View File

@@ -0,0 +1,141 @@
---
name: init
description: Initialize the agent loop harness in the current project. Scaffolds .loop/ directory, detects tech stack, picks mode, generates config, and flows into planning.
---
# /init — Initialize Agent Loop for a Project
Set up the agent loop harness in the current project. This is the entry point for first-time use.
## What This Skill Does
1. Scaffolds the `.loop/` directory with prompts, templates, and lib scripts from the plugin
2. Analyzes the project to understand its tech stack, structure, and conventions
3. Asks the user what they want to accomplish (explore, implement, or fix)
4. Creates project-specific configuration (`config.json`, `init.sh`)
5. Flows into planning to generate the PRD and sprint contracts
## Instructions
When the user invokes this skill, follow this sequence:
### Step 0: Scaffold .loop/ Directory
Check if `.loop/` already exists in the project root.
**If it does NOT exist**, create it by copying from the plugin:
1. The plugin's root directory is available at `${CLAUDE_PLUGIN_ROOT}`. Copy the harness files:
```bash
mkdir -p .loop
cp -r "${CLAUDE_PLUGIN_ROOT}/prompts" .loop/
cp -r "${CLAUDE_PLUGIN_ROOT}/templates" .loop/
cp -r "${CLAUDE_PLUGIN_ROOT}/lib" .loop/
cp "${CLAUDE_PLUGIN_ROOT}/loop.sh" .loop/
chmod +x .loop/loop.sh
```
**IMPORTANT:** If `${CLAUDE_PLUGIN_ROOT}` is not set or the path doesn't exist, look for the files in the plugin's own directory structure. The prompts, templates, and lib directories are bundled with this plugin.
2. Create `.loop/.gitignore` with runtime artifacts:
```
prd.json
progress.md
progress-archive.md
config.json
init.sh
contracts/
triage/
archive/
.archive-staging/
.last-branch
.loop.lock
```
**If `.loop/` already exists**, ask the user if they want to re-initialize (which resets config but preserves prd.json/progress.md if they exist).
### Step 1: Project Discovery
Read the project to understand what we're working with:
- Check for `CLAUDE.md`, `AGENTS.md`, `README.md` at the project root
- Check for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Package.swift`, `composer.json` to identify the tech stack
- Run `ls` on the project root to see the top-level structure
Present a brief summary:
> "I see this is a [language/framework] project with [key characteristics]. The main source is in [dir/]."
### Step 2: Mode Selection
Ask the user:
> **What would you like to do?**
>
> a) **Explore** — Analyze the codebase to understand what exists, find issues, and document the system. No code changes.
> b) **Implement** — Build a new feature from a PRD. Code changes, commits, and tests.
> c) **Fix** — Work through a list of bugs or tech debt items. Targeted code changes.
### Step 3: Clarifying Questions
Based on the mode, ask 3-5 questions:
**For Explore:**
- "What areas are you most interested in? (e.g., auth, database, API, frontend, everything)"
- "Are there known problem areas you want me to focus on?"
- "How many exploration sessions should I budget? (default: 20)"
**For Implement:**
- "Describe the feature you want to build (1-3 sentences is fine)"
- "Are there any architectural constraints I should know about?"
- "Should I follow any specific patterns from the existing codebase?"
**For Fix:**
- "Do you have a list of issues, or should I find them?"
- "Any areas that are off-limits for changes?"
- "What's the priority: security, stability, or code quality?"
### Step 4: Generate Configuration
Create `.loop/config.json` based on the project and user's answers:
```json
{
"tool": "claude",
"mode": "<selected mode>",
"maxIterations": <appropriate default>,
"skipEval": false,
"evalRetries": 2,
"autoHooks": true,
"branchPrefix": "loop/",
"scopeBudgets": {
// Set based on project size and mode
}
}
```
Create `.loop/init.sh` with project-specific setup commands:
- Dev server startup (if applicable)
- Test runner command
- Type checker command
- Linter command
- Any environment setup needed
Make `init.sh` executable.
### Step 5: Flow into Planning
Tell the user:
> "Project configured. Now let's plan the work."
Then invoke the `/agent-loop:plan` skill to generate the PRD and sprint contracts.
### Step 6: Ready to Run
Once planning is complete, tell the user:
> "Everything is set up. To start the loop:"
> ```
> /agent-loop:run # Interactive (recommended) — visible, can intervene
> .loop/loop.sh # Headless — fully autonomous
> ```
> You can monitor progress in `.loop/progress.md` and check story status in `.loop/prd.json`.

188
skills/loop-plan/SKILL.md Normal file
View File

@@ -0,0 +1,188 @@
---
name: plan
description: Interactive planning session that generates PRD (prd.json) and sprint contracts for the agent loop. Run /agent-loop:init first.
---
# /plan — Generate PRD and Sprint Contracts
Interactive planning session that produces all artifacts needed for the autonomous agent loop.
## Prerequisites
- `.loop/` directory must exist with `config.json` (run `/agent-loop:init` first if not)
- User should have a clear idea of what they want to build/explore/fix
## Usage
```
/loop-plan <optional feature description>
```
Examples:
- `/loop-plan Add OAuth authentication with Google and GitHub`
- `/loop-plan Explore the payment processing system`
- `/loop-plan Fix all critical security issues from the audit`
## Instructions
### Step 1: Understand the Request
If the user provided a feature description, use it. Otherwise ask:
> "What would you like to work on? Describe it in 1-3 sentences."
### Step 2: Codebase Analysis
Read key project files to understand existing patterns:
- Relevant source directories for the feature
- Existing tests to understand testing patterns
- Configuration files for conventions
- Recent git history (`git log --oneline -20`) for active work
### Step 3: Clarifying Questions
Ask 3-5 targeted questions based on what you found in the code. These should be questions where the answer isn't obvious from the codebase. Examples:
- "I see you have both REST endpoints and GraphQL. Should this feature use REST or GraphQL?"
- "The existing auth uses JWT. Should I add OAuth alongside it or replace it?"
- "I found two competing patterns for data validation. Which should I follow?"
**Do NOT ask questions you can answer from the code.** Only ask when human judgment is needed.
### Step 4: Generate PRD (`prd.json`)
Create `.loop/prd.json` with properly-sized, dependency-ordered stories.
**Story Sizing Rules (CRITICAL):**
- Each story must be completable in ONE context window (~100K tokens of work)
- Target: 1-3 files changed per story
- Too big: "Build the authentication system" → split into migration, endpoint, middleware, UI, tests
- Too small: "Add import statement" → combine with the story that needs it
**Dependency Ordering:**
1. Schema/database changes first (they block everything)
2. Backend logic (depends on schema)
3. Frontend components (depend on backend)
4. Integration/wiring (depends on components)
5. Polish/edge cases (depends on core being done)
**Required Fields Per Story:**
```json
{
"id": "US-001",
"title": "Short descriptive title",
"description": "As a [role], I want [feature] so that [benefit].",
"acceptanceCriteria": [
"Specific, verifiable criterion",
"Another criterion",
"Typecheck passes"
],
"priority": 1,
"passes": false,
"notes": "",
"rejections": 0
}
```
**Acceptance Criteria Rules:**
- Every criterion must be independently verifiable (not "works well" — "returns 200 with valid token")
- Always include "Typecheck passes" (or equivalent for the language)
- UI stories must include "Verify UI renders and responds to interaction"
- API stories must include status code expectations
- Database stories must include migration success check
### Step 5: Generate Sprint Contracts
For each story, create `.loop/contracts/{story-id}.contract.md`:
```markdown
# Sprint Contract: {Story ID} — {Story Title}
## What Will Be Built
Concrete description of the deliverable. Not the user story — the actual thing being built.
## Done Conditions
- [ ] Condition 1 (specific, testable)
- [ ] Condition 2
- [ ] All acceptance criteria from prd.json met
## Evaluation Criteria
What the evaluator will specifically check:
- [ ] Check 1
- [ ] Check 2
- [ ] No regressions in [specific area]
## Out of Scope
Things explicitly NOT part of this story:
- Thing 1
- Thing 2
## Key Files
Files likely to be created or modified:
- path/to/file.ext — what changes
- path/to/other.ext — what changes
## Dependencies
- Depends on: [story IDs that must be done first, or "none"]
- Blocks: [story IDs that depend on this one, or "none"]
```
### Step 6: Initialize Progress File
Create `.loop/progress.md` from the template with an initial Codebase Patterns section populated from what you learned during analysis:
```markdown
# Progress
## Codebase Patterns
- [Pattern you discovered during analysis]
- [Convention you noticed]
- [Testing approach used in the project]
---
## Session Log
### Planning Session
Date: YYYY-MM-DD HH:MM
**PRD created:** {N} stories for "{feature description}"
**Estimated iterations:** {N stories + ~30% for evaluator rejections}
**Key decisions:**
- [Decision 1 and why]
- [Decision 2 and why]
---
```
### Step 7: Present Summary
Show the user a summary:
> **Plan Ready**
>
> | Stories | Est. Iterations | Mode | Branch |
> |---------|----------------|------|--------|
> | {N} | {N+30%} | {mode} | {branchName} |
>
> **Story Overview:**
> 1. US-001: {title} (priority 1)
> 2. US-002: {title} (priority 2)
> ...
>
> Review the stories in `.loop/prd.json` and contracts in `.loop/contracts/`.
> Adjust anything you'd like, then run:
> ```
> /agent-loop:run # Interactive (recommended)
> .loop/loop.sh # Headless
> ```
### Step 8: Wait for Feedback
Let the user review and adjust. They might:
- Ask to split a story further
- Ask to reorder priorities
- Ask to add/remove stories
- Ask to change acceptance criteria
Make the requested changes, then re-present the summary.

203
skills/loop-run/SKILL.md Normal file
View File

@@ -0,0 +1,203 @@
---
name: run
description: Execute the generator-evaluator loop interactively inside Claude Code. Dispatches subagents with full visibility and intervention capability. Run /agent-loop:init and /agent-loop:plan first.
---
# /run — Execute Agent Loop Inside Claude Code
Run the generator-evaluator loop natively in Claude Code using subagents. Unlike `loop.sh` (headless), this gives you full visibility into each agent's work and the ability to intervene at any point.
## Usage
```
/agent-loop:run # Run until all stories pass or max iterations
/agent-loop:run 3 # Run at most 3 iterations
/agent-loop:run --skip-eval # Skip evaluator (generator marks stories done)
/agent-loop:run --story US-003 # Run only a specific story
```
## Prerequisites
- `.loop/config.json` exists (run `/agent-loop:init` first)
- `.loop/prd.json` exists with stories (run `/agent-loop:plan` first)
## Instructions
When the user invokes `/loop-run`, follow this orchestration sequence exactly.
### Step 0: Parse Arguments
- If a number is provided, use it as max iterations. Otherwise read `maxIterations` from `.loop/config.json`.
- If `--skip-eval` is provided, skip the evaluator pass.
- If `--story <ID>` is provided, only work on that specific story.
### Step 1: Load State
1. Read `.loop/config.json` — get `mode`, `maxIterations`, `evalRetries`, `scopeBudgets`
2. Read `.loop/prd.json` — get the story list and their statuses
3. Check `.loop/progress.md` exists; if not, create it from `.loop/templates/progress.md.template`
Report to the user:
> **Loop Ready**
> - Mode: {mode}
> - Stories: {passed}/{total} complete
> - Max iterations: {N}
> - Eval: {on/off}
>
> Starting loop. You can interrupt me at any time to adjust course.
### Step 2: Iteration Loop
For each iteration (1 to max iterations):
#### 2a. Find Next Story
Find the highest-priority story in `prd.json` where `passes` is `false` and `blocked` is not `true`. If `--story` was specified, use that story instead.
**If no actionable story remains:**
- If all stories have `passes: true` → report success and stop
- If some stories are `blocked: true` → report which are blocked and suggest `/agent-loop:triage`
- Stop the loop
#### 2b. Report Iteration Start
Tell the user:
> **Iteration {N}/{max} — {story.id}: {story.title}**
If the story has `[REJECTED]` entries in its `notes` field, summarize the previous feedback so the user has context.
#### 2c. Assemble Generator Prompt
Read these files and concatenate them with `---` separators:
1. `.loop/prompts/generator/_base.md`
2. `.loop/prompts/generator/{mode}.md`
Then substitute these template variables in the assembled text:
- `{{MAX_FILES_TO_READ}}` → from `config.scopeBudgets.{mode}.maxFilesToRead`
- `{{MAX_LINES_TO_WRITE}}` → from `config.scopeBudgets.{mode}.maxLinesToWrite`
- `{{MAX_FILES_TO_MODIFY}}` → from `config.scopeBudgets.{mode}.maxFilesToModify`
- `{{MODE}}` → the mode
- `{{ITERATION}}` → current iteration number
- `{{MAX_ITERATIONS}}` → max iterations
- `{{LOOP_DIR}}` → path to `.loop/` directory
- `{{PROJECT_ROOT}}` → project root path
- `{{CURRENT_STORY_ID}}` → the story ID being worked on
#### 2d. Capture Pre-Generator Git State
Run `git rev-parse HEAD` and save it. This is needed for the evaluator's diff.
#### 2e. Dispatch Generator Agent
Use the **Agent tool** to launch the generator:
```
Agent(
prompt: <assembled generator prompt>,
description: "Generator: {story.id}",
subagent_type: "general-purpose",
mode: "auto"
)
```
**IMPORTANT:** Use `mode: "auto"` so the user can see tool calls but isn't prompted for every action. If the user has expressed a preference for more control, use `mode: "default"` instead.
Wait for the agent to complete. The Agent tool returns the generator's final output.
#### 2f. Check for Completion Signal
If the generator output contains `<promise>COMPLETE</promise>`, report all stories complete and stop.
#### 2g. Skip Evaluator (if configured)
If `--skip-eval` was specified or `config.skipEval` is true, skip to step 2j.
#### 2h. Assemble Evaluator Prompt
Read these files and concatenate them:
1. `.loop/prompts/evaluator/_base.md`
2. `.loop/prompts/evaluator/{mode}.md`
Substitute the same template variables as the generator, plus:
- `{{PRE_GENERATOR_SHA}}` → the git SHA captured in step 2d
- `{{CURRENT_STORY_ID}}` → the story ID
#### 2i. Dispatch Evaluator Agent
Use the **Agent tool** to launch the evaluator:
```
Agent(
prompt: <assembled evaluator prompt>,
description: "Evaluator: {story.id}",
subagent_type: "general-purpose",
mode: "auto"
)
```
Wait for completion. Parse the verdict from the output:
- Look for `<verdict>PASS</verdict>` → story passes
- Look for `<verdict>REJECT</verdict>` → story rejected; extract reason from `<rejection_reason>...</rejection_reason>`
- No verdict tag found → treat as REJECT (fail-safe)
#### 2j. Update State Based on Verdict
**On PASS (or skip-eval):**
1. Update `.loop/prd.json` — set `passes: true` for the story
2. Report to user: ✓ **{story.id} PASSED**
**On REJECT:**
1. Update `.loop/prd.json`:
- Keep `passes: false`
- Increment `rejections` count
- Append `[REJECTED] {reason}` to `notes`
2. Report to user: ✗ **{story.id} REJECTED** — {reason}
3. Check if `rejections` >= `evalRetries` from config:
- If yes: set `blocked: true` in prd.json, append `[BLOCKED]` to notes
- Report: ⚠ **{story.id} BLOCKED** — rejected {N} times, needs human review
#### 2k. Append Progress Entry
Append to `.loop/progress.md`:
```markdown
### {story.id} — {story.title}
Date: {current date and time}
Iteration: {N}
Verdict: {PASS/REJECT/SKIP-EVAL}
---
```
#### 2l. Report Iteration Summary
Show current story counts: `{passed}/{total} stories complete`
If there are more iterations and more stories, continue to the next iteration.
### Step 3: Loop Exit
When the loop ends (all stories done, max iterations, or all remaining blocked), report:
> **Loop Complete**
> - Iterations used: {N}
> - Stories: {passed}/{total} complete, {blocked} blocked
> - {Suggest `/agent-loop:triage` if anything is blocked or incomplete}
### Error Handling
- If an Agent subagent fails or returns empty output, log a warning and continue to the next iteration. Do NOT stop the loop for a single agent failure.
- If `prd.json` cannot be parsed, stop immediately and report the error.
- If the user interrupts (denies a tool call, says "stop", etc.), gracefully end the loop and report current status.
### Key Differences from loop.sh
| Feature | loop.sh | /loop-run |
|---------|---------|-----------|
| Execution | Headless (`claude --print`) | Visible in Claude Code |
| Intervention | Kill the process | Deny tool calls, chat mid-loop |
| Permissions | `--dangerously-skip-permissions` | User-controlled |
| Context | Fresh process per agent | Fresh Agent subagent per agent |
| State updates | Shell functions | Claude Code reads/writes files directly |

View File

@@ -0,0 +1,83 @@
---
name: triage
description: Generate a human handoff brief summarizing loop status — completed, blocked, and remaining stories with recommended next steps.
---
# /triage — Generate Human Handoff Brief
Generate a triage brief summarizing the current state of a loop run. Use this when:
- The loop hit max iterations without completing
- You want a status check mid-run
- You're handing off to another developer
## Instructions
When the user invokes `/loop-triage`:
### Step 1: Read Current State
1. Read `.loop/prd.json` — get story statuses
2. Read `.loop/progress.md` — get session log and patterns
3. Read `.loop/config.json` — get mode and iteration settings
4. Check git log for recent commits on the loop branch
### Step 2: Analyze
For each story, determine:
- **Complete**: `passes: true`, verified by evaluator
- **In Progress**: `passes: false`, has been attempted (check progress.md for entries)
- **Blocked**: `passes: false`, rejected multiple times (check `rejections` count and `notes`)
- **Not Started**: `passes: false`, no progress.md entries, no rejections
### Step 3: Generate Brief
Write to `.loop/triage/TRIAGE_BRIEF.md`:
```markdown
# Triage Brief
Generated: {current date and time}
Mode: {mode from config.json}
Branch: {branchName from prd.json}
## Status Summary
- **Complete:** {N} stories
- **In Progress:** {N} stories
- **Blocked:** {N} stories (hit retry limit)
- **Not Started:** {N} stories
## Story Details
| ID | Title | Status | Rejections | Notes |
|----|-------|--------|------------|-------|
| US-001 | ... | Complete | 0 | |
| US-002 | ... | Blocked | 3 | Evaluator rejected: ... |
| US-003 | ... | Not Started | 0 | |
## Key Patterns Discovered
{Copy the Codebase Patterns section from progress.md}
## Blocked Stories — Analysis
For each blocked story, summarize:
- What was attempted
- Why it was rejected (from notes field)
- Suggested approach for a human to unblock it
## Recommended Next Steps
Based on the current state:
1. {Most important next action}
2. {Second priority}
3. {Third priority}
## Files Modified
{List all files changed across all commits on the loop branch, with brief descriptions}
```
### Step 4: Present to User
Show the summary inline and tell the user where the full brief is saved.