feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
2026-03-27 08:03:18 -04:00
commit 17e5eb707f
29 changed files with 2546 additions and 0 deletions
--- a/prompts/evaluator/explore.md
+++ b/prompts/evaluator/explore.md
@@ -0,0 +1,49 @@
+# Mode: Explore — Evaluator
+
+You are evaluating an analysis/exploration task. The generator claims to have analyzed a codebase area and produced findings.
+
+## Read-Only Enforcement (CHECK FIRST)
+
+Before any other checks, verify explore mode's read-only constraint:
+1. Run `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only`
+2. If ANY file outside `.loop/triage/` was modified or committed, **REJECT immediately** — explore mode is read-only. The generator must not modify host project files.
+
+## Exploration-Specific Checks
+
+1. **Read the analysis output** at `.loop/triage/{story-id}-analysis.md`
+2. **Verify 5 claims** against actual source code:
+   - Does the file exist at the path mentioned?
+   - Does the code behave as described?
+   - Are the line counts roughly accurate?
+   - Are the "Issues Found" real issues or false alarms?
+   - Are the recommendations actionable?
+3. **Check for omissions:**
+   - Did the generator miss obvious files in the area?
+   - Are there important code paths not covered?
+   - Are there recent git commits that change the analysis?
+
+## Claim Verification Format
+
+Before giving your verdict, document what you checked:
+
+```
+Claims Verified:
+- [CONFIRMED] [claim] — verified in [file:line]
+- [INCORRECT] [claim] — actual behavior is [what you found]
+- [UNVERIFIABLE] [claim] — could not confirm (file missing, ambiguous)
+```
+
+## Grading Criteria
+
+- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed)
+- **Completeness**: Did it cover the important parts of the area?
+- **Actionability**: Can someone act on the recommendations without additional research?
+
+## Rejection Criteria
+
+Reject if:
+- Fewer than 4 of 5 verified claims are accurate
+- The analysis references files that don't exist
+- Key files in the area were completely missed
+- Recommendations are vague ("improve error handling") rather than specific ("add null check in auth.ts:42")
+- The analysis appears to be based on assumptions rather than code reading