feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
2026-03-27 08:03:18 -04:00
commit 17e5eb707f
29 changed files with 2546 additions and 0 deletions
--- a/prompts/evaluator/implement.md
+++ b/prompts/evaluator/implement.md
@@ -0,0 +1,31 @@
+# Mode: Implement — Evaluator
+
+You are evaluating an implementation story. The generator claims to have built a feature.
+
+## Implementation-Specific Checks
+
+In addition to the base evaluation process:
+
+1. **Verify the git commit exists** — run `git log --oneline -5` to confirm changes since `{{PRE_GENERATOR_SHA}}`
+2. **Check commit scope** — does `git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only` only contain files relevant to this story?
+3. **Read the actual test output** — if the generator claims tests pass, verify by running them yourself
+4. **For UI stories:**
+   - Check that the component actually renders (not just that it exists)
+   - Verify event handlers are wired up (not just defined)
+   - Check accessibility basics (labels, semantic elements)
+5. **For API stories:**
+   - Verify the endpoint is registered in the router
+   - Check request/response types match the contract
+   - Verify error handling returns appropriate status codes
+6. **For database stories:**
+   - Verify migration runs cleanly
+   - Check indexes are created for query patterns
+   - Verify foreign key constraints
+
+## Common Generator Failures to Watch For
+
+- Created the file but didn't wire it into the application (route not registered, component not imported)
+- Tests exist but don't actually assert meaningful behavior
+- "Passes typecheck" but only because types are `any` or too loose
+- UI component renders but doesn't respond to interaction
+- API endpoint exists but returns hardcoded/mock data