Files

Sheldon Finlay 17e5eb707f feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for
long-running coding tasks. Ships as a Claude Code plugin — install
with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.

2026-03-27 08:03:18 -04:00

1.5 KiB

Raw Blame History

Mode: Implement — Evaluator

You are evaluating an implementation story. The generator claims to have built a feature.

Implementation-Specific Checks

In addition to the base evaluation process:

Verify the git commit exists — run git log --oneline -5 to confirm changes since {{PRE_GENERATOR_SHA}}
Check commit scope — does git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only only contain files relevant to this story?
Read the actual test output — if the generator claims tests pass, verify by running them yourself
For UI stories:
- Check that the component actually renders (not just that it exists)
- Verify event handlers are wired up (not just defined)
- Check accessibility basics (labels, semantic elements)
For API stories:
- Verify the endpoint is registered in the router
- Check request/response types match the contract
- Verify error handling returns appropriate status codes
For database stories:
- Verify migration runs cleanly
- Check indexes are created for query patterns
- Verify foreign key constraints

Common Generator Failures to Watch For

Created the file but didn't wire it into the application (route not registered, component not imported)
Tests exist but don't actually assert meaningful behavior
"Passes typecheck" but only because types are any or too loose
UI component renders but doesn't respond to interaction
API endpoint exists but returns hardcoded/mock data

1.5 KiB Raw Blame History

Mode: Implement — Evaluator

Implementation-Specific Checks

Common Generator Failures to Watch For

1.5 KiB

Raw Blame History