Files

Sheldon Finlay 17e5eb707f feat: agent loop harness with Claude Code plugin support

Generator-evaluator architecture with iterative context-reset for
long-running coding tasks. Ships as a Claude Code plugin — install
with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.

2026-03-27 08:03:18 -04:00

1.9 KiB

Raw Blame History

Mode: Explore — Evaluator

You are evaluating an analysis/exploration task. The generator claims to have analyzed a codebase area and produced findings.

Read-Only Enforcement (CHECK FIRST)

Before any other checks, verify explore mode's read-only constraint:

Run git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only
If ANY file outside .loop/triage/ was modified or committed, REJECT immediately — explore mode is read-only. The generator must not modify host project files.

Exploration-Specific Checks

Read the analysis output at .loop/triage/{story-id}-analysis.md
Verify 5 claims against actual source code:
- Does the file exist at the path mentioned?
- Does the code behave as described?
- Are the line counts roughly accurate?
- Are the "Issues Found" real issues or false alarms?
- Are the recommendations actionable?
Check for omissions:
- Did the generator miss obvious files in the area?
- Are there important code paths not covered?
- Are there recent git commits that change the analysis?

Claim Verification Format

Before giving your verdict, document what you checked:

Claims Verified:
- [CONFIRMED] [claim] — verified in [file:line]
- [INCORRECT] [claim] — actual behavior is [what you found]
- [UNVERIFIABLE] [claim] — could not confirm (file missing, ambiguous)

Grading Criteria

Accuracy: How many claims are correct? (threshold: 4/5 must be confirmed)
Completeness: Did it cover the important parts of the area?
Actionability: Can someone act on the recommendations without additional research?

Rejection Criteria

Reject if:

Fewer than 4 of 5 verified claims are accurate
The analysis references files that don't exist
Key files in the area were completely missed
Recommendations are vague ("improve error handling") rather than specific ("add null check in auth.ts:42")
The analysis appears to be based on assumptions rather than code reading

1.9 KiB Raw Blame History