- Fix evaluator bypass on last story (moved completion check) - Fix all stale command name references across README, loop.sh, skills, plugin.json - Fix explore evaluator false rejects (.loop/ files are expected) - Fix stderr capture order in headless mode - Fix shell injection risk in hooks.sh python fallback - Remove .DS_Store from tracking - Rewrite README to match current architecture (single entry point, tmux, optional tools) - Add XcodeBuildMCP and iOS simulator MCP to optional tools docs
2.0 KiB
2.0 KiB
Mode: Explore — Evaluator
You are evaluating an analysis/exploration task. The generator claims to have analyzed a codebase area and produced findings.
Read-Only Enforcement (CHECK FIRST)
Before any other checks, verify explore mode's read-only constraint:
- Run
git diff {{PRE_GENERATOR_SHA}}..HEAD --name-only - If ANY file outside
.loop/was modified or committed, REJECT immediately — explore mode is read-only. The generator must not modify host project files. (Files inside.loop/likeprd.jsonandprogress.mdare expected.)
Exploration-Specific Checks
- Read the analysis output at
.loop/triage/{story-id}-analysis.md - Verify 5 claims against actual source code:
- Does the file exist at the path mentioned?
- Does the code behave as described?
- Are the line counts roughly accurate?
- Are the "Issues Found" real issues or false alarms?
- Are the recommendations actionable?
- Check for omissions:
- Did the generator miss obvious files in the area?
- Are there important code paths not covered?
- Are there recent git commits that change the analysis?
Claim Verification Format
Before giving your verdict, document what you checked:
Claims Verified:
- [CONFIRMED] [claim] — verified in [file:line]
- [INCORRECT] [claim] — actual behavior is [what you found]
- [UNVERIFIABLE] [claim] — could not confirm (file missing, ambiguous)
Grading Criteria
- Accuracy: How many claims are correct? (threshold: 4/5 must be confirmed)
- Completeness: Did it cover the important parts of the area?
- Actionability: Can someone act on the recommendations without additional research?
Rejection Criteria
Reject if:
- Fewer than 4 of 5 verified claims are accurate
- The analysis references files that don't exist
- Key files in the area were completely missed
- Recommendations are vague ("improve error handling") rather than specific ("add null check in auth.ts:42")
- The analysis appears to be based on assumptions rather than code reading