loop-loop

Author	SHA1	Message	Date
Sheldon Finlay	60ce0fef54	fix: tighten vague language across all prompt files - Remove blanket "write tests" instructions; tests only when acceptance criteria require them - Replace arbitrary "30-50% rejection rate" with clear directive - Replace "4/5 threshold" with "majority of claims" rule - List concrete quality gate commands instead of "whatever project uses" - Remove "learnings" from progress summary (too vague) - Make error-leak pattern generic (not HTTP-specific) - Align fix evaluator with updated test expectations	2026-03-28 11:58:13 -04:00
Sheldon Finlay	2dc291aac4	fix: make evaluator calibration examples project-agnostic Replace ChaosRush-specific references with generic examples that apply to any codebase.	2026-03-28 11:21:11 -04:00
Sheldon Finlay	1d059e218b	feat: add few-shot calibration examples to evaluator prompt Three examples showing bad rubber-stamp, good rejection, and good pass patterns. Based on Anthropic's harness design recommendation to calibrate evaluators with few-shot score breakdowns, and informed by real failures observed in a production loop run.	2026-03-28 11:15:52 -04:00
Sheldon Finlay	48bc656cd8	refactor: trim generator and evaluator prompts — cut total in half	2026-03-27 14:48:42 -04:00
Sheldon Finlay	5f8a34cc7b	fix: simplify evaluator runtime verification — let claude figure out the tools	2026-03-27 14:45:55 -04:00
Sheldon Finlay	ee08e3617c	feat: evaluator runtime verification for web projects, optional Playwright docs	2026-03-27 14:30:09 -04:00
Sheldon Finlay	1e7f7ea6ed	feat: true interactive mode — run claude directly, verdict via file, no script/capture	2026-03-27 13:07:25 -04:00
Sheldon Finlay	17e5eb707f	feat: agent loop harness with Claude Code plugin support Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.	2026-03-27 08:03:18 -04:00

8 Commits