8 Commits

Author SHA1 Message Date
60ce0fef54 fix: tighten vague language across all prompt files
- Remove blanket "write tests" instructions; tests only when
  acceptance criteria require them
- Replace arbitrary "30-50% rejection rate" with clear directive
- Replace "4/5 threshold" with "majority of claims" rule
- List concrete quality gate commands instead of "whatever project uses"
- Remove "learnings" from progress summary (too vague)
- Make error-leak pattern generic (not HTTP-specific)
- Align fix evaluator with updated test expectations
2026-03-28 11:58:13 -04:00
2dc291aac4 fix: make evaluator calibration examples project-agnostic
Replace ChaosRush-specific references with generic examples
that apply to any codebase.
2026-03-28 11:21:11 -04:00
1d059e218b feat: add few-shot calibration examples to evaluator prompt
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
2026-03-28 11:15:52 -04:00
48bc656cd8 refactor: trim generator and evaluator prompts — cut total in half 2026-03-27 14:48:42 -04:00
5f8a34cc7b fix: simplify evaluator runtime verification — let claude figure out the tools 2026-03-27 14:45:55 -04:00
ee08e3617c feat: evaluator runtime verification for web projects, optional Playwright docs 2026-03-27 14:30:09 -04:00
1e7f7ea6ed feat: true interactive mode — run claude directly, verdict via file, no script/capture 2026-03-27 13:07:25 -04:00
17e5eb707f feat: agent loop harness with Claude Code plugin support
Generator-evaluator architecture with iterative context-reset for
long-running coding tasks. Ships as a Claude Code plugin — install
with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.
2026-03-27 08:03:18 -04:00