Generator now has explicit instructions for when it's stuck: write
the blocker to notes, leave passes as false, and stop. Also adds
a "Do Not Modify" section preventing changes to other stories,
contracts, or config.
Generator must now verify each acceptance criterion against actual
code before setting passes: true. Acts as a first filter before
the evaluator runs, reducing false completions.
The planner prompt had vague context window budget percentages that
don't reflect how agents actually work. Replaced with concrete
scope guidance (keep stories to ~10 files) which aligns with the
existing scope budgets in config.json.
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
Three new failure patterns: missing imports after refactoring,
orphaned resource instances, and error detail leakage. These were
observed in a real loop run where the evaluator missed them.
Generator now runs a quick health check before implementing if the
project has tests or a dev server. Catches regressions from previous
iterations early instead of building on a broken foundation.
Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.
- Fix evaluator bypass on last story (moved completion check)
- Fix all stale command name references across README, loop.sh, skills, plugin.json
- Fix explore evaluator false rejects (.loop/ files are expected)
- Fix stderr capture order in headless mode
- Fix shell injection risk in hooks.sh python fallback
- Remove .DS_Store from tracking
- Rewrite README to match current architecture (single entry point, tmux, optional tools)
- Add XcodeBuildMCP and iOS simulator MCP to optional tools docs
Generator-evaluator architecture with iterative context-reset for
long-running coding tasks. Ships as a Claude Code plugin — install
with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.