Planner now has examples for design/UX criteria that are evaluable
without being purely binary. Prevents the planner from avoiding
qualitative criteria just because they aren't grep-checkable.
Planner now sees specific examples of verifiable criteria (grep,
test commands, file checks) alongside vague anti-patterns. Drives
higher story quality which directly improves evaluator accuracy.
Generator now has explicit instructions for when it's stuck: write
the blocker to notes, leave passes as false, and stop. Also adds
a "Do Not Modify" section preventing changes to other stories,
contracts, or config.
Generator must now verify each acceptance criterion against actual
code before setting passes: true. Acts as a first filter before
the evaluator runs, reducing false completions.
Avoid maintaining specific install commands that will go stale.
The evaluator uses whatever tools are available — let users
configure their own testing environment.
Headless mode was half-built and untested. Agent-loop is a plugin
that runs interactively via tmux — there's no CI use case yet.
Removes --headless flag, timeout compatibility shim, output capture
logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.
The planner prompt had vague context window budget percentages that
don't reflect how agents actually work. Replaced with concrete
scope guidance (keep stories to ~10 files) which aligns with the
existing scope budgets in config.json.
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
Three new failure patterns: missing imports after refactoring,
orphaned resource instances, and error detail leakage. These were
observed in a real loop run where the evaluator missed them.
Generator now runs a quick health check before implementing if the
project has tests or a dev server. Catches regressions from previous
iterations early instead of building on a broken foundation.
Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.
Remove hardcoded development paths (ralph-loop, loop-test2) and
absolute-path permissions from the allow list, keeping only
project-agnostic and relative-path permissions.
- Fix evaluator bypass on last story (moved completion check)
- Fix all stale command name references across README, loop.sh, skills, plugin.json
- Fix explore evaluator false rejects (.loop/ files are expected)
- Fix stderr capture order in headless mode
- Fix shell injection risk in hooks.sh python fallback
- Remove .DS_Store from tracking
- Rewrite README to match current architecture (single entry point, tmux, optional tools)
- Add XcodeBuildMCP and iOS simulator MCP to optional tools docs