The tmux session name is now derived from the project directory name
(e.g., agent-loop-server, agent-loop-webapp). This allows running
multiple loops in parallel on different projects without collisions.
Previously hardcoded to "agent-loop", which meant launching a second
loop would kill the first project's tmux session.
Per-iteration install/remove had a race condition: settings.local.json
was written immediately before CC started, and CC could read the old
file (without the hook) on the first iteration.
Now the hook is installed once when loop.sh starts and removed on exit.
The AGENT_LOOP_ACTIVE env var guard ensures it only fires for CC sessions
spawned by the loop, so keeping it installed the whole time is safe.
The tmux display-message approach had edge cases: it could succeed outside
tmux, fail on first iteration, or behave differently depending on tmux
socket state.
Replace with AGENT_LOOP_ACTIVE env var exported by loop.sh. CC sessions
spawned by the loop inherit it; interactive CC sessions don't. Simple,
no external dependencies, no race conditions.
tmux display-message succeeds even outside tmux by falling back to the
most recently created session (agent-loop). This caused the hook to
match and kill interactive CC sessions.
Fix: check $TMUX env var first — only set when actually inside tmux.
setup.sh now stamps .harness-version in .loop/ at scaffold time. On each
/agent-loop:run, Phase 1 compares the installed harness version against
the plugin version and auto-updates lib/, prompts/, and loop.sh if stale.
Run state (prd.json, contracts, config.json) is preserved.
Also adds setup.sh --update mode for refreshing harness files without
re-scaffolding. Bump to 0.10.0.
The Stop hook (kill -INT $PPID) was written to the project's
settings.local.json, causing ANY Claude Code session in the same project
to kill its parent shell on exit — not just the loop's sessions.
Now the hook checks tmux session name before firing: only CC sessions
inside the "agent-loop" tmux session trigger the kill. Other CC sessions
in the same project are unaffected.
When /agent-loop:run detects a previous run with all stories passed (or the
feature branch deleted after merge), it archives the old artifacts and resets
.loop/ automatically — no more manual rm -rf .loop.
- Add archive_and_reset() for on-demand archiving from skills
- Add runs.log index tracking all archived runs
- Update /run and /stories skills to detect completed runs
- setup.sh archives instead of hard-failing when prd.json exists
- Bump version to 0.9.0
Planner now has examples for design/UX criteria that are evaluable
without being purely binary. Prevents the planner from avoiding
qualitative criteria just because they aren't grep-checkable.
Planner now sees specific examples of verifiable criteria (grep,
test commands, file checks) alongside vague anti-patterns. Drives
higher story quality which directly improves evaluator accuracy.
Generator now has explicit instructions for when it's stuck: write
the blocker to notes, leave passes as false, and stop. Also adds
a "Do Not Modify" section preventing changes to other stories,
contracts, or config.
Generator must now verify each acceptance criterion against actual
code before setting passes: true. Acts as a first filter before
the evaluator runs, reducing false completions.
Avoid maintaining specific install commands that will go stale.
The evaluator uses whatever tools are available — let users
configure their own testing environment.
Headless mode was half-built and untested. Agent-loop is a plugin
that runs interactively via tmux — there's no CI use case yet.
Removes --headless flag, timeout compatibility shim, output capture
logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.
The planner prompt had vague context window budget percentages that
don't reflect how agents actually work. Replaced with concrete
scope guidance (keep stories to ~10 files) which aligns with the
existing scope budgets in config.json.
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
Three new failure patterns: missing imports after refactoring,
orphaned resource instances, and error detail leakage. These were
observed in a real loop run where the evaluator missed them.
Generator now runs a quick health check before implementing if the
project has tests or a dev server. Catches regressions from previous
iterations early instead of building on a broken foundation.
Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.
Remove hardcoded development paths (ralph-loop, loop-test2) and
absolute-path permissions from the allow list, keeping only
project-agnostic and relative-path permissions.
- Fix evaluator bypass on last story (moved completion check)
- Fix all stale command name references across README, loop.sh, skills, plugin.json
- Fix explore evaluator false rejects (.loop/ files are expected)
- Fix stderr capture order in headless mode
- Fix shell injection risk in hooks.sh python fallback
- Remove .DS_Store from tracking
- Rewrite README to match current architecture (single entry point, tmux, optional tools)
- Add XcodeBuildMCP and iOS simulator MCP to optional tools docs