Commit Graph

65 Commits

Author SHA1 Message Date
ce111b4cbe feat: add guidance for subjective acceptance criteria
Planner now has examples for design/UX criteria that are evaluable
without being purely binary. Prevents the planner from avoiding
qualitative criteria just because they aren't grep-checkable.
2026-03-28 12:59:42 -04:00
77fd9e0cd6 feat: add concrete examples of good vs bad acceptance criteria
Planner now sees specific examples of verifiable criteria (grep,
test commands, file checks) alongside vague anti-patterns. Drives
higher story quality which directly improves evaluator accuracy.
2026-03-28 12:56:53 -04:00
1efca3c185 feat: add blocker handling and artifact protection to generator
Generator now has explicit instructions for when it's stuck: write
the blocker to notes, leave passes as false, and stop. Also adds
a "Do Not Modify" section preventing changes to other stories,
contracts, or config.
2026-03-28 12:40:05 -04:00
e4df81fdac feat: add self-verification gate before generator marks story done
Generator must now verify each acceptance criterion against actual
code before setting passes: true. Acts as a first filter before
the evaluator runs, reducing false completions.
2026-03-28 12:36:24 -04:00
6833d94cf4 docs: mention using Claude or /plan to generate specs 2026-03-28 12:26:40 -04:00
c293f53d90 docs: make runtime verification claim accurate
Only claim what the evaluator actually does: runs tests, builds,
and checks for errors. Don't overstate MCP server discovery.
2026-03-28 12:20:31 -04:00
9fd428ac51 docs: replace specific MCP recommendations with general guidance
Avoid maintaining specific install commands that will go stale.
The evaluator uses whatever tools are available — let users
configure their own testing environment.
2026-03-28 12:19:50 -04:00
c46de6815c refactor: remove headless mode
Headless mode was half-built and untested. Agent-loop is a plugin
that runs interactively via tmux — there's no CI use case yet.
Removes --headless flag, timeout compatibility shim, output capture
logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.
2026-03-28 12:17:30 -04:00
b4d4e1952a docs: rewrite README for plugin-first install
- Remove manual install and install.sh references
- Add prerequisites section (tmux, jq/python3)
- Add step to write a spec before running
- Fix "PRD" → "spec" in modes table
- Add --headless to options list
- Update generator description with startup sequence
- Note evaluator calibration examples
2026-03-28 12:01:05 -04:00
60ce0fef54 fix: tighten vague language across all prompt files
- Remove blanket "write tests" instructions; tests only when
  acceptance criteria require them
- Replace arbitrary "30-50% rejection rate" with clear directive
- Replace "4/5 threshold" with "majority of claims" rule
- List concrete quality gate commands instead of "whatever project uses"
- Remove "learnings" from progress summary (too vague)
- Make error-leak pattern generic (not HTTP-specific)
- Align fix evaluator with updated test expectations
2026-03-28 11:58:13 -04:00
f26bdce534 fix: replace misleading context budget percentages with scope guidance
The planner prompt had vague context window budget percentages that
don't reflect how agents actually work. Replaced with concrete
scope guidance (keep stories to ~10 files) which aligns with the
existing scope budgets in config.json.
2026-03-28 11:49:04 -04:00
2dc291aac4 fix: make evaluator calibration examples project-agnostic
Replace ChaosRush-specific references with generic examples
that apply to any codebase.
2026-03-28 11:21:11 -04:00
1d059e218b feat: add few-shot calibration examples to evaluator prompt
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
2026-03-28 11:15:52 -04:00
80b0f0f4c1 feat: add regression patterns to evaluator implement prompt
Three new failure patterns: missing imports after refactoring,
orphaned resource instances, and error detail leakage. These were
observed in a real loop run where the evaluator missed them.
2026-03-28 10:57:44 -04:00
5e4ad3b12e feat: add smoke test step to generator startup sequence
Generator now runs a quick health check before implementing if the
project has tests or a dev server. Catches regressions from previous
iterations early instead of building on a broken foundation.
2026-03-27 21:09:36 -04:00
9a7fa3a1bd fix: enforce strict orientation sequence in generator prompt
Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.
2026-03-27 21:07:48 -04:00
50e62ca979 fix: correct URLs, author name, and clean up stale hook
- Revert plugin/README/CONTRIBUTING URLs to git.jagfly.com (not on GitHub yet)
- Fix LICENSE copyright to Sheldon Finlay
- Remove leftover Stop hook from settings.local.json
2026-03-27 19:00:26 -04:00
d8c95397f2 feat: US-008 - Add CONTRIBUTING.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:51:33 -04:00
410c17b3b3 feat: US-007 - Increase evalRetries default from 2 to 3 2026-03-27 18:49:40 -04:00
25d53a6b4f feat: US-006 - Improve init.sh.example with project-type guidance 2026-03-27 18:47:44 -04:00
6b6cf842b9 feat: US-005 - Add MIT LICENSE file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:46:38 -04:00
978783d1be feat: US-004 - Update plugin URLs from jagfly.com to GitHub
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:44:59 -04:00
a4e9c4de05 feat: US-003 - Clarify .loop/ changes are expected in explore evaluator 2026-03-27 18:42:46 -04:00
3c518794ee feat: US-002 - Guard against data loss in archive.sh 2026-03-27 18:40:31 -04:00
a935997ac4 feat: US-001 - Clean up settings.local.json
Remove hardcoded development paths (ralph-loop, loop-test2) and
absolute-path permissions from the allow list, keeping only
project-agnostic and relative-path permissions.
2026-03-27 18:38:33 -04:00
e3554010dd fix: auto-close finish screen after 30s so background watcher fires 2026-03-27 18:18:07 -04:00
3d86562205 fix: scope Stop hook to per-agent — prevents killing orchestrating CC session 2026-03-27 16:06:48 -04:00
a9af753a2e fix: setup.sh initializes git repo if none exists 2026-03-27 15:47:43 -04:00
6b13fc3d38 feat: background watcher notifies CC session when loop completes 2026-03-27 15:22:43 -04:00
ddd8790481 docs: note that each loop session is resumable via claude -r 2026-03-27 15:20:05 -04:00
f1fde5cb01 fix: show summary and pause on loop exit — tmux doesn't vanish abruptly 2026-03-27 15:17:13 -04:00
bc7a1e2f04 fix: require spec file before story generation — don't reinvent planning 2026-03-27 15:08:30 -04:00
b3d263258a fix: critical bugs, stale refs, README rewrite, security fixes
- Fix evaluator bypass on last story (moved completion check)
- Fix all stale command name references across README, loop.sh, skills, plugin.json
- Fix explore evaluator false rejects (.loop/ files are expected)
- Fix stderr capture order in headless mode
- Fix shell injection risk in hooks.sh python fallback
- Remove .DS_Store from tracking
- Rewrite README to match current architecture (single entry point, tmux, optional tools)
- Add XcodeBuildMCP and iOS simulator MCP to optional tools docs
2026-03-27 14:58:01 -04:00
f3cbfd258c refactor: remove domain-specific language from prompts — fully universal 2026-03-27 14:50:52 -04:00
48bc656cd8 refactor: trim generator and evaluator prompts — cut total in half 2026-03-27 14:48:42 -04:00
5f8a34cc7b fix: simplify evaluator runtime verification — let claude figure out the tools 2026-03-27 14:45:55 -04:00
ee08e3617c feat: evaluator runtime verification for web projects, optional Playwright docs 2026-03-27 14:30:09 -04:00
18d95fed0d fix: don't capture stdout in interactive mode — run claude directly so UI renders 2026-03-27 13:34:54 -04:00
994908aed2 feat: adopt Ralph pattern — pipe to claude (no --print), working Stop hook 2026-03-27 13:24:13 -04:00
1e7f7ea6ed feat: true interactive mode — run claude directly, verdict via file, no script/capture 2026-03-27 13:07:25 -04:00
5e456cff6d fix: drop osascript, use universal ! tmux attach approach 2026-03-27 12:53:26 -04:00
4a6ddaa193 fix: pass prompt as CLI arg instead of stdin to preserve interactive UI 2026-03-27 12:49:42 -04:00
8129b5736b fix: platform-aware terminal launch — osascript on macOS, fallback on Linux 2026-03-27 12:42:01 -04:00
d457344806 feat: auto-open terminal window attached to tmux session 2026-03-27 12:41:02 -04:00
2a02a54b9d feat: interactive mode — full CC sessions visible in tmux, headless mode via --headless flag 2026-03-27 12:36:56 -04:00
a3cf3e7bae fix: add macOS timeout compatibility (gtimeout or perl fallback) 2026-03-27 12:24:53 -04:00
0666903b5f fix: launch tmux detached, prompt user to attach with ! prefix 2026-03-27 12:14:55 -04:00
e810d1a1db fix: attach to tmux session instead of detaching 2026-03-27 12:10:12 -04:00
a2b4369035 feat: launch execution in tmux, orchestrator monitors progress 2026-03-27 11:48:15 -04:00
f867630639 fix: use bypassPermissions for generator/evaluator agents (autonomous mode) 2026-03-27 10:14:11 -04:00