Commit Graph

58 Commits

Author SHA1 Message Date
c46de6815c refactor: remove headless mode
Headless mode was half-built and untested. Agent-loop is a plugin
that runs interactively via tmux — there's no CI use case yet.
Removes --headless flag, timeout compatibility shim, output capture
logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.
2026-03-28 12:17:30 -04:00
b4d4e1952a docs: rewrite README for plugin-first install
- Remove manual install and install.sh references
- Add prerequisites section (tmux, jq/python3)
- Add step to write a spec before running
- Fix "PRD" → "spec" in modes table
- Add --headless to options list
- Update generator description with startup sequence
- Note evaluator calibration examples
2026-03-28 12:01:05 -04:00
60ce0fef54 fix: tighten vague language across all prompt files
- Remove blanket "write tests" instructions; tests only when
  acceptance criteria require them
- Replace arbitrary "30-50% rejection rate" with clear directive
- Replace "4/5 threshold" with "majority of claims" rule
- List concrete quality gate commands instead of "whatever project uses"
- Remove "learnings" from progress summary (too vague)
- Make error-leak pattern generic (not HTTP-specific)
- Align fix evaluator with updated test expectations
2026-03-28 11:58:13 -04:00
f26bdce534 fix: replace misleading context budget percentages with scope guidance
The planner prompt had vague context window budget percentages that
don't reflect how agents actually work. Replaced with concrete
scope guidance (keep stories to ~10 files) which aligns with the
existing scope budgets in config.json.
2026-03-28 11:49:04 -04:00
2dc291aac4 fix: make evaluator calibration examples project-agnostic
Replace ChaosRush-specific references with generic examples
that apply to any codebase.
2026-03-28 11:21:11 -04:00
1d059e218b feat: add few-shot calibration examples to evaluator prompt
Three examples showing bad rubber-stamp, good rejection, and good
pass patterns. Based on Anthropic's harness design recommendation
to calibrate evaluators with few-shot score breakdowns, and
informed by real failures observed in a production loop run.
2026-03-28 11:15:52 -04:00
80b0f0f4c1 feat: add regression patterns to evaluator implement prompt
Three new failure patterns: missing imports after refactoring,
orphaned resource instances, and error detail leakage. These were
observed in a real loop run where the evaluator missed them.
2026-03-28 10:57:44 -04:00
5e4ad3b12e feat: add smoke test step to generator startup sequence
Generator now runs a quick health check before implementing if the
project has tests or a dev server. Catches regressions from previous
iterations early instead of building on a broken foundation.
2026-03-27 21:09:36 -04:00
9a7fa3a1bd fix: enforce strict orientation sequence in generator prompt
Add git log step and explicit gate requiring all startup steps
complete before implementation begins. Based on Anthropic's
prompting guide recommendation for prescriptive session orientation.
2026-03-27 21:07:48 -04:00
50e62ca979 fix: correct URLs, author name, and clean up stale hook
- Revert plugin/README/CONTRIBUTING URLs to git.jagfly.com (not on GitHub yet)
- Fix LICENSE copyright to Sheldon Finlay
- Remove leftover Stop hook from settings.local.json
2026-03-27 19:00:26 -04:00
d8c95397f2 feat: US-008 - Add CONTRIBUTING.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:51:33 -04:00
410c17b3b3 feat: US-007 - Increase evalRetries default from 2 to 3 2026-03-27 18:49:40 -04:00
25d53a6b4f feat: US-006 - Improve init.sh.example with project-type guidance 2026-03-27 18:47:44 -04:00
6b6cf842b9 feat: US-005 - Add MIT LICENSE file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:46:38 -04:00
978783d1be feat: US-004 - Update plugin URLs from jagfly.com to GitHub
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:44:59 -04:00
a4e9c4de05 feat: US-003 - Clarify .loop/ changes are expected in explore evaluator 2026-03-27 18:42:46 -04:00
3c518794ee feat: US-002 - Guard against data loss in archive.sh 2026-03-27 18:40:31 -04:00
a935997ac4 feat: US-001 - Clean up settings.local.json
Remove hardcoded development paths (ralph-loop, loop-test2) and
absolute-path permissions from the allow list, keeping only
project-agnostic and relative-path permissions.
2026-03-27 18:38:33 -04:00
e3554010dd fix: auto-close finish screen after 30s so background watcher fires 2026-03-27 18:18:07 -04:00
3d86562205 fix: scope Stop hook to per-agent — prevents killing orchestrating CC session 2026-03-27 16:06:48 -04:00
a9af753a2e fix: setup.sh initializes git repo if none exists 2026-03-27 15:47:43 -04:00
6b13fc3d38 feat: background watcher notifies CC session when loop completes 2026-03-27 15:22:43 -04:00
ddd8790481 docs: note that each loop session is resumable via claude -r 2026-03-27 15:20:05 -04:00
f1fde5cb01 fix: show summary and pause on loop exit — tmux doesn't vanish abruptly 2026-03-27 15:17:13 -04:00
bc7a1e2f04 fix: require spec file before story generation — don't reinvent planning 2026-03-27 15:08:30 -04:00
b3d263258a fix: critical bugs, stale refs, README rewrite, security fixes
- Fix evaluator bypass on last story (moved completion check)
- Fix all stale command name references across README, loop.sh, skills, plugin.json
- Fix explore evaluator false rejects (.loop/ files are expected)
- Fix stderr capture order in headless mode
- Fix shell injection risk in hooks.sh python fallback
- Remove .DS_Store from tracking
- Rewrite README to match current architecture (single entry point, tmux, optional tools)
- Add XcodeBuildMCP and iOS simulator MCP to optional tools docs
2026-03-27 14:58:01 -04:00
f3cbfd258c refactor: remove domain-specific language from prompts — fully universal 2026-03-27 14:50:52 -04:00
48bc656cd8 refactor: trim generator and evaluator prompts — cut total in half 2026-03-27 14:48:42 -04:00
5f8a34cc7b fix: simplify evaluator runtime verification — let claude figure out the tools 2026-03-27 14:45:55 -04:00
ee08e3617c feat: evaluator runtime verification for web projects, optional Playwright docs 2026-03-27 14:30:09 -04:00
18d95fed0d fix: don't capture stdout in interactive mode — run claude directly so UI renders 2026-03-27 13:34:54 -04:00
994908aed2 feat: adopt Ralph pattern — pipe to claude (no --print), working Stop hook 2026-03-27 13:24:13 -04:00
1e7f7ea6ed feat: true interactive mode — run claude directly, verdict via file, no script/capture 2026-03-27 13:07:25 -04:00
5e456cff6d fix: drop osascript, use universal ! tmux attach approach 2026-03-27 12:53:26 -04:00
4a6ddaa193 fix: pass prompt as CLI arg instead of stdin to preserve interactive UI 2026-03-27 12:49:42 -04:00
8129b5736b fix: platform-aware terminal launch — osascript on macOS, fallback on Linux 2026-03-27 12:42:01 -04:00
d457344806 feat: auto-open terminal window attached to tmux session 2026-03-27 12:41:02 -04:00
2a02a54b9d feat: interactive mode — full CC sessions visible in tmux, headless mode via --headless flag 2026-03-27 12:36:56 -04:00
a3cf3e7bae fix: add macOS timeout compatibility (gtimeout or perl fallback) 2026-03-27 12:24:53 -04:00
0666903b5f fix: launch tmux detached, prompt user to attach with ! prefix 2026-03-27 12:14:55 -04:00
e810d1a1db fix: attach to tmux session instead of detaching 2026-03-27 12:10:12 -04:00
a2b4369035 feat: launch execution in tmux, orchestrator monitors progress 2026-03-27 11:48:15 -04:00
f867630639 fix: use bypassPermissions for generator/evaluator agents (autonomous mode) 2026-03-27 10:14:11 -04:00
9508ad20b6 fix: rename init to setup to avoid built-in /init conflict 2026-03-27 10:01:50 -04:00
2a78915dcf feat: single entry point /agent-loop:run handles setup, planning, and execution 2026-03-27 09:53:52 -04:00
381741509d fix: rename generate to stories to avoid autocomplete issues 2026-03-27 09:49:10 -04:00
8c4e123976 fix: rename plan skill to generate to avoid name collision with built-in /plan 2026-03-27 09:39:13 -04:00
e9d87fa6a1 chore: bump to 0.3.0 2026-03-27 09:28:06 -04:00
86b2b7271b feat: bash setup script, planner agent with disallowedTools, simplified skills 2026-03-27 09:23:42 -04:00
53086c9dbc fix: radically simplify skills — each does exactly one thing, no chaining, explicit boundaries 2026-03-27 09:03:47 -04:00