loop-loop

Author	SHA1	Message	Date
Sheldon Finlay	ce111b4cbe	feat: add guidance for subjective acceptance criteria Planner now has examples for design/UX criteria that are evaluable without being purely binary. Prevents the planner from avoiding qualitative criteria just because they aren't grep-checkable.	2026-03-28 12:59:42 -04:00
Sheldon Finlay	77fd9e0cd6	feat: add concrete examples of good vs bad acceptance criteria Planner now sees specific examples of verifiable criteria (grep, test commands, file checks) alongside vague anti-patterns. Drives higher story quality which directly improves evaluator accuracy.	2026-03-28 12:56:53 -04:00
Sheldon Finlay	1efca3c185	feat: add blocker handling and artifact protection to generator Generator now has explicit instructions for when it's stuck: write the blocker to notes, leave passes as false, and stop. Also adds a "Do Not Modify" section preventing changes to other stories, contracts, or config.	2026-03-28 12:40:05 -04:00
Sheldon Finlay	e4df81fdac	feat: add self-verification gate before generator marks story done Generator must now verify each acceptance criterion against actual code before setting passes: true. Acts as a first filter before the evaluator runs, reducing false completions.	2026-03-28 12:36:24 -04:00
Sheldon Finlay	6833d94cf4	docs: mention using Claude or /plan to generate specs	2026-03-28 12:26:40 -04:00
Sheldon Finlay	c293f53d90	docs: make runtime verification claim accurate Only claim what the evaluator actually does: runs tests, builds, and checks for errors. Don't overstate MCP server discovery.	2026-03-28 12:20:31 -04:00
Sheldon Finlay	9fd428ac51	docs: replace specific MCP recommendations with general guidance Avoid maintaining specific install commands that will go stale. The evaluator uses whatever tools are available — let users configure their own testing environment.	2026-03-28 12:19:50 -04:00
Sheldon Finlay	c46de6815c	refactor: remove headless mode Headless mode was half-built and untested. Agent-loop is a plugin that runs interactively via tmux — there's no CI use case yet. Removes --headless flag, timeout compatibility shim, output capture logic, and LOOP_AGENT_TMPFILE handling. Cuts 82 lines from loop.sh.	2026-03-28 12:17:30 -04:00
Sheldon Finlay	b4d4e1952a	docs: rewrite README for plugin-first install - Remove manual install and install.sh references - Add prerequisites section (tmux, jq/python3) - Add step to write a spec before running - Fix "PRD" → "spec" in modes table - Add --headless to options list - Update generator description with startup sequence - Note evaluator calibration examples	2026-03-28 12:01:05 -04:00
Sheldon Finlay	60ce0fef54	fix: tighten vague language across all prompt files - Remove blanket "write tests" instructions; tests only when acceptance criteria require them - Replace arbitrary "30-50% rejection rate" with clear directive - Replace "4/5 threshold" with "majority of claims" rule - List concrete quality gate commands instead of "whatever project uses" - Remove "learnings" from progress summary (too vague) - Make error-leak pattern generic (not HTTP-specific) - Align fix evaluator with updated test expectations	2026-03-28 11:58:13 -04:00
Sheldon Finlay	f26bdce534	fix: replace misleading context budget percentages with scope guidance The planner prompt had vague context window budget percentages that don't reflect how agents actually work. Replaced with concrete scope guidance (keep stories to ~10 files) which aligns with the existing scope budgets in config.json.	2026-03-28 11:49:04 -04:00
Sheldon Finlay	2dc291aac4	fix: make evaluator calibration examples project-agnostic Replace ChaosRush-specific references with generic examples that apply to any codebase.	2026-03-28 11:21:11 -04:00
Sheldon Finlay	1d059e218b	feat: add few-shot calibration examples to evaluator prompt Three examples showing bad rubber-stamp, good rejection, and good pass patterns. Based on Anthropic's harness design recommendation to calibrate evaluators with few-shot score breakdowns, and informed by real failures observed in a production loop run.	2026-03-28 11:15:52 -04:00
Sheldon Finlay	80b0f0f4c1	feat: add regression patterns to evaluator implement prompt Three new failure patterns: missing imports after refactoring, orphaned resource instances, and error detail leakage. These were observed in a real loop run where the evaluator missed them.	2026-03-28 10:57:44 -04:00
Sheldon Finlay	5e4ad3b12e	feat: add smoke test step to generator startup sequence Generator now runs a quick health check before implementing if the project has tests or a dev server. Catches regressions from previous iterations early instead of building on a broken foundation.	2026-03-27 21:09:36 -04:00
Sheldon Finlay	9a7fa3a1bd	fix: enforce strict orientation sequence in generator prompt Add git log step and explicit gate requiring all startup steps complete before implementation begins. Based on Anthropic's prompting guide recommendation for prescriptive session orientation.	2026-03-27 21:07:48 -04:00
Sheldon Finlay	50e62ca979	fix: correct URLs, author name, and clean up stale hook - Revert plugin/README/CONTRIBUTING URLs to git.jagfly.com (not on GitHub yet) - Fix LICENSE copyright to Sheldon Finlay - Remove leftover Stop hook from settings.local.json	2026-03-27 19:00:26 -04:00
Sheldon Finlay	d8c95397f2	feat: US-008 - Add CONTRIBUTING.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 18:51:33 -04:00
Sheldon Finlay	410c17b3b3	feat: US-007 - Increase evalRetries default from 2 to 3	2026-03-27 18:49:40 -04:00
Sheldon Finlay	25d53a6b4f	feat: US-006 - Improve init.sh.example with project-type guidance	2026-03-27 18:47:44 -04:00
Sheldon Finlay	6b6cf842b9	feat: US-005 - Add MIT LICENSE file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 18:46:38 -04:00
Sheldon Finlay	978783d1be	feat: US-004 - Update plugin URLs from jagfly.com to GitHub Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 18:44:59 -04:00
Sheldon Finlay	a4e9c4de05	feat: US-003 - Clarify .loop/ changes are expected in explore evaluator	2026-03-27 18:42:46 -04:00
Sheldon Finlay	3c518794ee	feat: US-002 - Guard against data loss in archive.sh	2026-03-27 18:40:31 -04:00
Sheldon Finlay	a935997ac4	feat: US-001 - Clean up settings.local.json Remove hardcoded development paths (ralph-loop, loop-test2) and absolute-path permissions from the allow list, keeping only project-agnostic and relative-path permissions.	2026-03-27 18:38:33 -04:00
Sheldon Finlay	e3554010dd	fix: auto-close finish screen after 30s so background watcher fires	2026-03-27 18:18:07 -04:00
Sheldon Finlay	3d86562205	fix: scope Stop hook to per-agent — prevents killing orchestrating CC session	2026-03-27 16:06:48 -04:00
Sheldon Finlay	a9af753a2e	fix: setup.sh initializes git repo if none exists	2026-03-27 15:47:43 -04:00
Sheldon Finlay	6b13fc3d38	feat: background watcher notifies CC session when loop completes	2026-03-27 15:22:43 -04:00
Sheldon Finlay	ddd8790481	docs: note that each loop session is resumable via claude -r	2026-03-27 15:20:05 -04:00
Sheldon Finlay	f1fde5cb01	fix: show summary and pause on loop exit — tmux doesn't vanish abruptly	2026-03-27 15:17:13 -04:00
Sheldon Finlay	bc7a1e2f04	fix: require spec file before story generation — don't reinvent planning	2026-03-27 15:08:30 -04:00
Sheldon Finlay	b3d263258a	fix: critical bugs, stale refs, README rewrite, security fixes - Fix evaluator bypass on last story (moved completion check) - Fix all stale command name references across README, loop.sh, skills, plugin.json - Fix explore evaluator false rejects (.loop/ files are expected) - Fix stderr capture order in headless mode - Fix shell injection risk in hooks.sh python fallback - Remove .DS_Store from tracking - Rewrite README to match current architecture (single entry point, tmux, optional tools) - Add XcodeBuildMCP and iOS simulator MCP to optional tools docs	2026-03-27 14:58:01 -04:00
Sheldon Finlay	f3cbfd258c	refactor: remove domain-specific language from prompts — fully universal	2026-03-27 14:50:52 -04:00
Sheldon Finlay	48bc656cd8	refactor: trim generator and evaluator prompts — cut total in half	2026-03-27 14:48:42 -04:00
Sheldon Finlay	5f8a34cc7b	fix: simplify evaluator runtime verification — let claude figure out the tools	2026-03-27 14:45:55 -04:00
Sheldon Finlay	ee08e3617c	feat: evaluator runtime verification for web projects, optional Playwright docs	2026-03-27 14:30:09 -04:00
Sheldon Finlay	18d95fed0d	fix: don't capture stdout in interactive mode — run claude directly so UI renders	2026-03-27 13:34:54 -04:00
Sheldon Finlay	994908aed2	feat: adopt Ralph pattern — pipe to claude (no --print), working Stop hook	2026-03-27 13:24:13 -04:00
Sheldon Finlay	1e7f7ea6ed	feat: true interactive mode — run claude directly, verdict via file, no script/capture	2026-03-27 13:07:25 -04:00
Sheldon Finlay	5e456cff6d	fix: drop osascript, use universal ! tmux attach approach	2026-03-27 12:53:26 -04:00
Sheldon Finlay	4a6ddaa193	fix: pass prompt as CLI arg instead of stdin to preserve interactive UI	2026-03-27 12:49:42 -04:00
Sheldon Finlay	8129b5736b	fix: platform-aware terminal launch — osascript on macOS, fallback on Linux	2026-03-27 12:42:01 -04:00
Sheldon Finlay	d457344806	feat: auto-open terminal window attached to tmux session	2026-03-27 12:41:02 -04:00
Sheldon Finlay	2a02a54b9d	feat: interactive mode — full CC sessions visible in tmux, headless mode via --headless flag	2026-03-27 12:36:56 -04:00
Sheldon Finlay	a3cf3e7bae	fix: add macOS timeout compatibility (gtimeout or perl fallback)	2026-03-27 12:24:53 -04:00
Sheldon Finlay	0666903b5f	fix: launch tmux detached, prompt user to attach with ! prefix	2026-03-27 12:14:55 -04:00
Sheldon Finlay	e810d1a1db	fix: attach to tmux session instead of detaching	2026-03-27 12:10:12 -04:00
Sheldon Finlay	a2b4369035	feat: launch execution in tmux, orchestrator monitors progress	2026-03-27 11:48:15 -04:00
Sheldon Finlay	f867630639	fix: use bypassPermissions for generator/evaluator agents (autonomous mode)	2026-03-27 10:14:11 -04:00

1 2

65 Commits