loop-loop

Author	SHA1	Message	Date
Sheldon Finlay	ce111b4cbe	feat: add guidance for subjective acceptance criteria Planner now has examples for design/UX criteria that are evaluable without being purely binary. Prevents the planner from avoiding qualitative criteria just because they aren't grep-checkable.	2026-03-28 12:59:42 -04:00
Sheldon Finlay	77fd9e0cd6	feat: add concrete examples of good vs bad acceptance criteria Planner now sees specific examples of verifiable criteria (grep, test commands, file checks) alongside vague anti-patterns. Drives higher story quality which directly improves evaluator accuracy.	2026-03-28 12:56:53 -04:00
Sheldon Finlay	1efca3c185	feat: add blocker handling and artifact protection to generator Generator now has explicit instructions for when it's stuck: write the blocker to notes, leave passes as false, and stop. Also adds a "Do Not Modify" section preventing changes to other stories, contracts, or config.	2026-03-28 12:40:05 -04:00
Sheldon Finlay	e4df81fdac	feat: add self-verification gate before generator marks story done Generator must now verify each acceptance criterion against actual code before setting passes: true. Acts as a first filter before the evaluator runs, reducing false completions.	2026-03-28 12:36:24 -04:00
Sheldon Finlay	60ce0fef54	fix: tighten vague language across all prompt files - Remove blanket "write tests" instructions; tests only when acceptance criteria require them - Replace arbitrary "30-50% rejection rate" with clear directive - Replace "4/5 threshold" with "majority of claims" rule - List concrete quality gate commands instead of "whatever project uses" - Remove "learnings" from progress summary (too vague) - Make error-leak pattern generic (not HTTP-specific) - Align fix evaluator with updated test expectations	2026-03-28 11:58:13 -04:00
Sheldon Finlay	f26bdce534	fix: replace misleading context budget percentages with scope guidance The planner prompt had vague context window budget percentages that don't reflect how agents actually work. Replaced with concrete scope guidance (keep stories to ~10 files) which aligns with the existing scope budgets in config.json.	2026-03-28 11:49:04 -04:00
Sheldon Finlay	2dc291aac4	fix: make evaluator calibration examples project-agnostic Replace ChaosRush-specific references with generic examples that apply to any codebase.	2026-03-28 11:21:11 -04:00
Sheldon Finlay	1d059e218b	feat: add few-shot calibration examples to evaluator prompt Three examples showing bad rubber-stamp, good rejection, and good pass patterns. Based on Anthropic's harness design recommendation to calibrate evaluators with few-shot score breakdowns, and informed by real failures observed in a production loop run.	2026-03-28 11:15:52 -04:00
Sheldon Finlay	80b0f0f4c1	feat: add regression patterns to evaluator implement prompt Three new failure patterns: missing imports after refactoring, orphaned resource instances, and error detail leakage. These were observed in a real loop run where the evaluator missed them.	2026-03-28 10:57:44 -04:00
Sheldon Finlay	5e4ad3b12e	feat: add smoke test step to generator startup sequence Generator now runs a quick health check before implementing if the project has tests or a dev server. Catches regressions from previous iterations early instead of building on a broken foundation.	2026-03-27 21:09:36 -04:00
Sheldon Finlay	9a7fa3a1bd	fix: enforce strict orientation sequence in generator prompt Add git log step and explicit gate requiring all startup steps complete before implementation begins. Based on Anthropic's prompting guide recommendation for prescriptive session orientation.	2026-03-27 21:07:48 -04:00
Sheldon Finlay	a4e9c4de05	feat: US-003 - Clarify .loop/ changes are expected in explore evaluator	2026-03-27 18:42:46 -04:00
Sheldon Finlay	b3d263258a	fix: critical bugs, stale refs, README rewrite, security fixes - Fix evaluator bypass on last story (moved completion check) - Fix all stale command name references across README, loop.sh, skills, plugin.json - Fix explore evaluator false rejects (.loop/ files are expected) - Fix stderr capture order in headless mode - Fix shell injection risk in hooks.sh python fallback - Remove .DS_Store from tracking - Rewrite README to match current architecture (single entry point, tmux, optional tools) - Add XcodeBuildMCP and iOS simulator MCP to optional tools docs	2026-03-27 14:58:01 -04:00
Sheldon Finlay	f3cbfd258c	refactor: remove domain-specific language from prompts — fully universal	2026-03-27 14:50:52 -04:00
Sheldon Finlay	48bc656cd8	refactor: trim generator and evaluator prompts — cut total in half	2026-03-27 14:48:42 -04:00
Sheldon Finlay	5f8a34cc7b	fix: simplify evaluator runtime verification — let claude figure out the tools	2026-03-27 14:45:55 -04:00
Sheldon Finlay	ee08e3617c	feat: evaluator runtime verification for web projects, optional Playwright docs	2026-03-27 14:30:09 -04:00
Sheldon Finlay	1e7f7ea6ed	feat: true interactive mode — run claude directly, verdict via file, no script/capture	2026-03-27 13:07:25 -04:00
Sheldon Finlay	17e5eb707f	feat: agent loop harness with Claude Code plugin support Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.	2026-03-27 08:03:18 -04:00

19 Commits