loop-loop

Author	SHA1	Message	Date
Sheldon Finlay	2dc291aac4	fix: make evaluator calibration examples project-agnostic Replace ChaosRush-specific references with generic examples that apply to any codebase.	2026-03-28 11:21:11 -04:00
Sheldon Finlay	1d059e218b	feat: add few-shot calibration examples to evaluator prompt Three examples showing bad rubber-stamp, good rejection, and good pass patterns. Based on Anthropic's harness design recommendation to calibrate evaluators with few-shot score breakdowns, and informed by real failures observed in a production loop run.	2026-03-28 11:15:52 -04:00
Sheldon Finlay	80b0f0f4c1	feat: add regression patterns to evaluator implement prompt Three new failure patterns: missing imports after refactoring, orphaned resource instances, and error detail leakage. These were observed in a real loop run where the evaluator missed them.	2026-03-28 10:57:44 -04:00
Sheldon Finlay	a4e9c4de05	feat: US-003 - Clarify .loop/ changes are expected in explore evaluator	2026-03-27 18:42:46 -04:00
Sheldon Finlay	b3d263258a	fix: critical bugs, stale refs, README rewrite, security fixes - Fix evaluator bypass on last story (moved completion check) - Fix all stale command name references across README, loop.sh, skills, plugin.json - Fix explore evaluator false rejects (.loop/ files are expected) - Fix stderr capture order in headless mode - Fix shell injection risk in hooks.sh python fallback - Remove .DS_Store from tracking - Rewrite README to match current architecture (single entry point, tmux, optional tools) - Add XcodeBuildMCP and iOS simulator MCP to optional tools docs	2026-03-27 14:58:01 -04:00
Sheldon Finlay	f3cbfd258c	refactor: remove domain-specific language from prompts — fully universal	2026-03-27 14:50:52 -04:00
Sheldon Finlay	48bc656cd8	refactor: trim generator and evaluator prompts — cut total in half	2026-03-27 14:48:42 -04:00
Sheldon Finlay	5f8a34cc7b	fix: simplify evaluator runtime verification — let claude figure out the tools	2026-03-27 14:45:55 -04:00
Sheldon Finlay	ee08e3617c	feat: evaluator runtime verification for web projects, optional Playwright docs	2026-03-27 14:30:09 -04:00
Sheldon Finlay	1e7f7ea6ed	feat: true interactive mode — run claude directly, verdict via file, no script/capture	2026-03-27 13:07:25 -04:00
Sheldon Finlay	17e5eb707f	feat: agent loop harness with Claude Code plugin support Generator-evaluator architecture with iterative context-reset for long-running coding tasks. Ships as a Claude Code plugin — install with /plugin and use /agent-loop:init, /agent-loop:plan, /agent-loop:run.	2026-03-27 08:03:18 -04:00

11 Commits