From 60ce0fef5446f025de5618e687d1e169ef18192d Mon Sep 17 00:00:00 2001 From: Sheldon Finlay Date: Sat, 28 Mar 2026 11:58:13 -0400 Subject: [PATCH] fix: tighten vague language across all prompt files - Remove blanket "write tests" instructions; tests only when acceptance criteria require them - Replace arbitrary "30-50% rejection rate" with clear directive - Replace "4/5 threshold" with "majority of claims" rule - List concrete quality gate commands instead of "whatever project uses" - Remove "learnings" from progress summary (too vague) - Make error-leak pattern generic (not HTTP-specific) - Align fix evaluator with updated test expectations --- prompts/evaluator/_base.md | 2 +- prompts/evaluator/explore.md | 2 +- prompts/evaluator/fix.md | 5 ++--- prompts/evaluator/implement.md | 2 +- prompts/generator/_base.md | 4 ++-- prompts/generator/fix.md | 4 ++-- prompts/generator/implement.md | 2 +- 7 files changed, 10 insertions(+), 11 deletions(-) diff --git a/prompts/evaluator/_base.md b/prompts/evaluator/_base.md index 74a9f5e..282ee6b 100644 --- a/prompts/evaluator/_base.md +++ b/prompts/evaluator/_base.md @@ -10,7 +10,7 @@ You (Claude) have well-documented tendencies that make you a poor QA agent by de **OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence. -**Rejection is normal and healthy.** Rejecting 30-50% of iterations is expected. +**Rejection is normal and healthy.** Do not hesitate to reject when criteria aren't met. ## Your Target diff --git a/prompts/evaluator/explore.md b/prompts/evaluator/explore.md index 5ce7f17..a821183 100644 --- a/prompts/evaluator/explore.md +++ b/prompts/evaluator/explore.md @@ -37,7 +37,7 @@ Claims Verified: ## Grading Criteria -- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed) +- **Accuracy**: Are the majority of verified claims correct? If more than one claim is incorrect, reject. - **Completeness**: Did it cover the important parts of the area? - **Actionability**: Can someone act on the recommendations without additional research? diff --git a/prompts/evaluator/fix.md b/prompts/evaluator/fix.md index 565f81b..506e241 100644 --- a/prompts/evaluator/fix.md +++ b/prompts/evaluator/fix.md @@ -9,8 +9,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav - Would this fix survive edge cases? - Did the generator patch around the bug or fix the actual cause? -2. **Verify a regression test exists:** - - Is there a new or updated test? +2. **If the acceptance criteria require a regression test, verify it exists:** - Does the test actually reproduce the original bug scenario? - Would the test fail if the fix were reverted? @@ -27,7 +26,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav ## Rejection Criteria (Fix-Specific) - Fix addresses symptom but not root cause -- No regression test added +- Acceptance criteria require a regression test but none was added - Existing tests fail after the fix - Unrelated changes included in the commit - Fix introduces a new bug or security issue diff --git a/prompts/evaluator/implement.md b/prompts/evaluator/implement.md index 57e7bfd..32ecdf3 100644 --- a/prompts/evaluator/implement.md +++ b/prompts/evaluator/implement.md @@ -17,4 +17,4 @@ You are evaluating an implementation story. The generator claims to have built a - Code exists but doesn't actually run - Removed an import or variable during refactoring but it's still used elsewhere in the file - New instance of a shared resource (e.g., DB connection, rate limiter) instead of using the existing one -- Error details leaked to HTTP responses (use logging server-side, return generic message to client) +- Internal error details (stack traces, exception messages) exposed in user-facing output instead of being logged server-side diff --git a/prompts/generator/_base.md b/prompts/generator/_base.md index 7e06a2e..927c405 100644 --- a/prompts/generator/_base.md +++ b/prompts/generator/_base.md @@ -16,13 +16,13 @@ Do NOT start implementation until steps 1-5 are complete. - **ONE story per iteration.** Do not attempt multiple stories. - **Read before writing.** Understand existing code before modifying. - **No placeholders.** Every implementation must be complete and functional. -- **Run quality gates** before committing (typecheck, tests, lint — whatever the project uses). +- **Run quality gates** before committing. Check for common tools (`npm test`, `pytest`, `cargo test`, `make test`, `go test ./...`) and run what's available. If no test tooling exists, verify manually. - **Commit** with message: `feat: [Story ID] - [Story Title]` ## After Completing 1. Update `.loop/prd.json` — set `passes: true` for the story -2. Append a summary to `.loop/progress.md` — what was done, files changed, learnings +2. Append a summary to `.loop/progress.md` — what was done and which files were changed 3. Update Codebase Patterns in progress.md if you discovered a reusable pattern ## Completion Signal diff --git a/prompts/generator/fix.md b/prompts/generator/fix.md index e89d7c1..4770f60 100644 --- a/prompts/generator/fix.md +++ b/prompts/generator/fix.md @@ -8,7 +8,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is 2. Read the sprint contract for context on what's broken and what "fixed" means 3. **Understand the root cause before changing anything.** Read the relevant code, trace the execution path, understand WHY the bug exists. 4. Make the minimal change to fix the issue -5. Write or update a test that would have caught this bug +5. If the story's acceptance criteria require a regression test, write one 6. Run quality gates 7. Commit @@ -16,7 +16,7 @@ You are fixing bugs or reducing tech debt from a prioritized list. Each story is - **Fix only what the story describes.** Do not fix adjacent issues, even if you notice them. Note them in progress.md for future iterations. - **Minimal diff.** The smaller the change, the easier to review and the less risk of regressions. -- **Add a regression test.** Every bug fix should include a test that reproduces the bug and verifies the fix. If no test framework exists, note this in progress.md. +- **Add a regression test only if the acceptance criteria require it.** Not every fix is testable (config changes, prompt edits, dependency updates). - **Preserve behavior.** For tech debt refactors, the external behavior must not change. Only internal structure should improve. ## Git Workflow diff --git a/prompts/generator/implement.md b/prompts/generator/implement.md index bf58f67..d3fe6e6 100644 --- a/prompts/generator/implement.md +++ b/prompts/generator/implement.md @@ -16,7 +16,7 @@ You are building features from a PRD. Each story is a small, self-contained unit - **Minimal changes only.** Do not refactor surrounding code or add features beyond scope. - **Follow the contract's Out of Scope section.** -- **If tests don't exist yet,** write them as part of the story. +- **Write tests only if the story's acceptance criteria require them.** - **If you need a dependency,** install it and note it in progress.md. ## Git