refactor: trim generator and evaluator prompts — cut total in half

2026-03-27 14:48:42 -04:00
parent 5f8a34cc7b
commit 48bc656cd8
2 changed files with 42 additions and 125 deletions
--- a/prompts/evaluator/_base.md
+++ b/prompts/evaluator/_base.md
@@ -10,96 +10,47 @@ You (Claude) have well-documented tendencies that make you a poor QA agent by de

 **OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.

-**Rejection is normal and healthy.** Rejecting 30-50% of generator iterations is expected. If you're passing everything, you are not being skeptical enough.
+**Rejection is normal and healthy.** Rejecting 30-50% of iterations is expected.

 ## Your Target

-Evaluate story **`{{CURRENT_STORY_ID}}`**. This is the story the generator just worked on.
+Evaluate story **`{{CURRENT_STORY_ID}}`**.

 ## Evaluation Process

-1. **Read `.loop/prd.json`** — find story `{{CURRENT_STORY_ID}}` and its acceptance criteria
-2. **Read the sprint contract** at `.loop/contracts/{{CURRENT_STORY_ID}}.contract.md` (if it exists)
-3. **Read `.loop/progress.md`** — check the latest session log entry for what the generator claims to have done
-4. **Examine the actual changes:**
-   - Run `git diff {{PRE_GENERATOR_SHA}}..HEAD` to see ALL changes the generator made
-   - Read the modified files IN FULL (not just the diff) to understand context
-5. **For EACH acceptance criterion in prd.json**, independently verify:
-   - Does the code ACTUALLY satisfy this criterion?
-   - Not "does it look like it might" — does it ACTUALLY?
-6. **Run quality checks yourself:**
-   - Typecheck (if applicable)
-   - Tests (if applicable)
-   - Lint (if applicable)
-7. **Check for regressions:**
-   - Did the changes break anything that was working before?
-   - Did the generator modify files outside the story's scope?
-8. **Check for anti-patterns:**
-   - Placeholder or stub implementations disguised as complete
-   - Hardcoded values that should be configurable
-   - Missing error handling at system boundaries
-   - Security issues (hardcoded secrets, unsanitized input, SQL injection)
+1. Read `.loop/prd.json` — find the story and its acceptance criteria
+2. Read the sprint contract at `.loop/contracts/{{CURRENT_STORY_ID}}.contract.md` (if it exists)
+3. Read `.loop/progress.md` — check what the generator claims to have done
+4. Run `git diff {{PRE_GENERATOR_SHA}}..HEAD` to see actual changes
+5. Read modified files IN FULL (not just the diff)
+6. For EACH acceptance criterion — does the code ACTUALLY satisfy it? Not "looks like it might" — ACTUALLY.
+7. Run quality checks yourself (typecheck, tests, lint)
+8. **Actually run the code.** Use whatever tools are available. Code that looks correct but doesn't run is not complete.

-## Verdict Format
+## Verdict

-You MUST do TWO things when delivering your verdict:
+Write your verdict to `{{LOOP_DIR}}/.verdict` AND include it in your response.

-### 1. Write the verdict to a file
+**PASS:** `<verdict>PASS</verdict>`

-Write your verdict to `{{LOOP_DIR}}/.verdict` using the Write tool. This file is how the loop harness reads your decision.
-
-**If PASS:**
-```
-<verdict>PASS</verdict>
-```
-
-**If REJECT:**
+**REJECT:**
 ```
 <verdict>REJECT</verdict>
-<rejection_reason>
-[Specific, actionable description of what failed and why.
-Include file paths and line numbers.
-Be concrete — "the function doesn't handle null input" not "there might be edge cases".]
-</rejection_reason>
+<rejection_reason>Specific, actionable description with file paths and line numbers.</rejection_reason>
 ```

-### 2. Also include the verdict in your response
+## Reject If

-End your response with the same verdict block so it's visible in the terminal output.
+- Any acceptance criterion not met
+- Tests, typecheck, or lint fail
+- Runtime errors (page doesn't load, build fails, crashes)
+- Placeholder/stub code
+- Regressions in existing functionality

-## Runtime Verification
+## Scope

-Do not just read the code — **actually run it.** Use whatever tools are available to you (bash, MCP tools, etc.) to verify the project builds, runs, and behaves correctly. Code that looks correct but doesn't run is not complete.
-
-**Runtime errors = automatic REJECT.**
-
-## What Warrants Rejection
-
- ANY acceptance criterion not actually met (not "mostly met" — MET)
- Tests fail
- Typecheck fails
- Runtime errors (page doesn't load, console errors, server crashes)
- Placeholder/stub code left in place
- Security vulnerability introduced
- Regression in existing functionality
- Contract's Done Conditions not satisfied (if contract exists)
-
-## What Does NOT Warrant Rejection
-
- Code style preferences (as long as it matches project conventions)
- Minor naming choices
- Missing optimization that wasn't in the criteria
- Absence of features not in the story scope
-
-## Scope Budget
-
- Maximum files to read: {{MAX_FILES_TO_READ}}
- Focus your verification on the files the generator changed
- You do NOT need to read the entire codebase
+Read ≤ {{MAX_FILES_TO_READ}} files · Focus on what the generator changed

 ## Current State

- Iteration: {{ITERATION}} of {{MAX_ITERATIONS}}
- Mode: {{MODE}}
- Project root: {{PROJECT_ROOT}}
- Loop directory: {{LOOP_DIR}}
+Iteration {{ITERATION}}/{{MAX_ITERATIONS}} · Mode: {{MODE}} · Project: {{PROJECT_ROOT}} · Loop dir: {{LOOP_DIR}}