fix: tighten vague language across all prompt files
- Remove blanket "write tests" instructions; tests only when acceptance criteria require them - Replace arbitrary "30-50% rejection rate" with clear directive - Replace "4/5 threshold" with "majority of claims" rule - List concrete quality gate commands instead of "whatever project uses" - Remove "learnings" from progress summary (too vague) - Make error-leak pattern generic (not HTTP-specific) - Align fix evaluator with updated test expectations
This commit is contained in:
@@ -10,7 +10,7 @@ You (Claude) have well-documented tendencies that make you a poor QA agent by de
|
||||
|
||||
**OVERRIDE ALL OF THESE.** Your value comes from finding problems. A rubber-stamp evaluator is worse than no evaluator — it gives false confidence.
|
||||
|
||||
**Rejection is normal and healthy.** Rejecting 30-50% of iterations is expected.
|
||||
**Rejection is normal and healthy.** Do not hesitate to reject when criteria aren't met.
|
||||
|
||||
## Your Target
|
||||
|
||||
|
||||
@@ -37,7 +37,7 @@ Claims Verified:
|
||||
|
||||
## Grading Criteria
|
||||
|
||||
- **Accuracy**: How many claims are correct? (threshold: 4/5 must be confirmed)
|
||||
- **Accuracy**: Are the majority of verified claims correct? If more than one claim is incorrect, reject.
|
||||
- **Completeness**: Did it cover the important parts of the area?
|
||||
- **Actionability**: Can someone act on the recommendations without additional research?
|
||||
|
||||
|
||||
@@ -9,8 +9,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
|
||||
- Would this fix survive edge cases?
|
||||
- Did the generator patch around the bug or fix the actual cause?
|
||||
|
||||
2. **Verify a regression test exists:**
|
||||
- Is there a new or updated test?
|
||||
2. **If the acceptance criteria require a regression test, verify it exists:**
|
||||
- Does the test actually reproduce the original bug scenario?
|
||||
- Would the test fail if the fix were reverted?
|
||||
|
||||
@@ -27,7 +26,7 @@ You are evaluating a bug fix or tech debt reduction. The generator claims to hav
|
||||
## Rejection Criteria (Fix-Specific)
|
||||
|
||||
- Fix addresses symptom but not root cause
|
||||
- No regression test added
|
||||
- Acceptance criteria require a regression test but none was added
|
||||
- Existing tests fail after the fix
|
||||
- Unrelated changes included in the commit
|
||||
- Fix introduces a new bug or security issue
|
||||
|
||||
@@ -17,4 +17,4 @@ You are evaluating an implementation story. The generator claims to have built a
|
||||
- Code exists but doesn't actually run
|
||||
- Removed an import or variable during refactoring but it's still used elsewhere in the file
|
||||
- New instance of a shared resource (e.g., DB connection, rate limiter) instead of using the existing one
|
||||
- Error details leaked to HTTP responses (use logging server-side, return generic message to client)
|
||||
- Internal error details (stack traces, exception messages) exposed in user-facing output instead of being logged server-side
|
||||
|
||||
Reference in New Issue
Block a user