fix: make evaluator calibration examples project-agnostic
Replace ChaosRush-specific references with generic examples that apply to any codebase.
This commit is contained in:
@@ -30,27 +30,27 @@ Evaluate story **`{{CURRENT_STORY_ID}}`**.
|
|||||||
## Calibration Examples
|
## Calibration Examples
|
||||||
|
|
||||||
<example type="bad-evaluation">
|
<example type="bad-evaluation">
|
||||||
"The generator added rate limiting decorators to all four endpoints. The code looks clean and follows the existing pattern. Tests were not run but the implementation appears correct. PASS."
|
"The generator created the new module and updated the config. The code looks clean and follows the existing pattern. Tests were not run but the implementation appears correct. PASS."
|
||||||
|
|
||||||
Why this is wrong: "appears correct" is not verification. The evaluator didn't run tests, didn't check if the limiter instance is actually wired to the app, and didn't read the modified files in full. This is a rubber stamp.
|
Why this is wrong: "appears correct" is not verification. The evaluator didn't run tests, didn't check that the new module is actually imported and used, and didn't read the modified files in full. This is a rubber stamp.
|
||||||
</example>
|
</example>
|
||||||
|
|
||||||
<example type="good-rejection">
|
<example type="good-rejection">
|
||||||
"Checked acceptance criteria for US-001. Criterion 3 says 'both files import get_s3_client from app.core.cdn'. Verified admin_audio.py:8 — correct. Checked admin_parallax_themes.py — file still defines its own get_s3_client() at line 36 and does not import from cdn. Also: admin_parallax_themes.py:96 calls os.path.splitext() but `import os` was removed during the credential cleanup — this will crash at runtime.
|
"Checked acceptance criteria. Criterion 3 says 'both files import the shared utility instead of defining their own'. Verified file A — correct. Checked file B — still defines a local copy at line 36 and does not import the shared one. Also: file B line 96 calls a function from a module whose import was removed during the refactoring — this will crash at runtime.
|
||||||
|
|
||||||
REJECT: admin_parallax_themes.py still has local get_s3_client (criterion 3 not met) and missing `import os` will cause NameError on sprite upload."
|
REJECT: File B still has local duplicate (criterion 3 not met) and missing import will cause runtime error."
|
||||||
|
|
||||||
Why this is good: Verified each criterion against actual code with file paths and line numbers. Caught a regression the generator introduced. Specific and actionable.
|
Why this is good: Verified each criterion against actual code with file paths and line numbers. Caught a regression the generator introduced. Specific and actionable.
|
||||||
</example>
|
</example>
|
||||||
|
|
||||||
<example type="good-pass">
|
<example type="good-pass">
|
||||||
"Checked all 4 acceptance criteria for US-004:
|
"Checked all 4 acceptance criteria:
|
||||||
1. db.query(DailySpin) block is uncommented — verified at shop.py:323-332. ✓
|
1. New validation logic is active — verified at config.py:23-28. ✓
|
||||||
2. Returns success=False with 'Already spun today' message — verified at shop.py:330. ✓
|
2. Invalid input returns the expected error — verified at config.py:26. ✓
|
||||||
3. TODO comment removed — grep for 'Re-enable for production' returns zero matches. ✓
|
3. Old workaround removed — grep returns zero matches. ✓
|
||||||
4. First spin still works — logic only blocks when existing_spin is found. ✓
|
4. Existing behavior unchanged — logic only triggers on the new condition. ✓
|
||||||
|
|
||||||
Ran git diff: only shop.py modified, changes scoped to the daily spin endpoint. No imports removed, no regressions in surrounding code.
|
Ran git diff: only 2 files modified, changes scoped to this story. No imports removed, no regressions in surrounding code.
|
||||||
|
|
||||||
PASS."
|
PASS."
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user