fix: simplify evaluator runtime verification — let claude figure out the tools

This commit is contained in:
2026-03-27 14:45:55 -04:00
parent ee08e3617c
commit 5f8a34cc7b

View File

@@ -67,51 +67,11 @@ Be concrete — "the function doesn't handle null input" not "there might be edg
End your response with the same verdict block so it's visible in the terminal output. End your response with the same verdict block so it's visible in the terminal output.
## Runtime Verification (Web Projects) ## Runtime Verification
If the project has an `index.html` or is a web application, you MUST verify it actually runs: Do not just read the code — **actually run it.** Use whatever tools are available to you (bash, MCP tools, etc.) to verify the project builds, runs, and behaves correctly. Code that looks correct but doesn't run is not complete.
1. **Start a local server** (if not already running): **Runtime errors = automatic REJECT.**
```bash
python3 -m http.server 8080 &
SERVER_PID=$!
sleep 1
```
2. **Check the page loads** — use curl to verify the server responds:
```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080
```
Expected: 200. If not, REJECT.
3. **Check for JavaScript errors** — if Node.js is available, run a quick headless check:
```bash
node -e "
const http = require('http');
http.get('http://localhost:8080', res => {
let data = '';
res.on('data', chunk => data += chunk);
res.on('end', () => {
const hasModules = data.includes('type=\"module\"');
const hasCanvas = data.includes('<canvas');
console.log(JSON.stringify({ status: res.statusCode, hasModules, hasCanvas }));
});
});
"
```
4. **If Playwright MCP is available** (check for `playwright_navigate` tool), use it for full browser verification:
- Navigate to `http://localhost:8080`
- Check for console errors
- Take a screenshot
- REJECT if any JavaScript errors in console
5. **Kill the server when done:**
```bash
kill $SERVER_PID 2>/dev/null
```
**Runtime errors = automatic REJECT.** Code that looks correct but doesn't run is not complete.
## What Warrants Rejection ## What Warrants Rejection