fix: simplify evaluator runtime verification — let claude figure out the tools
This commit is contained in:
@@ -67,51 +67,11 @@ Be concrete — "the function doesn't handle null input" not "there might be edg
|
|||||||
|
|
||||||
End your response with the same verdict block so it's visible in the terminal output.
|
End your response with the same verdict block so it's visible in the terminal output.
|
||||||
|
|
||||||
## Runtime Verification (Web Projects)
|
## Runtime Verification
|
||||||
|
|
||||||
If the project has an `index.html` or is a web application, you MUST verify it actually runs:
|
Do not just read the code — **actually run it.** Use whatever tools are available to you (bash, MCP tools, etc.) to verify the project builds, runs, and behaves correctly. Code that looks correct but doesn't run is not complete.
|
||||||
|
|
||||||
1. **Start a local server** (if not already running):
|
**Runtime errors = automatic REJECT.**
|
||||||
```bash
|
|
||||||
python3 -m http.server 8080 &
|
|
||||||
SERVER_PID=$!
|
|
||||||
sleep 1
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Check the page loads** — use curl to verify the server responds:
|
|
||||||
```bash
|
|
||||||
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080
|
|
||||||
```
|
|
||||||
Expected: 200. If not, REJECT.
|
|
||||||
|
|
||||||
3. **Check for JavaScript errors** — if Node.js is available, run a quick headless check:
|
|
||||||
```bash
|
|
||||||
node -e "
|
|
||||||
const http = require('http');
|
|
||||||
http.get('http://localhost:8080', res => {
|
|
||||||
let data = '';
|
|
||||||
res.on('data', chunk => data += chunk);
|
|
||||||
res.on('end', () => {
|
|
||||||
const hasModules = data.includes('type=\"module\"');
|
|
||||||
const hasCanvas = data.includes('<canvas');
|
|
||||||
console.log(JSON.stringify({ status: res.statusCode, hasModules, hasCanvas }));
|
|
||||||
});
|
|
||||||
});
|
|
||||||
"
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **If Playwright MCP is available** (check for `playwright_navigate` tool), use it for full browser verification:
|
|
||||||
- Navigate to `http://localhost:8080`
|
|
||||||
- Check for console errors
|
|
||||||
- Take a screenshot
|
|
||||||
- REJECT if any JavaScript errors in console
|
|
||||||
|
|
||||||
5. **Kill the server when done:**
|
|
||||||
```bash
|
|
||||||
kill $SERVER_PID 2>/dev/null
|
|
||||||
```
|
|
||||||
|
|
||||||
**Runtime errors = automatic REJECT.** Code that looks correct but doesn't run is not complete.
|
|
||||||
|
|
||||||
## What Warrants Rejection
|
## What Warrants Rejection
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user