feat: evaluator runtime verification for web projects, optional Playwright docs

This commit is contained in:
2026-03-27 14:30:09 -04:00
parent 18d95fed0d
commit ee08e3617c
2 changed files with 63 additions and 0 deletions

View File

@@ -67,11 +67,58 @@ Be concrete — "the function doesn't handle null input" not "there might be edg
End your response with the same verdict block so it's visible in the terminal output.
## Runtime Verification (Web Projects)
If the project has an `index.html` or is a web application, you MUST verify it actually runs:
1. **Start a local server** (if not already running):
```bash
python3 -m http.server 8080 &
SERVER_PID=$!
sleep 1
```
2. **Check the page loads** — use curl to verify the server responds:
```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080
```
Expected: 200. If not, REJECT.
3. **Check for JavaScript errors** — if Node.js is available, run a quick headless check:
```bash
node -e "
const http = require('http');
http.get('http://localhost:8080', res => {
let data = '';
res.on('data', chunk => data += chunk);
res.on('end', () => {
const hasModules = data.includes('type=\"module\"');
const hasCanvas = data.includes('<canvas');
console.log(JSON.stringify({ status: res.statusCode, hasModules, hasCanvas }));
});
});
"
```
4. **If Playwright MCP is available** (check for `playwright_navigate` tool), use it for full browser verification:
- Navigate to `http://localhost:8080`
- Check for console errors
- Take a screenshot
- REJECT if any JavaScript errors in console
5. **Kill the server when done:**
```bash
kill $SERVER_PID 2>/dev/null
```
**Runtime errors = automatic REJECT.** Code that looks correct but doesn't run is not complete.
## What Warrants Rejection
- ANY acceptance criterion not actually met (not "mostly met" — MET)
- Tests fail
- Typecheck fails
- Runtime errors (page doesn't load, console errors, server crashes)
- Placeholder/stub code left in place
- Security vulnerability introduced
- Regression in existing functionality