feat: evaluator runtime verification for web projects, optional Playwright docs

This commit is contained in:
2026-03-27 14:30:09 -04:00
parent 18d95fed0d
commit ee08e3617c
2 changed files with 63 additions and 0 deletions

View File

@@ -151,6 +151,22 @@ Before the loop starts, `/loop-plan` generates contracts for each story. These d
archive/ # Completed feature archives archive/ # Completed feature archives
``` ```
## Browser Testing (Optional)
The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server:
```bash
claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
```
When Playwright is available, the evaluator will use it to:
- Navigate to the running application
- Check for JavaScript console errors
- Take screenshots for visual verification
- Reject stories with runtime errors
This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser.
## Design Principles ## Design Principles
- **Fresh context per iteration** — no accumulated hallucination drift - **Fresh context per iteration** — no accumulated hallucination drift

View File

@@ -67,11 +67,58 @@ Be concrete — "the function doesn't handle null input" not "there might be edg
End your response with the same verdict block so it's visible in the terminal output. End your response with the same verdict block so it's visible in the terminal output.
## Runtime Verification (Web Projects)
If the project has an `index.html` or is a web application, you MUST verify it actually runs:
1. **Start a local server** (if not already running):
```bash
python3 -m http.server 8080 &
SERVER_PID=$!
sleep 1
```
2. **Check the page loads** — use curl to verify the server responds:
```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080
```
Expected: 200. If not, REJECT.
3. **Check for JavaScript errors** — if Node.js is available, run a quick headless check:
```bash
node -e "
const http = require('http');
http.get('http://localhost:8080', res => {
let data = '';
res.on('data', chunk => data += chunk);
res.on('end', () => {
const hasModules = data.includes('type=\"module\"');
const hasCanvas = data.includes('<canvas');
console.log(JSON.stringify({ status: res.statusCode, hasModules, hasCanvas }));
});
});
"
```
4. **If Playwright MCP is available** (check for `playwright_navigate` tool), use it for full browser verification:
- Navigate to `http://localhost:8080`
- Check for console errors
- Take a screenshot
- REJECT if any JavaScript errors in console
5. **Kill the server when done:**
```bash
kill $SERVER_PID 2>/dev/null
```
**Runtime errors = automatic REJECT.** Code that looks correct but doesn't run is not complete.
## What Warrants Rejection ## What Warrants Rejection
- ANY acceptance criterion not actually met (not "mostly met" — MET) - ANY acceptance criterion not actually met (not "mostly met" — MET)
- Tests fail - Tests fail
- Typecheck fails - Typecheck fails
- Runtime errors (page doesn't load, console errors, server crashes)
- Placeholder/stub code left in place - Placeholder/stub code left in place
- Security vulnerability introduced - Security vulnerability introduced
- Regression in existing functionality - Regression in existing functionality