diff --git a/README.md b/README.md index 7b82fdc..2507ed2 100644 --- a/README.md +++ b/README.md @@ -151,6 +151,22 @@ Before the loop starts, `/loop-plan` generates contracts for each story. These d archive/ # Completed feature archives ``` +## Browser Testing (Optional) + +The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server: + +```bash +claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium +``` + +When Playwright is available, the evaluator will use it to: +- Navigate to the running application +- Check for JavaScript console errors +- Take screenshots for visual verification +- Reject stories with runtime errors + +This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser. + ## Design Principles - **Fresh context per iteration** — no accumulated hallucination drift diff --git a/prompts/evaluator/_base.md b/prompts/evaluator/_base.md index 465590d..318ff31 100644 --- a/prompts/evaluator/_base.md +++ b/prompts/evaluator/_base.md @@ -67,11 +67,58 @@ Be concrete — "the function doesn't handle null input" not "there might be edg End your response with the same verdict block so it's visible in the terminal output. +## Runtime Verification (Web Projects) + +If the project has an `index.html` or is a web application, you MUST verify it actually runs: + +1. **Start a local server** (if not already running): + ```bash + python3 -m http.server 8080 & + SERVER_PID=$! + sleep 1 + ``` + +2. **Check the page loads** — use curl to verify the server responds: + ```bash + curl -s -o /dev/null -w "%{http_code}" http://localhost:8080 + ``` + Expected: 200. If not, REJECT. + +3. **Check for JavaScript errors** — if Node.js is available, run a quick headless check: + ```bash + node -e " + const http = require('http'); + http.get('http://localhost:8080', res => { + let data = ''; + res.on('data', chunk => data += chunk); + res.on('end', () => { + const hasModules = data.includes('type=\"module\"'); + const hasCanvas = data.includes('/dev/null + ``` + +**Runtime errors = automatic REJECT.** Code that looks correct but doesn't run is not complete. + ## What Warrants Rejection - ANY acceptance criterion not actually met (not "mostly met" — MET) - Tests fail - Typecheck fails +- Runtime errors (page doesn't load, console errors, server crashes) - Placeholder/stub code left in place - Security vulnerability introduced - Regression in existing functionality