feat: evaluator runtime verification for web projects, optional Playwright docs

2026-03-27 14:30:09 -04:00
parent 18d95fed0d
commit ee08e3617c
2 changed files with 63 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -151,6 +151,22 @@ Before the loop starts, `/loop-plan` generates contracts for each story. These d
  archive/                       # Completed feature archives
 ```

+## Browser Testing (Optional)
+
+The evaluator includes basic runtime verification for web projects (starts a local server, checks HTTP response). For full browser testing with console error detection and screenshots, install the Playwright MCP server:
+
+```bash
+claude mcp add playwright npx @playwright/mcp@latest --headless --browser=chromium
+```
+
+When Playwright is available, the evaluator will use it to:
+- Navigate to the running application
+- Check for JavaScript console errors
+- Take screenshots for visual verification
+- Reject stories with runtime errors
+
+This is optional — the evaluator works without it, but may miss runtime issues that only surface in a browser.
+
 ## Design Principles

 - **Fresh context per iteration** — no accumulated hallucination drift
--- a/prompts/evaluator/_base.md
+++ b/prompts/evaluator/_base.md
@@ -67,11 +67,58 @@ Be concrete — "the function doesn't handle null input" not "there might be edg

 End your response with the same verdict block so it's visible in the terminal output.

+## Runtime Verification (Web Projects)
+
+If the project has an `index.html` or is a web application, you MUST verify it actually runs:
+
+1. **Start a local server** (if not already running):
+   ```bash
+   python3 -m http.server 8080 &
+   SERVER_PID=$!
+   sleep 1
+   ```
+
+2. **Check the page loads** — use curl to verify the server responds:
+   ```bash
+   curl -s -o /dev/null -w "%{http_code}" http://localhost:8080
+   ```
+   Expected: 200. If not, REJECT.
+
+3. **Check for JavaScript errors** — if Node.js is available, run a quick headless check:
+   ```bash
+   node -e "
+   const http = require('http');
+   http.get('http://localhost:8080', res => {
+     let data = '';
+     res.on('data', chunk => data += chunk);
+     res.on('end', () => {
+       const hasModules = data.includes('type=\"module\"');
+       const hasCanvas = data.includes('<canvas');
+       console.log(JSON.stringify({ status: res.statusCode, hasModules, hasCanvas }));
+     });
+   });
+   "
+   ```
+
+4. **If Playwright MCP is available** (check for `playwright_navigate` tool), use it for full browser verification:
+   - Navigate to `http://localhost:8080`
+   - Check for console errors
+   - Take a screenshot
+   - REJECT if any JavaScript errors in console
+
+5. **Kill the server when done:**
+   ```bash
+   kill $SERVER_PID 2>/dev/null
+   ```
+
+**Runtime errors = automatic REJECT.** Code that looks correct but doesn't run is not complete.
+
 ## What Warrants Rejection

 - ANY acceptance criterion not actually met (not "mostly met" — MET)
 - Tests fail
 - Typecheck fails
+- Runtime errors (page doesn't load, console errors, server crashes)
 - Placeholder/stub code left in place
 - Security vulnerability introduced
 - Regression in existing functionality