/eval-loop

Eval Loop

Name: Eval Loop
Author: STEEPWORKS

Trace quality gaps to root causes, fix iteratively

B2B TierValidation & Quality

Overview

Takes a specific quality complaint (thin data, robotic tone, broken UX) and traces it from symptom to structural root cause, then iterates with automated backpressure until measurable targets pass. Works across UX, data, code, and content.

What It Does

Surfaces and groups symptoms into problem classes, then identifies root causes per class
Defines measurable targets with automated backpressure (unit tests, Playwright, LLM-as-judge)
Iterates one fix at a time with verification, logging pass/fail to JSONL audit trail
Applies product-standard checklists across all pages to catch the same class of problem everywhere

Inputs

Quality complaints or symptoms
Current codebase or content
Definition of "10/10"

Outputs

eval-session.md (living diagnosis)
eval-results.jsonl (verification log)
Passing targets

Example

/eval-loop

A user says "the signal cards are paper thin." Eval loop traces that to missing provenance URLs in the agent prompt and missing UI components, sets Playwright assertions as targets, and iterates until every card has a clickable source link.