/eval-loop
Eval Loop
Trace quality gaps to root causes, fix iteratively
Overview
Takes a specific quality complaint (thin data, robotic tone, broken UX) and traces it from symptom to structural root cause, then iterates with automated backpressure until measurable targets pass. Works across UX, data, code, and content.
What It Does
- Surfaces and groups symptoms into problem classes, then identifies root causes per class
- Defines measurable targets with automated backpressure (unit tests, Playwright, LLM-as-judge)
- Iterates one fix at a time with verification, logging pass/fail to JSONL audit trail
- Applies product-standard checklists across all pages to catch the same class of problem everywhere
Inputs
- Quality complaints or symptoms
- Current codebase or content
- Definition of "10/10"
Outputs
- eval-session.md (living diagnosis)
- eval-results.jsonl (verification log)
- Passing targets
Example
A user says "the signal cards are paper thin." Eval loop traces that to missing provenance URLs in the agent prompt and missing UI components, sets Playwright assertions as targets, and iterates until every card has a clickable source link.
Deep Dives
Related Skills & Workflows
Autoresearch
Self-improving skill optimization through scored experiments
SkillDeep Planning
Complex task into step-by-step execution plan
SkillReview Content
Draft into 4-agent quality review before publishing
WorkflowPrompt Engineering Sprint
Design, test, and optimize production prompts with eval loops
Ready to use /eval-loop?
This skill ships with every Knowledge OS installation. Set up your system in 90 minutes.
Built and maintained by Victor Sowers at STEEPWORKS