Evaluation Frameworks.
/ Did the agent do the job.
Tooling for grading model output — task-specific, golden-set, LLM-as-judge, regression suites.
/ coming soon
The evaluation frameworksreview is in research. We’re collecting structured data on vendors, founders, and market dynamics before publishing. Check back, or send a tip to fast-track a player we should profile.