Evaluation and quality judgment
Runs the ARC Raiders RAG stack against the dataset, logs each variant to Braintrust, and splits retrieval metrics from generation judgment.
Production build notice
Live AI features are available in local builds only (to avoid billing and extra auth or account overhead in production).
Runs the ARC Raiders RAG stack against the dataset, logs each variant to Braintrust, and splits retrieval metrics from generation judgment.