Build datasets and gather feedback to improve your application
After observing your application in production, the next step is annotating and curating data to build evaluation datasets. This process transforms raw production logs into high-quality test cases that help you systematically improve your application.
Annotation creates the ground truth data needed for evaluation. By collecting feedback, adding labels, and curating examples from production, you build datasets that:
Represent real user interactions and edge cases
Include expected outputs and quality assessments
Enable systematic testing and comparison
Support automated and human evaluation
Braintrust integrates annotation seamlessly with logs and experiments, making it easy to capture feedback and build datasets without context switching.
Human review provides qualitative assessments that complement automated scoring. Configure review scores in your project to collect:
Continuous scores: Numeric ratings with slider controls (0-100%)
Categorical scores: Predefined options with assigned values
Expected values: Corrections showing what the output should be
Comments: Free-form feedback and context
Review traces and provide structured scores to begin the annotation process. You can efficiently evaluate large batches with keyboard navigation, or use the kanban layout to visualize review progress across backlog, pending, and complete states.
Custom trace views transform complex traces into interfaces anyone on your team can use. Describe what you want in natural language and Loop generates an interactive React component you can customize or embed anywhere.Build custom views to:
Create annotation interfaces for large-scale human review tasks
Replace JSON with intuitive UI components for non-technical reviewers
Display data in domain-specific formats (carousels, conversation threads, dashboards)
Aggregate information across multiple spans in a trace
Custom views integrate with human review workflows, enable interactive annotation controls, and can be shared across your team or embedded in your own applications.