Ship LLM products that work.
Braintrust is the end-to-end platform for building world-class AI apps.
Evaluate your prompts and models
Non-deterministic models and unpredictable natural language inputs make building robust LLM applications difficult. Adapt your development lifecycle for the AI era with Braintrust's iterative LLM workflows.
Easily answer questions like “which examples regressed when we changed the prompt?” or “what happens if I try this new model?”
Anatomy of an eval
Braintrust evals are composed of three components—a prompt, scorers, and a dataset of examples.
Prompt
Tweak LLM prompts from any AI provider, run them, and track their performance over time. Seamlessly and securely sync your prompts with your code.
Prompts guideScorers
Use industry standard autoevals or write your own using code or natural language. Scorers take an input, the LLM output, and an expected value to generate a score.
Scorers guideDataset
Capture rated examples from staging and production and incorporate them into “golden” datasets. Datasets are integrated, versioned, scalable, and secure.
Datasets guideFeatures for everyone
Intuitively designed for both technical and non-technical team members, and synced between code and UI.
Traces
Visualize and analyze LLM execution traces in real-time to debug and optimize your AI apps.
Tracing guideMonitoring
Monitor real-world AI interactions with insights to ensure your models perform optimally in production.
Logging and monitoringOnline evals
Continuously evaluate with automatic, asynchronous server-side scoring as you upload logs.
Online evaluation docsFunctions
Define functions in TypeScript and Python, and use as custom scorers or callable tools.
Functions referenceSelf-hosting
Deploy and run Braintrust on your own infrastructure for full control over your data and compliance requirements.
Self-hosting guideJoin industry leaders
Cofounder/Head of AI
CTO
President
Cofounder
Eng. Manager, AI