- How your application behaves with real user inputs
- Where failures and edge cases occur
- Performance bottlenecks and token usage
- Data for building evaluation datasets
Anatomy of a trace
A trace represents . Every trace contains one or more spans, each representing a unit of work with a start and end time. Spans nest inside each other to reflect . Braintrust assigns a type to each span:| Span type | What it represents |
|---|---|
eval | The root span for an evaluation run, wrapping a task span for your application code. One per test case — contains the input, expected output, and all child spans. |
task | A unit of application logic — a workflow, pipeline step, or named operation. In logs, the root span is always a task span. Multiple task spans can appear in a single trace. |
llm | A single call to an LLM. Shows the model, messages, parameters, token usage, and cost. |
function | A named block of application logic — retrieval, formatting, routing, etc. |
tool | A tool call made by the model — an external API, code execution, database query, etc. |
score | The result of a scorer — online (in logs) or offline (in evaluations). Contains the score value, scorer name, and for LLM-as-a-judge scorers, the judge’s reasoning. |
- Input: The data sent to this step
- Output: The result produced
- Metadata: Model parameters, tags, custom data
- Metrics: Latency, token counts, costs
- Scores: Quality metrics (added later)
What gets captured
Every instrumented request automatically captures:- Request inputs and outputs
- Model parameters (model name, temperature, etc.)
- Timing information (start time, duration)
- Token usage and costs
- Nested function calls and tool invocations
- Errors and exceptions
- Custom metadata you add
How to instrument
Braintrust makes it easy to get started with auto-instrumentation, which traces your LLM calls with no per-call code changes. When you need more control, you can trace your application logic — data retrieval, tool calls, business logic — alongside those calls.Trace LLM calls
Trace LLM calls from AI providers and frameworks
Trace application logic
Trace non-LLM application logic like data retrieval and tool calls
Provider and framework support
Braintrust integrates with all major AI providers and frameworks:- AI Providers: OpenAI, Anthropic, Gemini, AWS Bedrock, Azure, Mistral, Together, Groq, and many more
- Frameworks and Libraries: LangChain, LangGraph, CrewAI, Vercel AI SDK, Pydantic AI, DSPy, and many more
Next steps
Get started instrumenting your application:- Trace LLM calls to automatically capture LLM calls
- Trace application logic for application logic
- Capture user feedback like thumbs up/down