Tracing
Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: attach the Braintrust Java Agent at JVM startup and supported clients are traced with no code changes (see Install and instrument). The APIs below configure tracing programmatically and let you trace your own application code. To instrument a specific provider client, see Wrappers.Braintrust.get
Gets the global Braintrust instance, creating it on the first call and handing back that same instance on every call after. This is your entry point to the SDK: use it to set up tracing, run evaluations, load prompts, and fetch datasets.
Call get() with no arguments to configure the SDK from environment variables, which is the most common setup. To configure it in code instead, build a BraintrustConfig and pass it to get(config) (see Configuration).
#skip-compile
Braintrust
braintrust.openTelemetryCreate
Creates an OpenTelemetry instance pointed at Braintrust and registers it as the global instance, so the spans you record and the AI calls you trace are exported to Braintrust. Braintrust tracing is built on OpenTelemetry, and this sets up that pipeline. Call it once at startup.
Pass openTelemetryCreate(false) to create the instance without registering it globally, or use openTelemetryEnable(tracerProviderBuilder, loggerProviderBuilder, meterProviderBuilder) to add Braintrust’s exporters to OpenTelemetry SDK builders you already manage.
#skip-compile
io.opentelemetry.api.OpenTelemetry
Use the returned instance to wrap your own application code in spans. Braintrust traces your AI calls automatically, but not the code around them, so wrapping that work shows its structure in the trace and nests any traced AI calls underneath.
#skip-compile
braintrust.projectUri
The Braintrust UI URL for the configured organization and project. Use it to link from your own app or logs straight to the project in Braintrust.
#skip-compile
URI
Evaluations
An evaluation runs your task over a set of cases, scores each output, and reports the results, which is how you measure quality and catch regressions as you change prompts or models. These APIs build and run evaluations from your Java code.braintrust.evalBuilder
Defines an evaluation in code for input type INPUT and output type OUTPUT: you give it cases, a task function, and one or more scorers, then call run(), which returns an EvalResult. Call createReportString() on it for a human-readable summary.
#skip-compile
Eval.Builder
Builder methods (each returns the Eval.Builder for chaining, except build()):
cases(DatasetCase...)→Eval.Builder: inline evaluation cases. Provide this ordataset().dataset(Dataset)→Eval.Builder: run against a dataset instead of inlinecases.taskFunction(Function<INPUT, OUTPUT>)→Eval.Builder(required): the task that produces the output to score.scorers(Scorer...)→Eval.Builder: scorers to apply to each case. Required unlessclassifiersis set.classifiers(Classifier...)→Eval.Builder: classifiers to apply to each case. Required unlessscorersis set.name(String)→Eval.Builder: experiment name (optional).build()→Eval: builds theEval.
Scorer
A scorer measures how good your task’s output is, producing a score between 0 and 1 for each case in an evaluation. Scorers turn raw outputs into the metrics you compare across runs, whether that’s checking an answer against the expected value or asking an LLM to judge quality.
Create one inline with Scorer.of, implement the Scorer<INPUT, OUTPUT> interface when you need full control, or fetch a scorer you’ve defined in Braintrust with braintrust.fetchScorer(slug). Scorer.of comes in two forms: one scores from (expected, actual), the other from the full TaskResult when you need more than the output.
#skip-compile
Scorer.of(String name, BiFunction<OUTPUT, OUTPUT, Double>)→Scorer: score from(expected, actual).Scorer.of(String name, Function<TaskResult<INPUT, OUTPUT>, Double>)→Scorer: score from the full task result.
Classifier
When you want to categorize a task’s output instead of scoring it numerically, use a classifier. It returns zero or more Classification results for each case, which is useful for labeling outputs by topic, intent, or failure type rather than rating them between 0 and 1.
Create one with Classifier.of(String name, Function<TaskResult<INPUT, OUTPUT>, List<Classification>> fn). A Classification is a record (name, id, label, metadata) where id is required, so for the common case you can use Classification.of(id).
#skip-compile
Datasets
A dataset is the set of cases an evaluation runs against. Define cases inline in code, or load a dataset you manage in Braintrust.DatasetCase.of
A dataset case is a single example: an input paired with the output you expect. Use of(input, expected) for the common case, or of(input, expected, tags, metadata) to attach case-level tags and metadata.
Returns: DatasetCase
Parameters:
input(INPUT, required): the value passed to the task.expected(OUTPUT, required): the expected output to score against.tags(List<String>): case-level tags.metadata(Map<String, Object>): case-level metadata.
Dataset.of
Groups several cases into an in-memory dataset you can hand to an evaluation, as an alternative to passing cases inline. Build one with Dataset.of(DatasetCase...).
Returns: Dataset
braintrust.fetchDataset
Loads a dataset you manage in Braintrust by name, so you can evaluate against shared, versioned test data instead of defining cases in code.
Returns: Dataset
Parameters:
name(String, required): the dataset name.version(String): a specific version to pin. Omit to load the latest.
Prompts
Manage prompts in Braintrust and load them at runtime, so you can edit and version them without redeploying your code. These APIs load a prompt and turn it into the messages or request parameters you send to a model.braintrust.promptLoader
Provides the loader for prompts you manage in Braintrust. Load a prompt by its slug, then render it with variables to produce the messages you send to a model.
#skip-compile
BraintrustPromptLoader
prompt.renderMessages
Fills in a loaded prompt’s template variables to produce the final messages, ready to send to a model. Pass the variable values as renderMessages(Map<String, Object> parameters).
Returns: List<Map<String, Object>>
BraintrustOpenAI.buildChatCompletionsPrompt
Turns a loaded prompt and its variables directly into OpenAI chat completion parameters, so you can send a Braintrust-managed prompt to OpenAI without assembling the request yourself.
Returns: ChatCompletionCreateParams
Parameters:
prompt(BraintrustPrompt, required): a prompt loaded viapromptLoader.parameters(Map<String, Object>, required): the variable values to render into the prompt.
Attachments
When your traces involve binary content like images or PDFs, log it as an attachment so it appears in Braintrust instead of as an opaque blob. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you’re attaching binary content to a span yourself.Base64Attachment
Wrap binary content in a Base64Attachment to log it on a span. Create one from a base64 data URI with Base64Attachment.of(String dataUri), or from a file with Base64Attachment.ofFile(ContentType contentType, String path). ContentType provides constants such as IMAGE_PNG and APPLICATION_PDF.
#skip-compile
Wrappers
If you’d rather instrument a specific client in code than use the Java Agent, wrap it. Each wrapper takes theOpenTelemetry from openTelemetryCreate() and returns an instrumented client that traces its calls to Braintrust. For setup and the full list of supported integrations, see Java SDK integrations.
#skip-compile
- OpenAI:
BraintrustOpenAI.wrapOpenAI(openTelemetry, client)→OpenAIClient. - Anthropic:
BraintrustAnthropic.wrap(openTelemetry, client)→AnthropicClient. - Google GenAI (Gemini):
BraintrustGenAI.wrap(openTelemetry, clientBuilder)→Client. - LangChain4j:
BraintrustLangchain.wrap(openTelemetry, aiServices)→ the wrappedAiServices<T>. - AWS Bedrock:
BraintrustAWSBedrock.wrap(openTelemetry, clientBuilder)→BedrockRuntimeClientBuilder. - Spring AI:
BraintrustSpringAI.wrap(openTelemetry, chatModelBuilder)→ the wrapped chat model builder.
Dev server
Serve your evaluators over HTTP so Braintrust can run them as remote evals from the Playground. Define each evaluator as aRemoteEval, register it with a Devserver, and start the server.
#skip-compile
RemoteEval
Defines an evaluator the dev server can run on demand. Bundle a task and its scorers under a name, with an optional parameter schema for values supplied from the Playground.
Build one with RemoteEval.<INPUT, OUTPUT>builder(). Builder methods (each returns the RemoteEval.Builder for chaining, except build()):
name(String)→RemoteEval.Builder(required): evaluator name, used as its identifier.taskFunction(Function<INPUT, OUTPUT>)→RemoteEval.Builder(required): the task that produces the output to score.scorers(List<Scorer>)→RemoteEval.Builder: scorers to apply to each case. Add one at a time withscorer(...).parameters(List<ParameterDef>)→RemoteEval.Builder: runtime parameter schema exposed in the Playground. Add one at a time withparameter(...).build()→RemoteEval: builds the evaluator.
Devserver
Runs an HTTP server that exposes your registered evaluators for Braintrust to call.
Build one with Devserver.builder(), then call start(). Builder methods (each returns the Devserver.Builder for chaining, except build()):
config(BraintrustConfig)→Devserver.Builder(required): SDK configuration, typicallybraintrust.config().registerEval(RemoteEval)→Devserver.Builder(required): registers an evaluator to serve. Call once per evaluator, with at least one registered.host(String)→Devserver.Builder: bind address. Defaults tolocalhost. Set to0.0.0.0to bind all interfaces.port(int)→Devserver.Builder: port to listen on. Defaults to8300.build()→Devserver: builds the server.
start(): starts the server and begins serving the registered evaluators. ThrowsIOException.stop(): stops the server.
Configuration
Configure the SDK with environment variables, or programmatically withBraintrustConfig.builder().
#skip-compile
Environment variables
BRAINTRUST_API_KEY(required): Braintrust API key.BRAINTRUST_DEFAULT_PROJECT_NAME: project that traced spans route to. Defaults todefault-java-project.BRAINTRUST_DEFAULT_PROJECT_ID: project UUID. Takes precedence over the project name.BRAINTRUST_API_URL: Braintrust API URL. Defaults tohttps://api.braintrust.dev.BRAINTRUST_APP_URL: Braintrust app URL, used for permalinks. Defaults tohttps://www.braintrust.dev.BRAINTRUST_DEBUG: enable debug logging. Defaults tofalse.BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG: print spans to the console. Defaults tofalse.BRAINTRUST_REQUEST_TIMEOUT: request timeout in seconds. Defaults to30.BRAINTRUST_FILTER_AI_SPANS: export only AI-related spans. Defaults tofalse.