API reference

This page covers the key APIs in the Braintrust Java SDK. For setup, see the Quickstart. For the complete reference, see javadoc.io.

Tracing

Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: attach the Braintrust Java Agent at JVM startup and supported clients are traced with no code changes (see Install and instrument). The APIs below configure tracing programmatically and let you trace your own application code. To instrument a specific provider client, see Wrappers.

`Braintrust.get`

Gets the global Braintrust instance, creating it on the first call and handing back that same instance on every call after. This is your entry point to the SDK: use it to set up tracing, run evaluations, load prompts, and fetch datasets. Call get() with no arguments to configure the SDK from environment variables, which is the most common setup. To configure it in code instead, build a BraintrustConfig and pass it to get(config) (see Configuration).

#skip-compile

import dev.braintrust.Braintrust;

var braintrust = Braintrust.get();

Returns: Braintrust

`braintrust.openTelemetryCreate`

Creates an OpenTelemetry instance pointed at Braintrust and registers it as the global instance, so the spans you record and the AI calls you trace are exported to Braintrust. Braintrust tracing is built on OpenTelemetry, and this sets up that pipeline. Call it once at startup. Pass openTelemetryCreate(false) to create the instance without registering it globally, or use openTelemetryEnable(tracerProviderBuilder, loggerProviderBuilder, meterProviderBuilder) to add Braintrust’s exporters to OpenTelemetry SDK builders you already manage.

#skip-compile

var openTelemetry = braintrust.openTelemetryCreate();

Returns: io.opentelemetry.api.OpenTelemetry Use the returned instance to wrap your own application code in spans. Braintrust traces your AI calls automatically, but not the code around them, so wrapping that work shows its structure in the trace and nests any traced AI calls underneath.

#skip-compile

var tracer = openTelemetry.getTracer("my-app");
var span = tracer.spanBuilder("process-request").startSpan();
try (var scope = span.makeCurrent()) {
    // traced AI calls here nest under "process-request"
} finally {
    span.end();
}

`braintrust.projectUri`

The Braintrust UI URL for the configured organization and project. Use it to link from your own app or logs straight to the project in Braintrust.

#skip-compile

var url = braintrust.projectUri();

Returns: URI

Evaluations

An evaluation runs your task over a set of cases, scores each output, and reports the results, which is how you measure quality and catch regressions as you change prompts or models. These APIs build and run evaluations from your Java code.

`braintrust.evalBuilder`

Defines an evaluation in code for input type INPUT and output type OUTPUT: you give it cases, a task function, and one or more scorers, then call run(), which returns an EvalResult that exposes getExperimentId(), getExperimentName(), and getExperimentUrl(). Call createReportString() on it for a human-readable summary.

#skip-compile

import dev.braintrust.eval.DatasetCase;
import dev.braintrust.eval.Scorer;

var eval = braintrust.<String, String>evalBuilder()
    .name("food-classifier")
    .cases(
        DatasetCase.of("strawberry", "fruit"),
        DatasetCase.of("asparagus", "vegetable"))
    .taskFunction(getFoodType)
    .scorers(Scorer.of("exact_match", (expected, result) -> expected.equals(result) ? 1.0 : 0.0))
    .build();

var result = eval.run();
System.out.println(result.createReportString());

Returns: Eval.Builder Builder methods (each returns the Eval.Builder for chaining, except build()):

cases(DatasetCase...) → Eval.Builder: inline evaluation cases. Provide this or dataset().
dataset(Dataset) → Eval.Builder: run against a dataset instead of inline cases.
taskFunction(Function<INPUT, OUTPUT>) → Eval.Builder (required): the task that produces the output to score.
scorers(Scorer...) → Eval.Builder: scorers to apply to each case. Required unless classifiers is set.
classifiers(Classifier...) → Eval.Builder: classifiers to apply to each case. Required unless scorers is set.
name(String) → Eval.Builder: experiment name (optional).
ensureNew(boolean) → Eval.Builder: when true, always create a new experiment even if one with the same name exists (the backend deduplicates the name). Defaults to false.
build() → Eval: builds the Eval.

`Scorer`

A scorer measures how good your task’s output is, producing a score between 0 and 1 for each case in an evaluation. Scorers turn raw outputs into the metrics you compare across runs, whether that’s checking an answer against the expected value or asking an LLM to judge quality. Create one inline with Scorer.of, implement the Scorer<INPUT, OUTPUT> interface when you need full control, or fetch a scorer you’ve defined in Braintrust with braintrust.fetchScorer(slug). Scorer.of comes in two forms: one scores from (expected, actual), the other from the full TaskResult when you need more than the output.

#skip-compile

// (expected, actual) -> score
Scorer.of("exact_match", (expected, result) -> expected.equals(result) ? 1.0 : 0.0);

Scorer.of(String name, BiFunction<OUTPUT, OUTPUT, Double>) → Scorer: score from (expected, actual).
Scorer.of(String name, Function<TaskResult<INPUT, OUTPUT>, Double>) → Scorer: score from the full task result.

When you fetch a scorer, pass converter functions to control how your typed input and output are serialized before they’re sent to the scorer:

braintrust.fetchScorer(String slug) → Scorer: fetch by slug.
braintrust.fetchScorer(String slug, String version, Function<INPUT, Object> inputConverter, Function<OUTPUT, Object> outputConverter) → Scorer: fetch a pinned version and convert the input and output before sending them.

Scorer arguments serialize with BraintrustJsonMapper by default, which uses snake_case field names and honors @JsonIgnore. Scorer converter functions are never called with null.

`Classifier`

When you want to categorize a task’s output instead of scoring it numerically, use a classifier. It returns zero or more Classification results for each case, which is useful for labeling outputs by topic, intent, or failure type rather than rating them between 0 and 1. Create one with Classifier.of(String name, Function<TaskResult<INPUT, OUTPUT>, List<Classification>> fn). A Classification is a record (name, id, label, metadata) where id is required, so for the common case you can use Classification.of(id).

#skip-compile

Classifier.of("topic", taskResult -> List.of(Classification.of("billing")));

Datasets

A dataset is the set of cases an evaluation runs against. Define cases inline in code, or load a dataset you manage in Braintrust.

`DatasetCase.of`

A dataset case is a single example: an input paired with the output you expect. Use of(input, expected) for the common case, or of(input, expected, tags, metadata) to attach case-level tags and metadata. Returns: DatasetCase Parameters:

input (INPUT, required): the value passed to the task.
expected (OUTPUT, required): the expected output to score against.
tags (List<String>): case-level tags.
metadata (Map<String, Object>): case-level metadata.

`Dataset.of`

Groups several cases into an in-memory dataset you can hand to an evaluation, as an alternative to passing cases inline. Build one with Dataset.of(DatasetCase...). Returns: Dataset

`braintrust.fetchDataset`

Loads a dataset you manage in Braintrust by name, so you can evaluate against shared, versioned test data instead of defining cases in code. Pass the input and output types to deserialize each case into your own types, as Class objects or as converter functions when a Class isn’t enough:

fetchDataset(String name, Class<INPUT> inputType, Class<OUTPUT> outputType)
fetchDataset(String name, String version, Class<INPUT> inputType, Class<OUTPUT> outputType)
fetchDataset(String name, Function<Object, INPUT> inputConverter, Function<Object, OUTPUT> outputConverter)
fetchDataset(String name, String version, Function<Object, INPUT> inputConverter, Function<Object, OUTPUT> outputConverter)

Converter functions must tolerate null. The SDK passes null for a case’s expected output when the row has none.

The untyped fetchDataset(name) and fetchDataset(name, version) overloads are deprecated. They return each case as a raw LinkedHashMap. Returns: Dataset Parameters:

name (String, required): the dataset name.
version (String): a specific version to pin. Omit to load the latest.

Prompts

Manage prompts in Braintrust and load them at runtime, so you can edit and version them without redeploying your code. These APIs load a prompt and turn it into the messages or request parameters you send to a model.

`braintrust.promptLoader`

Provides the loader for prompts you manage in Braintrust. Load a prompt by its slug, then render it with variables to produce the messages you send to a model.

#skip-compile

var prompt = braintrust.promptLoader().load("my-prompt-slug");

Returns: BraintrustPromptLoader

`prompt.renderMessages`

Fills in a loaded prompt’s template variables to produce the final messages, ready to send to a model. Pass the variable values as renderMessages(Map<String, Object> parameters). Returns: List<Map<String, Object>>

`BraintrustOpenAI.buildChatCompletionsPrompt`

Turns a loaded prompt and its variables directly into OpenAI chat completion parameters, so you can send a Braintrust-managed prompt to OpenAI without assembling the request yourself. Returns: ChatCompletionCreateParams Parameters:

prompt (BraintrustPrompt, required): a prompt loaded via promptLoader.
parameters (Map<String, Object>, required): the variable values to render into the prompt.

Attachments

When your traces involve binary content like images or PDFs, log it as an attachment so it appears in Braintrust instead of as an opaque blob. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you’re attaching binary content to a span yourself.

`Base64Attachment`

Wrap binary content in a Base64Attachment to log it on a span. Create one from a base64 data URI with Base64Attachment.of(String dataUri), or from a file with Base64Attachment.ofFile(ContentType contentType, String path). ContentType provides constants such as IMAGE_PNG and APPLICATION_PDF.

#skip-compile

var attachment = Base64Attachment.ofFile(Base64Attachment.ContentType.IMAGE_PNG, "chart.png");

Wrappers

If you’d rather instrument a specific client in code than use the Java Agent, wrap it. Each wrapper takes the OpenTelemetry from openTelemetryCreate() and returns an instrumented client that traces its calls to Braintrust. For setup and the full list of supported integrations, see Java SDK integrations.

#skip-compile

import dev.braintrust.instrumentation.openai.BraintrustOpenAI;

OpenAIClient client = BraintrustOpenAI.wrapOpenAI(openTelemetry, OpenAIOkHttpClient.fromEnv());

OpenAI: BraintrustOpenAI.wrapOpenAI(openTelemetry, client) → OpenAIClient.
Anthropic: BraintrustAnthropic.wrap(openTelemetry, client) → AnthropicClient.
Google GenAI (Gemini): BraintrustGenAI.wrap(openTelemetry, clientBuilder) → Client.
LangChain4j: BraintrustLangchain.wrap(openTelemetry, aiServices) → the wrapped AiServices<T>.
AWS Bedrock: BraintrustAWSBedrock.wrap(openTelemetry, clientBuilder) → BedrockRuntimeClientBuilder.
Spring AI: BraintrustSpringAI.wrap(openTelemetry, chatModelBuilder) → the wrapped chat model builder.

Dev server

Serve your evaluators over HTTP so Braintrust can run them as remote evals from the Playground. Define each evaluator as a RemoteEval, register it with a Devserver, and start the server.

#skip-compile

import dev.braintrust.Braintrust;
import dev.braintrust.devserver.Devserver;
import dev.braintrust.devserver.RemoteEval;
import dev.braintrust.eval.Scorer;
import java.util.List;

var braintrust = Braintrust.get();

RemoteEval<String, String> foodTypeEval = RemoteEval.builder(String.class, String.class)
    .name("food-type-classifier")
    .taskFunction(food -> classifyFood(food))
    .scorers(List.of(
        Scorer.of("exact_match", (expected, result) -> expected.equals(result) ? 1.0 : 0.0)))
    .build();

var devserver = Devserver.builder()
    .config(braintrust.config())
    .registerEval(foodTypeEval)
    .port(8301)
    .build();

devserver.start();

`RemoteEval`

Defines an evaluator the dev server can run on demand. Bundle a task and its scorers under a name, with an optional parameter schema for values supplied from the Playground. Build one with RemoteEval.builder(INPUT.class, OUTPUT.class), passing the input and output types so cases deserialize into your own types. Pass converter functions instead of Class objects when a Class isn’t enough. The no-arg RemoteEval.builder() is deprecated. Builder methods (each returns the RemoteEval.Builder for chaining, except build()):

name(String) → RemoteEval.Builder (required): evaluator name, used as its identifier.
taskFunction(Function<INPUT, OUTPUT>) → RemoteEval.Builder (required): the task that produces the output to score.
scorers(List<Scorer>) → RemoteEval.Builder: scorers to apply to each case. Add one at a time with scorer(...).
parameters(List<ParameterDef>) → RemoteEval.Builder: runtime parameter schema exposed in the Playground. Add one at a time with parameter(...).
build() → RemoteEval: builds the evaluator.

`Devserver`

Runs an HTTP server that exposes your registered evaluators for Braintrust to call. Build one with Devserver.builder(), then call start(). Builder methods (each returns the Devserver.Builder for chaining, except build()):

config(BraintrustConfig) → Devserver.Builder (required): SDK configuration, typically braintrust.config().
registerEval(RemoteEval) → Devserver.Builder (required): registers an evaluator to serve. Call once per evaluator, with at least one registered.
host(String) → Devserver.Builder: bind address. Defaults to localhost. Set to 0.0.0.0 to bind all interfaces.
port(int) → Devserver.Builder: port to listen on. Defaults to 8300.
build() → Devserver: builds the server.

Methods:

start(): starts the server and begins serving the registered evaluators. Throws IOException.
stop(): stops the server.

Configuration

Configure the SDK with environment variables, or programmatically with BraintrustConfig.builder().

#skip-compile

import dev.braintrust.config.BraintrustConfig;

var config = BraintrustConfig.builder()
    .apiKey(System.getenv("BRAINTRUST_API_KEY"))
    .defaultProjectName("My project")
    .build();

var braintrust = Braintrust.get(config);

Environment variables

BRAINTRUST_API_KEY (required): Braintrust API key.
BRAINTRUST_DEFAULT_PROJECT_NAME: project that traced spans route to. Defaults to default-java-project.
BRAINTRUST_DEFAULT_PROJECT_ID: project UUID. Takes precedence over the project name.
BRAINTRUST_API_URL: Braintrust API URL. Defaults to https://api.braintrust.dev.
BRAINTRUST_APP_URL: Braintrust app URL, used for permalinks. Defaults to https://www.braintrust.dev.
BRAINTRUST_DEBUG: enable debug logging. Defaults to false.
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG: print spans to the console. Defaults to false.
BRAINTRUST_REQUEST_TIMEOUT: request timeout in seconds. Defaults to 30.
BRAINTRUST_FILTER_AI_SPANS: export only AI-related spans. Defaults to false.

Use the SDK

Reference

Tracing

`Braintrust.get`

`braintrust.openTelemetryCreate`

`braintrust.projectUri`

Evaluations

`braintrust.evalBuilder`

`Scorer`

`Classifier`

Datasets

`DatasetCase.of`

`Dataset.of`

`braintrust.fetchDataset`

Prompts

`braintrust.promptLoader`

`prompt.renderMessages`

`BraintrustOpenAI.buildChatCompletionsPrompt`

Attachments

`Base64Attachment`

Wrappers

Dev server

`RemoteEval`

`Devserver`

Configuration

Environment variables

​Tracing

​Braintrust.get

​braintrust.openTelemetryCreate

​braintrust.projectUri

​Evaluations

​braintrust.evalBuilder

​Scorer

​Classifier

​Datasets

​DatasetCase.of

​Dataset.of

​braintrust.fetchDataset

​Prompts

​braintrust.promptLoader

​prompt.renderMessages

​BraintrustOpenAI.buildChatCompletionsPrompt

​Attachments

​Base64Attachment

​Wrappers

​Dev server

​RemoteEval

​Devserver

​Configuration

​Environment variables

Tracing

`Braintrust.get`

`braintrust.openTelemetryCreate`

`braintrust.projectUri`

Evaluations

`braintrust.evalBuilder`

`Scorer`

`Classifier`

Datasets

`DatasetCase.of`

`Dataset.of`

`braintrust.fetchDataset`

Prompts

`braintrust.promptLoader`

`prompt.renderMessages`

`BraintrustOpenAI.buildChatCompletionsPrompt`

Attachments

`Base64Attachment`

Wrappers

Dev server

`RemoteEval`

`Devserver`

Configuration

Environment variables