Skip to main content
This page covers the key APIs in the Braintrust Ruby SDK. For setup, see the Quickstart. For the complete reference, see gemdocs.org.

Tracing

Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation, which traces supported provider gems with no code changes (see Install and instrument). The APIs below initialize the SDK, control instrumentation, and let you flush and link to your traces.

Braintrust.init

Initializes SDK state, configures OpenTelemetry tracing, and optionally auto-instruments supported provider gems. Call it once on startup.
require "braintrust"

Braintrust.init(
  default_project: "Support bot",
  auto_instrument: {only: [:openai]}
)
Arguments:
  • api_key (String): Braintrust API key. Defaults to ENV["BRAINTRUST_API_KEY"].
  • org_name (String): organization name to use during login. Defaults to ENV["BRAINTRUST_ORG_NAME"].
  • default_project (String): project used for traced spans that do not set an explicit parent. Defaults to ENV["BRAINTRUST_DEFAULT_PROJECT"].
  • app_url (String): Braintrust app URL. Defaults to ENV["BRAINTRUST_APP_URL"] or https://www.braintrust.dev.
  • api_url (String): Braintrust API URL. Defaults to ENV["BRAINTRUST_API_URL"] or https://api.braintrust.dev.
  • set_global (Boolean): sets the created state as global SDK state. Defaults to true.
  • blocking_login (Boolean): logs in synchronously instead of using the background login thread. Defaults to false.
  • enable_tracing (Boolean): enables OpenTelemetry tracing setup. Defaults to true.
  • tracer_provider (OpenTelemetry::SDK::Trace::TracerProvider): tracer provider to use instead of creating or reusing the global provider.
  • filter_ai_spans (Boolean): sends only AI-related spans when enabled. Defaults to ENV["BRAINTRUST_OTEL_FILTER_AI_SPANS"] == "true".
  • span_filter_funcs (Array<Proc>): custom span filters.
  • exporter (Object): optional OpenTelemetry exporter override.
  • auto_instrument (Boolean or Hash): controls provider auto-instrumentation. Use false, true, {only: [...]}, or {except: [...]}. Defaults to ENV["BRAINTRUST_AUTO_INSTRUMENT"], enabled.

Braintrust.auto_instrument!

Discovers loaded provider gems and instruments the ones Braintrust supports. Braintrust.init runs this for you when auto-instrumentation is enabled, so you rarely call it directly. Reach for it to instrument explicitly, such as after initializing with auto_instrument: false, or to re-run discovery once more provider gems have loaded.
Braintrust.auto_instrument!
Braintrust.auto_instrument!(only: [:openai, :anthropic])
Braintrust.auto_instrument!(except: [:ruby_llm])
Returns: Array<Symbol> (the instrumented integrations), or nil when disabled. Arguments:
  • config (nil, Boolean, or Hash): nil reads environment settings, false disables, true enables, and hashes accept :only or :except.

Braintrust.instrument!

Instruments a specific integration by name. Use it to wrap a single provider, or to instrument one specific client instance.
Braintrust.instrument!(:openai)

client = OpenAI::Client.new
Braintrust.instrument!(:openai, target: client)
Arguments:
  • name (Symbol, required): integration name: :anthropic, :openai, :ruby_openai, or :ruby_llm.
  • target (Object): provider client instance to instrument.
  • tracer_provider (OpenTelemetry::SDK::Trace::TracerProvider): tracer provider.
Builds a Braintrust UI URL for an OpenTelemetry span, so you can link straight to a trace from your own logs or app.
tracer = OpenTelemetry.tracer_provider.tracer("my-app")

tracer.in_span("answer-question") do |span|
  puts Braintrust::Trace.permalink(span)
end
Returns: String (empty when the span lacks the Braintrust attributes needed to build a permalink).

Braintrust::Trace.flush_spans

Forces the global tracer provider to flush buffered spans. Call it before a short-lived process exits if you’ve disabled the automatic flush on exit.
Braintrust::Trace.flush_spans
Returns: Boolean.

Evaluations

An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models. Braintrust::Eval.run is the main entry point. The other APIs here define the tasks, scorers, classifiers, and reusable evaluators it uses.

Braintrust::Eval.run

Runs an evaluation, logs each case to an experiment, and returns a result summary. Give it the cases to run, a task, and at least one scorer or classifier.
require "braintrust"

Braintrust.init

result = Braintrust::Eval.run(
  project: "Support bot",
  experiment: "answers-v1",
  cases: [
    {input: "How do I reset my password?", expected: "Use the account recovery flow."},
    {input: "How do I export my data?", expected: "Open Settings and choose Export."}
  ],
  task: ->(input:) { answer_question(input) },
  scorers: [
    Braintrust::Scorer.new("exact_match") do |expected:, output:|
      output == expected ? 1.0 : 0.0
    end
  ]
)
Returns: Braintrust::Eval::Result. Arguments:
  • task (#call, required): callable that receives input: and returns the output to score.
  • cases (Array or Enumerable): inline evaluation cases. Mutually exclusive with dataset.
  • dataset (String, Hash, Braintrust::Dataset, or Braintrust::Dataset::ID): dataset to fetch for evaluation cases. Mutually exclusive with cases.
  • scorers (Array<String, Braintrust::Scorer, #call>): scorers to run. At least one scorer or classifier is required.
  • classifiers (Array<Braintrust::Classifier, #call>): classifiers to run. At least one scorer or classifier is required.
  • project (String): project name. Enables full API mode and creates or resolves the project.
  • project_id (String): project UUID. Skips project lookup when provided.
  • experiment (String): experiment name.
  • on_progress (#call): callback after each case. Receives output and scores, or an error payload.
  • parallelism (Integer): number of worker threads. Tasks and scorers must be thread-safe when greater than 1. Defaults to 1.
  • tags (Array<String>): experiment tags.
  • metadata (Hash): experiment metadata.
  • update (Boolean): reuse an existing experiment when possible. Defaults to false.
  • quiet (Boolean): suppress result output. Defaults to false.
  • tracer_provider (OpenTelemetry::SDK::Trace::TracerProvider): tracer provider. Defaults to the global provider.
  • parent (Hash): parent span context for remote evals.
  • parameters (Hash): runtime parameters passed to tasks, scorers, and classifiers that declare parameters:.

Braintrust::Task.new

A task is the function under evaluation: it takes a case’s input and returns the output to score. Eval.run accepts a plain lambda for task:, but Task.new wraps one in a named, reusable object, which is useful when you want the task name to show up in Braintrust or you plan to reuse it across runs. The block declares the keyword arguments it needs, and extra keywords are filtered out automatically.
task = Braintrust::Task.new("answer_question") do |input:, parameters:|
  answer_question(input, model: parameters["model"])
end
Arguments:
  • name (String): task name. Defaults to "task".
  • block (Proc, required): declare the keyword arguments you need.
You can also define class-based tasks:
class SupportBot
  include Braintrust::Task

  def call(input:)
    answer_question(input)
  end
end

Braintrust::Scorer.new

A scorer measures how good your task’s output is, producing a score for each case in an evaluation. Scorer.new is the standard way to define a custom scorer inline: give it a name and a block that returns a number (typically 0 to 1), a score hash, or an array of score hashes.
scorer = Braintrust::Scorer.new("exact_match") do |expected:, output:|
  output == expected ? 1.0 : 0.0
end
Scorer blocks can declare input:, expected:, output:, metadata:, trace:, and parameters:. Return an array of score hashes to emit multiple named scores from one scorer:
Braintrust::Scorer.new("summary_quality") do |expected:, output:|
  [
    {name: "coverage", score: coverage(expected, output)},
    {name: "conciseness", score: concise?(output) ? 1.0 : 0.0}
  ]
end

Braintrust::Classifier.new

A classifier categorizes your task’s output instead of scoring it numerically, which is useful for labeling outputs by topic, intent, or failure type. Classifier.new defines one inline: give it a name and a block that returns a classification.
classifier = Braintrust::Classifier.new("topic") do |output:|
  {id: "billing", label: "Billing"}
end
Classifier blocks can declare input:, expected:, output:, metadata:, trace:, and parameters:. They return a classification hash, an array of classification hashes, or nil.

Braintrust::Eval::Evaluator

Bundles a task, scorers, and a parameter schema into one reusable object. Serve it from the Ruby dev server so Braintrust can run it as a remote eval from the Playground, or call #run to run it directly.
evaluator = Braintrust::Eval::Evaluator.new(
  task: ->(input:) { answer_question(input) },
  scorers: [
    Braintrust::Scorer.new("exact_match") do |expected:, output:|
      output == expected ? 1.0 : 0.0
    end
  ],
  parameters: {
    "model" => {type: "string", default: "gpt-5-mini"}
  }
)
Arguments:
  • task (#call): task callable.
  • scorers (Array): scorers attached to the evaluator. Defaults to [].
  • classifiers (Array): classifiers attached to the evaluator. Defaults to [].
  • parameters (Hash): parameter schema used by remote evals and the Playground UI. Defaults to {}.
Call #run on the evaluator to delegate to Braintrust::Eval.run.

Braintrust::Eval::Result

Represents the outcome of an evaluation run, returned by Braintrust::Eval.run.
#skip-compile
result = Braintrust::Eval.run(...)
puts result.permalink
puts result.success?
Fields and methods:
  • experiment_id, experiment_name: experiment identifiers.
  • project_id, project_name: project identifiers.
  • permalink: Braintrust UI link for the experiment.
  • errors: errors collected during the run.
  • duration: evaluation duration in seconds.
  • scores: raw score data keyed by scorer name.
  • classifications: classification results keyed by classifier name.
  • success?: returns true when no errors occurred.
  • failed?: returns true when errors occurred.
  • summary: lazily computed experiment summary.
  • scorer_stats: score statistics keyed by scorer name.
  • to_pretty: human-readable CLI summary.

Datasets

A dataset is a versioned collection of cases you manage in Braintrust and reuse across evaluations. Reference one by name or ID, iterate its records, or pass it to Braintrust::Eval.run.

Braintrust::Dataset

References a dataset you manage in Braintrust. It fetches records lazily as you iterate, implementing Enumerable, so you can stream a large dataset without loading every record into memory at once. Use fetch_all when you do want them all in an array, or pass the dataset straight to Braintrust::Eval.run.
dataset = Braintrust::Dataset.new(
  name: "Golden examples",
  project: "Support bot"
)

dataset.each do |record|
  puts record[:input]
end
Arguments:
  • name (String): dataset name. Required unless id is provided.
  • id (String): dataset UUID. Required unless name is provided.
  • project (String): project name. Required when using name.
  • version (String): dataset version to pin to.
Methods:
  • idString: resolves and returns the dataset UUID.
  • metadataHash: fetches and returns dataset metadata.
  • fetch_all(limit: nil)Array: fetches records eagerly into an array.
  • each: lazily iterates records and implements Enumerable.

Braintrust::Dataset::ID

Wraps a dataset UUID so Braintrust::Eval.run can distinguish dataset-by-ID from dataset-by-name.
Braintrust::Eval.run(
  project: "Support bot",
  dataset: Braintrust::Dataset::ID.new(id: "dataset-uuid"),
  task: task,
  scorers: scorers
)

Prompts and functions

In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt with Braintrust::Prompt.load, or turn a deployed function into an eval task or scorer with Braintrust::Functions.

Braintrust::Prompt.load

Loads a saved prompt from Braintrust by project and slug. Use the returned prompt’s #build to render provider parameters with runtime variables.
prompt = Braintrust::Prompt.load(
  project: "Support bot",
  slug: "summarize-ticket",
  defaults: {tone: "concise"}
)

params = prompt.build(ticket: "Customer cannot log in")
my_llm_client.call(**params)
Returns: Braintrust::Prompt. Arguments:
  • slug (String, required): prompt slug.
  • project (String): project name. Required unless project_id is provided.
  • project_id (String): project UUID. Required unless project is provided.
  • version (String): prompt version. Defaults to latest.
  • defaults (Hash): default template variables for #build. Defaults to {}.
  • api (Braintrust::API): API client override.

Braintrust::Prompt#build

Renders prompt variables and returns provider parameters ready to send to a model.
params = prompt.build({name: "Ada"}, strict: true)
params = prompt.build(name: "Ada")
Returns: Hash containing :model, :messages, optional :tools, and prompt model parameters. Arguments:
  • variables (Hash): template variables.
  • strict (Boolean): raises when a template variable is missing. Defaults to false.
  • **kwargs (Hash): additional template variables.

Braintrust::Functions.task

Creates a task that invokes a remote Braintrust function, for use as an eval task.
task = Braintrust::Functions.task(
  project: "Support bot",
  slug: "answer-ticket"
)
Returns: Braintrust::Task. Arguments:
  • project (String, required): project name.
  • slug (String, required): function slug.
  • tracer_provider (OpenTelemetry::SDK::Trace::TracerProvider): tracer provider. Defaults to the global provider.

Braintrust::Functions.scorer

Creates a scorer that invokes a remote Braintrust function, for use as an eval scorer.
scorer = Braintrust::Functions.scorer(
  project: "Support bot",
  slug: "answer-quality"
)

pinned = Braintrust::Functions.scorer(
  id: "function-uuid",
  version: "transaction-id"
)
Returns: Braintrust::Scorer. Arguments:
  • project (String): project name. Use with slug.
  • slug (String): function slug. Use with project.
  • id (String): function UUID. Alternative to project and slug.
  • version (String): function version when using id.
  • tracer_provider (OpenTelemetry::SDK::Trace::TracerProvider): tracer provider. Defaults to the global provider.

Attachments

When your traces involve binary content like images or PDFs, log it as an attachment so it appears in Braintrust instead of as an opaque blob.

Braintrust::Trace::Attachment

Wraps binary data so you can attach it to a traced message. Create one from bytes, a file, or a URL, then add its hash to a message’s content.
require "braintrust/trace/attachment"

attachment = Braintrust::Trace::Attachment.from_file("image/png", "./photo.png")

messages = [
  {
    role: "user",
    content: [
      {type: "text", text: "What is in this image?"},
      attachment.to_h
    ]
  }
]
Constructors:
  • from_bytes(content_type, data)Attachment: creates an attachment from raw bytes.
  • from_file(content_type, path)Attachment: reads a file and creates an attachment.
  • from_url(url)Attachment: fetches a URL and creates an attachment using the response content type.
Methods:
  • to_data_urlString: returns a base64 data URL.
  • to_messageHash: returns the base64_attachment message hash.
  • to_h: alias for to_message.

API client

For direct access to the Braintrust REST API, use Braintrust::API and its namespaces. Reach for these when you need to manage datasets or functions programmatically, beyond what the higher-level APIs above cover.

Braintrust::API

Creates a REST API client using SDK state.
api = Braintrust::API.new
api.login
Methods:
  • datasetsBraintrust::API::Datasets: the datasets API namespace.
  • functionsBraintrust::API::Functions: the functions API namespace.
  • loginBraintrust::API: logs in through SDK state and returns the API client.
  • object_permalink(object_type:, object_id:)String: builds a Braintrust UI object permalink.

Braintrust::API::Datasets

Provides dataset management APIs.
api = Braintrust::API.new
dataset = api.datasets.create(
  project_name: "Support bot",
  name: "Golden examples"
)
Methods:
  • list(project_name: nil, dataset_name: nil, project_id: nil, limit: nil): lists datasets with optional filters.
  • get(project_name:, name:): fetches one dataset by project and name.
  • get_by_id(id:): fetches dataset metadata by UUID.
  • create(name:, project_name: nil, project_id: nil, description: nil, metadata: nil): creates or registers a dataset.
  • insert(id:, events:): inserts events into a dataset.
  • delete(id:): deletes a dataset.
  • permalink(id:): builds a Braintrust UI dataset permalink.
  • fetch(id:, limit: 1000, cursor: nil, version: nil): fetches dataset records with pagination.

Braintrust::API::Functions

Provides function and prompt management APIs.
api = Braintrust::API.new
function = api.functions.create_task(
  project_name: "Support bot",
  slug: "answer-ticket",
  prompt_data: {
    prompt: {
      messages: [{role: "user", content: "{{question}}"}]
    },
    options: {
      model: "gpt-5-mini"
    }
  }
)
Methods:
  • list(project_name: nil, project_id: nil, function_name: nil, slug: nil, limit: nil): lists functions with optional filters.
  • create(project_name:, slug:, function_data:, prompt_data: nil, name: nil, description: nil, function_type: nil, function_schema: nil): creates or registers a function.
  • invoke(id:, input:): invokes a function by UUID.
  • get(id:, version: nil): fetches a function by UUID.
  • delete(id:): deletes a function.
  • create_tool(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates a tool function.
  • create_scorer(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates a scorer function.
  • create_task(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates a task function.
  • create_llm(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates an LLM function.

Dev server

Serve your evaluators over HTTP so Braintrust can run them remotely, from the Playground or remote evals. Use the Rack app for any Ruby app, or the Rails engine to mount it in an existing Rails application.

Braintrust::Server::Rack.app

Creates a Rack app that exposes evaluators for remote evals from Braintrust.
eval_server.ru
require "braintrust/eval"
require "braintrust/server"

Braintrust.init(blocking_login: true)

evaluator = Braintrust::Eval::Evaluator.new(
  task: ->(input:) { answer_question(input) },
  scorers: [
    Braintrust::Scorer.new("exact_match") do |expected:, output:|
      output == expected ? 1.0 : 0.0
    end
  ]
)

run Braintrust::Server::Rack.app(
  evaluators: {
    "support-bot" => evaluator
  }
)
Arguments:
  • evaluators (Hash<String, Braintrust::Eval::Evaluator>): evaluators served by slug. Defaults to {}.
  • auth (:clerk_token, :none, or an object): auth strategy. Use :none to disable incoming request auth for local development. Defaults to :clerk_token.

Braintrust::Contrib::Rails::Server::Engine

A Rails Engine you can mount in a Rails application to serve evaluators alongside your app. Run the generator to create an initializer that wires up your evaluators:
bin/rails generate braintrust:server
Mount the engine in your routes:
config/routes.rb
Rails.application.routes.draw do
  mount Braintrust::Contrib::Rails::Server::Engine, at: "/braintrust"
end
The generator writes config/initializers/braintrust_server.rb, where you can review or customize the evaluator mapping and auth strategy.

Configuration

Configure the SDK with environment variables, or pass the equivalent options to Braintrust.init.

Environment variables

  • BRAINTRUST_API_KEY (required): Braintrust API key.
  • BRAINTRUST_API_URL: Braintrust API URL. Defaults to https://api.braintrust.dev.
  • BRAINTRUST_APP_URL: Braintrust app URL. Defaults to https://www.braintrust.dev.
  • BRAINTRUST_AUTO_INSTRUMENT: set to false to disable auto-instrumentation.
  • BRAINTRUST_DEBUG: set to true to enable debug logging.
  • BRAINTRUST_DEFAULT_PROJECT: default project for traced spans.
  • BRAINTRUST_FLUSH_ON_EXIT: set to false to disable automatic span flushing on process exit.
  • BRAINTRUST_INSTRUMENT_EXCEPT: comma-separated list of integrations to skip.
  • BRAINTRUST_INSTRUMENT_ONLY: comma-separated list of integrations to enable, such as openai,anthropic.
  • BRAINTRUST_ORG_NAME: organization name.
  • BRAINTRUST_OTEL_FILTER_AI_SPANS: set to true to export only AI-related spans.