You can customize how you trace to better understand how your application runs and make it easier to find and fix problems. By adjusting how you collect and manage trace data, you can better track complex processes, monitor systems that work across multiple services, and debug issues more effectively.
You can add traces for multiple, specific functions in your code to your logs by annotating them with functional wrappers (TypeScript) or decorators and context managers (Python):
When using wrapOpenAI/wrap_openai, you technically do not need to use traced or start_span. In fact, just
initializing a logger is enough to start logging LLM calls. If you use traced or start_span, you will create more
detailed traces that include the functions surrounding the LLM calls and can group multiple LLM calls together.
wrap_openai/wrapOpenAI will automatically log metrics like prompt_tokens, completion_tokens, and total_tokens for
streaming LLM calls if the LLM API returns them. OpenAI only returns these metrics if you set include_usage to true in
the stream_options parameter.
If you're using your own client, you can wrap it yourself using the same conventions
as the OpenAI wrapper. Feel free to check out the Python
and TypeScript implementations for reference.
To track the span as an LLM, you must:
Specify the type as llm. You can specify any name you'd like. This enables LLM duration metrics.
Add prompt_tokens, completion_tokens, and total_tokens to the metrics field. This enables LLM token usage metrics.
Format the input as a list of messages (using the OpenAI format), and put other parameters (like model) in metadata. This enables the "Try prompt" button in the UI.
In addition to text and structured data, Braintrust also supports uploading file
attachments (blobs). This is especially useful when working with multimodal
models, which can require logging large image, audio, or video files. You can
also use attachments to log other unstructured data related to your LLM usage,
such as a user-provided PDF file that your application later transforms into an
LLM input.
To upload an attachment, create a new Attachment object to represent the file
on disk or binary data in memory to be uploaded. You can place Attachment
objects anywhere in the event to be logged, including in arrays/lists or deeply
nested in objects. See the TypeScript or Python SDK
reference for usage details.
The SDK uploads the attachments separately from other parts of the log, so the
presence of attachments doesn't affect non-attachment logging latency.
Image, audio, video, and PDF attachments can be previewed in Braintrust. All
attachments can be downloaded for viewing locally.
To log an external image, provide an image URL or base64 encoded image as a
string. The tree viewer will automatically render the image.
The tree viewer will look at the URL or string to determine if it is an image. If you want to force the
viewer to treat it as an image, then nest it in an object like
and the viewer will render it as an image. Base64 images must be rendered in URL format, just like the OpenAI API.
For example:
Often, you want to trace functions that are deep in the call stack, without
having to propagate the span object throughout. Braintrust uses async-friendly
context variables to make this workflow easy:
The traced function/decorator will create a span underneath the
currently-active span.
The currentSpan() / current_span() method returns the currently active
span, in case you need to do additional logging.
Sometimes it's useful to be able to start a trace in one process and continue it
in a different one. For this purpose, Braintrust provides an export function
which returns an opaque string identifier. This identifier can be passed to
start_span to resume the trace elsewhere. Consider the following example of
tracing across separate client and server processes.
Similar to distributed tracing, it can be useful to update spans after you initially log them.
For example, if you collect the output of a span asynchronously.
The Experiment and Logger classes each have an updateSpan() method, which you can call with
the span's id to perform an update:
You can also use span.export() to export the span in a fully contained string, which is useful if you
have multiple loggers or perform the update from a different service.
It's important to make sure the update happens after the original span has been logged, otherwise
they can trample on each other.
Distributed tracing is designed specifically to prevent this edge case, and instead works by logging
a new (sub) span.
The Span.permalink method formats a permalink to the Braintrust application
for viewing the span. The link will open the UI to the row represented by the
Span object.
If you do not have access to the original Span object, the slug produced by
Span.export contains enough information to produce the same permalink. The
braintrust.permalink function can be used to construct a deep link to the row
in the UI from a given span slug.
In more complicated environments, it may not always be possible to wrap the
entire duration of a span within a single block of code. In such cases, you can
always pass spans around manually.
Consider this hypothetical server handler, which logs to a span incrementally
over several distinct callbacks:
Although the built-in span viewers cover a variety of different span field display types— YAML, JSON, Markdown, LLM calls, and more—you may
want to further customize the display of your span data. For example, you could include the id of an internal database
and want to fetch and display its contents in the span viewer. Or, you may want to reformat the data in the span in a way
that's more useful for your use case than the built-in options.
To support these use cases, Braintrust supports custom span iframe viewers. To enable a span iframe, visit the Configuration
tab of a project, and create one. You can define the URL, and then customize its behavior:
Provide a title, which is displayed at the top of the section.
Provide, via mustache, template parameters to the URL. These parameters are
in terms of the top-level span fields, e.g. {{input}}, {{output}}, {{expected}}, etc. or their subfields, e.g.
{{input.question}}.
Allow Braintrust to send a message to the iframe with the span data, which is useful when the data may be very large and
not fit in a URL.
In this example, the "Table" section is a custom span iframe.
To help you get started, check out the braintrustdata/braintrust-viewers
repository on Github, which contains example code for rendering a table, X/Tweet, and more.
Spans are processed in Braintrust as a simple format, consisting of input, output, expected, metadata, scores,
and metrics fields (all optional), as well as a few system-defined fields which you usually do not need to mess with, but
are described below for completeness. This simple format makes
it easy to import spans captured in other systems (e.g. languages other than TypeScript/Python), or to export spans from
Braintrust to consume in other systems.
The underlying span format contains a number of fields which are not exposed directly through the SDK, but are useful to
understand when importing/exporting spans.
id is a unique identifier for the span, within the container (e.g. an experiment, or logs for a project). You can technically
set this field yourself (to overwrite a span), but it is recommended to let Braintrust generate it automatically.
input, output, expected, scores, metadata, and metrics are optional fields which describe the span and are exposed in the
Braintrust UI. When you use the TypeScript or Python SDK, these fields are validated for you (e.g. scores must be a mapping from strings
to numbers between 0 and 1).
span_attributes contains attributes about the span. Currently the recognized attributes are name, which is
used to display the span name in the UI, and type, which displays a helpful icon. type should be one of "llm", "score", "function",
"eval", "task", or "tool".
Depending on the container, e.g. an experiment, or project logs, or a dataset, fields like project_id, experiment_id, dataset_id, and
log_id are set automatically, by the SDK, so the span can be later retrieved by the UI and API. You should not set these fields yourself.
span_id, root_span_id, and span_parents are used to construct the span tree and are automatically set by Braintrust. You should not
set these fields yourself, but rather let the SDK create and manage them (even if importing from another system).
When importing spans, the only fields you should need to think about are input, output, expected, scores, metadata, and metrics.
You can use the SDK to populate the remaining fields, which the next section covers with an example.
Here is an example of a span in the underlying format:
The following example walks through how to generate spans in one program and then import them to Braintrust
in a script. You can use this pattern to support tracing or running experiments in environments that use programming
languages other than TypeScript/Python (e.g. Kotlin, Java, Go, Ruby, Rust, C++), or codebases that cannot integrate the
Braintrust SDK directly.
The following example runs a simple LLM app and collects logging information at each stage of the process, without using
the Braintrust SDK. This could be implemented in any programming language, and you certainly do not need to collect or process
information this way. All that matters is that your program generates a useful format that you can later parse and use to import
the spans using the SDK.
The following program uses the Braintrust SDK in Python to import the spans generated by the previous script. Again, you can
modify this program to fit the needs of your environment, e.g. to import spans from a different source or format.
The SDK includes several tuning knobs that may prove useful for debugging.
BRAINTRUST_SYNC_FLUSH: By default, the SDKs will log to the backend API in
the background, asynchronously. Logging is automatically batched and retried
upon encountering network errors. If you wish to have fine-grained control over
when logs are flushed to the backend, you may set BRAINTRUST_SYNC_FLUSH=1.
When true, flushing will only occur when you run Experiment.flush (or any of
the other object flush methods). If the flush fails, the SDK will raise an
exception which you can handle.
BRAINTRUST_MAX_REQUEST_SIZE: The SDK logger batches requests to save on
network roundtrips. The batch size is tuned for the AWS lambda gateway, but you
may adjust this if your backend has a different max payload requirement.
BRAINTRUST_DEFAULT_BATCH_SIZE: The maximum number of individual log messages
that are sent to the network in one payload.
BRAINTRUST_NUM_RETRIES: The number of times the logger will attempt to retry
network requests before failing.
BRAINTRUST_QUEUE_SIZE (Python only): The maximum number of elements in the
logging queue. This value limits the memory usage of the logger. Logging
additional elements beyond this size will block the calling thread. You may set
the queue size to unlimited (and thus non-blocking) by passing
BRAINTRUST_QUEUE_SIZE=0.
BRAINTRUST_QUEUE_DROP_WHEN_FULL (Python only): Useful in conjunction with
BRAINTRUST_QUEUE_SIZE. Change the behavior of the queue from blocking when it
reaches its max size to dropping excess elements. This can be useful for
guaranteeing non-blocking execution, at the cost of possibly dropping data.
BRAINTRUST_QUEUE_DROP_EXCEEDING_MAXSIZE (Javascript only): Essentially a
combination of BRAINTRUST_QUEUE_SIZE and BRAINTRUST_QUEUE_DROP_WHEN_FULL,
which changes the behavior of the queue from storing an unlimited number of
elements to capping out at the specified value. Additional elements are
discarded.
BRAINTRUST_FAILED_PUBLISH_PAYLOADS_DIR: Sometimes errors occur when writing
records to the backend. To aid in debugging errors, you may set this
environment variable to a directory of choice, and Braintrust will save any
payloads it failed to publish to this directory.
BRAINTRUST_ALL_PUBLISH_PAYLOADS_DIR: Analogous to
BRAINTRUST_FAILED_PUBLISH_PAYLOADS_DIR, except that Braintrust will save all
payloads to this directory.
If you are not running an eval or logging, then the tracing code will be a no-op with negligible performance overhead. In other words, if you do not call initLogger/init_logger/init, in your code, then the tracing annotations are a no-op.
A trace is a directed acyclic graph (DAG) of spans. Each span can have multiple parents, but most
executions are a tree of spans. Currently, the UI only supports displaying a single root span, due to
the popularity of this pattern.
If the Braintrust SDK cannot log for some reason (e.g. a network issue), then your application should
not be affected. All logging operations run in a background thread, including api key validation,
project/experiment registration, and flushing logs.
When errors occur, the SDK retries a few times before eventually giving up. You'll see loud warning messages
when this occurs. And you can tune this behavior via the environment variables defined in Tuning parameters.