Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Summary
Issue: The Name column in experiment results always shows "eval" and cannot be overridden through Eval() or EvalCase parameters.
Cause: The SDK hardcodes name="eval" when creating the root span for each row in Eval().
Resolution: Use tags to differentiate rows, or apply a monkey-patch workaround for full Name column control.
Resolution steps
Add a tags field to each EvalCase to identify rows without changing the Name column.
EvalCase(
input="my input",
expected="my output",
tags=["my-custom-label"]
)
Option 2: Monkey-patch start_span (unsupported)
This overrides the hardcoded name="eval" per row. Use with caution — this is not officially supported and may break with SDK updates.
"""
Demonstrates a monkey-patch to set per-row span names in the Braintrust
eval framework, so the "Name" column in the experiment UI shows a
different value for each row instead of the hardcoded "eval".
The key insight: when the framework calls experiment.start_span(),
it passes the row's `input` and `expected` in the kwargs, so we can
derive a name from the data itself.
"""
import braintrust.framework as _fw
_orig_impl = _fw._run_evaluator_internal_impl
_name_fn = None
def set_eval_name_fn(fn):
"""Register a function that receives (input, expected) and returns a span name."""
global _name_fn
_name_fn = fn
async def _patched_impl(experiment, evaluator, *args, **kwargs):
if experiment is not None:
_orig_start = experiment.start_span
def _patched_start(*a, **kw):
if kw.get("name") == "eval" and _name_fn is not None:
kw["name"] = "Custom name: " + _name_fn(
kw.get("input"),
kw.get("expected"),
)
return _orig_start(*a, **kw)
experiment.start_span = _patched_start
return await _orig_impl(experiment, evaluator, *args, **kwargs)
_fw._run_evaluator_internal_impl = _patched_impl
# ── Eval definition ──────────────────────────────────────────────────
from braintrust import Eval
def data():
return [
{"input": "What is 2+2?", "expected": "4"},
{"input": "What is the capital of France?", "expected": "Paris"},
{"input": "What color is the sky?", "expected": "Blue"},
]
def task(input, hooks):
answers = {
"What is 2+2?": "4",
"What is the capital of France?": "Paris",
"What color is the sky?": "Blue",
}
return answers.get(input, "I don't know")
def exact_match(input, output, expected):
return output.strip().lower() == expected.strip().lower()
set_eval_name_fn(lambda input, expected: input[:40])
Eval(
"pedro-project1",
data=data,
task=task,
scores=[exact_match],
experiment_name="per-row-name-demo",
)
Notes
- Native experiment row name customization is not supported as of this writing.
- The
metadata field on EvalCase is another option for per-row identification if tags do not meet your needs.