Evaluate via SDK

This guide walks through the steps to set up and run an Experiment using the Braintrust SDK. Wrappers are available for TypeScript, Python, and other languages.

Install Braintrust libraries

Install the Braintrust SDK for your language. For TypeScript and Python, use the following commands:

npm install braintrust autoevals

yarn add braintrust autoevals

Node version >= 18 is required

Create a simple evaluation script

The eval framework allows you to declaratively define evaluations in your code. Inspired by tools like Jest, you can define a set of evaluations in files named _.eval.ts or _.eval.js (Node.js) or eval_*.py (Python).

Create a file named tutorial.eval.ts or eval_tutorial.py with the following code.

import { Eval } from "braintrust";
import { Levenshtein } from "autoevals";
 
Eval(
  "Say Hi Bot", // Replace with your project name
  {
    data: () => {
      return [
        {
          input: "Foo",
          expected: "Hi Foo",
        },
        {
          input: "Bar",
          expected: "Hello Bar",
        },
      ]; // Replace with your eval dataset
    },
    task: async (input) => {
      return "Hi " + input; // Replace with your LLM call
    },
    scores: [Levenshtein],
  },
);

This script sets up the basic scaffolding of an evaluation:

data is an array or iterator of data you'll evaluate
task is a function that takes in an input and returns an output
scores is an array of scoring functions that will be used to score the tasks's output

In addition to adding each data point inline when you call the Eval() function, you can also pass an existing or new dataset directly.

(You can also write your own code. Make sure to follow the naming conventions for your language. TypeScript files should be named *.eval.ts and Python files should be named eval_*.py.)

Create an API key

Next, create an API key to authenticate your evaluation script. You can create an API key in the settings page.

Run this command to add your API key to your environment:

export BRAINTRUST_API_KEY="YOUR_API_KEY"

Run your evaluation script

Run your evaluation script with the following command:

npx braintrust eval tutorial.eval.ts

This will create an experiment in Braintrust. Once the command runs, you'll see a link to your experiment.

Tip: To test your evaluation locally without sending results to Braintrust, add the --no-send-logs flag:

npx braintrust eval --no-send-logs tutorial.eval.ts

View your results

Congrats, you just ran an eval! You should see a dashboard like this when you load your experiment. This view is called the experiment view, and as you use Braintrust, we hope it becomes your trusty companion each time you change your code and want to run an eval.

The experiment view allows you to look at high level metrics for performance, dig into individual examples, and compare your LLM app's performance over time.

First eval

Run another experiment

After running your first evaluation, you’ll see that we achieved a 77.8% score. Can you adjust the evaluation to improve this score? Make your changes and re-run the evaluation to track your progress.

Second eval

Next steps

Dig into our evals guide to learn more about how to run evals.
Look at our cookbook to learn how to evaluate RAG, summarization, text-to-sql, and other popular use cases.
Learn how to log traces to Braintrust.
Read about Braintrust's platform and architecture.

On this page