Reference/API/Evals
POST
/v1/eval

Launch an eval

Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.

Authorization

Authorization
Required
Bearer <token>

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

In: header


Request Body

Eval launch parameters

project_id
Required
string

Unique identifier for the project to run the eval in

data
Required
Any properties in object, object

The dataset to use

task
Required
Any properties in object, object, object, object, object

The function to evaluate

scores
Required
array<Any properties in object, object, object, object, object>

The functions to score the eval on

experiment_namestring

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

metadataobject

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

streamboolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

Status codeDescription
200Eval launch response
curl -X POST "https://api.braintrust.dev/v1/eval" \
  -d '{
  "project_id": "string",
  "data": {
    "dataset_id": "string"
  },
  "task": {
    "function_id": "string",
    "version": "string"
  },
  "scores": [
    {
      "function_id": "string",
      "version": "string"
    }
  ],
  "experiment_name": "string",
  "metadata": {
    "property1": null,
    "property2": null
  },
  "stream": true
}'

Summary of an experiment

{
  "project_name": "string",
  "experiment_name": "string",
  "project_url": "http://example.com",
  "experiment_url": "http://example.com",
  "comparison_experiment_name": "string",
  "scores": {
    "property1": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    }
  },
  "metrics": {
    "property1": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    }
  }
}

On this page