Prompt playground

The prompt playground is a tool for exploring, comparing, and evaluating prompts. The playground is deeply integrated within Braintrust, so you can easily to try out prompts with data from your datasets.

The playground supports a wide range of models including the latest models from OpenAI, Anthropic, Mistral, Google, Meta, and more deployed on first and third party infrastructure. You can also configure it to talk to your own model endpoints and custom models, as long as they speak the OpenAI, Anthropic, or Google protocol.

We're constantly working on improving the playground and adding new features. If you have any feedback or feature requests, please reach out to us.

Creating a playground

The playground organizes your work into sessions. A session is a saved and collaborative workspace that includes one or more prompts and is linked to a dataset.

Empty Playground

Sharing playgrounds

Playgrounds are designed for collaboration and automatically synchronize in real-time.

Sync Playground

To share a playground, simply copy the URL and send it to your collaborators. Your collaborators must be members of your organization to see the session. You can invite users from the settings page.

Playgrounds can also be shared publicly (read-only).

Writing prompts

Each prompt includes a model (e.g. GPT-4 or Claude-2), a prompt string or messages (depending on the model), and an optional set of parameters (e.g. temperature) to control the model's behavior. When click "Run" (or the keyboard shortcut Cmd/Ctrl+Enter), each prompt runs in parallel and the results stream into the grid below.

Without a dataset

By default, a playground is not linked to a dataset, and is self contained. This is similar to the behavior on other playgrounds (e.g. OpenAI's). This mode is a useful way to explore and compare self-contained prompts.

With a dataset

The real power of Braintrust comes from linking a playground to a dataset. You can link to an existing dataset or create a new one from the dataset dropdown:

Dataset dropdown

Once you link a dataset, you will see a new row in the grid for each record in the dataset. You can reference the data from each record in your prompt using the input, expected, and metadata variables. The playground uses mustache syntax for templating:

Prompt with dataset

Each value can be arbitrarily complex JSON, e.g.

Prompt with JSON data

If you want to preserve double curly brackets {{ and }} as plain text in your prompts, you can change the delimiter tags to any custom string of your choosing. For example, if you want to change the tags to <% and %>, insert {{=<% %>=}} into the message, and all strings below in the message block will respect these delimiters:

{{=<% %>=}}
Return the number in the following format: {{ number }}

<% input.formula %>

Mustache delimiter

Multimodal prompts

You can also add images to your prompts by selecting the image icon in the input field. Images can be accessed via URLs, base64 encoded images as strings, or variables that contain an image.

Multimodal prompt

Prompt code snippets

The playground makes it easy to copy code snippets that you can run through the AI proxy. Select the code icon () next to any chat-based prompt to get code snippets in TypeScript, Python, or cURL.

The generated code includes all the prompt configuration, including the model, messages, and any additional parameters you've set.

Custom models

To configure custom models, see the Custom models section of the proxy docs. Endpoint configurations, like custom models, are automatically picked up by the playground.

Advanced options

Appended dataset messages

You may sometimes have additional messages in a dataset that you want to append to a prompt. This option lets you specify a path to a messages array in the dataset. For example, if input is specified as the appended messages path and a dataset row has the following input, all prompts in the playground will run with additional messages.

[
  {
    "role": "assistant",
    "content": "Is there anything else I can help you with?"
  },
  {
    "role": "user",
    "content": "Yes, I have another question."
  }
]

Max concurrency

The maximum number of tasks/scorers that will be run concurrently in the playground. This is useful for avoiding rate limits (429 - Too many requests) from AI providers.

Strict variables

When this option is enabled, evaluations will fail if the dataset row does not include all of the variables referenced in prompts.

On this page