Tools

Tool functions in Braintrust allow you to define general-purpose code that can be invoked by LLMs to add complex logic or external operations to your workflows. Tools are reusable and composable, making it easy to iterate on assistant-style agents and more advanced applications. You can create tools in TypeScript or Python and deploy them across the UI and API via prompts.

Creating a tool

Currently, you must define tools via code and push them to Braintrust with braintrust push. To define a tool, use project.tool.create and pick a name and unique slug:

import * as braintrust from "braintrust";
import { z } from "zod";
 
const project = braintrust.projects.create({ name: "calculator" });
 
project.tools.create({
  handler: ({ op, a, b }) => {
    switch (op) {
      case "add":
        return a + b;
      case "subtract":
        return a - b;
      case "multiply":
        return a * b;
      case "divide":
        return a / b;
    }
  },
  name: "Calculator method",
  slug: "calculator",
  description:
    "A simple calculator that can add, subtract, multiply, and divide.",
  parameters: z.object({
    op: z.enum(["add", "subtract", "multiply", "divide"]),
    a: z.number(),
    b: z.number(),
  }),
  returns: z.number(),
  ifExists: "replace",
});

Pushing to Braintrust

Once you define a tool, you can push it to Braintrust with braintrust push:

npx braintrust push calculator.ts

Dependencies

Braintrust will take care of bundling the dependencies your tool needs.

In Typescript, we use esbuild to bundle your code and its dependencies together. This works for most dependencies, but it does not support native (compiled) libraries like SQLite.

If you have trouble bundling your dependencies, let us know by filing an issue.

Testing it out

If you visit the project in the UI, you'll see the tool listed on the Tools page in the Library:

Tool in UI

Using tools

Once you define a tool in Braintrust, you can access it through the UI and API. However, the real advantage lies in calling a tool from an LLM. Most models support tool calling, which allows them to select a tool from a list of available options. Normally, it's up to you to execute the tool, retrieve its results, and re-run the model with the updated context.

Braintrust simplifies this process dramatically by:

  • Automatically passing the tool's definition to the model
  • Running the tool securely in a sandbox environment when called
  • Re-running the model with the tool's output
  • Streaming the whole output along with intermediate progress to the client

Let's walk through an example.

Defining a GitHub tool

Let's define a tool that looks up information about the most recent commit in a GitHub repository:

import * as braintrust from "braintrust";
import { z } from "zod";
 
const project = braintrust.projects.create({ name: "github" });
 
project.tools.create({
  handler: async ({ org, repo }: { org: string; repo: string }) => {
    const url = `https://api.github.com/repos/${org}/${repo}/commits?per_page=1`;
    const response = await fetch(url);
 
    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }
 
    const data = await response.json();
 
    if (data.length > 0) {
      return data[0];
    } else {
      return null;
    }
  },
  name: "Get latest commit",
  slug: "get-latest-commit",
  description: "Get the latest commit in a repository",
  parameters: z.object({
    org: z.string(),
    repo: z.string(),
  }),
  ifExists: "replace",
});

If you save this file locally to github.ts or github.py, you can run

npx braintrust push github.ts

to push the function to Braintrust. Once the command completes, you should see the function listed in the Library's Tools tab.

Tool code in library

To use a tool, simply select it in the Tools dropdown in your Prompt window. Braintrust will automatically:

  • Include it in the list of available tools to the model
  • Invoke the tool if the model calls it, and append the result to the message history
  • Call the model again with the tool's result as context
  • Continue for up to (default) 5 iterations or until the model produces a non-tool result

Invoke github tool

Connecting tools to prompts in code

You can also attach a tool to a prompt defined in code. For example:

import * as braintrust from "braintrust";
import { z } from "zod";
 
const project = braintrust.projects.create({ name: "github" });
 
const latestCommit = project.tools.create({
  handler: async ({ org, repo }: { org: string; repo: string }) => {
    const url = `https://api.github.com/repos/${org}/${repo}/commits?per_page=1`;
    const response = await fetch(url);
 
    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }
 
    const data = await response.json();
 
    if (data.length > 0) {
      return data[0];
    } else {
      return null;
    }
  },
  name: "Get latest commit",
  slug: "get-latest-commit",
  description: "Get the latest commit in a repository",
  parameters: z.object({
    org: z.string(),
    repo: z.string(),
  }),
});
 
project.prompts.create({
  model: "gpt-4o-mini",
  name: "Commit bot",
  slug: "commit-bot",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that can help with GitHub.",
    },
    {
      role: "user",
      content: "{{{question}}}",
    },
  ],
  tools: [latestCommit],
});

If you run braintrust push on this file, Braintrust will push both the tool and the prompt.

Structured outputs

Another use case for tool calling is to coerce a model into producing structured outputs that match a given JSON schema. You can do this without creating a tool function, and instead use the Raw tab in the Tools dropdown.

Enter an array of tool definitions following the OpenAI tool format:

Raw tools

By default, if a tool is called, Braintrust will return the arguments of the first tool call as a JSON object. If you use the invoke API, you'll receive a JSON object as the result.

Invoke raw tool

If you specify parallel as the mode, then instead of the first tool call's arguments, you'll receive an array of all tool calls including both function names and arguments.

On this page