Blog

Copilot autocomplete in the Braintrust UI

Ankur Goyal
5 September 2024

Hero

I spend about half my time in an IDE (Cursor, nowadays), and have grown used to constantly having autocomplete fill things in for me — code, comments, tests, and more. However, I've missed this experience while authoring prompts and editing data in Braintrust. I'm not the only one:

I'm excited to reveal our latest feature: Copilot autocomplete in the Braintrust UI! Now, throughout the Braintrust UI, the text editors are souped up with an autocomplete that should feel familiar if you're coming from Cursor, Github Copilot, or others. Let's dive in to how to use it, and a bit about how it's implemented behind the scenes.

Dataset autocomplete

Try it out

Copilot autocomplete is on for everyone who has configured at least one AI provider key, and it uses your API keys (not ours). By default, it uses gpt-4o-2024-08-06 as the model, but you can pick among any of the supported options in Braintrust including Claude 3.5 Sonnet, LLaMa 3.1, Gemini, Mistral, and many others. To configure it, visit your profile settings where you can select a model or disable it.

How to use

The fastest way to try it out is to try writing a prompt.

Prompt autocomplete

Copilot intelligently stitches together context from other data fields to tailor completions. For example, I like to use it to write few-shot examples for prompts. Notice how it already knows that the examples have language and sql fields.

Autocomplete in the data editor

How it works

Behind the scenes, our UI now traces which rows and prompts you encounter into a CopilotContext. Every text editor is also able to generate a CellContext which includes information about the editor itself (for example, that it's the expected field in a dataset named “Golden data”) and a prefix/suffix surrounding your cursor. Using some heuristics (or the manual option-\ keyboard shortcut), we determine when it's time to generate a completion, and compile the CellContext and CopilotContext into a series of prioritized messages an LLM can use to generate the optimal completion.

Evals

It wouldn't be a Braintrust blog post without some evals! We captured some completion data from our staging account, and ran an initial set of evals (80 records, trial count 3) that showed that despite gpt-4o-2024-08-06 being half the cost of gpt-4o, it performs almost as well as gpt-4o and claude-3.5-sonnet, and is incrementally faster. That's how we picked it as our default model, but with additional data and customer feedback, we'll be able to run larger scale evals and even few shot or fine tune smaller models like gpt-4o-mini to be competitive.

Eval results

Next steps

We're excited for you to start playing with our new Copilot and can't wait to hear what you think. If you already have a Braintrust account, you can start playing with it right away! If you don't, you can sign up for free and take it for a spin.

This is a new feature, so as you play with it, let us know what you like and more importantly, what we can improve! If you enjoy working on problems like this, we're hiring.