Cookbook
This cookbook, inspired by OpenAI's cookbook, is a collection of recipes for common use cases of Braintrust. Each recipe is an open source self-contained example, hosted on GitHub. We welcome community contributions and aspire for the cookbook to be a collaborative, living, breathing collection of best practices for building high quality AI products.
python
Evaluating a voice agent
data:image/s3,"s3://crabby-images/71da1/71da16241153f8df380942c695e9140b9dadf530" alt="Avatar"
Adrian Barbir
Feb 13, 2025agentevalsvoice
typescript
Classifying spam using structured outputs
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
Ornella Altunyan
Feb 8, 2025classifierstructured outputsplayground
python
Evaluating a prompt chaining agent
data:image/s3,"s3://crabby-images/71da1/71da16241153f8df380942c695e9140b9dadf530" alt="Avatar"
Adrian Barbir
Jan 30, 2025agentevalspython
python
Evaluating the precision and recall of an emotion classifier
data:image/s3,"s3://crabby-images/71da1/71da16241153f8df380942c695e9140b9dadf530" alt="Avatar"
Adrian Barbir
Jan 17, 2025recallprecisionevalsclassifierpython
typescript
Evaluating audio with the OpenAI Realtime API
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
Ornella Altunyan
Dec 14, 2024evalstoolsaudio
python
Evaluating SimpleQA
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
Ankur Goyal, Ornella Altunyan
Dec 6, 2024datasetsevals
typescript
Using Python functions to extract text from images
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
Ornella Altunyan
Nov 22, 2024pythontoolsocrfunctions
typescript
Using OpenTelemetry for LLM observability
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
Ornella Altunyan
Oct 31, 2024evalstools
typescript
Using functions to build a RAG agent
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ornella Altunyan, Ankur Goyal
Oct 8, 2024functionsragtools
python
Evaluating multimodal receipt extraction
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Sep 30, 2024evalsmultimodalreceipts
typescript
Unreleased AI: A full stack Next.js app for generating changelogs
data:image/s3,"s3://crabby-images/85c1b/85c1bcb0754edae46a19e45ec64368b6f5beade8" alt="Avatar"
Ornella Altunyan
Aug 28, 2024evalsloggingnext.js
python
An agent that runs OpenAPI commands
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Aug 12, 2024agentragevals
typescript
Benchmarking inference providers
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Jul 29, 2024evalsllama-3.1providers
typescript
Tool calls in LLaMa 3.1
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Jul 26, 2024evalsllama-3.1tools
typescript
Evaluating a chat assistant
data:image/s3,"s3://crabby-images/84535/8453572410de794195f674d849a090c1c16fb950" alt="Avatar"
Tara Nagar
Jul 16, 2024evalschat
python
LLM Eval For Text2SQL
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
May 29, 2024evalsdatasetstext2sql
python
Optimizing Ragas to evaluate a RAG pipeline
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
data:image/s3,"s3://crabby-images/d639b/d639b01bb7e617fb687fab8584c8365d04c4b820" alt="Avatar"
Ankur Goyal, Nelson Auner
May 27, 2024evalsrag
typescript
Comparing evals across multiple AI models
data:image/s3,"s3://crabby-images/38532/385328ce6f038b516114925ddf27a7296c7b9d24" alt="Avatar"
John Huang
May 22, 2024evalscharts
python
Detecting Prompt Injections
data:image/s3,"s3://crabby-images/d639b/d639b01bb7e617fb687fab8584c8365d04c4b820" alt="Avatar"
Nelson Auner
May 20, 2024evalsclassification
python
AI Search Bar
data:image/s3,"s3://crabby-images/b2272/b22725696cf18f24d019d36d8ef87e3f3ff9b9d4" alt="Avatar"
Austin Moehle
Mar 4, 2024evalssql
typescript
How Zapier uses assertions to evaluate tool usage in chatbots
data:image/s3,"s3://crabby-images/0bbbb/0bbbb7d775c77bd27e780bbe150906545f2e0c65" alt="Avatar"
Vítor Balocco
Feb 13, 2024evalsassertionstools
typescript
Generating release notes and hill-climbing to improve them
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Feb 2, 2024evalshill-climbing
typescript
Generating beautiful HTML components
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Jan 29, 2024loggingdatasetsevals
python
Coda's Help Desk with and without RAG
data:image/s3,"s3://crabby-images/b2272/b22725696cf18f24d019d36d8ef87e3f3ff9b9d4" alt="Avatar"
data:image/s3,"s3://crabby-images/5cf5f/5cf5ff356eb82181f49fdcb6c46c02523491d398" alt="Avatar"
Austin Moehle, Kenny Wong
Dec 21, 2023evalsrag
typescript
Improving Github issue titles using their contents
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Oct 29, 2023evalssummarization
python
Classifying news articles
data:image/s3,"s3://crabby-images/e1a6c/e1a6cc9fb537baf395eebd3b6f0cb1a56c82be9e" alt="Avatar"
David Song
Sep 1, 2023evalsclassification
python
Text-to-SQL
data:image/s3,"s3://crabby-images/2924d/2924dc0b1bd621a80ba8e88b8da6079f9c7ef452" alt="Avatar"
Ankur Goyal
Aug 12, 2023evalssql