Open sourcing the AI proxy

Ankur Goyal

27 November 2023

Last week, we released the Braintrust AI Proxy, a new, free way to access LLaMa2, Mistral, OpenAI, Anthropic, and many other models behind the OpenAI protocol with built-in caching and API key management.

Folks immediately started reaching out about running the proxy in production. We firmly believe that code on the critical path to production should be open source, so we're excited to announce that the proxy's source code is now available on GitHub under the MIT license.

Give us a star!

Deployment options

You can continue to access the proxy, for free, by using the hosted version at https://braintrustproxy.com. It's hosted on Cloudflare workers and end-to-end encrypts cached data using 256-bit AES-GCM encryption. For more details, see the documentation or source code.

The repository also contains instructions for deploying the proxy to Vercel Edge Functions, Cloudflare workers, AWS Lambda, or as a plain-old Express server.

Benchmarks

I did some quick benchmarks, from my in-laws' place in California and an EC2 machine (US East N. Virginia) to compare performance across options (code). The AWS Lambda functions are deployed in us-east-1. aws-pc is AWS Lambda with provisioned concurrency.

In-laws (CA)

$ python proxy_benchmark.py -n 100
cloudflare: AVG: 57.98ms,   MIN: 42.39ms,   MAX: 258.04ms
vercel:     AVG: 82.05ms,   MIN: 54.65ms,   MAX: 326.60ms
aws:        AVG: 131.95ms,  MIN: 103.64ms,  MAX: 722.90ms
aws-pc:     AVG: 145.10ms,  MIN: 109.22ms,  MAX: 1704.05ms

EC2 (US East N. Virginia)

$ python proxy_benchmark.py -n 100
cloudflare: AVG: 32.23ms,   MIN: 20.15ms,   MAX: 283.90ms
vercel:     AVG: 55.72ms,   MIN: 25.03ms,   MAX: 512.94ms
aws:        AVG: 43.91ms,   MIN: 22.20ms,   MAX: 130.78ms
aws-pc:     AVG: 68.13ms,   MIN: 24.46ms,   MAX: 973.50ms

As you can see, Cloudflare and Vercel are consistently very fast, and AWS Lambda in US East suffers (as expected) when measured from CA. I was surprised that AWS Lambda with provisioned concurrency was slower than without. Perhaps I misconfigured something...

Additional features

Along with the open source release, the proxy contains a number of useful built-in features.

Caching

The proxy automatically caches responses from the model provider if you set a seed value or temperature=0. Seeds are a new feature in the OpenAI API that allows you to create reproduceable results, but most model providers do not yet support them. The proxy automatically handles that for you.

Open sourcing the AI proxy

Deployment options

Benchmarks

Additional features

Caching

API key management

Load balancing

Azure OpenAI

What's next