Open sourcing the AI proxy
Last week, we released the Braintrust AI Proxy, a new, free way to access LLaMa2, Mistral, OpenAI, Anthropic, and many other models behind the OpenAI protocol with built-in caching and API key management.
Folks immediately started reaching out about running the proxy in production. We firmly believe that code on the critical path to production should be open source, so we're excited to announce that the proxy's source code is now available on GitHub under the MIT license.
Deployment options
You can continue to access the proxy, for free, by using the hosted version at https://braintrustproxy.com
. It's hosted
on Cloudflare workers and end-to-end encrypts cached data using 256-bit AES-GCM encryption.
For more details, see the documentation or source code.
The repository also contains instructions for deploying the proxy to Vercel Edge Functions, Cloudflare workers, AWS Lambda, or as a plain-old Express server.
Benchmarks
I did some quick benchmarks, from my in-laws' place in California and an EC2 machine (US East N. Virginia) to compare performance across options (code).
The AWS Lambda functions are deployed in us-east-1
. aws-pc
is AWS Lambda with provisioned concurrency.
In-laws (CA)
EC2 (US East N. Virginia)
As you can see, Cloudflare and Vercel are consistently very fast, and AWS Lambda in US East suffers (as expected) when measured from CA. I was surprised that AWS Lambda with provisioned concurrency was slower than without. Perhaps I misconfigured something...
Additional features
Along with the open source release, the proxy contains a number of useful built-in features.
Caching
The proxy automatically caches responses from the model provider if you set a seed
value or temperature=0
.
Seeds are a new feature in the OpenAI API that allows you to create reproduceable results, but most model providers
do not yet support them. The proxy automatically handles that for you.
API key management
You can add API keys across providers as secrets in Braintrust, and use a single API key to access all of them. This is a great way to manage your API keys in one place, and share them with your team.
Load balancing
You can now add multiple keys and organizations as secrets in Braintrust, and the proxy will automatically load balance across them for you. This is a simple way to add resiliency across OpenAI accounts or providers (e.g. OpenAI and Azure).
Azure OpenAI
You can access Azure's OpenAI endpoints through the proxy, with vanilla OpenAI drivers, by configuring Azure endpoints in Braintrust. If you configure both OpenAI and Azure endpoints, the proxy will automatically load balance between them.
What's next
We have an exciting roadmap ahead for the proxy, including more advanced load balancing/resiliency features, support for more models/providers, and deeper integrations into Braintrust.
If you have any feedback or want to collaborate, send us an email or join our Discord.