Quickstart
You can use the proxy without a Braintrust account by providing your API key from any supported provider. If you have a Braintrust account, you can use a single Braintrust API key to access all AI providers through one interface. The proxy is fully compatible with the OpenAI SDK. Set the API URL tohttps://api.braintrust.dev/v1/proxy.
Run the following script twice to see caching in action:
Configure API keys
Add provider API keys in your organization settings under AI providers, configure them at the project level to override organization defaults, or set them up inline when running playgrounds or prompts. Then use your Braintrust API key to access all providers through the proxy. Organization-level providers are available across all projects. Project-level providers override organization-level keys for that specific project, allowing you to isolate API usage, manage separate billing, or use different credentials per project. Project-level API keys take precedence over organization-level keys when making proxy requests in a project context. Without a Braintrust account, you can use the proxy with individual provider API keys to get automatic caching. The proxy response returns thex-bt-used-endpoint header, which specifies which of your configured providers was used to complete the request.
Supported providers
Standard providers include:- OpenAI (GPT-4o, GPT-4o-mini, o4-mini, etc.)
- Anthropic (Claude 4 Sonnet, Claude 3.5 Sonnet, etc.)
- Google (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)
- AWS Bedrock (Claude, Llama, Mistral models)
- Azure OpenAI Service
- Third-party providers (Together AI, Fireworks, Groq, Replicate, etc.)
Enable caching
The proxy automatically caches results and reuses them when possible. Because the proxy runs on the edge, cached requests return in under 100ms. This is especially useful when developing and frequently re-running or evaluating the same prompts.Cache modes
There are three caching modes:auto (default), always, never:
- In
automode, requests are cached if they havetemperature=0or theseedparameter set and they are one of the supported paths. - In
alwaysmode, requests are cached as long as they are one of the supported paths. - In
nevermode, the cache is never read or written to.
/auto/embeddings/chat/completions/completions/moderations
x-bt-use-cache header:
x-bt-cached: HIT or MISS to indicate cache status.
Cache TTL
By default, cached results expire after 1 week. Set the TTL for individual requests by passing thex-bt-cache-ttl header. The TTL is specified in seconds and must be between 1 and 604800 (7 days).
Cache control
The proxy supports a limited set of Cache-Control directives:- To bypass the cache, set the
Cache-Controlheader tono-cache, no-store. This is semantically equivalent to setting thex-bt-use-cacheheader tonever. - To force a fresh request, set the
Cache-Controlheader tono-cache. Without theno-storedirective, the response will be cached for subsequent requests. - To request a cached response with a maximum age, set the
Cache-Controlheader tomax-age=<seconds>. If the cached data is older than the specified age, the cache will be bypassed and a new response will be generated. Combine this withno-storeto bypass the cache for a request without overwriting the current cached response.
x-bt-use-cache header, the cache control directives take precedence.
The proxy returns the x-bt-cached header in the response with HIT or MISS to indicate whether the response was served from the cache, the Age header to indicate the age of the cached response, and the Cache-Control header with the max-age directive to return the TTL of the cached response.
For example, to set the cache mode to always with a TTL of 2 days:
Cache encryption
The proxy uses AES-GCM to encrypt the cache, using a key derived from your API key. Results are cached for 1 week unless otherwise specified in request headers. This design ensures that the cache is only accessible to you. Braintrust cannot see your data and does not store or log API keys.Because the cache’s encryption key is your API key, cached results are scoped to an individual user. Braintrust customers can opt into sharing cached results across users within their organization.
Enable logging
To log requests that you make through the proxy, specify anx-bt-parent header with the project or experiment you’d like to log to. While tracing, you must use a BRAINTRUST_API_KEY rather than a provider’s key. The proxy will derive your provider’s key and facilitate tracing using the BRAINTRUST_API_KEY.
x-bt-parent header sets the trace’s parent project or experiment. You can use a prefix like project_id:, project_name:, or experiment_id: or pass in a span slug (span.export()) to nest the trace under a span within the parent object.
Load balance across providers
If you have multiple API keys for a given model type (e.g., OpenAI and Azure forgpt-4o), the proxy automatically load balances across them. This is useful for working around per-account rate limits and providing resiliency if one provider is down.
To set up load balancing:
- Add your primary provider key (e.g., OpenAI) in your organization settings.
- Add Azure OpenAI as a custom provider for the same models.
- The proxy automatically distributes requests across both.
- Resilience if one provider is down
- Higher effective rate limits
- Geographic distribution
Use reasoning models
For hybrid deployments, reasoning support requires
v0.0.74 or later.- Supported providers: OpenAI, Anthropic, and Google
- Unified parameters: Consistent parameters related to reasoning:
reasoning_effort: Specify the desired level of reasoning complexityreasoning_enabled: Explicit flag to enable or disable reasoning output (has no effect for OpenAI models)reasoning_budget: Specify a budget for the reasoning process (requires eitherreasoning_effortorreasoning_enabled)
- Structured reasoning output: Responses include a list of
reasoningobjects as part of the assistant’s message. Each object contains thecontentof the reasoning step and a uniqueid. Include thesereasoningobjects from previous turns in subsequent requests to maintain context in multi-turn conversations. - Streaming support: A
reasoning_deltais available when streaming, allowing you to process reasoning output as it is generated. - Type safety: Type augmentations are available for better developer experience. For JavaScript/TypeScript, use the
@braintrust/proxy/typesmodule to extend OpenAI’s types. For Python, thebraintrust-proxypackage provides casting utilities for input parameters and output objects.
Non-streaming request
Here’s a non-streaming chat completion request using a Google model with reasoning enabled:Streaming request
This example shows how to handle thereasoning_delta when streaming chat completion responses:
Use alternative protocols
The proxy translates OpenAI requests into various provider APIs automatically. You can also use native Anthropic and Gemini API schemas.Anthropic API
anthropic-version and x-api-key headers are not required.
Gemini API
Add custom providers
Add custom models or endpoints to use with the proxy. Custom providers support self-hosted models, fine-tuned models, and proprietary AI services. See Custom providers for setup instructions and configuration options.Use realtime models
The proxy supports the OpenAI Realtime API at the/realtime endpoint using WebSockets. Use the official OpenAI SDK (v6.0+) to connect to the proxy’s realtime endpoint.
Use
https://braintrustproxy.com/v1, not https://api.braintrust.dev/v1/proxy, for WebSocket-based proxying.Node.js with ws library
In Node.js environments, useOpenAIRealtimeWS from the openai/realtime/ws module:
Log realtime sessions
To log realtime sessions to Braintrust, pass thex-bt-parent header when creating the connection:
- Text output:
response.output_text.deltaandresponse.output_text.done - Audio output:
response.output_audio.deltaandresponse.output_audio.done - Audio transcripts:
response.output_audio_transcript.deltaandresponse.output_audio_transcript.done
Compress audio
To reduce storage costs, enable audio compression by setting thex-bt-compress-audio header to true or 1:
Browser or Cloudflare workers
For browser and Cloudflare Workers environments, useOpenAIRealtimeWebSocket:
Temporary credentials for realtime
For frontend or mobile applications, use temporary credentials to avoid exposing your API key. Pass the temporary credential as theapiKey:
Create temporary credentials
A temporary credential converts your Braintrust API key (or model provider API key) to a time-limited credential that can be safely shared with end users.- Temporary credentials can carry additional information to limit access to a particular model and enable logging to Braintrust.
- They can be used in the
Authorizationheader anywhere you’d use a Braintrust API key or a model provider API key.
Issue temporary credentials
Call the/credentials endpoint from a privileged location, such as your app’s backend, to issue temporary credentials. The temporary credential will be allowed to make requests on behalf of the Braintrust API key (or model provider API key) provided in the Authorization header.
The body should specify the restrictions to be applied to the temporary credentials as a JSON object. If the logging key is present, the proxy will log to Braintrust any requests made with this temporary credential.
The following example grants access to gpt-4o-realtime-preview-2024-10-01 for 10 minutes, logging the requests to the project named “My project”:
Inspect temporary credentials
Temporary credentials are formatted as JSON Web Tokens (JWT). Inspect the JWT’s payload using a library such asjsonwebtoken or a web-based tool like JWT.io to determine the expiration time and granted models:
Do not modify the JWT payload. This will invalidate the signature. Instead, issue a new temporary credential using the
/credentials endpoint.Use PDF input
The proxy extends the OpenAI API to support PDF input. Pass PDF URLs or base64-encoded PDFs with MIME typeapplication/pdf:
data:application/pdf;base64,<BASE64_DATA> as the URL.
Specify an organization
If you’re part of multiple organizations, specify which to use with thex-bt-org-name header:
Advanced configuration
Configure proxy behavior with these headers:- x-bt-use-cache:
auto | always | never- Control caching behavior - x-bt-cache-ttl: Seconds (max 604800) - Set cache TTL
- x-bt-use-creds-cache:
auto | always | never- Control credentials caching (useful when rapidly updating credentials) - x-bt-org-name: Organization name - Specify organization for multi-org users
- x-bt-endpoint-name: Endpoint name - Use a specific configured endpoint
- x-bt-parent: Project/experiment/span - Enable logging to Braintrust
- x-bt-compress-audio:
true | false- Enable audio compression for realtime sessions
Monitor proxy usage
Track proxy usage across your organization:- Create a project for proxy logs.
- Enable logging by setting the
x-bt-parentheader when calling the proxy (see Enable logging). - View logs in the Logs page.
- Create dashboards to track usage, costs, and errors.
x-bt-used-endpoint header, which specifies which of your configured providers was used to complete the request.
Self-hosting
Self-hosted Braintrust deployments include a built-in proxy that runs in your environment. To configure your proxy URLs, see Configure API URLs in organization settings. For complete deployment instructions, see Self-hosting.Integration with Braintrust
Several features in Braintrust are powered by the proxy. For example, when you create a playground, the proxy handles running the LLM calls. Similarly, if you create a prompt, when you preview the prompt’s results, the proxy is used to run the LLM. However, the proxy is not required when you:- Run evaluations in your code.
- Load prompts to run in your code.
- Log traces to Braintrust.
Open source
The AI proxy is open source. View the code on GitHub.Next steps
- Deploy prompts to call versioned prompts through the proxy
- Evaluate reasoning models with standardized reasoning parameters
- Monitor deployments to track production performance
- Manage environments to separate dev and production
- Manage organizations to configure AI providers
- Manage projects for project-level provider configuration