Applies to:
- Plan:
- Deployment:
Summary
Issue: After upgrading to data plane v1.1.32, GCP self-hosted deployments experience checksum mismatch errors inbraintrust-api logs, causing traces to appear incomplete (spans stuck “in progress”), 500 Internal Server Error responses on the /logs3 endpoint, and generally unusable tracing.
Cause: Data plane v1.1.32 bundles AWS SDK v3.723+, which changed the default to enable CRC32 response checksum validation. GCP’s S3 compatibility layer does not return the checksum headers the SDK now expects, causing every object storage operation to fail with a checksum mismatch.
Resolution: Set two environment variables on braintrust-api to disable strict checksum validation, or upgrade to Helm chart v5.0.1+ which includes this fix automatically.
Symptoms
You may see one or more of the following after upgrading to data plane v1.1.32:- Checksum mismatch errors in
braintrust-apilogs: - Traces with child spans stuck “in progress” that never complete
500 Internal Server Errorresponses with"Service":"api"on the/logs3endpoint- Spans within traces appearing inconsistently or missing
Who is affected
This issue only affects deployments that meet all of these criteria:- Running on GCP
- Using S3 compatibility mode for object storage
- Not using native GCS auth (
ENABLE_GCS_AUTHis not set totrue) - Upgraded to data plane v1.1.32
Resolution Steps
Option 1: Upgrade Helm chart to v5.0.1+ (recommended)
Step 1: Update your Helm chart version
Upgrade to Helm chart version 5.0.1 or later, which includes the fix automatically.Step 2: Apply the upgrade
Step 3: Verify the fix
Checkbraintrust-api logs to confirm the checksum mismatch errors have stopped. Send a test trace and verify that all spans complete successfully.
Option 2: Manually set environment variables
If you cannot upgrade the Helm chart immediately, set these two environment variables on thebraintrust-api deployment:
Step 1: Add environment variables
Add the following to yourbraintrust-api configuration (via Helm extraEnvVars, Terraform, or your deployment manifest):
Step 2: Restart the API pods
Step 3: Verify the fix
Checkbraintrust-api logs for the checksum error. It should no longer appear. Send a test trace and verify all spans complete.