SQL queries in Braintrust provide a precise, standard syntax for querying Braintrust logs, experiments, and datasets. Use SQL to:Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Filter and search for relevant logs and experiments. Use
WHEREclauses to filter individual records andHAVINGclauses to filter aggregated results after grouping. - Create consistent, reusable queries for monitoring.
- Build automated reporting and analysis pipelines.
- Write complex queries to analyze model performance.
Self-hosted deployments: SQL syntax support requires data plane version v1.1.29 or later.
Run SQL queries
SQL sandbox
To test SQL with autocomplete, validation, and a table of results, use the SQL sandbox in your project. In the sandbox, you can use Loop to generate and optimize queries from natural language: Example queries:- “Find the most common errors in logs over the last week”
- “What are the highest scoring rows in my experiment”
- “Show me error distribution over time”
- “List all traces where latency exceeded 60 seconds”
- “Update the query to show error distribution over time”
- “Add a filter to only show errors from specific models”
- “Group by user instead”
- Syntax errors
- Schema validation issues
- Field name corrections
If a
project_logs() query is missing a range filter on created, _xact_id, _pagination_key, or a specific root_span_id/id, the sandbox proactively warns you so you don’t have to wait for a timeout to discover the issue.API access
Access SQL programmatically with the Braintrust API:query(required): your SQL query string.fmt: response format (jsonorparquet, defaults tojson).tz_offset: timezone offset in minutes for time-based operations.audit_log: include audit log data.version: an_xact_idstring to query data as it existed at a specific point in time (useful for recovering deleted rows). Supported forexperimentanddatasetsources; not supported forproject_logs.
For correct day boundaries, set
tz_offset to match your timezone. For example, use 480 for US Pacific Standard Time.Query structure
SQL queries follow a familiar structure that lets you define what data you want, how you want it returned, and how to analyze it. This example returns logs from the last 7 days from a project where Factuality is greater than 0.8, sorts by created date descending, and limits the results to 100.SELECT/select:: Choose which fields to retrieve.FROM/from:: Specify the data source — a table function or a subquery.WHERE/filter:: Define conditions to filter the data.GROUP BY/dimensions:: Group rows for aggregation.HAVING/final_filter:: Filter aggregated results.sample:: (BTQL-only) Randomly sample a subset of the filtered data (rate or count-based).ORDER BY/sort:: Set the order of results (ASC/DESCorasc/desc).LIMIT/limit:: Control result size.OFFSET '<CURSOR_TOKEN>'/cursor:: Pagination token from the previous query response (SQLOFFSETmust be a string literal cursor token; use with cursor-compatible sorts such as_pagination_keyor_xact_id).
BTQL syntax
Braintrust also supports BTQL, an alternative pipe-delimited clause syntax. The parser automatically detects whether your query is SQL or BTQL:- SQL queries start with
SELECT,WITH, etc. followed by whitespace - BTQL queries use clause syntax like
select:,filter:, etc.
| SQL Clause | BTQL Clause |
|---|---|
SELECT ... | select: ... |
FROM table('id', shape => 'traces') | from: table('id') traces |
WHERE ... | filter: ... |
GROUP BY ... | dimensions: ... |
HAVING ... | final_filter: ... |
ORDER BY ... | sort: ... |
LIMIT n | limit: n |
OFFSET '<CURSOR_TOKEN>' | cursor: '<CURSOR_TOKEN>' |
SQL syntax specifies the shape with a named parameter (e.g.,
FROM experiment('id', shape => 'traces')), while BTQL uses a trailing token (e.g., from: experiment('id') traces). Table aliases on top-level table functions (e.g., FROM project_logs('id') AS t) are reserved for future use. Aliases on subquery sources (e.g., FROM (...) AS sub) are required.Full-text search: Use the
MATCH infix operator for full-text search:WHERE input MATCH 'search term'→filter: input MATCH 'search term'- Multiple columns require OR:
WHERE input MATCH 'x' OR output MATCH 'x'→filter: input MATCH 'x' OR output MATCH 'x'
SELECT
SELECT lets you choose specific fields, compute values, or use * to retrieve every field. Use SELECT alone to retrieve individual records, or combine it with GROUP BY to aggregate results. Both work with all data shapes (spans, traces, and summary).
Implicit aliasing: Multi-part identifiers like
metadata.model automatically create implicit aliases using their last component (e.g., model), which you can use in WHERE, ORDER BY, and GROUP BY clauses when unambiguous. See Field access for details.SELECT clause. This query returns metadata.model, whether metrics.tokens is greater than 1000, and a quality indicator of either “high” or “low” depending on whether or not the Factuality score is greater than 0.8.
SELECT clause to transform values and create meaningful aliases for your results. This query extracts the day the log was created, the hour, and a Factuality score rounded to 2 decimal places.
GROUP BY for aggregations
Instead of a simple SELECT, you can use SELECT ... GROUP BY to group and aggregate data. This query returns a row for each distinct model with the day it was created, the total number of calls, the average Factuality score, and the latency percentile.
count(expr): number of rowscount_distinct(expr): number of distinct valuessum(expr): sum of numeric valuesavg(expr): mean (average) of numeric valuesmin(expr): minimum valuemax(expr): maximum valueany_value(expr): an arbitrary non-null value from the group for the given expressionpercentile(expr, p): a percentile wherepis between 0 and 1
LIMIT works with GROUP BY queries to restrict the number of grouped results returned. When combined with ORDER BY, rows are sorted before limiting. See LIMIT for examples.HAVING for filtering aggregations
HAVING filters the results after aggregation, letting you narrow down grouped data based on aggregate values. Use HAVING with GROUP BY when you need to filter by aggregated metrics like counts, averages, or sums.
This query returns models with high average scores:
WHERE and HAVING to filter both before and after aggregation. This query filters individual logs before grouping, then filters the aggregated results:
HAVING supports the same operators and aggregate functions as other clauses. You can reference aggregated values by their alias or by repeating the aggregate expression.
FROM
The FROM clause identifies where the records are coming from.
Data sources
experiment('<experiment_id1>', <experiment_id2>): A specific experiment or list of experiments.dataset('<dataset_id1>', <dataset_id2>): A specific dataset or list of datasets.prompt('<prompt_id1>', <prompt_id2>): A specific prompt or list of prompts.function('<function_id1>', <function_id2>): A specific function or list of functions.view('<view_id1>', <view_id2>): A specific saved view or list of saved views.project_logs('<project_id1>', <project_id2>): All logs for a specific project or list of projects.project_prompts('<project_id1>', <project_id2>): All prompts for a specific project or list of projects.project_functions('<project_id1>', <project_id2>): All functions for a specific project or list of projects.org_prompts('<org_id1>', <org_id2>): All prompts for a specific organization or list of organizations.org_functions('<org_id1>', <org_id2>): All functions for a specific organization or list of organizations.
Subqueries
A subquery uses the result of an inner query as the data source for an outer query:span_filter, trace_filter, ANY_SPAN(), FILTER_SPANS()) is only supported in the innermost query, where it runs against raw data.
span_filterandtrace_filterare not supported on subquery sources.ANY_SPAN()andFILTER_SPANS()are not supported in the outerWHEREclause when the source is a subquery.HAVING(final_filter) is supported.
Data shapes
You can add an optional parameter to theFROM clause that defines how the data is returned. The options are spans (default), traces, and summary.
spans
spans returns individual spans that match the filter criteria. This example returns 10 LLM call spans that took more than 0.2 seconds to use the first token.
traces
traces returns all spans from traces that contain at least one matching span. This is useful when you want to see the full context of a specific event or behavior, for example if you want to see all spans in traces where an error occurred.
This example returns all spans for a specific trace where one span in the trace had an error.
summary
summary provides trace-level views of your data by aggregating metrics across all spans in a trace. This shape is useful for analyzing overall performance and comparing results across experiments.
The summary shape can be used in two ways:
- Individual trace summaries (using
SELECT): Returns one row per trace with aggregated span metrics. Use this to see trace-level details. Example: “What are the details of traces with errors?” - Aggregated trace analytics (using
GROUP BY): Groups multiple traces and computes statistics. Use this to analyze patterns across many traces. Example: “What’s the average cost per model per day?”
- Individual trace summaries
- Aggregated trace analytics
Use Summary rows include some aggregated metrics and some preview fields that show data from the root span of the trace.The following fields are aggregated metrics across all spans in the trace.
SELECT with the summary shape to retrieve individual traces with aggregated metrics. This is useful for inspecting specific trace details, debugging issues, or exporting trace-level data.This example returns 10 summary rows from the project logs for ‘my-project-id’:scores: an object with all scores averaged across all spansmetrics: an object with aggregated metrics across all spansprompt_tokens: total number of prompt tokens usedcompletion_tokens: total number of completion tokens usedprompt_cached_tokens: total number of cached prompt tokens usedprompt_cache_creation_tokens: total number of tokens used to create cache entriestotal_tokens: total number of tokens used (prompt + completion)estimated_cost: total estimated cost of the trace in US dollars (prompt + completion costs)llm_calls: total number of LLM callstool_calls: total number of tool callserrors: total number of errors (LLM + tool errors)llm_errors: total number of LLM errorstool_errors: total number of tool errorsstart: Unix timestamp of the first span start timeend: Unix timestamp of the last span end timeduration: wall-clock elapsed time of the trace in seconds, from the earliest span start to the latest span endllm_duration: sum of all durations across LLM spans in secondstime_to_first_token: the average time to first token across LLM spans in seconds
span_type_info: an object with span type info. Some fields in this object are aggregated across all spans and some reflect attributes from the root span.cached: true only if all LLM spans were cachedhas_error: true if any span had an error
input, output, expected, error, and metadata.WHERE
The WHERE clause lets you specify conditions to narrow down results. It supports a wide range of operators and functions, including complex conditions.
This example WHERE clause only retrieves data where:
- Factuality score is greater than 0.8
- model is “gpt-4”
- tag list contains “triage” (using
MATCHfor SQL mode; use BTQLINCLUDESfor exact matching) - input contains the word “question” (case-insensitive)
- created date is later than January 1, 2024
- more than 1000 tokens were used or the data being traced was made in production
Single span filters
By default, each returned trace includes at least one span that matches all filter conditions. UseANY_SPAN() to wrap any filter expression and find traces where at least one span matches the specified condition.
Single span filters work with the traces and summary data shapes.
ANY_SPAN() can be combined with GROUP BY to aggregate traces based on span-level conditions. This is useful for analyzing patterns across traces that contain specific types of spans. See Analyze based on tags and scores, Analyze based on tags, and Analyze traces with span filters for examples.
By default,
ANY_SPAN() matches against all spans in a trace. To restrict matching to only root spans, add is_root to the condition: ANY_SPAN(is_root AND error IS NOT NULL).ANY_SPAN() supports one level of nesting. Nested calls are flattened, which allows query builders and the Braintrust UI to compose filters by wrapping conditions in ANY_SPAN(). For example:
ANY_SPAN(ANY_SPAN(ANY_SPAN(...)))) is not supported. NOT ANY_SPAN() does not support multiple span filter clauses — for example, NOT ANY_SPAN(ANY_SPAN(a) AND ANY_SPAN(b)) is not supported.
Matching spans filters
WhileANY_SPAN() helps you find traces you care about, matching spans filters let you filter spans within the traces you’ve already found. Use FILTER_SPANS() to return only the matching spans from those traces, rather than entire traces. This is analogous to trace_filter in BTQL.
Matching spans filters work with the traces and summary data shapes. On the spans shape, FILTER_SPANS() acts as a no-op wrapper.
FILTER_SPANS() with other filter conditions:
Full-text search
UseMATCH to search a specific field for exact word matches, or search() to search across all text fields at once.
search() is equivalent to writing input MATCH query OR output MATCH query OR ... for each text field. When log search optimization is enabled, search() also benefits from bloom filter acceleration to skip irrelevant segments.
Pattern matching
SQL supports the% wildcard for pattern matching with LIKE (case-sensitive) and ILIKE (case-insensitive).
The % wildcard matches any sequence of zero or more characters.
Time intervals
SQL supports intervals for time-based operations. This query returns all project logs from ‘my-project-id’ that were created in the last day.ORDER BY
The ORDER BY clause determines the order of results. The options are DESC (descending) and ASC (ascending) on a numerical field. You can sort by a single field, multiple fields, or computed values.
PIVOT and UNPIVOT
PIVOT and UNPIVOT are advanced operations that transform your results for easier analysis and comparison. Both SQL and BTQL syntax support these operations.
PIVOT
PIVOT transforms rows into columns, which makes comparisons easier by creating a column for each distinct value. This is useful when comparing metrics across different categories, models, or time periods.
Structure:
- The pivot column must be a single identifier (e.g.,
metadata.model) - Must include at least one aggregate measure (e.g.,
SUM(value),AVG(score)) - Only
IN (ANY)is supported (explicit value lists, subqueries,ORDER BY, andDEFAULT ON NULLare not supported) SELECTlist must include the pivot column, all measures, and allGROUP BYcolumns (or useSELECT *)
metadata.model with a model named “gpt-4” for measure avg_score, the column becomes gpt-4_avg_score. When using aliases, the alias replaces the measure name in the output column.
Single aggregate - pivot one metric across categories:
PIVOT with GROUP BY for multi-dimensional analysis:
SELECT * - automatically includes all required columns:
UNPIVOT
UNPIVOT transforms columns into rows, which is useful when you need to analyze arbitrary scores and metrics without specifying each field name in advance. This is helpful when working with dynamic sets of metrics or when you want to normalize data for aggregation.
Key-value unpivot - transforms an object into rows with key-value pairs:
When using key-value unpivot, the source column must be an object (e.g.,
scores). When using array unpivot with _, the source column must be an array (e.g., tags)._ as the name column:
UNPIVOT operations to expand multiple columns:
UNPIVOT with GROUP BY to aggregate across unpivoted rows:
LIMIT and cursors
LIMIT
The LIMIT clause controls the size of the result in number of records.
LIMIT with GROUP BY, it restricts the number of grouped results returned. This is useful for getting top-N results after aggregation. When combined with ORDER BY, rows are sorted before limiting.
Cursors for pagination
Cursors are supported in both SQL and BTQL queries. Cursors are automatically returned in query responses. After an initial query, pass the returned cursor token in the follow-on query. When a cursor has reached the end of the result set, thedata array will be empty, and no cursor token will be returned.
In SQL syntax, pass cursor tokens using OFFSET '<CURSOR_TOKEN>'. Numeric offsets are not supported. For cursor pagination, use cursor-compatible sorts such as _pagination_key (recommended) or _xact_id.
Expressions
SQL operators
You can use the following operators in your SQL queries.SQL functions
You can use the following functions inSELECT, WHERE, GROUP BY clauses, and aggregate measures.
Field access
SQL provides flexible ways to access nested data in arrays and objects:Array indices are 0-based, and negative indices count from the end (-1 is the last element).
json_extract function to access values within it. The path parameter is treated as a literal string key name:
Implicit aliasing
When you reference multi-part identifiers (e.g.,metadata.category), SQL automatically creates an implicit alias using the last component of the path (e.g., category). This allows you to use the short form in your queries when unambiguous.
-
Ambiguity prevention: If multiple fields share the same last component (e.g.,
metadata.nameanduser.name), the short formnamebecomes ambiguous and cannot be used. You must use the full path instead. -
Top-level field priority: Top-level fields take precedence over nested fields. If you have both
idandmetadata.id, the short formidrefers to the top-level field. -
Explicit aliases override: When you provide an explicit alias (e.g.,
metadata.category AS cat), the implicit alias is disabled and you must use either the explicit alias or the full path. -
Duplicate alias detection: SQL will detect and reject queries with duplicate aliases in the SELECT list, whether explicit or implicit. For example,
SELECT id, user.number AS idwill raise an error.
Conditional expressions
SQL supports conditional logic using the ternary operator (? :):
Examples
Track token usage
This query helps you monitor token consumption across your application.Monitor model quality
Track model performance across different versions and configurations.Analyze errors
Identify and investigate errors in your application.Analyze latency
Monitor and optimize response times.Analyze prompts
Analyze prompt effectiveness and patterns.Analyze based on tags
Use tags to track and analyze specific behaviors.INCLUDES is not supported in SQL mode. The examples below use BTQL syntax, which supports INCLUDES for exact array membership. In SQL mode, MATCH is a fuzzy approximation (tags MATCH 'feedback'), but it performs full-text tokenization and may return false positives.Analyze based on tags and scores
A common pattern is filtering traces by both tags and scores when automated scorers apply scores at the span level while tags exist on root spans. Use separateANY_SPAN() clauses to match traces where any span contains the tag AND any span contains the score.
Each ANY_SPAN() clause evaluates independently across all spans in a trace. The conditions don’t need to match on the same span - the first ANY_SPAN() finds traces where at least one span has the tag, while the second finds traces where at least one span has the score.
Analyze traces with span filters
Use single span filters with aggregations to analyze traces based on span-level conditions. This is useful for understanding patterns across complex, multi-step operations.FILTER_SPANS() to analyze only the spans that match specific criteria:
Extract data from JSON strings
Usejson_extract to extract values from a JSON string using a key name. This is useful when you have JSON data stored as a string field and need to access specific values within it. The path parameter is treated as a literal key name (not a path expression with traversal).
json_extract returns null for invalid JSON or missing keys rather than raising an error, making it safe to use in filters and aggregations. The path parameter is a literal key name, not a path expression - characters like dots, brackets, etc. are treated as part of the key name itself.Query by classifications
Classifications are categorical labels assigned by topic maps (e.g.,classifications.Task[0].label). Each classification includes .label (the category), .metadata (with distance metrics), and .source information. Classifications are generated by topics automations.
Filter and analyze logs by topic classifications to understand patterns in your data.
Analyze facet distributions
Facets are AI-extracted attributes that summarize logs (e.g.,facets.task, facets.sentiment). They’re generated by topics automations.
Query logs by facet values to identify patterns and issues.