Skip to content

QueryPlan Reference

A QueryPlan is a JSON/dict document that describes what data to fetch. The LLM produces it; the compiler turns it into SQL. You can also write plans by hand — the compiler doesn't care about the source.

Mental model

Think of a QueryPlan as a structured SELECT intent: which table, which filters, which aggregations, how to sort, and how many rows. The compiler fills in all the SQL syntax — the plan has zero SQL keywords.


Top-Level Structure

{
  "version": "1.0",
  "dataset": "orders",
  "filters":    [...],
  "dimensions": [...],
  "metrics":    [...],
  "order_by":   [...],
  "limit":      100,
  "offset":     0,
  "rollup":     {...}
}
Field Type Required Default Description
version "1.0" no "1.0" Schema version — always "1.0"
dataset string yes Logical table name (must be in schema.yaml)
filters array no [] WHERE conditions
dimensions array no [] GROUP BY columns + SELECT columns
metrics array no [] Aggregate expressions
order_by array no [] ORDER BY clauses
limit integer | null no 100 Row limit (max 1000)
offset integer no 0 Row offset for pagination
rollup object no Two-level aggregation (see Rollup)

Unknown fields are rejected

The QueryPlan model uses extra="forbid". Any field not listed above will cause a QueryPlanError immediately. This prevents prompt injection payloads that try to smuggle extra data through the plan.


Filters

{"field": "ship_country", "op": "=", "value": "Germany"}
Field Type Required Description
field string yes Logical column name (optionally qualified: "orders.ship_country")
op Operator yes Comparison operator — see table below
value any depends on op Filter value — becomes a bind parameter

Operators

Comparison

Operator SQL equivalent value
= col = :p scalar
!= col != :p scalar
> col > :p scalar (numeric / date)
>= col >= :p scalar
< col < :p scalar
<= col <= :p scalar

Membership

Operator SQL equivalent value
in col IN (:p0, :p1, ...) non-empty array
not_in col NOT IN (...) non-empty array

Empty IN lists are rejected

in / not_in values must be non-empty arrays. Empty lists now fail early with QueryPlanError instead of generating invalid SQL behavior.

Text

Operator SQL equivalent value
contains col ILIKE '%value%' string
not_contains col NOT ILIKE '%value%' string
starts_with col ILIKE 'value%' string
ends_with col ILIKE '%value' string

contains uses ILIKE

Text operators are case-insensitive (ILIKE). This is intentional — LLMs should not need to know the case of values in the database.

Null Checks

Operator SQL equivalent value
is_null col IS NULL omit or null
is_not_null col IS NOT NULL omit or null

Range

between is not currently supported by the compiler. Use two comparisons with and:

{
  "and": [
    {"cmp": {"left": {"col": "order_date"}, "op": ">=", "right": "2026-01-01"}},
    {"cmp": {"left": {"col": "order_date"}, "op": "<=", "right": "2026-01-31"}}
  ]
}

Relative Dates

Instead of a literal date string, filters can use a $relative_date sentinel that resolves to a concrete UTC timestamp at query time:

{
  "field": "order_date",
  "op": ">=",
  "value": {"$relative_date": {"op": "now_minus_days", "days": 30}}
}
Sentinel op Resolves to
now_minus_days datetime.utcnow() - timedelta(days=N)
now_minus_hours datetime.utcnow() - timedelta(hours=N)
today date.today().isoformat()

Dimensions

Dimensions map to SELECT + GROUP BY:

{"field": "ship_country", "alias": "country"}
Field Type Required Description
field string yes Logical column name (optionally qualified)
alias string no AS alias in SELECT — defaults to the column name

Metrics

Metrics map to aggregate SELECT expressions:

{"agg": "sum", "field": "freight", "alias": "total_freight"}
Field Type Required Description
agg Aggregation yes Aggregation function — see table below
field string depends Logical column name — required for all aggs except count(*)
alias string yes AS alias — must be unique within the plan

Aggregations

Agg SQL Field required?
count count(*) or count(col) no (use "*" for count(*))
count_distinct count(DISTINCT col) yes
sum sum(col) yes
avg avg(col) yes
min min(col) yes
max max(col) yes

Counting entities

To count distinct customers: {"agg": "count_distinct", "field": "customer_id", "alias": "n_customers"}.
To count rows: {"agg": "count", "field": "*", "alias": "n_rows"}.


Order By

{"by": "total_freight", "dir": "desc"}
Field Type Required Default Description
by string yes Metric alias or dimension alias/field name
dir "asc" | "desc" no "asc" Sort direction

Rollup

Rollup implements two-level aggregation — aggregate an already-aggregated inner query:

{
  "dataset": "order_details",
  "dimensions": [{"field": "order_details.order_id"}],
  "metrics": [{"agg": "sum", "field": "order_details.unit_price", "alias": "revenue"}],
  "rollup": {
    "metrics": [{"agg": "avg", "field": "revenue", "alias": "avg_revenue"}],
    "limit": 1,
    "offset": 0
  }
}

The compiler generates:

WITH _inner AS (
    SELECT order_id, sum(unit_price) AS revenue
    FROM order_details
    GROUP BY order_id
)
SELECT avg(revenue) AS avg_revenue
FROM _inner
LIMIT 1
Field Type Required Default Description
metrics array yes Outer aggregations — field must reference an inner metric alias
limit integer no 1 Outer LIMIT — minimum 1
offset integer no 0 Outer OFFSET

Rollup field references inner aliases, not columns

In the rollup metrics array, "field" must be the alias of a metric in the inner plan (e.g. "revenue"), not a raw column name. The compiler will raise a QueryPlanError otherwise.

When to use Rollup

Use rollup when the question asks for an aggregate of an aggregate:

Question pattern Needs rollup?
"total revenue" no — single-level sum
"revenue per customer" no — sum grouped by customer_id
"average revenue per customer" yesavg(sum(revenue) per customer)
"standard deviation of orders per day" yes
"top customer by number of orders" no — count grouped by customer_id + ORDER BY + LIMIT

Column Qualification

When a plan joins multiple tables, column references must be qualified to avoid ambiguity:

{"field": "orders.freight"}        // qualified
{"field": "freight"}               // unqualified — only valid if unambiguous

Use the logical table name, not the physical db_table name.

If an unqualified column exists in multiple joined tables, the compiler raises AmbiguousColumnError.

Explicit Joins and as

When using advanced plans with explicit joins, you can join the same dataset more than once only if you provide as aliases:

{
  "dataset": "orders",
  "joins": [
    {
      "dataset": "orders",
      "as": "o2",
      "type": "inner",
      "on": {"cmp": {"left": {"col": "orders.order_id"}, "op": "=", "right": {"col": "o2.order_id"}}}
    }
  ],
  "select": [{"expr": {"col": "orders.order_id"}, "alias": "order_id"}]
}

Duplicate explicit joins without a unique as alias are rejected with QueryPlanError.


Examples

Scalar aggregate

{
  "dataset": "orders",
  "metrics": [{"agg": "count", "field": "*", "alias": "total_orders"}],
  "limit": 1
}

Filtered list

{
  "dataset": "products",
  "filters": [
    {"field": "discontinued", "op": "=", "value": false},
    {"field": "unit_price", "op": ">=", "value": 20}
  ],
  "dimensions": [
    {"field": "product_name"},
    {"field": "unit_price"}
  ],
  "order_by": [{"by": "unit_price", "dir": "desc"}],
  "limit": 20
}

Group-by with multiple metrics

{
  "dataset": "orders",
  "dimensions": [{"field": "ship_country", "alias": "country"}],
  "metrics": [
    {"agg": "count", "field": "*", "alias": "n_orders"},
    {"agg": "sum", "field": "freight", "alias": "total_freight"},
    {"agg": "avg", "field": "freight", "alias": "avg_freight"}
  ],
  "order_by": [{"by": "n_orders", "dir": "desc"}],
  "limit": 10
}

Cross-table query (auto join)

{
  "dataset": "orders",
  "filters": [{"field": "customers.country", "op": "=", "value": "UK"}],
  "dimensions": [
    {"field": "customers.company_name", "alias": "customer"},
    {"field": "orders.order_date"}
  ],
  "metrics": [{"agg": "sum", "field": "orders.freight", "alias": "freight"}],
  "order_by": [{"by": "freight", "dir": "desc"}],
  "limit": 25
}

auto_inject_joins will resolve the orderscustomers join automatically.

Relative date filter

{
  "dataset": "orders",
  "filters": [
    {
      "field": "order_date",
      "op": ">=",
      "value": {"$relative_date": {"op": "now_minus_days", "days": 90}}
    }
  ],
  "metrics": [{"agg": "count", "field": "*", "alias": "recent_orders"}],
  "limit": 1
}