Skip to content

Schema Reference

schema.yaml is the allowlist that the IntentQL compiler enforces. Only tables and columns declared here are reachable by the LLM or any QueryPlan. Anything not listed causes the compiler to raise a QueryPlanError — there is no silent fallthrough, no dynamic DB introspection.

Quick start

Copy the Northwind example in Getting Started and adapt it to your database. You only need tables and links to get going.


Top-Level Structure

version: 1          # always 1
dialect: postgres   # always postgres for now

tables:
  - ...

links:
  - ...

Tables

Each entry under tables maps a logical name (what the LLM and QueryPlan use) to a physical name (what Postgres actually has).

tables:
  - name: orders               # (required) logical name — used in QueryPlan "dataset" and field refs
    db_table: orders           # (required) physical table name in Postgres
    description: "Customer orders placed through the storefront"  # (optional) shown to LLM
    primary_id: order_id       # (optional) primary key logical name — used by semantic lint Rule 5

    columns:
      - name: order_id         # (required) logical column name
        db_column: order_id    # (required) physical column name in Postgres
        type: integer          # (optional) type hint — see supported types below
        description: "Unique order identifier"  # (optional) shown to LLM

Supported type Values

Type string Postgres type
integer, int INTEGER
bigint BIGINT
float, double FLOAT
numeric, decimal NUMERIC
varchar, text, string TEXT
boolean, bool BOOLEAN
date DATE
timestamp, datetime TIMESTAMP
uuid UUID
json, jsonb JSONB

Unknown type strings fall back to TEXT.

Quoted Identifiers

If your physical table or column name contains uppercase letters, spaces, or reserved words, IntentQL will automatically double-quote it in the generated SQL. You can also force quoting by wrapping the name in the YAML string:

- name: user_events
  db_table: '"UserEvents"'   # becomes FROM "UserEvents" in SQL
  columns:
    - name: event_type
      db_column: '"EventType"'

primary_id

The primary_id field names the primary key column's logical name. It is used by:

  1. Intent pipeline — the plan builder uses count_distinct(primary_id) instead of count(*) for accurate counting.
  2. Semantic lint Rule 5 — verifies that "how many" questions use count_distinct on the primary key.
  3. Spec builder — auto-generates correct count_distinct(primary_id) examples in the LLM prompt.

It is not required, but recommended for any table you expect users to ask "how many X" questions about.

primary_date

The primary_date field names the date column used for time range filters. When the user asks about "last year" or "last 30 days," the intent pipeline maps the time range to date filters on this column.

- name: work_orders
  db_table: '"finalWorkOrders"'
  primary_id: work_order_id
  primary_date: created_date

Without primary_date, time range mentions in questions are silently ignored.

keyword_search_or (optional)

Optional list of logical column names (at least two). When present, it tells the LLM and semantic lint that keyword-style questions should use OR (ILIKE-style contains on each column), not a single column. Declare only the columns that should participate in that OR; omit this key (or list a single column) if searches are intentionally one-column.

- name: work_orders
  db_table: finalWorkOrder
  keyword_search_or:
    - work_order_description
    - long_desc
    - shop_name
  columns:
    - ...

If keyword_search_or has two or more names and a plan uses legacy contains on any of those columns without an advanced where with or covering all of them, semantic lint flags a retry. A plan that puts two or more legacy contains filters on different keyword columns is especially wrong: legacy filters are ANDed, so each row must satisfy every predicate — narrower than OR and not the usual “keyword in any of these fields” intent.


Links declare join relationships between tables. The compiler uses these when auto_inject_joins is active, or when the QueryPlan explicitly names a link.

links:
  - name: orders_to_customers          # (required) unique link name
    from_table: orders                 # (required) logical table name — must exist in tables
    to_table: customers                # (required) logical table name — must exist in tables
    join_type: left                    # (optional) "left" (default) or "inner"
    "on":                              # (required) must be quoted! see warning below
      - left: orders.customer_id       # logical qualified column from `from_table`
        op: "="                        # currently only "=" is supported
        right: customers.customer_id   # logical qualified column from `to_table`
    optional: true                     # (optional) default true — whether the join is LEFT or INNER

on: must always be quoted

on is a YAML 1.1 reserved word that parses as boolean true. Always write "on": (with double quotes). IntentQL's schema validator will raise a SchemaError with a helpful message if it detects this issue.

# WRONG — parsed as boolean true
links:
  - name: orders_to_customers
    on:
      - left: orders.customer_id

# CORRECT
links:
  - name: orders_to_customers
    "on":
      - left: orders.customer_id

Multi-Column Joins

For composite keys, list multiple conditions under "on"::

links:
  - name: order_details_join
    from_table: order_details
    to_table: orders
    join_type: inner
    "on":
      - left: order_details.order_id
        op: "="
        right: orders.order_id

Bidirectional Traversal

Links are stored bidirectionally — auto_inject_joins can traverse them in either direction to find the shortest path between any two tables.


Full Example

version: 1
dialect: postgres

tables:
  - name: customers
    db_table: customers
    description: "Customer master records"
    primary_id: customer_id
    columns:
      - {name: customer_id, db_column: customer_id, type: varchar}
      - {name: company_name, db_column: company_name, type: varchar}
      - {name: city, db_column: city, type: varchar}
      - {name: country, db_column: country, type: varchar}

  - name: orders
    db_table: orders
    description: "Orders placed by customers"
    primary_id: order_id
    columns:
      - {name: order_id, db_column: order_id, type: integer}
      - {name: customer_id, db_column: customer_id, type: varchar}
      - {name: order_date, db_column: order_date, type: date}
      - {name: freight, db_column: freight, type: numeric, description: "Shipping cost"}
      - {name: ship_country, db_column: ship_country, type: varchar}

  - name: order_details
    db_table: order_details
    description: "Line items within each order"
    columns:
      - {name: order_id, db_column: order_id, type: integer}
      - {name: product_id, db_column: product_id, type: integer}
      - {name: unit_price, db_column: unit_price, type: numeric}
      - {name: quantity, db_column: quantity, type: smallint}
      - {name: discount, db_column: discount, type: numeric}

links:
  - name: orders_to_customers
    from_table: orders
    to_table: customers
    join_type: left
    "on":
      - left: orders.customer_id
        op: "="
        right: customers.customer_id

  - name: order_details_to_orders
    from_table: order_details
    to_table: orders
    join_type: left
    "on":
      - left: order_details.order_id
        op: "="
        right: orders.order_id

Schema Validation

When execute_query_plan or load_and_validate_schema is called, IntentQL validates the schema file and:

  • Raises SchemaError for fatal problems (missing required fields, unknown link tables, invalid join_type, "on" parsed as boolean)
  • Prints warnings for non-fatal issues (e.g. primary_id declared but not in the columns list)

You can run schema validation standalone:

from intentql.api.api import load_and_validate_schema

schema = load_and_validate_schema("config/schema.yaml")
# raises SchemaError on fatal issues
# prints warnings to stdout for non-fatal issues