DataSet
DataSet manifests describe derived tables built using SQL or PRQL (Pipelined Relational Query Language).
Datasets typically select from DataSource views and may depend on other datasets.
Spec overview
Section titled “Spec overview”apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_summary
spec:
query: |
SELECT ...
dependencies:
- sales_csv
- exchange_ratesFields:
spec.query– SQL query, either inline or loaded from an external file. Required ifspec.prqlandspec.sourceare not set.spec.prql– PRQL query, either inline or loaded from an external file. When set, takes precedence overspec.query. The PRQL is sent directly to the query engine which compiles it via the prql extension.spec.source– direct pass-through to a DataSource without transformation. Mutually exclusive withqueryandprql.spec.dependencies– optional list of datasource references. Can be string names or inline DataSource definitions. Inline definitions are referenced via@inline(N)syntax in queries.
External query files
Section titled “External query files”For complex queries, you can store them in separate .sql or .prql files and reference them using the $file syntax. This keeps your YAML manifests clean and enables syntax highlighting in your editor.
Loading SQL from an external file
Section titled “Loading SQL from an external file”apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_summary
spec:
query:
$file: ./queries/sales_summary.sql
dependencies:
- sales_csvLoading PRQL from an external file
Section titled “Loading PRQL from an external file”apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: customer_orders
spec:
prql:
$file: ./queries/customer_orders.prql
dependencies:
- orders
- customersFile path resolution
Section titled “File path resolution”- Paths are resolved relative to the manifest file containing the
$filereference - Both
./relative/path.sqlandrelative/path.sqlformats are supported - Absolute paths are also supported but not recommended for portability
Auto-reload and caching
Section titled “Auto-reload and caching”External query files are fully integrated with bino’s caching and hot-reload system:
- Cache invalidation: When an external SQL or PRQL file changes, the dataset cache is automatically invalidated
- Hot-reload in preview: Changes to external query files trigger automatic refresh in
bino previewmode - Dependency graph: External files appear in
bino graphoutput for visibility
Organizing query files
Section titled “Organizing query files”A recommended project structure for external queries:
my-report/
├── bino.toml
├── manifests/
│ ├── datasources.yaml
│ └── datasets.yaml
├── queries/
│ ├── sales_by_region.sql
│ ├── monthly_revenue.sql
│ └── customer_analysis.prql
└── data/
└── sales.csvSimple aggregation
Section titled “Simple aggregation”---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
name: sales_csv
spec:
type: csv
path: ./data/sales.csv
---
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_by_region
spec:
query: |
SELECT
region,
SUM(amount) AS total_amount
FROM sales_csv
GROUP BY region
dependencies:
- sales_csvDataset depending on multiple inputs
Section titled “Dataset depending on multiple inputs”---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
name: sales_fact
spec:
type: parquet
path: ./warehouse/fact_sales/*.parquet
---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
name: currency_rates
spec:
type: csv
path: ./data/currency_rates.csv
---
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: revenue_eur
spec:
query: |
SELECT
f.region,
f.booking_date,
f.amount * r.rate_to_eur AS revenue_eur
FROM sales_fact f
JOIN currency_rates r
ON r.currency = f.currency
AND r.valid_on = f.booking_date;
dependencies:
- sales_fact
- currency_ratesDatasets are referenced from layouts via their metadata.name using the dataset field of charts, tables, and text components.
Using PRQL
Section titled “Using PRQL”PRQL is a modern language for transforming data that compiles to SQL. It offers a more readable, pipeline-based syntax compared to traditional SQL.
When you specify spec.prql, bino automatically loads the PRQL extension and sends your PRQL query directly to the query engine for compilation and execution.
Simple PRQL example
Section titled “Simple PRQL example”---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
name: sales_csv
spec:
type: csv
path: ./data/sales.csv
---
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_by_region
spec:
prql: |
from sales_csv
filter amount > 0
group {region} (
aggregate {
total_amount = sum amount,
order_count = count this
}
)
sort {-total_amount}
dependencies:
- sales_csvPRQL with joins and transformations
Section titled “PRQL with joins and transformations”---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
name: orders
spec:
type: csv
path: ./data/orders.csv
---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
name: customers
spec:
type: csv
path: ./data/customers.csv
---
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: customer_orders
spec:
prql: |
from orders
join customers (==customer_id)
derive full_name = f"{customers.first_name} {customers.last_name}"
select {
order_id,
full_name,
order_date,
total
}
sort {-order_date}
take 100
dependencies:
- orders
- customersFor more PRQL syntax and examples, see the PRQL documentation.
Standard Dataset Schema
Section titled “Standard Dataset Schema”To work correctly with visualization components like Charts and Tables, your dataset queries should return rows that conform to the standard schema. While not all fields are required for every component, following this structure ensures proper aggregation, drill-down, and filtering behavior.
Measure Values (Scenarios)
Section titled “Measure Values (Scenarios)”The schema supports four parallel sets of scenario columns for comparative analysis (e.g., Actual vs Plan).
| Column | Description |
|---|---|
ac1 … ac4 | Actual values (current measurements) |
pp1 … pp4 | Previous Period values (Year-over-Year, Month-over-Month) |
fc1 … fc4 | Forecast values (predictions) |
pl1 … pl4 | Plan/Budget values (targets) |
Grouping Dimensions
Section titled “Grouping Dimensions”These fields determine how data is hierarchically organized and sorted.
| Dimension | Index Column | Description |
|---|---|---|
rowGroup | rowGroupIndex | Top-level row grouping (e.g., “Revenue”, “Costs”). |
category | categoryIndex | Primary data dimension (e.g., “Product A”, “Region North”). |
subCategory | subCategoryIndex | Detail dimension for drill-down (creates “thereOf” rows). |
columnGroup | columnGroupIndex | Column-level grouping for breaking down measures. |
Note: When providing a grouping dimension string (e.g., rowGroup), you should likewise provide the corresponding index (e.g., rowGroupIndex) to ensure consistent sort ordering.
Metadata & Control Fields
Section titled “Metadata & Control Fields”| Column | Type | Description |
|---|---|---|
date | string | Required for TimeCharts. ISO 8601 date (e.g., 2024-01-15). Used for time-series axes and normalization. |
operation | string | Aggregation sign: '+' (add) or '-' (subtract). Defaults to '+'. Useful for P&L structures where costs subtract from totals. |
setname | string | Dataset identifier. Used by some charts to distinguish multiple query results. |
Example Query
Section titled “Example Query”Here is an example SQL query producing a compliant dataset:
SELECT
'Revenue' as rowGroup,
1 as rowGroupIndex,
product_name as category,
product_rank as categoryIndex,
'2024-01-15' as date,
sum(sales_amount) as ac1,
sum(budget_amount) as pl1
FROM sales_data
GROUP BY product_name, product_rankData Validation
Section titled “Data Validation”bino can validate query results against the standard schema at build time. This catches common data issues early:
Validation checks
Section titled “Validation checks”| Check | Description |
|---|---|
| Type validation | String fields (category, rowGroup, etc.) must be strings |
| Type validation | Number fields (ac1, categoryIndex, etc.) must be numbers |
| Enum validation | operation must be "+" or "-" |
| Date format | date must be ISO 8601 format (YYYY-MM-DD) |
| Dependent required | rowGroup requires rowGroupIndex, category requires categoryIndex, etc. |
Enabling validation
Section titled “Enabling validation”Data validation is enabled by default in warn mode. Configure via CLI flags:
# Log warnings and continue (default)
bino build --data-validation=warn
# Treat validation errors as fatal
bino build --data-validation=fail
# Skip validation entirely
bino build --data-validation=offFor lint-only validation (without building), use --execute-queries:
bino lint --execute-queriesSample size
Section titled “Sample size”For efficiency, bino validates only the first N rows per dataset. Configure via environment variable:
BNR_DATA_VALIDATION_SAMPLE_SIZE=100 bino buildDefault sample size is 10 rows.
Conditional inclusion with constraints
Section titled “Conditional inclusion with constraints”DataSet documents support metadata.constraints to conditionally include them for specific artefacts, modes, or environments.
Preview-only datasets
Section titled “Preview-only datasets”Include debug or diagnostic datasets only during development:
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: debug_metrics
constraints:
- mode==preview
spec:
query: |
SELECT * FROM raw_data LIMIT 100
dependencies:
- raw_dataFormat-specific datasets
Section titled “Format-specific datasets”Use different datasets for different output formats:
# Detailed dataset for print (A4/Letter)
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_summary
constraints:
- spec.format in [a4,letter]
spec:
query: |
SELECT region, product, month, revenue, costs, margin
FROM sales_fact
GROUP BY region, product, month
---
# Summarized dataset for screen (XGA)
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_summary
constraints:
- field: spec.format
operator: "=="
value: xga
spec:
query: |
SELECT region, SUM(revenue) as total_revenue
FROM sales_fact
GROUP BY regionFor the full constraint syntax and operators, see Constraints and Scoped Names.
Inline DataSet definitions
Section titled “Inline DataSet definitions”Instead of defining DataSets as separate documents, you can define them inline directly within components. This is useful for simple, component-specific data transformations.
Inline DataSet in a Chart
Section titled “Inline DataSet in a Chart”apiVersion: bino.bi/v1alpha1
kind: ChartStructure
metadata:
name: sales_chart
spec:
dataset:
query: |
SELECT region, SUM(amount) as total
FROM sales_csv
GROUP BY region
dependencies:
- sales_csv
chartTitle: Sales by Region
chartType: bar
# ... chart configurationInline DataSet with inline DataSource
Section titled “Inline DataSet with inline DataSource”You can nest inline DataSource definitions within an inline DataSet. Use the @inline(N) syntax to reference them by index:
apiVersion: bino.bi/v1alpha1
kind: ChartStructure
metadata:
name: quick_chart
spec:
dataset:
dependencies:
- type: csv
path: ./data/sales.csv
query: |
SELECT region, SUM(amount) as total
FROM @inline(0)
GROUP BY region
chartTitle: Sales Overview
# ... chart configurationThe @inline(0) reference points to the first item in the dependencies array (0-indexed). When you have multiple inline DataSources:
spec:
dataset:
dependencies:
- type: csv
path: ./data/orders.csv
- type: csv
path: ./data/customers.csv
query: |
SELECT c.name, COUNT(*) as order_count
FROM @inline(0) o
JOIN @inline(1) c ON o.customer_id = c.id
GROUP BY c.nameMixing inline and named references
Section titled “Mixing inline and named references”You can mix inline DataSource definitions with named references:
spec:
dataset:
dependencies:
- sales_csv # Named reference to existing DataSource
- type: csv # Inline definition
path: ./data/rates.csv
query: |
SELECT s.*, r.rate
FROM sales_csv s
JOIN @inline(1) r ON s.currency = r.codeNote: Named references use their document name directly in SQL, while inline definitions use @inline(N).
Direct source pass-through
Section titled “Direct source pass-through”For simple cases where you want to use a DataSource directly without transformation, use the source field:
apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_passthrough
spec:
source: sales_csvThis is equivalent to SELECT * FROM sales_csv but more explicit and efficient. The source field is mutually exclusive with query and prql.
Inline source pass-through
Section titled “Inline source pass-through”You can also use inline DataSource definitions with source:
apiVersion: bino.bi/v1alpha1
kind: Table
metadata:
name: raw_data_table
spec:
dataset:
source:
type: csv
path: ./data/report.csv
tableTitle: Raw Data
# ... table configurationHow inline definitions work
Section titled “How inline definitions work”When bino processes inline definitions:
- Materialization: Inline definitions are converted to synthetic documents during YAML loading
- Hash-based naming: Generated documents get unique names like
_inline_datasource_a1b2c3d4 - Deduplication: Identical inline definitions share the same generated document
- Labels: Generated documents have
bino.bi/generated: "true"andbino.bi/inline: "true"labels
This means inline definitions are fully compatible with caching, the dependency graph, and all other bino features.
Naming restrictions
Section titled “Naming restrictions”User-defined document names must not start with _inline_ as this prefix is reserved for generated inline definitions. The lint rule inline-naming-conflict enforces this.