DataSet
DataSet manifests describe derived tables built using SQL or PRQL (Pipelined Relational Query Language).
Datasets typically select from DataSource views and may depend on other datasets.
Spec overview
Section titled “Spec overview”apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_summary
spec:
query: |
SELECT ...
dependencies:
- sales_csv
- exchange_ratesFields:
spec.query– DuckDB SQL query. Required ifspec.prqlis not set.spec.prql– PRQL query. When set, takes precedence overspec.query. The PRQL is sent directly to DuckDB which compiles it via the prql extension.spec.dependencies– optional list of datasource names that must be materialized before the query runs.
Simple aggregation
Section titled “Simple aggregation”---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSource
metadata:
name: sales_csv
spec:
type: csv
path: ./data/sales.csv
---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_by_region
spec:
query: |
SELECT
region,
SUM(amount) AS total_amount
FROM sales_csv
GROUP BY region
dependencies:
- sales_csvDataset depending on multiple inputs
Section titled “Dataset depending on multiple inputs”---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSource
metadata:
name: sales_fact
spec:
type: parquet
path: ./warehouse/fact_sales/*.parquet
---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSource
metadata:
name: currency_rates
spec:
type: csv
path: ./data/currency_rates.csv
---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSet
metadata:
name: revenue_eur
spec:
query: |
SELECT
f.region,
f.booking_date,
f.amount * r.rate_to_eur AS revenue_eur
FROM sales_fact f
JOIN currency_rates r
ON r.currency = f.currency
AND r.valid_on = f.booking_date;
dependencies:
- sales_fact
- currency_ratesDatasets are referenced from layouts via their metadata.name using the dataset field of charts, tables, and text components.
Using PRQL
Section titled “Using PRQL”PRQL is a modern language for transforming data that compiles to SQL. It offers a more readable, pipeline-based syntax compared to traditional SQL.
When you specify spec.prql, bino automatically loads the DuckDB PRQL extension and sends your PRQL query directly to DuckDB for compilation and execution.
Simple PRQL example
Section titled “Simple PRQL example”---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSource
metadata:
name: sales_csv
spec:
type: csv
path: ./data/sales.csv
---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSet
metadata:
name: sales_by_region
spec:
prql: |
from sales_csv
filter amount > 0
group {region} (
aggregate {
total_amount = sum amount,
order_count = count this
}
)
sort {-total_amount}
dependencies:
- sales_csvPRQL with joins and transformations
Section titled “PRQL with joins and transformations”---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSource
metadata:
name: orders
spec:
type: csv
path: ./data/orders.csv
---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSource
metadata:
name: customers
spec:
type: csv
path: ./data/customers.csv
---
apiVersion: rainbow.bino.bi/v1alpha1
kind: DataSet
metadata:
name: customer_orders
spec:
prql: |
from orders
join customers (==customer_id)
derive full_name = f"{customers.first_name} {customers.last_name}"
select {
order_id,
full_name,
order_date,
total
}
sort {-order_date}
take 100
dependencies:
- orders
- customersFor more PRQL syntax and examples, see the PRQL documentation.