Skip to content
GitHub

DataSource

DataSource manifests describe how bino loads raw data into the query engine. Each datasource becomes a view named after metadata.name.

metadata.name for DataSource must match the sqlIdentifier pattern:

  • ^[a-z_][a-z0-9_]*$
  • Lowercase letters, digits, and underscores only
  • Must start with a letter or underscore

Use these names directly in DataSet.spec.query.

apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: sales_csv
spec:
  type: csv # inline | excel | csv | parquet | postgres_query | mysql_query
  inline: {} # for type: inline
  content: [] # alternative inline content
  path: ./data/*.csv # for file-based types
  connection: {} # for database queries
  query: "" # SQL for postgres_query / mysql_query
  ephemeral: false # optional caching hint

Type-specific rules (simplified from the schema):

  • type: inline – requires either inline (object with content) or content (array or JSON string).
  • type: excel | csv | parquet – requires path.
  • type: postgres_query | mysql_query – requires connection and query.

See the JSON schema for precise conditions.

---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: kpi_inline
spec:
  type: inline
  inline:
    content:
      - { label: "Revenue", value: 123.45 }
      - { label: "EBIT", value: 12.34 }
---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: sales_daily
spec:
  type: csv
  path: ./data/sales_daily/*.csv
---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: fact_sales_parquet
spec:
  type: parquet
  path: ./warehouse/fact_sales/*.parquet
  ephemeral: false # allow caching between builds
---
apiVersion: bino.bi/v1alpha1
kind: ConnectionSecret
metadata:
  name: postgresCredentials
spec:
  type: postgres
  postgres:
    passwordFromEnv: POSTGRES_PASSWORD
---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: sales_from_postgres
spec:
  type: postgres_query
  connection:
    host: ${DB_HOST:db.example.com}
    port: 5432
    database: analytics
    schema: public
    user: reporting
    secret: postgresCredentials
  query: |
    SELECT *
    FROM fact_sales
    WHERE booking_date >= DATE '2024-01-01';
---
apiVersion: bino.bi/v1alpha1
kind: ConnectionSecret
metadata:
  name: mysqlCredentials
spec:
  type: mysql
  mysql:
    passwordFromEnv: MYSQL_PASSWORD
---
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: sales_from_mysql
spec:
  type: mysql_query
  connection:
    host: ${DB_HOST:db.example.com}
    port: 3306
    database: analytics
    user: reporting
    secret: mysqlCredentials
  query: |
    SELECT * FROM fact_sales WHERE year = 2024;

For more on secrets and object storage, see ConnectionSecret.

DataSource documents support metadata.constraints to conditionally include them for specific artefacts, modes, or environments.

Use different data sources for development vs production:

# Mock data for development
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: sales
  constraints:
    - labels.env==dev
spec:
  type: inline
  content:
    - { region: "Test", amount: 100 }

---
# Production database
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: sales
  constraints:
    - labels.env==prod
spec:
  type: postgres_query
  connection:
    host: prod-db.example.com
    # ...

Match multiple environments at once using either format:

# String format
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: staging_data
  constraints:
    - labels.env in [dev,staging,qa]
spec:
  type: postgres_query
  connection:
    host: staging-db.example.com

---
# Structured format (IDE-friendly)
apiVersion: bino.bi/v1alpha1
kind: DataSource
metadata:
  name: staging_data
  constraints:
    - field: labels.env
      operator: in
      value: [dev, staging, qa]
spec:
  type: postgres_query
  connection:
    host: staging-db.example.com

For the full constraint syntax and operators, see Constraints and Scoped Names.

DataSources can also be defined inline within DataSet dependencies arrays, eliminating the need for separate documents. This is useful for simple, single-use data sources.

apiVersion: bino.bi/v1alpha1
kind: DataSet
metadata:
  name: sales_summary
spec:
  dependencies:
    - type: csv
      path: ./data/sales.csv
  query: |
    SELECT region, SUM(amount) as total
    FROM @inline(0)
    GROUP BY region

The @inline(0) syntax references the inline DataSource by its position (0-indexed) in the dependencies array.

All DataSource types can be used inline:

# CSV
- type: csv
  path: ./data/sales.csv

# Excel
- type: excel
  path: ./data/report.xlsx

# Parquet
- type: parquet
  path: ./warehouse/*.parquet

# Inline data
- type: inline
  content:
    - { region: "US", amount: 100 }
    - { region: "EU", amount: 200 }

Reference multiple inline DataSources by their index:

spec:
  dependencies:
    - type: csv
      path: ./data/orders.csv
    - type: csv
      path: ./data/customers.csv
  query: |
    SELECT c.name, o.total
    FROM @inline(0) o
    JOIN @inline(1) c ON o.customer_id = c.id

You can combine inline definitions with references to standalone DataSource documents:

spec:
  dependencies:
    - existing_datasource     # Named reference
    - type: csv               # Inline definition
      path: ./data/extra.csv
  query: |
    SELECT * FROM existing_datasource
    UNION ALL
    SELECT * FROM @inline(1)

For more details on inline definitions and the @inline(N) syntax, see DataSet - Inline DataSet definitions.