delika SQL Reference

Preview: This feature in preview state and might change or have limited support.

delika SQL

The delika SQL is a SQL-like language to process data on delika platform. The main differences with the standard SQL are:

  • only SELECT statement is supporeted
  • table name has special format like [account/dataset/data.extension] and is only available in FROM clauses with alias name
  • duration literal in ISO 8601 format (see below)

Duration Literal

The duration literal can be written as a string in the ISO-8601 format with the d prefix. Due to technical limitations, it is not possible to mix the precision of year and month with that of day, hour, minute and second. Only a precision of seconds is allowed for decimals, and all other precision is for integers only.

Examples: d'P1Y2M' (1 year and 2 months), d'P1DT2H3M4.5S' (1 day, 2 hours, 3 minutes and 4.5 seconds), d'PT10M' (10 minutes)

Examples

SELECT
  t.date + d'P1D',
  t.value
FROM
  [delika-examples/weather/tokyo.csv] t

Data Snapshot

Data snapshot is available for datasets of data file only.

The delika SQL allows users to query against data at specific version or time. If users do not specify the version or time, it indicates users specify the latest version of data.

Snapshot by Version

Users can specify the version of data in the format of [account/dataset/data.extension@version].

Snapshot by Time

Users can specify the time of data in the format of [account/dataset/data.extension@time] where time is <YYYY>-<MM>-<DD>T<hh>:<mm>:<ss>±<hh>:<mm> (offset can be Z for UTC).

Examples

SELECT
  t1.date,
  t1.value AS new_value,
  t2.value AS old_value
FROM
  [delika-examples/weather/tokyo.csv@1] t1
INNER JOIN [delika-examples/weather/tokyo.csv@2] t2 ON
  t1.date = t2.date
  AND t1.value <> t2.value

Dataset Collection

Users cannot refer to individual files in a dataset. Users can refer to one or more files in a partition in a format like [account/dataset/partition.extension]. For example, if a dataset is partitioned by year, queries look like SELECT * FROM [account/dataset/2021.csv].

Users can also query multiple tables using wildcard like [account/dataset/*.csv] or [account/dataset/202101*.csv]. If users would like to specify partitions in detail, they should use partition column in their queries.

Examples

SELECT
  this_year.date AS this_date,
  last_year.date AS last_date,
  this_year.value AS this_value,
  last_year.value AS last_value
FROM
  [account/collection/2021.csv] this_year
FULL OUTER JOIN [account/collection/2020.csv] last_year ON
  last_year.date + d'P1Y' = this_year.date

Comments

Single line comment (--) and multiline comment (/* */) are supported. Multiline comment can be nested.

/*
 * example of multiline comment
 *
 * /* nested comment */
 */
SELECT
  t.*
FROM
  [my/nice/data.csv] t  -- single line comment

Operators

See List of Operators in delika SQL.

Modules and Functions

A module is a collection of functions.

A function takes the following form:

MODULE_NAME.FUNCTION_NAME(ARGUMENT_1, ARGUMENT_2, ..., ARGUMENT_N)

A window funtion takes the following form:

MODULE_NAME.FUNCTION_NAME(ARGUMENT_1, ARGUMENT_2, ..., ARGUMENT_N) OVER (PARTITION BY PARTITION_EXPRESSION ORDER BY ORDER_EXPRESSION)

PARTITION BY and ORDER BY can be omitted if unnecessary.

Some functions can be called without their module name. Such functions are called global functions.

See List of Modules and Functions in delika SQL to look up functions.