Last Updated:

delika SQL

Preview: This feature in preview state and might change or have limited support.

delika SQL

The delika SQL is a SQL-like language to process data on delika platform. The main differences with the standard SQL are:

  • only SELECT statement is supporeted
  • table name has special format like [account/dataset/data.extension] and is only available in FROM clauses with alias name
  • duration literal in ISO 8601 format (see below)
Currently, delika SQL is converted into Trino (formerly Presto SQL) so that users might use some of functions or operators those Presto allows. However, it might change in the future.

Duration Literal

The duration literal can be written as a string in the ISO-8601 format with the d prefix. Due to technical limitations, it is not possible to mix the precision of year and month with that of day, hour, minute and second. Only a precision of seconds is allowed for decimals, and all other precision is for integers only.

Examples: d'P1Y2M' (1 year and 2 months), d'P1DT2H3M4.5S' (1 day, 2 hours, 3 minutes and 4.5 seconds), d'PT10M' (10 minutes)

Examples

SELECT
  t.date + d'P1D',
  t.value
FROM
  [delika-examples/weather/tokyo.csv] t

Data Snapshot

Data snapshot is available for datasets of data file only.

The delika SQL allows users to query against data at specific version or time. If users do not specify the version or time, it indicates users specify the latest version of data.

Snapshot by Version

Users can specify the version of data in the format of [account/dataset/data.extension@version].

Snapshot by Time

Users can specify the time of data in the format of [account/dataset/data.extension@time] where time is <YYYY>-<MM>-<DD>T<hh>:<mm>:<ss>±<hh>:<mm> (offset can be Z for UTC).

Examples

SELECT
  t1.date,
  t1.value AS new_value,
  t2.value AS old_value
FROM
  [delika-examples/weather/tokyo.csv@1] t1
INNER JOIN [delika-examples/weather/tokyo.csv@2] t2 ON
  t1.date = t2.date
  AND t1.value <> t2.value

Dataset Collection

Data snapshot is available for datasets of data collection only.

Users cannot refer to individual files in a dataset. Users can refer to one or more files in a partition in a format like [account/dataset/partition.extension]. For example, if a dataset is partitioned by year, queries look like SELECT * FROM [account/dataset/2021.csv].

Users can also query multiple tables using wildcard like [account/dataset/*.csv] or [account/dataset/202101*.csv]. If users would like to specify partitions in detail, they should use partition column in their queries.

Examples

SELECT
  this_year.date AS this_date,
  last_year.date AS last_date,
  this_year.value AS this_value,
  last_year.value AS last_value
FROM
  [account/collection/2021.csv] this_year
FULL OUTER JOIN [account/collection/2020.csv] last_year ON
  last_year.date + d'P1Y' = this_year.date

Comments

Single line comment (--) and multiline comment (/* */) are supported. Multiline comment can be nested.

/*
 * example of multiline comment
 *
 * /* nested comment */
 */
SELECT
  t.*
FROM
  [my/nice/data.csv] t  -- single line comment