delika SQL Reference
delika SQL
The delika SQL is a SQL-like language to process data on delika platform. The main differences with the standard SQL are:
- only SELECT statement is supporeted
- table name has special format like
[account/dataset/data.extension]
and is only available in FROM clauses with alias name - duration literal in ISO 8601 format (see below)
Duration Literal
The duration literal can be written as a string in the ISO-8601 format with the d prefix. Due to technical limitations, it is not possible to mix the precision of year and month with that of day, hour, minute and second. Only a precision of seconds is allowed for decimals, and all other precision is for integers only.
Examples: d'P1Y2M'
(1 year and 2 months), d'P1DT2H3M4.5S'
(1 day, 2 hours, 3 minutes and 4.5 seconds), d'PT10M'
(10 minutes)
Examples
SELECT
date + d'P1D',
t.value
t.FROM
-examples/weather/tokyo.csv] t [delika
Data Snapshot
The delika SQL allows users to query against data at specific version or time. If users do not specify the version or time, it indicates users specify the latest version of data.
Snapshot by Version
Users can specify the version of data in the format of [account/dataset/data.extension@version]
.
Snapshot by Time
Users can specify the time of data in the format of [account/dataset/data.extension@time]
where time
is <YYYY>-<MM>-<DD>T<hh>:<mm>:<ss>±<hh>:<mm>
(offset can be Z
for UTC).
Examples
SELECT
date,
t1.value AS new_value,
t1.value AS old_value
t2.FROM
-examples/weather/tokyo.csv@1] t1
[delikaINNER JOIN [delika-examples/weather/tokyo.csv@2] t2 ON
date = t2.date
t1.AND t1.value <> t2.value
Dataset Collection
Users cannot refer to individual files in a dataset. Users can refer to one or more files in a partition in a format like [account/dataset/partition.extension]
. For example, if a dataset is partitioned by year, queries look like SELECT * FROM [account/dataset/2021.csv]
.
Users can also query multiple tables using wildcard like [account/dataset/*.csv]
or [account/dataset/202101*.csv]
. If users would like to specify partitions in detail, they should use partition column in their queries.
Examples
SELECT
this_year.date AS this_date,
last_year.date AS last_date,
this_year.value AS this_value,
last_year.value AS last_value
FROM
[account/collection/2021.csv] this_year
FULL OUTER JOIN [account/collection/2020.csv] last_year ON
last_year.date + d'P1Y' = this_year.date
Comments
Single line comment (--
) and multiline comment (/* */
) are supported. Multiline comment can be nested.
/*
* example of multiline comment
*
* /* nested comment */
*/
SELECT
*
t.FROM
/nice/data.csv] t -- single line comment [my
Operators
See List of Operators in delika SQL.
Modules and Functions
A module is a collection of functions.
A function takes the following form:
..., ARGUMENT_N) MODULE_NAME.FUNCTION_NAME(ARGUMENT_1, ARGUMENT_2,
A window funtion takes the following form:
..., ARGUMENT_N) OVER (PARTITION BY PARTITION_EXPRESSION ORDER BY ORDER_EXPRESSION) MODULE_NAME.FUNCTION_NAME(ARGUMENT_1, ARGUMENT_2,
PARTITION BY
and ORDER BY
can be omitted if unnecessary.
Some functions can be called without their module name. Such functions are called global functions.
See List of Modules and Functions in delika SQL to look up functions.