Dataset

Dataset

A dataset is a collection of data.

Dataset Types

  • "Data File" is a type of dataset which can contain files with different schemas.
  • "Data Collection" is a type of dataset which can be handled as like a single file with single common schema.

The syntax of table name in delika SQL varies according to the dataset type. See delika SQL for details.

Permission Levels

The following table depicts the member permission levels in a team.

Action Owner Editor Reader Logged-in User Anonymous
read public dataset
read private dataset
update dataset description
create data in dataset
update data in dataset
delete data in dataset
delete dataset
set dataset permission to users

License

The data provider may license their user data.

License text can contain the following YAML front matter:

---
License: <license_name>
Author: <author_name>
---

If <license_name> is one of the followings, it will be converted into appropriate tags:

  • CC BY 4.0
  • CC BY-NC 4.0
  • CC BY-SA 4.0
  • CC BY-NC-SA 4.0
  • CC0 1.0

Example:

---
License: CC BY 4.0
Author: Example Company
---

Data Collection

Partition

File names in a data collection must be in the following format.

Partition Unit File Name Format
year yyyy__<suffix>.extension
month yyyyMM__<suffix>.extension
day yyyyMMdd__<suffix>.extension
hour yyyyMMddHH__<suffix>.extension

Restriction

Files in a data collection cannot be overwritten. If you would like to replace a file, you must delete it first and then upload a new file.