DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

  1. DZone
  2. Refcards
  3. SQL Syntax for Apache Drill
refcard cover
Refcard #223

SQL Syntax for Apache Drill

Using SQL for the SQL-on-Everything Engine

Acquaints you with the agility and flexibility of Apache Drill and enables you to easily query non-relational datastores, including Hadoop. Discover the benefits of data agility and gain faster insights into BI and analytics through advanced SQL queries on massive datasets.

Download Refcard
Free PDF for Easy Reference
refcard cover

Written By

author avatar Rick Van Der Lans
Analyst, R20/Consultancy
Table of Contents
► Introduction ► Installing and Starting Up Drill ► First SQL Queries ► Querying Nested Data Structures ► Querying Arrays ► Querying Maps With Data ► Combining Multiple Functions in Queries ► Definitions of the SQL Statements for Querying Data Sources ► Definitions of Data Definition Statements ► Definitions of SQL Statements for Working with Schemas and Sessions ► Data Types ► SQL Scalar Functions ► SQL Aggregate and Window Functions ► SQL Nested Data Functions
Section 1

Introduction

This Refcard describes version 1.2 of Apache Drill released in October 2015. Apache Drill is an open-source SQL-on-Everything engine. It allows SQL queries to be executed on any kind of data source, ranging from simple CSV files to advanced SQL and NoSQL database servers.

Drill has three distinguishing features:

  • Drill can access flat, relational data structures as well as data sources with non-relational structures, such as arrays, hierarchies, maps, nested tables, and complex data types. Besides being able to access these data sources, Drill can run queries that join data across multiple data sources, including non-relational and relational ones.
  • Drill allows for schema-less access of data. In other words, it doesn’t need access to the schema definition of the data source. It doesn’t need to know the structure of the tables in advance, nor does it need statistical data. It goes straight for the data. The schema of the query result is therefore not known in advance. It’s built up and derived when data comes back from the data source. During the processing of all the data, the schema of the query result is continuously adapted. For example, the schema of a Hadoop file (or any data source) doesn’t have to be documented in Apache Hive to make it accessible for Drill.
  • As with most SQL database servers, Drill doesn’t have its own database. It has been designed and optimized to access other data sources. There is rarely ever a need to copy data from a data source to Hadoop to make it accessible by Drill.

Apache Drill's Supported Plugins for Data Access

  • CSV and TSV files
  • Hadoop files with Parquet and AVRO file formats, including Amazon S3
  • NoSQL databases, such as MongoDB and Apache HBase
  • SQL database servers, such as MySQL and Oracle, through an ODBC/JDBC interface
  • Files with JSON or BSON data structures
  • SQL-on-Hadoop engines, such as Apache Hive and Impala

Note: This Refcard uses several example files (CSV and JSON files) to demonstrate several of Drill's capabilities.

Section 2

Installing and Starting Up Drill

The following webpage contains detailed descriptions on how to install Drill on various platforms: https://drill.apache.org/docs/install-drill-introduction/.

After installation of the software, Drill can be started by using the command-line tool SQLLine:

1
1
sqlline.bat -u "jdbc:drill:"

Depending on the platform, the result looks something like this:

7
1
DRILL_ARGS - " -u jdbc:drill:"
2
Calculating Drill classpath...
3
oct 26, 2015 9:33:46 AM org.glassfish.jersey.server.ApplicationHandler initialize
4
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
5
apache drill 1.2.0
6
"json ain't no thang"
7
0: jdbc:drill:>

Drill is now ready to query data. To check if it really works, use the following query:

1
1
VALUES(CURRENT_DATE);

Result:

1
+---------------+
2
| CURRENT_DATE  |
3
+---------------+
4
| 2015-10-28    |
5
+---------------+

To stop Drill enter:

1
1
!quit
Section 3

First SQL Queries

Create a simple text file with the following text and call it HelloWorld.csv:

1
1
HelloWorld!

Query the file:

1
1
SELECT * FROM dfs.`\MyDirectory\HelloWorld.csv`;

Result:

5
1
+------------------+
2
|     columns      |
3
+------------------+
4
| ["HelloWorld!"]  |
5
+------------------+

The unique thing about this query is its use of the FROM clause. Instead of a table name, it contains a reference to the file to be accessed. The term dfs represents one of the supported storage plugins. This particular plugin indicates that a file in the local file system is accessed. The storage plugin is followed by a file specification containing the correct directory and the file name. Note that the file specification must be specified in between backticks and not single quotes. The file contains only one line with data, so only one row is returned. Because no column names are specified, the column name columns is used.

Create a file with the following content and call it "Employees.json":

15
1
{ "number"  : "6",
2
  "name"    : "Manzarek",
3
  "initials": "R",
4
  "street"  : "Haseltine Lane",
5
  "town"    : "Phoenix" }
6
{ "number"  : "8",
7
  "name"    : "Young",
8
  "initials": "N",
9
  "street"  : "Brownstreet",
10
  "mobile"  : "1234567" }
11
{ "number"  : "15",
12
  "name"    : "Metheny",
13
  "initials": "M",
14
  "province": "South"
15
}

Query the file:

1
1
SELECT * FROM dfs.`\MyDirectory\Employees.json`;

The JSON data structure is turned into a flat table, and for the missing values in specific columns, the null value is presented. Result:

7
1
+---------+-----------+-----------+-----------------+----------+----------+-----------+
2
| number  |   name    | initials  |     street      |   town   |  mobile  | province  |
3
+---------+-----------+-----------+-----------------+----------+----------+-----------+
4
| 6       | Manzarek  | R         | Haseltine Lane  | Phoenix  | null     | null      |
5
| 8       | Young     | N         | Brownstreet     | null     | 1234567  | null      |
6
| 15      | Metheny   | M         | null            | null     | null     | South     |
7
+---------+-----------+-----------+-----------------+----------+----------+-----------+
Section 4

Querying Nested Data Structures

Many data sources, such as MongoDB, Hadoop with AVRO, and JSON files, contain nested data structures. In relational terminology they would be called columns within columns. To address these nested columns, a specific syntax is introduced. Several examples are used to illustrate this syntax.

Create a file with the following content and call it "EmployeesNested.json":

24
1
{ "employee" : {
2
   "number"  : "6",
3
   "name"    : { "lastname": "Manzarek",
4
                 "initials": "R" },
5
   "address" : { "street"  : "Haseltine Lane",
6
                 "houseno" : "80",
7
                 "postcode": "1234KK",
8
                 "town"    : "Stratford" } }
9
}
10
{ "employee" : {
11
   "number"  : "8",
12
   "name"    : { "lastname": "Young",
13
                 "initials": "N" },
14
   "address" : { "street"  : "Brownstreet",
15
                 "houseno" : "80",
16
                 "province": "ZH",
17
                 "town"    : "Boston" } }
18
}
19
{ "employee" : {
20
   "number"  : "15",
21
   "name"    : { "lastname": "Metheny",
22
                 "initials": "M",
23
                 "code"    : "45" } }
24
}

Query the file:

2
1
SELECT emp.employee.number AS enr, emp.employee.name.lastname AS lastname
2
FROM   dfs.`\MyDirectory\EmployeesNested.json` AS emp;

Result:

7
1
+------+-----------+
2
| enr  | lastname  |
3
+------+-----------+
4
| 6    | Manzarek  |
5
| 8    | Young     |
6
| 15   | Metheny   |
7
+------+-----------+

To refer to the employee numbers the specification emp.employee.number is used. Emp refers to the table name specified in the FROM clause. Specifying this table name is required when dealing with nested tables, otherwise Drill may confuse columns with tables and therefore won’t run the query. Next, employee refers to the first level in the JSON document and number to the next level. To get the lastname of an employee, employee must be specified, followed by name and lastname. Assigning column names to these two result columns is not required, but does make the result easier to read.

Section 5

Querying Arrays

Many data sources contain arrays (sometimes called repeating groups). Drill supports the flatten function to transform these arrays into flat, relational data structures. Create a file with the following content (each employee can work for several projects) and call it EmployeesArrays.json:

6
1
{ "number"  : "8",
2
  "projects": [ "ACP3", "FGTR" ]
3
}
4
{ "number"  : "15",
5
  "projects": [ "ACP3", "HHGT", "X456" ]
6
}

Query the file:

1
1
SELECT * FROM dfs.`\MyDirectory\EmployeesArrays.json`;

Result:

6
1
+---------+-------------------------+
2
| number  |        projects         |
3
+---------+-------------------------+
4
| 8       | ["ACP3","FGTR"]         |
5
| 15      | ["ACP3","HHGT","X456"]  |
6
+---------+-------------------------+

In each row (so for each employee), the projects column contains a set of project values. To see them as separate values, use the flatten function:

2
1
SELECT FLATTEN(projects) AS project, enr
2
FROM   dfs.`\MyDirectory\EnployeesArrays.json`;

In the result each row contains a separate project value and the number of the employee to which the project value belongs:

9
1
+----------+------+
2
| project  | enr  |
3
+----------+------+
4
| ACP3     | 8    |
5
| FGTR     | 8    |
6
| ACP3     | 15   |
7
| HHGT     | 15   |
8
| X456     | 15   |
9
+----------+------+

Besides flatten, Drill supports two additional functions to work with the contents of arrays: repeated_count (counts the number of values in an array) and repeated_contains (searches for a specific value in an array):

3
1
SELECT number AS enr, REPEATED_COUNTS(projects) AS NumberOfProjects,
2
       REPEATED_CONTAINS(projects,'FGTR') AS ContainsFGTR
3
FROM   dfs.`\MyDirectory\EmployeesArrays.json` AS emp;

Result:

6
1
+------+-------------------+---------------+
2
| enr  | NumberOfProjects  | ContainsFGTR  |
3
+------+-------------------+---------------+
4
| 8    | 2                 | true          |
5
| 15   | 3                 | false         |
6
+------+-------------------+---------------+

Individual values can be retrieved from an array:

2
1
SELECT number, projects[0], projects[1], projects[2]
2
FROM   dfs.`\MyDirectory\EmployeesArrays.json` AS emp;

Result:

6
1
+---------+---------+---------+---------+
2
| number  | EXPR$1  | EXPR$2  | EXPR$3  |
3
+---------+---------+---------+---------+
4
| 8       | ACP3    | FGTR    | null    |
5
| 15      | ACP3    | HHGT    | X456    |
6
+---------+---------+---------+---------+
Section 6

Querying Maps With Data

The kvgen function (Key-Value Generation) transforms query maps that contain keys instead of a schema to arrays. Create a file with the following content and call it "CarOwners.json". This file contains the number of cars that each car owner owns by each manufacturer. The element cars doesn’t have a schema, but instead contains a set of keys (car manufacturer) and each key has a value (number of cars of that particular manufacturer):

9
1
{ "owner" : "1",
2
  "cars"  : { "Ford" : 3,
3
              "BMW"  : 2,
4
              "Ferrari" : 1 }
5
}
6
{ "owner" : "2",
7
  "cars"  : { "BMW" : 4,
8
              "GM"  : 5 }
9
}

Query the file:

1
1
SELECT KVGEN(cars) AS cars FROM dfs.`\MyDirectory\CarOwners.json`;

Result:

6
1
+---------------------------------------------------------------------------------+
2
|                                      cars                                       |
3
+---------------------------------------------------------------------------------+
4
| [{"key":"Ford","value":3},{"key":"BMW","value":2},{"key":"Ferrari","value":1}]  |
5
| [{"key":"BMW","value":4},{"key":"GM","value":5}]                                |
6
+---------------------------------------------------------------------------------+

The effect of the kvgen function is that the car data inside the cars element is transformed to an array with two elements: key and value. The result of this function can subsequently be processed by the flatten function to turn each element combination into a separate row:

1
1
SELECT owner, FLATTEN(KVGEN(cars)) AS cars FROM dfs.`\MyDirectory\CarOwners.json`;

Result:

9
1
+--------+------------------------------+
2
| owner  |             cars             |
3
+--------+------------------------------+
4
| 1      | {"key":"Ford","value":3}     |
5
| 1      | {"key":"BMW","value":2}      |
6
| 1      | {"key":"Ferrari","value":1}  |
7
| 2      | {"key":"BMW","value":4}      |
8
| 2      | {"key":"GM","value":5}       |
9
+--------+------------------------------+
Section 7

Combining Multiple Functions in Queries

Apache Drill supports all the query features to be expected from a SQL product. The next example shows how the special functions can be combined with more traditional joins and window functions. Create a file with the following content and call it "EmployeesProjects.json". Each line in this file indicates how many hours an employee has worked on a project on a specific day.

9
1
{"enr": "8", "project":"ACP3", "date":"2015-10-01", "hours":"4"}
2
{"enr": "8", "project":"ACP3", "date":"2015-10-04", "hours":"5"}
3
{"enr": "8", "project":"FGTR", "date":"2015-10-02", "hours":"2"}
4
{"enr":"15", "project":"ACP3", "date":"2015-10-01", "hours":"7"}
5
{"enr":"15", "project":"ACP3", "date":"2015-10-03", "hours":"5"}
6
{"enr":"15", "project":"HHGT", "date":"2015-10-01", "hours":"4"}
7
{"enr":"15", "project":"HHGT", "date":"2015-10-05", "hours":"2"}
8
{"enr":"15", "project":"HHGT", "date":"2015-10-07", "hours":"8"}
9
{"enr":"15", "project":"X456", "date":"2015-10-01", "hours":"6"}

Join this file with the EmployeesNested file:

8
1
SELECT CAST(proj.`date` AS DATE) AS pdate, emp.employee.number AS enr,
2
       emp.employee.name.lastname AS ename, proj.project, proj.hours,
3
       SUM(CAST(proj.hours AS INTEGER))
4
          OVER(PARTITION BY proj.`date`, emp.employee.number) AS sum_hours
5
FROM   dfs.`\Prive\ARTIKEL\DZone\Examples\EmployeesNested.json` AS emp LEFT OUTER JOIN
6
       dfs.`\Prive\ARTIKEL\DZone\Examples\EmployeesProjects.json` AS proj
7
       ON CAST(emp.employee.number AS INTEGER) = CAST(proj.enr AS INTEGER)
8
ORDER  BY 1, 2;

Result:

14
1
+-------------+------+-----------+----------+--------+------------+
2
|    pdate    | enr  |   ename   | project  | hours  | sum_hours  |
3
+-------------+------+-----------+----------+--------+------------+
4
| 2015-10-01  | 15   | Metheny   | ACP3     | 7      | 17         |
5
| 2015-10-01  | 15   | Metheny   | HHGT     | 4      | 17         |
6
| 2015-10-01  | 15   | Metheny   | X456     | 6      | 17         |
7
| 2015-10-01  | 8    | Young     | ACP3     | 4      | 4          |
8
| 2015-10-02  | 8    | Young     | FGTR     | 2      | 2          |
9
| 2015-10-03  | 15   | Metheny   | ACP3     | 5      | 5          |
10
| 2015-10-04  | 8    | Young     | ACP3     | 5      | 5          |
11
| 2015-10-05  | 15   | Metheny   | HHGT     | 2      | 2          |
12
| 2015-10-07  | 15   | Metheny   | HHGT     | 8      | 8          |
13
| null        | 6    | Manzarek  | null     | null   | null       |
14
+-------------+------+-----------+----------+--------+------------+
Section 8

Definitions of the SQL Statements for Querying Data Sources

This section contains the definitions of the SQL statements supported by Drill related to querying.

Query Statement

Definition

SELECT

<select statement> ::=
   [ WITH <table name> [ ( <column name> [ , <column name> ]... ) ]
       AS ( <table expression> ) ]
   <table expression>

<table expression> ::=
   SELECT <select clause>
   FROM <from clause>
   [ WHERE <boolean expression> ]
   [ GROUP BY <expression> [ , <expression> ] ]
   [ HAVING <boolean expression> ]
   [ ORDER BY <clause> ]
   [ LIMIT { <count> | ALL } ]
   [ OFFSET <number> { ROW | ROWS } ]

<from clause> ::=
   <table reference> [ , <table reference> ]...


<table reference> ::=
   { <table specification>  |
     <join specification>   |
     ( <table expression> ) |
     VALUES ( <expression list>) [ , ( <expression list> ) ]...
   [ [ AS ] <alias name> [ ( <column name> [ , <column name> ]... ) ] ]

<table specification> ::=
   <table name> | <storage plugin> . `<workspace>`

<join specification> ::=
   <table reference> <join type> <table reference> [ <join condition> ]

<join condition: ::= ON <condition>

<join type> ::=
   [INNER] JOIN | LEFT [OUTER] JOIN | RIGHT [OUTER] JOIN | FULL [OUTER] JOIN

<boolean expression> ::=
   <boolean expression { AND | OR } <boolean expression> |
   NOT <boolean expression> |
   <case expression> |
   <expression> { < | > | <= | >= | = | <> } <expression> |
   <expression> [ NOT ] BETWEEN <expression> and <expression> |
   <expression> IN ( <expression> [ , <expression> ]... ) |
   <expression> { LIKE | ILIKE | NOT LIKE | SIMILAR TO | NOT SIMILAR TO }
           <expression> |
   <expression> ( IS [ NOT ] NULL | IS [ NOT ] FALSE | IS [ NOT ] TRUE } |
   <expression> { IN | ANY | ALL } ( <table expression> ) |
   EXISTS ( <table expression> ) |
   <expression> || <expression>

<case expression> ::=
CASE
    WHEN <expression> [ , <expression> [ ... ] ] THEN <statements>
  [ WHEN <expression> [ , <expression> [ ... ] ] THEN <statements> ]...
  [ ELSE <statements> ]
END

VALUES

<values statement> ::=
   VALUES ( <expression list>) [ , ( <expression list> ) ]...

<expression list> ::=
   <expression> [ , <expression> ]...

EXPLAIN

EXPLAIN PLAN
   [ INCLUDING ALL ATTRIBUTES]
   [ WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ]
   FOR <select statement>

Section 9

Definitions of Data Definition Statements

This section contains the definitions of the data definition statements supported by Drill.

SQL Statement

Definition

CREATE TABLE

CREATE TABLE name [ ( <column list> ) ]
   [ PARTITION BY ( <column_name> [ , ... ] ) ]
   AS <select statement>

CREATE VIEW

CREATE [OR REPLACE] VIEW [ <workspace> . ] <view name>
   [ ( <column name> [ , ... ] ) ]
   AS <select statement>

DROP TABLE

DROP TABLE [ <workspace> . ] <table name>

DROP VIEW

DROP VIEW [ <workspace> . ] <view name>

DESCRIBE

DESCRIBE [ <workspace> . ] { <table name> | <view name> }

SHOW DATABASES

SHOW DATABASES

SHOW SCHEMAS

SHOW SCHEMAS

Shows all the schemas that can be used.

SHOW FILES

SHOW FILES
   [ FROM <filesystem> . <director name> | IN <filesystem> . <directory name> ]

SHOW TABLES

SHOW TABLES

Section 10

Definitions of SQL Statements for Working with Schemas and Sessions

This section contains the definitions of the statements supported by Drill for working with schemas and sessions.

SQL Statement

Definition

USE

USE <schema name>

ALTER SESSION

ALTER SESSION SET `<option name>` = <value>

ALTER SYSTEM

ALTER SYSTEM SET `<option name>` =  <value>

Section 11

Data Types

The following data types are supported by Drill and can be used when converting the data types of values.

Data types

Description

BIGINT

8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

9223372036854775807.

BINARY

Variable-length byte string

B@e6d9eb7.

BOOLEAN

True or false

true.

DATE

Years, months, and days in YYYY-MM-DD format since 4713 BC

2015-12-30.

DECIMAL(p,s), DEC(p,s), or NUMERIC(p,s)*

38-digit precision number, precision is p, and scale is s DECIMAL(6,2) is 1234.56, 4 digits before and 2 digits after the decimal point.

FLOAT

4-byte floating point number

0.456.

DOUBLE or DOUBLE PRECISION

8-byte floating point number, precision-scalable

0.456.

INTEGER or INT

4-byte signed integer in the range -2,147,483,648 to 2,147,483,647

2147483646.

INTERVAL

A day-time or year-month interval

'1 10:20:30.123' (day-time) or '1-2' year to month (year-month).

Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.

SMALLINT

2-byte signed integer in the range -32,768 to 32,767 32000.

This data type is not supported in version 1.2.

TIME

24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss

22:55:55.23.

TIMESTAMP

JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS 2015-12-30

22:55:55.23.

CHARACTER VARYING, CHARACTER, CHAR, or VARCHAR

UTF8-encoded variable-length string. The default limit is 1 character. The maximum character limit is 2,147,483,647.

CHAR(30) casts data to a 30-character string maximum.

Section 12

SQL Scalar Functions

This section contains descriptions of all the scalar functions supported by Drill that can be used in any expression.

The processing logic of all the scalar functions can easily be tested by using the VALUES statement. For example, with the following statement the CBRT function can be tested:

1
1
VALUES(CBRT(64));

Result:

5
1
+---------+
2
| EXPR$0  |
3
+---------+
4
| 4.0     |
5
+---------+ 

Numeric function

Data type of result

Definition

ABS(x)

Data type of x

Returns the absolute value of the input argument x.

CBRT(x)

FLOAT8

Returns the cubic root of x.

CEIL(x) and CEILING(x)

Data type of x

Returns the smallest integer not less than x.

DEGREES(x)

FLOAT8

Converts x radians to degrees.

E()

FLOAT8

Returns 2.718281828459045.

EXP(x)

FLOAT8

Returns e (Euler's number) to the power of x.

FLOOR(x)

Data type of x

Returns the largest integer not greater than x.

LOG(x)

FLOAT8

Returns the natural log (base e) of x.

LOG(x, y)

FLOAT8

Returns log base x to the y power.

LOG10(x)

FLOAT8

Returns the common log of x.

LSHIFT(x, y)

Data type of x

Shifts the binary x by y times to the left.

MOD(x, y)

FLOAT8

Returns the remainder of x divided by y.

NEGATIVE(x)

Data type of x

Returns x as a negative number.

PI

FLOAT8

Returns pi.

POW(x, y)

FLOAT8

Returns the value of x to the y power.

RADIANS

FLOAT8

Converts x degrees to radians.

RAND

FLOAT8

Returns a random number from 0-1.

ROUND(x)

Data type of x

Rounds to the nearest integer.

ROUND(x, y)

DECIMAL

Rounds x to y decimal places.

RSHIFT(x, y)

Data type of x

Shifts the binary x by y times to the right.

SIGN(x)

INT

Returns the sign of x.

SQRT(x)

Data type of x

Returns the square root of x.

TRUNC(x [ , y ] )

Data type of x

Truncates x to y decimal places. Y is optional. Default is 1.

TRUNC(x, y)

DECIMAL

Truncates x to y decimal places.

Date/time function

Data type of result

Definition

AGE(x [, y ] )

INTERVALDAY or INTERVALYEAR

Returns interval between two timestamps or subtracts a timestamp from midnight of the current date.

CURRENT_DATE

DATE

Returns the current date.

CURRENT_TIME

TIME

Returns the current time.

CURRENT_TIMESTAMP

TIMESTAMP

Returns the current timestamp.

DATE_ADD(x,y)

DATE, TIMESTAMP

Returns the sum of the sum of a date/time and a number of days/hours, or of a date/time and date/time interval. X must be a date, time, or timestamp expression, and y must be an integer or an interval expression. If y is an integer then x must be a date and y represents the number of days to be added. In other situations an interval must be used. Use the CAST function to create an interval.

DATE_PART(x,y)

DOUBLE

Extracts a time unit from the value of a time, date, or timestamp expression (y). Allowed time units for x are second, minute, hour, day, month, and year. A time unit must be specified between single quotes. Y represents a date, time, or timestamp.

DATE_SUB(x,y)

DATE, TIMESTAMP

Subtracts an interval (y) from a date or timestamp expression (x). X must be a date, time, or timestamp expression, and y must be an integer or an interval expression. If y is an integer then x must be a date and y represents the number of days to be subtracted. In other situations an interval must be used. Use the CAST function to create an interval.

EXTRACT(x FROM y)

DOUBLE


Extracts a time unit from a date or timestamp expression (y). X indicates which time unit to extract. This must be one of the following values: SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR. The time unit must not be specified between quotes.

LOCALTIME

TIME

Returns the local current time.

LOCALTIMESTAMP

TIMESTAMP

Returns the local current timestamp.

NOW()

TIMESTAMP

Returns the current timestamp.

TIMEOFDAY()

VARCHAR

Returns the current timestamp for the UTC time zone.

UNIX_TIMESTAMP( [ x] )

BIGINT

If no x is specified, it returns the number of seconds since the UNIX epoch (January 1, 1970 at 00:00:00). If x is specified it must be a timestamp and then the number of seconds since the UNIX epoch and the timestamp x is returned.

String function

Data type of result

Definition

BYTE_SUBSTR(x,y [, z ] )

BINARY or VARCHAR

Returns in binary format a substring y of the string x.

CHAR_LENGTH(x)

INTEGER

Returns the length of the alphanumeric argument x.

CONCAT(x,y)

VARCHAR

Combines the two alphanumeric values x and y. Has the same effect as the || operator.

INITCAP(x)

VARCHAR

Returns x in which the first character is capitalized.

LENGTH(x)

INTEGER

Returns the length in bytes of the alphanumeric value x.

LOWER(x)

VARCHAR

Converts all upper-case letters of x to lower-case letters.

LPAD(x,y [ , z ] )

VARCHAR

The value of x is filled in the front (the left-hand side) with the value of z just until the total length of the value is equal y’s length. If the maximum length is smaller than that of x, x is shortened on the left side. If no z is specified, blanks are used.

LTRIM(x)

VARCHAR

Removes all blanks that appear at the beginning of x.

POSITION( x IN y)

INTEGER

Returns the start position of the string x in the string y.

REGEXP_REPLACE(x,y,x)

VARCHAR

Substitutes new text for substrings that match Java regular expression patterns. In the string x, y is replaced by z. Y is the regular expression.

RPAD(x,y,z)

VARCHAR

The value of x is filled in the front (the right-hand side) with the value of z just until the total length of the value is equal to that of y. If the maximum length is smaller than that of x, x is shortened on the right side.

RTRIM(x)

VARCHAR

Removes all blanks from the end of the value of x.

STRPOS(x,y)

INTEGER

Returns the start position of the string y in the string x.

SUBSTR(x,y,z)

VARCHAR

Extracts characters from position 1 - x of x an optional y times.

TRIM(x)

VARCHAR

Removes all blanks from the start and from the end of x. Blanks in the middle are not removed.

UPPER(x)

VARCHAR

Converts all lower-case letters of x to upper-case letters.

Data type conversion function

Data type of result

Definition

CAST(x AS y)

Data type of y

Converts the data type of x to y. Y must be one of the supported data types; see Section 12.

CONVERT_TO(x,y)

Data type of y

Converts binary data (x) to Drill internal types (y) based on the little or big endian encoding of the data.

CONVERT_FROM(x,y)

Data type of y

Converts binary data (x) from Drill internal types (y) based on the little or big endian encoding of the data.

Null handling function

Data type of result

Definition

COALESCE(x, y [ , y ]... )

Data type of y

Returns the first non-null argument in the list of y’s.

NULLIF(x,y )

Data type of x

Returns the value of the x if x and y are not equal, and returns a null value if x and y are equal.

Section 13

SQL Aggregate and Window Functions

This section contains descriptions of all the aggregate and window functions supported by Drill that can be used in any expression.

Aggregate function

Data type of result

Definition

AVG(x)

DECIMAL for any integer-type argument, DOUBLE for a floating-point argument, otherwise the data type of x.

Calculates the weighted average of all values in x.

COUNT(*)

BIGINT

Counts the number of rows.

COUNT([DISTINCT] x)

BIGINT


Counts the number of non-null values in x, and if DISTINCT is specified the number of different non-null values in x.

MAX(x)

Data type of x.

Determines the maximum value of all non-null values in x.

MIN(x)

Data type of x.

Determines the minimum value of all non-null values in x.

SUM(x)

BIGINT for SMALLINT or INTEGER arguments, DECIMAL for BIGINT arguments, DOUBLE for floating-point arguments, otherwise the same as data type of x.

Calculates the sum of all values in x.

The syntax definition for all the window functions in the table below is as follows:

2
1
<window_function_name> ( [ ALL ] <expression> )
2
   OVER ( [ PARTITION BY <expression list> ] [ ORDER BY <order list> ] )

Window function

Data type of result

Definition

AVG(x) OVER (y)

DECIMAL for any integer-type argument, DOUBLE for a floating-point argument, otherwise the same as the argument data type

Returns the average value for the input expression values. It works with numeric values and ignores null values.

COUNT(x) OVER (y)

BIGINT

Counts the number of input rows. COUNT(*) counts all of the rows in the target table if they do or do not include nulls. COUNT(expression) computes the number of rows with non-NULL values in a specific column or expression.

COUNT( { * | x } ) OVER (y)

BIGINT

Counts the number of input rows. COUNT(*) counts all of the rows in the target table if they do or do not include nulls. COUNT(x) computes the number of rows with non-NULL values in a specific column or expression.

MAX(x) OVER (y)

Data type of x

Determines the maximum value of all non-null values in x.

MIN(x) OVER (y)

Data type of x

Determines the minimum value of all non-null values in x.

SUM(x) OVER (y)

BIGINT for SMALLINT or INTEGER arguments, DECIMAL for BIGINT arguments, DOUBLE for floating-point arguments, otherwise the data type of x

Calculates the sum of all values in x.

CUME_DIST() OVER (y)

DOUBLE PRECISION

Calculates the relative rank of the current row within a window partition: (number of rows preceding or peer with current row) / (total rows in the window partition).

DENSE_RANK()OVER (y)

BIGINT

Determines the rank of a value in a group of values based on the ORDER BY expression and the OVER clause. Each value is ranked within its partition. Rows with equal values receive the same rank. There are no gaps in the sequence of ranked values if two or more rows have the same rank.

NTILE(x) OVER(y)

INTEGER

Divides the rows for each window partition, as equally as possible, into a specified number of ranked groups. The NTILE window function requires the ORDER BY clause in the OVER clause.

PERCENT_RANK() OVER(y)

DOUBLE PRECISION

Calculates the percent rank of the current row using the following formula: (x - 1) / (number of rows in window partition - 1) where x is the rank of the current row.

RANK() OVER(y)

BIGINT

Determines the rank of a value in a group of values. The ORDER BY expression in the OVER clause determines the value. Each value is ranked within its partition. Rows with equal values for the ranking criteria receive the same rank. Drill adds the number of tied rows to the tied rank to calculate the next rank and thus the ranks might not be consecutive numbers. For example, if two rows are ranked 1, the next rank is 3. The DENSE_RANK window function differs in that no gaps exist if two or more rows tie.

ROW_NUMBER() OVER(y)

BIGINT

Determines the ordinal number of the current row within its partition. The ORDER BY expression in the OVER clause determines the number. Each value is ordered within its partition. Rows with equal values for the ORDER BY expressions receive different row numbers non-deterministically.

LAG(x) OVER (y)

Data type of x

Returns the value for the row before the current row in a partition. If no row exists, null is returned.

LEAD(x) OVER (y)

Data type of x

Returns the value for the row after the current row in a partition. If no row exists, null is returned.

FIRST_VALUE(x) OVER (y)

Data type of x

Returns the value of x with respect to the first row in the window frame.

LAST_VALUE(x) OVER (y)

Data type of x

Returns the value of x with respect to the last row in the window frame.

Section 14

SQL Nested Data Functions

This section contains descriptions of the functions supported by Drill for manipulating arrays and nested data.

Nested data function

Data type of result

Definition

FLATTEN(x)

----

Separates the elements in a repeated field x into individual rows; see Section 6.

KVGEN(x)

VARCHAR

Returns a list of the keys that exist in x; see Section 7.

REPEATED_COUNT(x)

INTEGER

Counts the number of values in x. X must be an array. See also Section 6.

REPEATED_CONTAINS(x,y)

BOOLEAN

Determines if the value y appears in the array x. Y may contain the following regular expression wildcards: asterisk (*), period (.), question mark (?), square bracketed ranges [a-z], square bracketed characters [ch], and negated square bracketed ranges or characters [!ch]. See also Section 6.

Like This Refcard? Read More From DZone

related article thumbnail

DZone Article

Metrics at a Glance for Production Clusters
related article thumbnail

DZone Article

Beyond Simple Responses: Building Truly Conversational LLM Chatbots
related article thumbnail

DZone Article

AI-Driven Test Automation Techniques for Multimodal Systems
related article thumbnail

DZone Article

The Smart Way to Talk to Your Database: Why Hybrid API + NL2SQL Wins
related refcard thumbnail

Free DZone Refcard

Open-Source Data Management Practices and Patterns
related refcard thumbnail

Free DZone Refcard

Real-Time Data Architecture Patterns
related refcard thumbnail

Free DZone Refcard

Getting Started With Real-Time Analytics
related refcard thumbnail

Free DZone Refcard

Getting Started With Apache Iceberg

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: