How to Query InfluxDB

Learn how to query time series data in your InfluxDB database using IFQL, an improved query language that is very similar to SQL.

Gianluca Arbezzano

Jan. 23, 18 · Tutorial

Likes (3)

Comment

Save

54.6K Views

InfluxDB is a popular time series database and I am here to show you how to query it. It uses a SQL-like (or to be totally honest, an "almost-SQL") language.

SELECT * FROM "foodships" WHERE time > now() - 1h

This language has worked and been supported in InfluxDB since day one and it has both pros and cons:

It's easy for new users to interact with a database in a language that they already know. SQL is popular, and a lot of developers already know it. This query syntax helped InfluxDB grow in its early days.
As I said before, InfluxQL is an "almost-SQL" language. Writing a SQL query language is not a simple job. One problem is that DBAs have some expectations of SQL, but our language is not fully compatible with SQL. This creates a lot of mismatches and doubts.
One of the characteristics of a time series database is that writes cannot block reads and reads cannot block writes. This is a must-have. Your IoT devices or your servers will always write metrics and in the meantime, your dashboard, analytic tools, and alert managers will read from the database. InfluxDB needs something more scalable.
At the moment, the storage engine and query language are in the same project, which blocks fast iteration on the code that we would like to have. The query language evolves much quicker compared to the storage engine. The engine needs stability, but the query language needs more functions and capabilities.
InfluxQL needs a rearchitecturing because the team wasn't able to implement all the amazing ideas required by the community.
Currently, SQL is very simple to learn, but there is another DSL available in our stack: the TICK script used from Kapacitor to send alerts and downsample your series.
InfluxQL doesn't have mathematical functions because the language doesn't allow us to add them in a simple way. With IFQL, they will be very easy to implement.

During InfluxDays in San Francisco, Paul Dix, CTO and founder of InfluxData, released on stage a new open-source query language to bring data analysis close to InfluxDB. It is called IFQL.

Architecture

var cpu = select(db: "telegraf")
  .filter(exp:{"_measurement" == "cpu" and "_field" == "usage_user"})
  .range(start: -30m)
  
select(db: "telegraf")
  .filter(exp:{"_measurement" == "mem" and "_field" == "used_percent"})
  .range(start: -30m)
  .join(on:["host"], eval:{$ + cpu})

IFQL is not just a query language. It has a parser that reads a query like the one above and generates a plan that can be visualized as JSON and run directly via IFQL. This means that the syntax you are seeing is just one of the possible implementations. The real contract is between the plan and the processor.

The architecture and the idea behind the project will allow us to reimplement InfluxQL and PromQL, TICK Script on top of IFQL AST. Besides that, if you have your query builder or you need to create a custom implementation to interact with InfluxDB, you can generate the plan as simple JSON and push it to IFQL. Flexible and amazing.

IFQL is designed to be extensible and to achieve this goal — we took an important lesson learned from Telegraf. You need a clean interface and a single entry point to make developers happy to contribute to your project. That's why IFQL has a directory called /functions. It contains all the functions currently supported, and contributors looking to add functions should work only in that directory.

TICK Script and IFQL

One of the goals for IFQL is to provide all the capabilities currently provided by TICK Script so that we will be able to query, manipulate, and send alerts without learning two languages.

Currently, we have functions replicated across both projects, but we will be able to reuse them with IFQL.

At the moment, IFQL is a standalone project available on GitHub. Other than the parser and planner, it provides a simple http-daemon called ifqld to start some tests.

It is in beta, which means that the API and query syntax will change, but it is very easy to set up. I will use Docker for this example, but you can find the IFQL binaries, deb, and rpm available here if you don't know Docker.

Let's download an InfluxDB configuration file from the IFQL repository:

wget https://raw.githubusercontent.com/influxdata/ifql/master/docker-compose.yml
mkdir examples
wget https://raw.githubusercontent.com/influxdata/ifql/master/examples/influxdb.conf -O examples/influxdb.conf

The main difference is a paragraph that enables IFQL:

[ifql]
  enabled = true
  log-enabled = true
  bind-address = ":8082"

InfluxDB uses port 8082 to start a gRPC communication with IFQL.

docker-compose up

As you can see, we created a network and deployed two containers. One is InfluxDB with our modified configuration and 8082 and 8086 exposed. The second one is IFQL v0.0.2. It exposes port 8093 and we can verify everything is working as expected by running a simple curl:

15:21 $ curl -I http://localhost:8093/
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Tue, 28 Nov 2017 14:21:40 GMT
Content-Length: 19

Now let's populate our database with some sample data:

curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
influx
use NOAA_water_database
SELECT COUNT(*) FROM NOAA_water_database
> SELECT COUNT(*) FROM h2o_feet
name: h2o_feet
time count_level description count_water_level
---- ----------------------- -----------------
0    15258                   15258

As you can see, we have some points and measurements. Now we will interact with ifqld using curl to run some basic examples:

20:11 $ curl -XPOST --data-urlencode 'q=from(db:"NOAA_water_database").filter(exp:{"_measurement"=="h2o_temperature"}).range(start: -500000h).limit(n: 10)' http://localhost:8093/query       
h2o_temperature,location=coyote_creek degrees=60 1439856000000000000                           
h2o_temperature,location=coyote_creek degrees=65 1439856360000000000                           
h2o_temperature,location=coyote_creek degrees=68 1439856720000000000                           
h2o_temperature,location=coyote_creek degrees=62 1439857080000000000                           
h2o_temperature,location=coyote_creek degrees=62 1439857440000000000                           
h2o_temperature,location=coyote_creek degrees=69 1439857800000000000                           
h2o_temperature,location=coyote_creek degrees=67 1439858160000000000                           
h2o_temperature,location=coyote_creek degrees=67 1439858520000000000                           
h2o_temperature,location=coyote_creek degrees=70 1439858880000000000                           
h2o_temperature,location=coyote_creek degrees=65 1439859240000000000                           
h2o_temperature,location=santa_monica degrees=70 1439856000000000000                           
h2o_temperature,location=santa_monica degrees=60 1439856360000000000                           
h2o_temperature,location=santa_monica degrees=62 1439856720000000000                           
h2o_temperature,location=santa_monica degrees=62 1439857080000000000                           
h2o_temperature,location=santa_monica degrees=60 1439857440000000000                           
h2o_temperature,location=santa_monica degrees=63 1439857800000000000                           
h2o_temperature,location=santa_monica degrees=64 1439858160000000000                           
h2o_temperature,location=santa_monica degrees=63 1439858520000000000                           
h2o_temperature,location=santa_monica degrees=63 1439858880000000000                           
h2o_temperature,location=santa_monica degrees=61 1439859240000000000

Let's compare the ranned query from(db:"NOAA_water_database").filter(exp:{"_measurement"=="h2o_temperature"}).range(start: -500000h).limit(n: 10) with InfluxQL:

from works a bit like select in SQL. It selects the database to query.
Inside filter, there are some specific keywords like _measurement used by the database itself. In this case, it filters by the measurement h2o_temperature.
limit(10) returns 20 lines in this case. What is going on? This is one of the first differences around how we built the query language. There is an implicit group by * that groups by series. In this case, there are two series: h2o_temperature,location=santa_monica and h2o_temperature,location=coyote_creek. If you need a specific series, you should add .filter(exp:{"_measurement"=="h2o_temperature" AND "location" == "coyote_creek"}).

Some important decisions behind the query language:

The arguments' order for a function isn't important. Every argument has a name and is not mandatory.
Use " everywhere to avoid mismatches between single and double quotes.

IFQL simplifies every query from a specific syntax to a JSON AST. You can get the AST for your specific query by adding the query param analyze=true.

from(db:"NOAA_water_database").filter(exp:{"_measurement"=="h2o_temperature"}).range(start:
-500000h).limit(n: 10)

The README.md inside the IFQL project documents all the other available functions.

Scalability

ifqld is a stateless HTTP daemon, so it can scale up and down easier than InfluxDB (or other databases) because there is no state or data involved. Some functions, like filters, are pushed down to the engine (not in all cases) because InfluxDB is able to retrieve the time series and all of the required points.

Other functions, like aggregation functions, are not pushed to InfluxDB but are run by the ifqld process as soon as it gets the data back from the database. This allows us to scale reading and aggregation in a more flexible way. In case of more queries or more computation requirements, we can spin up more ifqld processes without needing to care that much about the underlying storage.

Another important aspect of scalability is that ifqld can query more than one influxdb.

19:56 $ curl -s -XPOST --data-urlencode 'q=from(db:"NOAA_water_database").range(start: 200h).count()' http://localhost:8093/query?analyze=true | jq
{
  "operations": [
    {
      "id": "from0",
      "kind": "from",
      "spec": {
        "database": "NOAA_water_database",
        "hosts": null
      }
    },
    {
      "id": "range1",
      "kind": "range",
      "spec": {
        "start": "200h0m0s",
        "stop": "now"
      }
    },
    {
      "id": "count2",
      "kind": "count",
      "spec": {}
    }
  ],
  "edges": [
    {
      "parent": "from0",
      "child": "range1"
    },
    {
      "parent": "range1",
      "child": "count2"
    }
  ]
}

Using the CLI argument --host= or the environment variable $HOSTS allows you to specify more than one backend. This means that if you are running the open-source version and you are able to manage the data sharding on top of more databases, you can sort of horizontally scale.

Roadmap

The project is unstable and under fast development. The API and the output can change based on yours and our experience with the query language. The idea is to lock the API in early 2018. If you are an active InfluxDB user, try it out and feel free to open PRs or issues with feedback about your use case to help us improve the syntax.

Database InfluxDB Query language Open source Time series

Published at DZone with permission of Gianluca Arbezzano, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending