DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Reading Table Metadata With Flight SQL
  • DuckDB for Python Developers
  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • CockroachDB TIL: Volume 11

Trending

  • The Invisible OOMKill: Why Your Java Pod Keeps Restarting in Kubernetes
  • Build Self-Managing Data Pipelines With an LLM Agent
  • 5 Common Security Pitfalls in Serverless Architectures
  • Scaling Cloud Data Automation: A Practical Guide to Open Table Formats
  1. DZone
  2. Data Engineering
  3. Databases
  4. Query a Database With Arrow Flight

Query a Database With Arrow Flight

This post walks you through querying a database with Arrow Flight and Arrow Flight SQL, the differences between the two libraries, and the advantages of each.

By 
Anais Dotis-Georgiou user avatar
Anais Dotis-Georgiou
·
Dec. 20, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
6.5K Views

Join the DZone community and get the full member experience.

Join For Free

Arrow Flight is a “new general-purpose client-server framework to simplify high-performance transport of large datasets over network interfaces.” Flight uses gRPC and IPC protocols as its foundation. The structure of Flight primarily focuses on the transmission of Arrow record batches. The protocols for transmitting methods and data use Protobuf outlines, providing compatibility with clients equipped to handle gRPC and Arrow individually, even if they do not support Flight directly. Moreover, additional enhancements fine-tune Flight to reduce performance penalties typically associated with Protobuf use, particularly those related to unnecessary memory duplication. 

You can also use Apache Arrow Flight SQL to query a database. Apache Arrow Flight SQL is “a new client-server protocol developed by the Apache Arrow community for interacting with SQL databases that makes use of the Arrow in-memory columnar format and the Flight RPC framework.” Essentially, the Flight SQL clients wrap the underlying Flight client. However, it also provides methods for defining streams of Arrow record batches and how to read them. FlightSQL is useful for writing database-agnostic code for different databases that support it. This more general approach is what this article covers. For example, if you are creating a UI or other user experience that might query different kinds of databases. 

However, if you intend to use InfluxDB, we suggest using the upstream Flight libraries directly, because they have simpler dependencies and support both InfluxQL and SQL. In this post, we’ll learn how to use Apache Flight to query a database. Specifically, we’ll be querying InfluxDB 3.0, however, you can use this approach to query a variety of other data tools. We’ll also learn how to use FlightSQL for the sake of comparison. 

Requirements

To follow this tutorial, please ensure you meet the following requirements. 

Python Environment: Ensure Python is installed on your machine.

Dependencies: Install necessary Python packages like Yarrow, which includes support for Arrow Flight SQL.

Shell
 
pip install pyarrow


Specific Example 

Querying a database (like InfluxDB v3) with Arrow Flight involves the following steps: 

  1. Create a FlightClient by passing in the URL.
  2. Write a JSON script that has the necessary information for running the query on the specific platform. For example, you include the query and query language that you want to use to query your database. 
  3. Create a ticket from that JSON script.
  4. Return a reader that you can use to do what you want. 
Python
 
from pyarrow.flight import FlightClient, Ticket

import json

host = "us-east-1-2.aws.cloud2.influxdata.com"

database = 'my_db'

sql = "SELECT * from my_table"

client = FlightClient(f"grpc+tls://{host}:443")

ticket_data = {

    "namespace_name": "my_db",

    "sql_query": sql,

    "query_type": "sql",

}


ticket_bytes = json.dumps(ticket_data)

ticket = Ticket(ticket_bytes)

flight_reader = client.do_get(ticket)

arrow_table = flight_reader.read_all()

data_frame = arrow_table.to_pandas()

print(data_frame.to_markdown())


Note how you specify the sql_query. If you are querying InfluxDB exclusively, you could easily switch between SQL and InfluxQL, making ArrowFlight a more convenient tool for querying InfluxDB specifically. 

Comparison: Leveraging Apache Arrow Flight SQL Python Client

Sometimes the best way to learn is by contrast or comparison. Here’s an example of how to query InfluxDB with the Apache Arrow Flight SQL Python Client. First, you need to install the Arrow Flight SQL Python Client with:

Shell
 
pip install flightsql-dbapi


As you can see, using the Arrow Flight SQL Python Client is quite similar to using the Arrow Flight Python Client. However, the FlightSQL Client contains Readers that return Arrow Record batches in streams and methods to read the data from those streams. It also doesn’t provide a query language option, as SQL is used by default. 

Python
 
from flightsql import FlightSQLClient

# Instantiate a FlightSQLClient configured for a database
client = FlightSQLClient(host='cluster-id.influxdb.io',
    token='DATABASE_TOKEN',
    metadata={'database': 'DATABASE_NAME'},
    features={'metadata-reflection': 'true'})

# Execute the query to retrieve FlightInfo
info = client.execute("SELECT * FROM home")

# Extract the token for retrieving data

ticket = info.endpoints[0].ticket

# Use the ticket to request the Arrow data stream.

# Return a FlightStreamReader for streaming the results.

reader = client.do_get(ticket)

# Read all data to a pyarrow.Table

table = reader.read_all()

print(table)


It’s also important to note that the way the FlightSQL clients return streams of data differs quite a lot across different languages. Refer to this repo to see more examples of how to query InfluxDB with C++, Go, and Java the Arrow Flight SQLClients. While those examples are specific to InfluxDB, 

Final thoughts

Apache Arrow Flight is a powerful tool for transporting large datasets over a network interface.
If you need any help with Arrow Flight SQL or InfluxDB, please reach out using our community site. I’d love to hear about what you’re trying to achieve and what features you’d like InfluxDB to have. I hope this tutorial helped familiarize you with how to query a database with Arrow Flight.

If you’re interested in learning more about how to use Apache Flight SQL instead, I encourage you to take a look at the following resources: 

  • A repository with examples on how to query InfluxDB with the Java, Go, and C++ Flight SQL clients
  • InfluxData Documentation on using Apache Arrow Flight Clients
  • InfluxData Documentation on using Python Flight SQL DBAPI Client

If you’re using InfluxDB specifically, I recommend that you use one of our client libraries because they further wrap those Flight libraries to make writing and reading data to InfluxDB with Arrow simpler. Here are some resources for the InfluxDB v3 client libraries: 

  • All of the InfluxDB v3 client libraries are maintained here. 
  • The InfluxDB v3 client library documentation.
Apache Arrow Database InfluxDB Python (language) sql

Opinions expressed by DZone contributors are their own.

Related

  • Reading Table Metadata With Flight SQL
  • DuckDB for Python Developers
  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • CockroachDB TIL: Volume 11

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook