DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Reading Table Metadata With Flight SQL
  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • CockroachDB TIL: Volume 11
  • Python Bags the TIOBE Language of the Year 2021 in a Row

Trending

  • Automatic Code Transformation With OpenRewrite
  • Automating Data Pipelines: Generating PySpark and SQL Jobs With LLMs in Cloudera
  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You
  • 5 Subtle Indicators Your Development Environment Is Under Siege
  1. DZone
  2. Data Engineering
  3. Databases
  4. Query a Database With Arrow Flight

Query a Database With Arrow Flight

This post walks you through querying a database with Arrow Flight and Arrow Flight SQL, the differences between the two libraries, and the advantages of each.

By 
Anais Dotis-Georgiou user avatar
Anais Dotis-Georgiou
·
Dec. 20, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
5.1K Views

Join the DZone community and get the full member experience.

Join For Free

Arrow Flight is a “new general-purpose client-server framework to simplify high-performance transport of large datasets over network interfaces.” Flight uses gRPC and IPC protocols as its foundation. The structure of Flight primarily focuses on the transmission of Arrow record batches. The protocols for transmitting methods and data use Protobuf outlines, providing compatibility with clients equipped to handle gRPC and Arrow individually, even if they do not support Flight directly. Moreover, additional enhancements fine-tune Flight to reduce performance penalties typically associated with Protobuf use, particularly those related to unnecessary memory duplication. 

You can also use Apache Arrow Flight SQL to query a database. Apache Arrow Flight SQL is “a new client-server protocol developed by the Apache Arrow community for interacting with SQL databases that makes use of the Arrow in-memory columnar format and the Flight RPC framework.” Essentially, the Flight SQL clients wrap the underlying Flight client. However, it also provides methods for defining streams of Arrow record batches and how to read them. FlightSQL is useful for writing database-agnostic code for different databases that support it. This more general approach is what this article covers. For example, if you are creating a UI or other user experience that might query different kinds of databases. 

However, if you intend to use InfluxDB, we suggest using the upstream Flight libraries directly, because they have simpler dependencies and support both InfluxQL and SQL. In this post, we’ll learn how to use Apache Flight to query a database. Specifically, we’ll be querying InfluxDB 3.0, however, you can use this approach to query a variety of other data tools. We’ll also learn how to use FlightSQL for the sake of comparison. 

Requirements

To follow this tutorial, please ensure you meet the following requirements. 

Python Environment: Ensure Python is installed on your machine.

Dependencies: Install necessary Python packages like Yarrow, which includes support for Arrow Flight SQL.

Shell
 
pip install pyarrow


Specific Example 

Querying a database (like InfluxDB v3) with Arrow Flight involves the following steps: 

  1. Create a FlightClient by passing in the URL.
  2. Write a JSON script that has the necessary information for running the query on the specific platform. For example, you include the query and query language that you want to use to query your database. 
  3. Create a ticket from that JSON script.
  4. Return a reader that you can use to do what you want. 
Python
 
from pyarrow.flight import FlightClient, Ticket

import json

host = "us-east-1-2.aws.cloud2.influxdata.com"

database = 'my_db'

sql = "SELECT * from my_table"

client = FlightClient(f"grpc+tls://{host}:443")

ticket_data = {

    "namespace_name": "my_db",

    "sql_query": sql,

    "query_type": "sql",

}


ticket_bytes = json.dumps(ticket_data)

ticket = Ticket(ticket_bytes)

flight_reader = client.do_get(ticket)

arrow_table = flight_reader.read_all()

data_frame = arrow_table.to_pandas()

print(data_frame.to_markdown())


Note how you specify the sql_query. If you are querying InfluxDB exclusively, you could easily switch between SQL and InfluxQL, making ArrowFlight a more convenient tool for querying InfluxDB specifically. 

Comparison: Leveraging Apache Arrow Flight SQL Python Client

Sometimes the best way to learn is by contrast or comparison. Here’s an example of how to query InfluxDB with the Apache Arrow Flight SQL Python Client. First, you need to install the Arrow Flight SQL Python Client with:

Shell
 
pip install flightsql-dbapi


As you can see, using the Arrow Flight SQL Python Client is quite similar to using the Arrow Flight Python Client. However, the FlightSQL Client contains Readers that return Arrow Record batches in streams and methods to read the data from those streams. It also doesn’t provide a query language option, as SQL is used by default. 

Python
 
from flightsql import FlightSQLClient

# Instantiate a FlightSQLClient configured for a database
client = FlightSQLClient(host='cluster-id.influxdb.io',
    token='DATABASE_TOKEN',
    metadata={'database': 'DATABASE_NAME'},
    features={'metadata-reflection': 'true'})

# Execute the query to retrieve FlightInfo
info = client.execute("SELECT * FROM home")

# Extract the token for retrieving data

ticket = info.endpoints[0].ticket

# Use the ticket to request the Arrow data stream.

# Return a FlightStreamReader for streaming the results.

reader = client.do_get(ticket)

# Read all data to a pyarrow.Table

table = reader.read_all()

print(table)


It’s also important to note that the way the FlightSQL clients return streams of data differs quite a lot across different languages. Refer to this repo to see more examples of how to query InfluxDB with C++, Go, and Java the Arrow Flight SQLClients. While those examples are specific to InfluxDB, 

Final thoughts

Apache Arrow Flight is a powerful tool for transporting large datasets over a network interface.
If you need any help with Arrow Flight SQL or InfluxDB, please reach out using our community site. I’d love to hear about what you’re trying to achieve and what features you’d like InfluxDB to have. I hope this tutorial helped familiarize you with how to query a database with Arrow Flight.

If you’re interested in learning more about how to use Apache Flight SQL instead, I encourage you to take a look at the following resources: 

  • A repository with examples on how to query InfluxDB with the Java, Go, and C++ Flight SQL clients
  • InfluxData Documentation on using Apache Arrow Flight Clients
  • InfluxData Documentation on using Python Flight SQL DBAPI Client

If you’re using InfluxDB specifically, I recommend that you use one of our client libraries because they further wrap those Flight libraries to make writing and reading data to InfluxDB with Arrow simpler. Here are some resources for the InfluxDB v3 client libraries: 

  • All of the InfluxDB v3 client libraries are maintained here. 
  • The InfluxDB v3 client library documentation.
Apache Arrow Database InfluxDB Python (language) sql

Opinions expressed by DZone contributors are their own.

Related

  • Reading Table Metadata With Flight SQL
  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • CockroachDB TIL: Volume 11
  • Python Bags the TIOBE Language of the Year 2021 in a Row

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!