CData Elasticsearch Driver Features and Differentiators

In this article, we explore how CData drivers grant access to all of Elasticsearch, enable full SQL querying of Elasticsearch, and more.

Jerod Johnson

Sep. 16, 20 · Opinion

Likes (2)

Comment

Save

4.1K Views

Elasticsearch is a distributed, RESTful, full-text search engine designed to store, index, retrieve, and manage document-oriented or semi-structured data. Common uses for Elasticsearch range from building a simple search engine for a web site or document collection, to supporting auto-completion, analytics, AI, and cognitive computing workloads.

Because of the schemaless / NoSQL nature of Elasticsearch, data management is often a challenge. Through REST APIs users can access the search capabilities and features of Elasticsearch, but that requires custom development to connect it with other applications and services. The other option of using the SQL API and vendor-supplied libraries offers limited read-only SQL query capability and still only offers a small subset of tool integrations.

The CData Elasticsearch drivers offer the best of both worlds. The CData Drivers expand on the capabilities of the REST and SQL APIs, providing standards-compliant SQL-92 interfaces for data management, and the support for standards like ODBC / JDBC / ADO.NET provides near-universal seamless tool integration. The effectiveness of this methodology is best represented by our knowledge base of Elasticsearch integrations, which is only a small sample of what is possible.

In this article, we explore how CData drivers:

Grant access to all of Elasticsearch
Enable full SQL querying of Elasticsearch
Support JSON structures in Elasticsearch
Provide connectivity across major tools, platforms, and applications
Outperform the native drivers when querying Elasticsearch

Access to All of Elasticsearch

CData Drivers for Elasticsearch provide unmatched access to Elasticsearch with many features beyond those available from the native drivers. The distinctive features for enabling access to Elasticsearch are outlined below:

Connects to the SQL endpoint and REST endpoint (as opposed to only the SQL endpoint)
Enables SQL access to Elasticsearch v2.2 and above (compared to v6.3 and above)
Compatible with Open Source Elasticsearch subscriptions in addition to other commercially available subscriptions
Returns the relevance score as a _score column for queries without explicitly requiring the SCORE() function
Allows querying multiple indices (details below)

Querying Multiple Indices

Elasticsearch SQL supports querying multiple indices (or tables) through pattern matching. The CData Drivers support querying multiple indices, but for both the REST and SQL endpoints (as opposed to only the SQL endpoint). Multiple indices can be queried using any of the following formats:

Result Source	SQL Statement	REST API Endpoint
All available indices	`SELECT * FROM [_all]`	/_all/_search
A list of specific indices	`SELECT * FROM [index1, index2, index3]`	/index1%2Cindex2%2Cindex3/_search
Indices matching a wildcard pattern	`SELECT * FROM [index*]`	/index*/_search
Indices matching a pattern (with exclusions)	`SELECT * FROM [index*, -index3]`	/index*%2C-index/_search

SQL Support

The CData Drivers for Elasticsearch fully support standard SQL queries, whether crafted manually for custom applications and visualizations or generated automatically by a BI, reporting, or ETL tool.

Full CRUD Support

Since the CData Drivers include support for both the SQL and REST API endpoints, users can submit SELECT, INSERT, UPDATE, and DELETE statements to easily read from and write to Elasticsearch with CData Drivers.

Operation	SQL Statement	Endpoint(s) Used
Query all documents in an index	`SELECT * FROM [myIndex];`	REST, SQL
Create a new document in an index	`INSERT INTO [myIndex] (myField1, myField2, myField3, ...) VALUES ('myValue1', 'myValue2', 'myValue3', ...);`	REST
Update an existing document in an index	`UPDATE [myIndex] SET myField1 = 'myNewValue1' WHERE myId = '1';`	REST
DELETE an existing document from an index	`DELETE FROM [myIndex] WHERE myId = '1';`	REST

Filtering

CData Drivers allow for filtering using LIKE and = operators, as opposed to only supporting LIKE operators, offering more precise querying. This functionality is useful, but can still lead to imprecise results. For example, Elasticsearch may return records that contain the same words but in a different order: a search performed using the value "blue sky" will return a record with "sky blue." The Query Type logic built-in to the CData drivers identifies columns as Analyzed or Non-Analyzed and the drivers issue the best possible Elasticsearch query based on the specific operator and the search value.

WHERE Clause	Column Type	Elasticsearch Query Type
column = 'value'	Analyzed	Query String Query
column = 'value with spaces'	Analyzed	Match Phrase Query
column LIKE 'v_lu%'	Analyzed	Query String with wildcards
column = 'value'	Non-Analyzed	Query String Query*
column = 'value with spaces'	Non-Analyzed	Wildcard Query
column LIKE 'v_lu%'	Non-Analyzed	Wildcard Query with wildcards

* Non-analyzed columns can be matched case-insensitive, so four popular cases are checked: 'myValue' OR 'MYVALUE' OR 'myvalue' OR 'Myvalue'

JOIN Support

In Elasticsearch, the join datatype creates a parent/child relation within documents of the same index. CData Drivers can leverage the join datatypes to split related tables and enable SQL JOIN queries across those parent/child relationships. If you need to JOIN across indexes, the built-in SQL engine can perform a client-side, in-memory JOIN to eliminate restrictions on how you work with your data. Using the Elasticsearch example, we can build related tables using the join datatype.

For example, if we have the following documents in the sample index, we get different results from a query, depending on whether we use the native drivers or the CData drivers.

Example Index: sample

    Java
   
 

     
    

          x
         

            
          
{
  "_index": "sample",
  "_type": "doc",
  "_id": "2",
  "_score": 1,
  "_source": {
    "text": "What is your favorite motorcycle?",
    "question_id": {
      "name": "question"
    }
  }
},
{
  "_index": "sample",
  "_type": "doc",
  "_id": "1",
  "_score": 1,
  "_source": {
    "text": "What is your favorite company?",
    "question_id": {
      "name": "question"
    }
  }
},
{
  "_index": "sample",
  "_type": "doc",
  "_id": "3",
  "_score": 1,
  "_routing": "1",
  "_source": {
    "text": "CData Software",
    "question_id": {
      "name": "answer",
      "parent": "1"
    }
  }
},
{
  "_index": "sample",
  "_type": "doc",
  "_id": "4",
  "_score": 1,
  "_routing": "1",
  "_source": {
    "text": "Elastic",
    "question_id": {
      "name": "answer",
      "parent": "1"
    }
  }
},
{
  "_index": "sample",
  "_type": "doc",
  "_id": "5",
  "_score": 1,
  "_routing": "1",
  "_source": {
    "text": "Suzuki V-Strom 650",
    "question_id": {
      "name": "answer",
      "parent": "2"
    }
  }
},
{
  "_index": "sample",
  "_type": "doc",
  "_id": "6",
  "_score": 1,
  "_routing": "1",
  "_source": {
    "text": "Yamaha FZ6",
    "question_id": {
      "name": "answer",
      "parent": "2"
    }
  }
}

The native drivers expose a single table, sample, with a single column, text, but with no preservation of the relationship between the documents. The CData Drivers expose two tables, sample_question and sample_answer based on the relationship between the documents in the index.

Native Drivers

When querying the data, the native Elasticsearch driver returns a single table with one column, 'text':

Query Result (sample)

text
What is your favorite company?
What is your favorite motorcycle?
CData Software
Elastic
Suzuki V-Strom 650
Yamaha FZ6

CData Drivers

In contrast, CData Drivers expose two separate related tables, with several columns, including foreign keys to define the relationships:

Parent (sample_question)

_id	_score	text
1	0.0	What is your favorite company?
2	0.0	What is your favorite motorcycle?

Child (sample_answer)

_id	sample_question_id	text
3	1	CData Software
4	1	Elastic
5	2	Suzuki V-Strom 650
6	2	Yamaha FZ6

Support for JSON Structures

CData Drivers provide support for querying JSON structures, like arrays and nested JSON objects, which can often be found in Elasticsearch records. With CData, users will be able to get exactly the data they want from Elasticsearch, thanks to built-in schema discovery and JSON structure flattening.

Array Support

Native Elasticsearch drivers do not (at the time of writing) support querying data found in arrays. CData Drivers not only support accessing the arrays (as raw JSON data), but also support flattening the arrays to drill down into the relevant data. Consider the following JSON array:

    Java
   
          x
         
{
  ...
  "coords" : [35.91,-79.06],
  ...
}

CData Elasticsearch Drivers expose the columns coords coords.0 and coords.1 (sample results below).

coords	coords.0	coords.1
[35.92,-79.06]	35.91	-79.06

Nested Type Support

Data stored for use with Elasticsearch often contains nested JSON objects. Without support for these objects, native drivers will miss a lot of available data in their queries. CData Drivers support searching nested objects and include functionality to flatten the nested objects to atomize the data. Consider the following JSON object:

    JSON
   
xxxxxxxxxx

{
  "manager": {
    "name": "Ms. Manager",
    "age": 35,
    "location": "Flagship Store",
    ...
  }
}

CData Elasticsearch Drivers expose manager as a column, along with a separate column for each nested element: manager.name, manager.age, manager.location (sample results below):

manager	manager.name	manager.age	manager.location
{ "name": "Ms. Manager", "age": 35, "location": "Flagship Store", ... }	Ms. Manager	35	Flagship Store

Platform Support

Outside of the version and subscription restrictions, one of the largest limitations of the native Elasticsearch drivers is that they only support the JDBC and ODBC standards. In comparison, CData offers connectivity across a variety of data-centric standards like ODBC (Windows, Linux, Mac), JDBC, and ADO.NET and directly within tools like Excel, SSIS, BizTalk, and PowerShell. Whether you are using tools and applications for BI, analytics, reporting, and ETL or building custom applications, CData drivers provide live access to Elasticsearch data wherever you need it.

Performance Comparison

The CData JDBC Driver is able to read data more than twice as fast as the native driver. The CData engineers built a driver that makes better use of client-side resources to work with large datasets quickly.

Rows Queried	CData Software	Native
JDBC Query Times by Company (in milliseconds)
25,000	1,016.9 (+35%)	1,375.4
~10,000,000	199,577 (+155%)	508,338

For more information, refer to our performance comparison article.

Other Features

Beyond these major features and key differentiators, CData Drivers for Elasticsearch come with additional features, that are included across all of the full range of data sources that CData supports.

Collaborative query processing - a powerful, built-in SQL engine pushes supported complex queries (filtering, aggregations, JOINS, etc.) down to Elasticsearch for server-side processing and features client-side processing for unsupported queries
Active development & maintenance — changes to the Elasticsearch API are implemented and supported as soon as possible, keeping up with the latest versions and updates
World-class support — our Support Team is available to answer your questions via email, live chat, and phone.

Free Trials and More Information

You will find the latest version of our Elasticsearch drivers here and all of our drivers include a free, 30-day trial. If you want to learn more about using our drivers in your existing tools or applications, you can read more in our Knowledge Base.

Database Driver (software) Elasticsearch sql

Published at DZone with permission of Jerod Johnson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending