Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Standards-Based Access to NoSQL Data Sources for Business Intelligence

DZone's Guide to

Standards-Based Access to NoSQL Data Sources for Business Intelligence

Learn how to use the ODBC Driver for MongoDB to create reports in a popular business intelligence tool, Crystal Reports.

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

NoSQL data sources like MongoDB provide a solution for high-volume, scalable, and agile data management. There is a high demand to integrate these data sources with existing infrastructure and familiar tools. Standards-based data access facilitates your expansion into MongoDB by enabling you to leverage existing skills using standards like ODBC, JDBC, and ADO.NET. Standards-based drivers also provide maximum interoperability with BI (business intelligence), analytics, and ETL.

The ability to connect to MongoDB and other NoSQL data sources as a relational database is a critical component of integration with BI, analytics, and ETL. Major BI tools such Crystal Reports, Tableau, QlikView, and TIBCO Spotfire depend on the SQL interface provided by standards-based drivers.

CData drivers enable bidirectional access using standard SQL to MongoDB data. CData drivers model MongoDB objects as normalized relational tables. This article shows how to access objects in NoSQL data sources from any CData driver technology. In this article, you will use the ODBC Driver for MongoDB to create reports in a popular BI tool, Crystal Reports.

Connect to MongoDB From BI, Analytics, and ETL

CData drivers enable out-of-the-box connectivity to NoSQL data sources from your application. The drivers automatically generate schemas, or table definitions, that enable you to start querying MongoDB objects with SQL as soon as you connect. Simply set the DetectDataTypes property to the number of rows the driver will scan to infer column data types.

The driver models all top-level native properties as columns of an appropriate type. The inferred data types of the following JSON object are shown below:

{
  id: 12,
  name: "Lohia Manufacturers Inc.",
  address: {street: "Main Street", city: "Chapel Hill", state: "NC"},
  offices: ["Chapel Hill", "London", "New York"]
  annual_revenue: 35,600,000
}
Column Name Data Type Example Value
id
Integer 12
name
String Lohia Manufacturers Inc.
address
String {street: "Main Street", city: "Chapel Hill", state: "NC"}
 offices
String ["Chapel Hill", "London", "New York"]
annual_revenue
Double 35,600,000

Access Nested Data From SQL

CData drivers provide easy access to nested data through the XPath standard. You can call the JSON_EXTRACT SQL function to drill down to nested properties from SQL. Use the XPath as the second argument to flatten the JSON hierarchy to the level you want. Use an SQL alias to define the dataset's column names.

The following examples use the raw JSON below of a nested array of objects:

offices: ["Headquarters", {street: "600 Main Street", city: "Chapel Hill", state: "NC"}]

To flatten the fields of the offices object, use the following SQL:

SELECT JSON_EXTRACT(offices,'[1]') AS Headquarters,
JSON_EXTRACT(offices,'[2].street') AS Street,
JSON_EXTRACT(offices,'[3].city') AS City,
JSON_EXTRACT(offices,'[4].state') AS State
FROM Customers

The dataset returned from the preceding query contains the following columns:

Column Name Data Type Example Value
Headquarters
String Headquarters
Street
String 600 Main Street
City
String Chapel Hill
State
String NC

Access Nested Data From Custom Tables

CData drivers provide two methods to outline the exact fields you want to pull from MongoDB, which you are then able to query as tables:

  • Specify the XPath in your query with the JSON_EXTRACT formula.
  • Create static schemas to map columns to objects and arrays nested in the JSON response.

You can define schemas, or table definitions, in simple, XML-based files. Schema files can be used with any CData driver technology.

The schema in this example will map columns to the fields of the following array of objects:

offices: ["Headquarters", {street: "600 Main Street", city: "Chapel Hill", state: "NC"}]

The following definitions map columns to the fields in the example:

<attr name="Headquarters" xs:type="string" other:bsonpath="offices.0" />
<attr name="OfficeStreet" xs:type="string" other:bsonpath="offices.1.street" />
<attr name="OfficeCity" xs:type="string" other:bsonpath="offices.1.city" />
<attr name="OfficeState" xs:type="string" other:bsonpath="offices.1.state" />

Each attr defined in the rsb:info section defines a column from the database. Define a column by providing the following information:

  • Name: Set name to the column name.
  • Data type: Set xs:type to define the data type. The supported data types for the xs:type attribute are integer, double, datetime, string, long, and boolean.
  • XPath: Set other:bsonpath to the XPath to the JSON element you want to retrieve. Elements of arrays are indexed starting from 0. Each . in the other:bsonpath attribute signifies a child of the current element.

Schemas are defined in RSD files with the table name specified by the filename. Schemas consist of column definitions and calls to the CRUD (create, read, update, and delete) operations shipped with the driver. Below is a complete schema, Office.rsd. The only part of the schema you will need to modify is the rsb:info block; copy the rest of the script as is.

<rsb:script xmlns:rsb="https://www.cdata.com/ns/rsbscript/2">
  <rsb:info title="Office" description="An example schema.">  
    <!-- Column definitions -->
    < attr name="OfficeName" xs:type="string" other:bsonpath="offices.0" />
    < attr name="OfficeStreet" xs:type="string" other:bsonpath="offices.1.street" />
    < attr name="OfficeCity" xs:type="string" other:bsonpath="offices.1.city" />
    < attr name="OfficeState" xs:type="string" other:bsonpath="offices.1.state" />
  </rsb:info>
  <rsb:script method="GET">
    <rsb:call op="mongodbadoExecuteSelect">
      <rsb:push />
    </rsb:call>
  </rsb:script>

  <rsb:script method="POST">
    <rsb:call op="mongodbadoExecutePost">
      <rsb:push />
    </rsb:call>
  </rsb:script>

  <rsb:script method="MERGE">
    <rsb:call op="mongodbadoExecuteMerge">
      <rsb:push />
    </rsb:call>
  </rsb:script>

  <rsb:script method="DELETE">
    <rsb:call op="mongodbadoExecuteDelete">
      <rsb:push />
    </rsb:call>
  </rsb:script>
</rsb:script>

You can use CData drivers to obtain the raw JSON for any MongoDB object:

  1. Set the Logfile connection property to a location where the driver has write permissions (for example, C:\MongoDBLog.txt).
  2. Set the Verbosity connection property to 3. The driver will log the body of HTTP requests, containing the JSON.
  3. Connect to the MongoDB database. The driver will detect the objects in your database.
  4. Execute a SELECT * query to the MongoDB object you want.
  5. The log file should now contain the raw JSON response from your server.

To use a custom schema, set the Location connection property to the schema file and reconnect.

Connect to MongoDB From Crystal Reports

The ODBC driver has almost ubiquitous support across the major BI tools. It is managed using Windows tools. Follow the steps below to use the ODBC Data Source Administrator to connect to MongoDB as an ODBC data source in Crystal Reports:

  1. Specify connection properties in a DSN (data source name). You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs. Typical connection properties are the following:

    • Server: Enter the name or address of the MongoDB server you want to connect to.

    • Database: Enter the name of the database you want to connect to.

    • User: Enter the username of an authorized user.

    • Password: Enter the password of an authorized user.

    • AuthDatabase: If you are connecting to a different database than the authentication database, enter the name of the authentication database.

    Image title

  2. In Crystal Reports, open a new report and click Create New Connection > ODBC.

  3. In the resulting wizard, click Select Data Source and select the DSN in the Data Source Name menu.

Design a Report

After adding an ODBC connection to MongoDB, you can then use the Report Wizard to add MongoDB objects to your report.

In the Report Wizard, click Create New Connection > ODBC and select the MongoDB DSN. Configure the data source by selecting the tables and fields needed in the report. We use a dataset from the Bureau of Transportation Statistics' domestic flights for 2015. You can then create charts and other report objects. Your report objects can summarize and sort report fields.

Tables in your MongoDB database.

Leverage Server-Side Processing

With a fast database like MongoDB, you can see marked performance gains by formulating large result sets on the server. CData drivers enable you to take advantage of MongoDB's aggregation framework to create responsive reports that reflect changes to the data. As you build charts and other report objects in Crystal Reports, Crystal Reports builds a query to the driver. The driver makes the corresponding request to MongoDB APIs.

The driver exposes the capabilities of the MongoDB API as standard SQL. Below are some report objects and the corresponding SQL queries. The dataset used is from the Bureau of Transportation Statistics' domestic flights dataset for 2015.

The following query finds the best average time to catch a flight to avoid delays. The query retrieves the average arrival delay for each departure time. The arrival time is the difference in minutes between the scheduled and actual arrival time. The number of minutes a plane arrived early are represented as negative numbers:

SELECT AVG(ARR_DELAY) AS AvgARR_DELAY, DEP_TIME
FROM Airlines
GROUP BY DEP_TIME

Average flight delays in New York airports for every hour from January to June 2015.

To create this chart, click Insert > Chart in a report and enter the following in the Chart Wizard:

  • In the On Change Of menu, select DEP_TIME.
  • In the Show Values box, select ARR_DELAY and set the summary option to average.

Chart Wizard configuration to find the best average time to leave New York to avoid delays.

You can work with MongoDB just like any other ODBC data source. See the Online Knowledge Base for more guides to integrating your business intelligence and ETL solutions with MongoDB.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:
big data ,nosql ,business intelligence ,data analytics ,etl ,crystal reports ,cdata ,mongodb

Published at DZone with permission of Jerod Johnson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}