Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Gentle Introduction to OmniSci

DZone 's Guide to

A Gentle Introduction to OmniSci

See how OmniSciDB fits into an analytics ecosystem.

· Big Data Zone ·
Free Resource

Introduction

I recently joined OmniSci as a Community Developer Advocate. My role is to help build the global OmniSci community and raise awareness through presentations and technical writing. My background is in database technology, but I don't have any previous experience with OmniSci or Graphics Processing Unit (GPU) technology. In this series of articles, I will share what I learned about OmniSci as a beginner and I hope that this will also be useful to other beginners. I will focus mainly on Open Source Software (OSS) and industry standards.

What Is OmniSci?

At its heart, OmniSci is an analytics platform that can process very large quantities of data at scale. As shown in Figure 1, OmniSci consists of a number of components in three major groupings: Data Integration, OmniSci Platform and Develop and Accelerate.

Image title

Figure 1. OmniSci Architecture (Source: OmniSci).

Data Integration

A company may be working with Data-at-Rest or Streaming Data, and frequently both. A variety of methods can be used with the OmniSci Platform to ingest data, such as through JDBC, or directly from Apache Kafka. OmniSci can also work with streaming data, such as data generated by IoT and sensors. In either case, the data may potentially run into many billions of rows. The OmniSci Platform can handle such data volumes with ease.

Develop and Accelerate

Develop and Accelerate also uses industry-standard ODBC and JDBC to export data to various external tools, for example. We may also wish to undertake further data processing using Machine Learning. OmniSci supports Python, which is one of the most popular programming languages used by Data Scientists today. Integration with third-party tools, such as TensorFlow, is also possible.

In this article, we'll now focus on the OmniSci Platform.

What Is the OmniSci Platform?

From Figure 1, we can see that the OmniSci Platform consists of three main components:

  1. OmniSciDB
  2. OmniSci Render
  3. OmniSci Immerse

Let's now look at these three components in more detail.

Getting Started With OmniSciDB

OmniSciDB is an open source SQL database engine designed to run on GPUs and CPUs. Therefore, it can benefit from the advantages that GPUs provide, such as parallelism or the ability to process in parallel, which can boost performance. OmniSciDB also uses multi-tiered memory caching, a Just-In-Time (JIT) query compilation framework and in-situ graphics rendering.

Using GPUs provides both scale and performance. For example, using GPUs, SQL query performance is often many orders of magnitude greater, even without the use of other accelerators, such as indexes.

Today, SQL is still extremely popular and very widely-used. Data scientists and developers can use their existing SQL programming knowledge and skills to query database systems. OmniSci provides several ways that users can run SQL queries:

  • OmniSciDB ships with a command line tool.
  • OmniSci Immerse provides a browser-based SQL editor.

SQL query results can be rendered in OmniSci Immerse or output to popular BI tools.

Many applications today use Geospatial Data to provide a range of services, such as the distance between two points or the intersection of two objects. These types of applications require support for richer data types, such as:

  • POINT
  • LINESTRING
  • POLYGON
  • MULTIPOLYGON

OmniSci enables the storage and querying of Geospatial data by providing support for Open Geospatial Consortium (OGC) data types. A number of Geospatial file formats are also supported, such as:

  • GeoJSON
  • ESRI Shapefile
  • KML
  • CSV/TSV with WKT

Additionally, Geospatial functions are supported, such as:

  • Geometry Constructors
  • Geometry Editors
  • Geometry Accessors
  • Spatial Relationships and Measurements

GPU technology can again provide considerable benefits when working with Geospatial data, as we'll see in examples in future articles in this series.

As mentioned earlier, OmniSciDB is open source and supports a range of programming interfaces that will appeal to many different users, such as Data Scientists and Developers. In future articles, we'll look at examples of how to use some of these interfaces, such as Python and JavaScript. In the meantime, you can find further details, download links and build instructions for OmniSciDB on GitHub.

Next, let's discuss OmniSci Render.

Getting Started With Analytics in OmniSci Render Using the Vega Rendering Engine

For analytics, we need the ability to visualize data. OmniSci Render performs this task by using the Vega Rendering Engine. It works server-side and generates a range of different visualizations, such as:

  • Pointmaps
  • Heatmaps
  • Choropleths
  • Scatterplots

Rendering on the server provides a number of benefits. For example, using server-side GPU processing power allows visualizations to be generated very quickly. Furthermore, once visualizations have been created, only small graphics files need to be sent to OmniSci Immerse. Sending small files also reduces network traffic, which can be an issue for large, complex charts.

Previously, we discussed OmniSci's Geospatial data support. When performing Geospatial queries, we may be working with many millions of different Geospatial data points. The ability to render these shapes quickly and change resolution to zoom-in interactively provides real business benefits. This is again made possible by OmniSci's ability to use server-side processing and intelligent network data transfer.

Out-of-the-box, OmniSci supports an implementation of the Vega Rendering Engine which generates visualizations that OmniSci Immerse can display. However, the API can be customized, providing the same benefits of server-side processing with a low-resource browser-based frontend.

Finally, let's discuss OmniSci Immerse.

Taking OmniSci Immerse for a Test Drive

We can use OmniSci Immerse to view and interact with the visualizations generated by OmniSci Render. We can create multiple dashboards and many different chart types. Let's look at some examples.

In Figure 2, we are using OmniSci Immerse with the OmniSci Cloud environment and we have a number of saved dashboards, as shown.

Figure 2. Saved Dashboards.

Figure 2. Saved Dashboards.

Let's explore the Flights Demo dashboard. In Figure 3, we can see a wide-range of chart-types that have been used to construct this particular dashboard.

Figure 3. Flights Demo.

Figure 3. Flights Demo.

If we select the option to Add Chart, shown on the top right-hand side, we can see many of the built-in chart types that we can use, as shown in Figure 4.

Figure 4. Different Chart Types.

Figure 4. Different Chart Types.

These built-in interactive charts can also be customized. We can also perform interactive filtering and all the appropriate charts will be updated. For example, if we filter by Month and Day-of-Week, as shown in Figure 5, we can see that the other charts have been updated appropriately.

Figure 5. Filter by Month and Day-of-Week.

Figure 5. Filter by Month and Day-of-Week.

Let's now look at another dashboard, the NYC Tree Census 2015, as shown in Figure 6.

Figure 6. NYC Tree Census 2015.

Figure 6. NYC Tree Census 2015.

We can see that there is a lot of information here. Let's filter by 6th Avenue and Poor tree health, as shown in Figure 7.

Figure 7. 6th Avenue and Poor Tree Health.


Figure 7. 6th Avenue and Poor Tree Health.

We can see that the records by tree species and tree diameter have been updated.

When working with streamed data, charts can auto-refresh, providing quick insights into the data and helping identify particular trends, for example.

We have barely scratched the surface with these examples and we'll cover more examples in greater detail in future articles.

Next Steps

There are several ways to get started with OmniSci:

The online documentation and tutorials are great places to start your journey with OmniSci. Community support is also available online.

In the next article, we'll look at how to install an open source version of OmniSciDB.

Summary

In many industries today, there are considerable competitive pressures, opportunities and threats. We may also be working with very large quantities of data, either at rest or streamed. Therefore, organizations need the ability to process data at scale and with high performance to deliver timely business insights.

OmniSci is a standards-based solution that provides users, such as Business Decision Makers, Data Scientists and Developers, with the tools to perform analytics and create visualizations to meet today's challenges.

In this article, we have seen a quick overview of the main components of the OmniSci Platform. OmniSci can also work with a wide-range of open source and commercial technologies for data ingestion and for further processing and analysis, such as Machine Learning frameworks. In future articles in this beginners series, we'll explore some of these capabilities and open source third-party integrations in greater detail. Until next time!

Acknowledgements

My sincere thanks to my colleagues at OmniSci for reviewing earlier drafts of this article and for helping me understand the OmniSci Platform.

Topics:
omniscidb ,analytics ,big data ,bi ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}