Analytics System Provides Fast Access to Data Science
Analytics System Provides Fast Access to Data Science
The new IBM Integrated Analytics System features IBM Data Science Experience and embedded Apache Spark to give users high-performance data science across the cloud.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Thanks to Rob Thomas, General Manager, IBM Analytics for sharing IBM's strategy for helping clients get to the public cloud with three different platforms:
Hybrid data management
Data science platform
The Integrated Analytics System is a new unified data system designed to give users fast, easy access to advanced data science capabilities and the ability to work with their data across private, public, or hybrid cloud environments.
The system, which comes with a variety of data science tools and encryption built-in, allows data scientists to get up and running quickly to develop and deploy their advanced analytics models in place — directly where the data resides for greater performance. Because it is based on the IBM common SQL engine, clients can use the system to move workloads to the public cloud to begin automating their businesses with machine learning. The database engine is used across both hosted and cloud-based databases, users can move and query data across multiple data stores, such as the Db2 Warehouse on Cloud or Hortonworks Data Platform.
At the heart of the Integrated Analytics System are the IBM Data Science Experience, Apache Spark, and the Db2 Warehouse — all of which have been optimized to work together with straightforward management. The Data Science Experience provides a set of critical data science tools and a collaborative workspace through which data scientists can create new analytic models that developers can use to build intelligent applications quickly and easily. The inclusion of Apache Spark, the popular open-source framework, enables in-memory data processing, which speeds analytic applications by allowing analytics to be processed directly where the data resides.
New to this class of offerings are the machine learning capabilities that come with both the Data Science Experience and Spark embedded on the system. Having machine learning processing embedded means that data does not need to be moved to the analytics processing, reducing processes and wait times for analytics to run and response. This significantly simplifies the process of training and evaluating predictive models, as well as the testing, deployment, and training as it is all done in-place.
“The combination of high performance and advanced analytics — from the Data Science Experience to the open Spark platform — gives our business analysts the ability to conduct intense data investigations with ease and speed,” said Vitaly Tsivin, Executive Vice President, at AMC Networks, who has been testing the system for several months. “The Integrated Analytics System is positioned as an integral component of an enterprise data architecture solution, connecting IBM Netezza Data Warehouse and IBM PureData System for Analytics, cloud-based Db2 Warehouse on Cloud clusters, and other data sources.”
“This is a continuation of our aggressive strategy to make data science and machine learning more accessible than ever before and to help organizations like AMC, begin harvesting their massive data volumes across infrastructures for insight and intelligence.”
Seamless Expansion to the Cloud
The integrated architecture of the new system combines software enhancements such as asymmetric massively parallel processing (AMPP) with IBM Power® technology and flash memory storage hardware and builds on the IBM PureData System for Analytics, and the previous IBM Netezza data warehouse offerings. It also supports a wide range of data types and data platforms, including everything from the IBM Db2 Warehouse On Cloud to Hadoop and IBM BigSQL. Like these solutions, the Integrated Analytics System is built with the IBM common SQL engine, enabling users to seamlessly integrate the unit with cloud-based warehouse solutions.
In addition, industry-standard tools and the common SQL engine provide users with an option to also move these workloads seamlessly to public or private cloud environments with Spark clusters, based on the user’s requirements.
The Integrated Analytics System provides built-in data virtualization and high levels of language compatibility for SQL, stored procedures, and UDX for Netezza®, Oracle®, Db2®, and PostgreSQL. The system also includes the IBM PureData for Analytics SQL Extension Toolkit and compatibility for the most popular IBM PureData for Analytics.
Among these capabilities, the new system also incorporates hybrid transactional analytical processing (HTAP). In contrast to typical business environments where transaction processing and analytics are run on distinct architectures, HTAP runs predictive analytics, transactional, and historical data on the same database without any performance degradation. And because the system supports HTAP with IBM Db2 Analytics Accelerator for z/OS, the new unit seamlessly integrates with IBM z Systems infrastructures.
Rob sees the future of big data and data science as greater automation of non-value-added manual tasks freeing up the data steward to provide more value by learning Python and R to develop higher-value models.
Opinions expressed by DZone contributors are their own.