The Direction of Cloud Computing Storage and Analysis

DZone 's Guide to

The Direction of Cloud Computing Storage and Analysis

· Big Data Zone ·
Free Resource
Even though cloud storage is currently lagging behind other cloud services, Cloud analysts expect service providers to keep getting better at delivering technologies that reduce latency and overhead to bring cloud storage into the mainstream.  Cloud-based analysis of large data stores has also been a tricky thing to manage with only a few proven solutions, like Apache Hadoop.  Cloud vendors like Appistry are finding new distributed-file-system approaches to eliminate performance bottlenecks by letting analytics run wherever the data is stored.

Data analysis is important for customer behavior tracking, telecom patterns, and financial management.  Business intelligence in particular relies on highly structured data that is usually transactional.  A ton of processing overhead is required just to decide what should be analyzed.  This approach isn't helpful for exploring trends or patterns.


To unlock the patterns in large data sets, exploratory analytics must be able to bypass the bottlenecks in accessing and storing that data.  Appistry's CloudIQ platform analytics, along with their new CloudIQ Storage, will let analytics application workloads run in the location where the data is stored, providing faster and cheaper analysis.  This is different from the traditional approach where data has to be moved to the application.  Appistry calls their type of distributed-file-system "Computational Storage."  The CloudIQ model unifies applications and data by storing data across commodity servers and intelligently locating application processing on the machines containing the relevant data.

Like many Cloud vendors, Appistry also recognizes the power of the Hadoop MapReduce framework.  Hadoop is used by database vendors such as IBM, Teradata, Sybase, and Cloudera, who offers an open source enterprise Hadoop distro.  Appistry's Hadoop Edition of CloudIQ Storage has plug-and-play compatibility with the Hadoop Distributed File System (HDFS) and includes HDFS drivers for easy deployment in place of HDFS for applications that are focused on reliability and throughput.  The CloudIQ Hadoop Storage also bypasses the single point of failure in the regular HDFS because it is not built around the single metadata repository called NameNode.

File-based data is the life-blood of many enterprises today, and the lowering costs of distributed data storage solutions could eventually convert many companies to cloud storage and analytics.   Some cloud vendors will surely follow in Appistry's footsteps by bringing application workloads to the data on which they work.  This approach lets the analytics access the data at SATA bus speeds, and because processing and data access happen across many machines in parallel, each incremental machine adds to aggregate storage bandwidth, as opposed to further subdividing it.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}