Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Apache Spark v1.0 Solidifies Its Place Among Big Data Tools

DZone's Guide to

Apache Spark v1.0 Solidifies Its Place Among Big Data Tools

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

The Apache Foundation announced today the release of Apache Spark v1.0, an open source large-scale data processing and advanced analytics engine. Spark allows developers to write applications in Java, Scala or Python.

The release of version 1.0 signifies a step forward towards greater stability and community involvement.

According to the press release, Spark offers flexibility in large-scale data processing that has earned it the nickname "Hadoop Swiss Army Knife." Chief among these is the speed at which it is able to process data, which outstrips Hadoop's MapReduce by 10 to 100 times. Additionally,

Apache Spark is well-suited for machine learning,  interactive queries, and stream processing. It is 100% compatible with Hadoop's Distributed File System (HDFS), HBase, Cassandra, as well as any Hadoop storage system, making existing data immediately usable in Spark. In addition, Spark supports SQL queries, streaming data, and complex analytics such as machine learning and graph algorithms out-of-the-box.

New in v1.0, Apache Spark offers strong API stability guarantees (backward-compatibility throughout the 1.X series), a new Spark SQL component for accessing structured data, as well as richer integration with other Apache projects (Hadoop YARN, Hive, and Mesos).

In a blog post at Cloudera, Sean Owens writes "Spark has a number of features that make it a compelling crossover platform for investigative as well as operational analytics." It will be interesting to see how data scientists and other users integrate Spark into their workflow.

For more information, check out the Spark website  and The Apache Foundation's official release.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}