The Apache Foundation announced today the release of Apache Spark v1.0, an open source large-scale data processing and advanced analytics engine. Spark allows developers to write applications in Java, Scala or Python.
The release of version 1.0 signifies a step forward towards greater stability and community involvement.
According to the press release, Spark offers flexibility in large-scale data processing that has earned it the nickname "Hadoop Swiss Army Knife." Chief among these is the speed at which it is able to process data, which outstrips Hadoop's MapReduce by 10 to 100 times. Additionally,
Apache Spark is well-suited for machine learning, interactive queries, and stream processing. It is 100% compatible with Hadoop's Distributed File System (HDFS), HBase, Cassandra, as well as any Hadoop storage system, making existing data immediately usable in Spark. In addition, Spark supports SQL queries, streaming data, and complex analytics such as machine learning and graph algorithms out-of-the-box.
New in v1.0, Apache Spark offers strong API stability guarantees (backward-compatibility throughout the 1.X series), a new Spark SQL component for accessing structured data, as well as richer integration with other Apache projects (Hadoop YARN, Hive, and Mesos).
In a blog post at Cloudera, Sean Owens writes "Spark has a number of features that make it a compelling crossover platform for investigative as well as operational analytics." It will be interesting to see how data scientists and other users integrate Spark into their workflow.