Over a million developers have joined DZone.

First Impressions From Spark Summit 2016

A report on the overall trends from Spark Summit 2016 in Silicon Valley that just wrapped up.

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

There is no doubt that Spark is one of the hottest (pun intended) technologies available to data engineers today.  And the buzz on the exhibitor floor at the Hilton San Francisco for this year’s Spark Summit did not disappoint.  All the major players who supply software and hardware tools for next generation app development were present and accounted for, including some intriguing new players to the market. The companies on the badges of attendees that floated by were also heavily skewed to the early adopter market, but with some surprising traditional names making their appearance as well.  This tells me that Spark usage is quite possibly ready to cross Geoffrey Moore’s chasm, if it hasn’t already.

Map/Reduce and Hadoop are already 10 years old, and in that 10 years, the proliferation of mobile devices has become the driving force behind most data scientists’ workload.  There is exponentially more data coming in at faster and faster of speeds, while consumers are becoming more accustomed to highly personalized experiences.   Spark has become the standard for faster and more flexible analytics that drive these applications, and with the cost of RAM dropping, it makes technologies like Apache Ignite and its easy integration with Spark and easy 1-2 punch for bringing analytics from batch to true real-time.

But why?  If you think about it, Spark over HDFS creates a bottleneck at inception.  Spark with all it’s in-memory goodness is still subject to getting data off impossibly slow disk.  And once that memory resident RDD is gone, you have to go back to turn of the century spinning rust to get your answers.  Why not architect your system from the get-go to take advantage of all that RAM has to offer?  “But it is expensive and I have too much data!”.  Go check out some of the advancements from IBM and Intel. Within 2 years, you will wish you had an infrastructure architected for memory rather than disk.

Related Refcard:

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
spark ,conference ,big data

Published at DZone with permission of Rachel Pedreschi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}