Apache Spark Updates
One of the most powerful new Hortonworks 2.5 Sandbox features is the ability to run two versions of Spark alongside in the same environment: a Generally Available (GA) Spark 1.6.2 and a Tech Preview (TP) of Spark 2.0. If you would like to learn how to effortlessly run different versions of Spark, check out A Lap Around Apache Spark tutorial.
NOTE: Zeppelin does not yet support Spark 2.0. This functionality will be coming soon.
Also, a new HBase connector has been added that allows you to ingest HBase datasets straight into a Spark DataFrame. To learn more, see the Spark on HBase tutorial.
Apache Zeppelin Updates
With HDP 2.5, Zeppelin notebook security and multi-user support were added. By enabling a Livy REST server and a LDAP/AD for user authentication, you may now specify user access to different notebooks, depending on their role and needs. Livy also adds a more efficient cluster utilization with the ability to recycle inactive interpreters after 60 minutes.
Given Zeppelin’s General Availability, Enterprise readiness, flexibility (30+ interpreters), ease of use, and a rich development community, it’s a great time to start exploring how you can leverage Zeppelin notebooks to accelerate data wrangling, analytics, and data science in your business. If you would like to give Zeppelin a try, check out the Learning Spark with Zeppelin tutorial.
A Monte Carlo Simulation with Spark and Zeppelin
What’s New in Spark 2.0
With Spark 2.0 TP now available, there are several updates that you should be aware of.
- DataFrame is now an alias for a Dataset of Row type or Dataset[Row] in Scala.
- SparkSession replaces SparkContext, SQLContext, and HiveContext. In other words, Spark is the new entry point to all Spark features.
- You can manipulate stream data via DataFrames and Datasets.
- Real-time incremental processing. Conceptually, it’s useful to think of an infinite DataFrame.
- Speedup from Tungsten Phase 2 multi-stage code generation.
- ORC and Parquet file format improvements.
Stay tuned for more blogs, with more details, on each of these topics.
Get Started in 4 Steps
- Download HDP Sandbox as a VM image (VMware and Virtualbox or Docker).
- Setup and Start the VM image.
- Try a Sandbox tutorial, check out the list of free tutorials, or jump directly into a Learning Spark with Zeppelin hands-on tutorial.
- Need more help? Visit the Hortonworks Community Connection (HCC) and interact directly with the community and our development team.
If you want a little more of a guided introduction view the following Hadoop Summit Crash Courses:
Try Hortonworks Cloud
Don’t have the minimum 8 GB of RAM to allocate to the virtual machine? Looking to try the latest in Hive and Spark in AWS. Try the Hortonworks Cloud Technical Preview, which supports ephemeral workloads for Hive and Spark.