Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Tools for Troubleshooting, Installation and Setup of Apache Spark Environments

DZone's Guide to

Tools for Troubleshooting, Installation and Setup of Apache Spark Environments

DZone Zone Leader Tim Spann runs through a checklist for setting up Big Data applications with Apache Spark.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Let's run through some tools for installing, setting up, and troubleshooting a Big Data environment in Apache Spark. 

First, validate that you have connectivity and no firewall issues when you are starting.  Conn Check is an awesome tool for that.

If you need to setup a number of servers at once, check out Sup.

First get version 1.8 of the JDK.  Apache Spark works best with Scala, Java, and Python.  Get the version of Scala you may need. Scala Version 2.10 is the standard version and used for the precompiled downloads. You can use Scala 2.11, but you will need to build the package yourself.   You will need Apache Maven if you want to build yourself. Install Python 2.6 for PySpark. Also download SBT for Scala.

Once everything is installed, a very cool tool to work with Apache Spark is the new Apache Zeppelin.   Very cool for data exploration and data science experiments, give it a try.

An Example SBT for building a Spark Job:

name := "Postgresql Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1"
libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1204-jdbc42"
libraryDependencies += "org.mongodb" % "mongo-java-driver" % "3.1.0"
libraryDependencies += "com.stratio.datasource" % "spark-mongodb_2.10" % "0.10.0"


An example of running a Spark Scala Job:

sudo /deploy/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --packages com.stratio:spark-mongodb-core:0.8.7  --master spark://10.13.196.41:7077 --class "PGApp" --driver-class-path /deploy/postgresql-9.4-1204.jdbc42.jar  target/scala-2.10/postgresql-project_2.10-1.0.jar  --driver-memory 1G


Items to add to your Spark toolbox:

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
apache spark ,big data ,postgresql ,mongodb

Published at DZone with permission of Tim Spann, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}