Tools for Troubleshooting, Installation and Setup of Apache Spark Environments
DZone Zone Leader Tim Spann runs through a checklist for setting up Big Data applications with Apache Spark.
Join the DZone community and get the full member experience.
Join For FreeLet's run through some tools for installing, setting up, and troubleshooting a Big Data environment in Apache Spark.
First, validate that you have connectivity and no firewall issues when you are starting. Conn Check is an awesome tool for that.
If you need to setup a number of servers at once, check out Sup.
First get version 1.8 of the JDK. Apache Spark works best with Scala, Java, and Python. Get the version of Scala you may need. Scala Version 2.10 is the standard version and used for the precompiled downloads. You can use Scala 2.11, but you will need to build the package yourself. You will need Apache Maven if you want to build yourself. Install Python 2.6 for PySpark. Also download SBT for Scala.
Once everything is installed, a very cool tool to work with Apache Spark is the new Apache Zeppelin. Very cool for data exploration and data science experiments, give it a try.
An Example SBT for building a Spark Job:
name := "Postgresql Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1"
libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1204-jdbc42"
libraryDependencies += "org.mongodb" % "mongo-java-driver" % "3.1.0"
libraryDependencies += "com.stratio.datasource" % "spark-mongodb_2.10" % "0.10.0"
An example of running a Spark Scala Job:
sudo /deploy/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --packages com.stratio:spark-mongodb-core:0.8.7 --master spark://10.13.196.41:7077 --class "PGApp" --driver-class-path /deploy/postgresql-9.4-1204.jdbc42.jar target/scala-2.10/postgresql-project_2.10-1.0.jar --driver-memory 1G
Items to add to your Spark toolbox:
Security
http://mig.mozilla.org/Machine Learning
http://systemml.apache.org/
Published at DZone with permission of Tim Spann, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments