Testing Spark Code
An introduction to testing your Apache Spark code with Scala.
Join the DZone community and get the full member experience.Join For Free
There's many ways to try to test your Spark code. Depending on if it's Java (you can do basic JUnit tests to test non-Spark pieces) or ScalaTest for your Scala code. You can also do full integration tests by running Spark locally or in a small test cluster.
Another awesome choice from Holden is using Spark-Testing-Base.
There are a few presentations and articles about doing so:
Add it to your SBT for Spark 1.6.0.
"com.holdenkarau" %% "spark-testing-base" % "1.6.0_0.3.1" parallelExecution in Test := false
Check out the Wiki for usage details.
Use RDDComparisons to see if your RDD is as expected.
Some other Testing Resources for Apache Spark:
There are many options, I suggest trying a few and definitely using Spark Testing Base and ScalaTest at a minimum. Always deploy locally first and try with a subset of data before moving to production. Develop Test Driven and in an iterative fashion just like a program you are writing.
Opinions expressed by DZone contributors are their own.