Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Testing Spark Code

DZone's Guide to

Testing Spark Code

An introduction to testing your Apache Spark code with Scala.

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

There's many ways to try to test your Spark code. Depending on if it's Java (you can do basic JUnit tests to test non-Spark pieces) or ScalaTest for your Scala code. You can also do full integration tests by running Spark locally or in a small test cluster.

Another awesome choice from Holden is using Spark-Testing-Base.

There are a few presentations and articles about doing so:

Add it to your SBT for Spark 1.6.0.

"com.holdenkarau" %% "spark-testing-base" % "1.6.0_0.3.1"

parallelExecution in Test := false

Check out the Wiki for usage details.

Use RDDComparisons to see if your RDD is as expected.

Some other Testing Resources for Apache Spark:

There are many options, I suggest trying a few and definitely using Spark Testing Base and ScalaTest at a minimum. Always deploy locally first and try with a subset of data before moving to production. Develop Test Driven and in an iterative fashion just like a program you are writing.

12 Best Practices for Modern Data Ingestion. Download White Paper.

Topics:
spark ,scala ,testing ,jvm ,big data ,apache spark ,unit testing

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}