Over a million developers have joined DZone.

Scala vs. Java for Big Data Engineering

DZone's Guide to

Scala vs. Java for Big Data Engineering

Which language is better for your big data needs?

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Hadoop is mostly written in Java.   Spark is mostly written in Scala.

Apache Spark programming's default language is Scala and it can't be argued that Scala is the easiest and cleaniest language to implement Spark programs.   The Spark Shell is Scala REPL and is awesome because of Scala.   Take a look at my learning tutorial on Spark actions and transformations.

valreducedByRDD= kvRDD.reduceByKey((a,b)=>a.concat(b))

After 15 years of coding in Java, this line of code disgusts me:

PairFunction<Tuple2<Integer, Optional<String>>, Integer, String> KEY_VALUE_PAIRER =
    new PairFunction<Tuple2<Integer, Optional<String>>, Integer, String>() 

Java 8 and Spring Boot are working on reducing boilerplate and extraneous code, but most of the

Java you face is not going to be ultra clean.   AOL-CyclopsReact is helping and Javaslang is neat.  Java still supports a lot of code soup.

Spring XD is Java with Spring Boot and has a decent command line interface.

On the Pro Java Side...

Apache NiFi from the NSA!!!!

Apache Beam (Google Data Flow) vs Scala DSL for Apache Beam

Google and AOL are using Java 8.

Apache Flink is implemented in Java, but supports Java and Scala.

Most of the Hadoop ecosystem is writtein in Java.

Apache Crunch is Java.

CDAP is Java.

On the Pro Scala Side...

Apache Spark is written in Scala and the Shell is Scala.

Lightbend is doing fast data and data anlytics in Scala. AKKA is very powerful and written in Scala. See this cool Scala/Spark class for more details.

Looking at the number of projects, number of developers, and the number of companies, you can see in the numbers game Java is way ahead of Scala. 

Scala is ahead in clean code and concise code. If you look at this presentation, you can see the power of Scala and Spark.

Both languages, and especially the JVM, will be part of Big Data for years to come. Can you pick a wrong language? No, if you are a Java programmer make sure you use Java 8 with all the latest tools to improve its abilities. Make sure you use these guides:  

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

scala ,spark ,java ,jvm

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}