Scala vs. Java for Big Data Engineering
Which language is better for your big data needs?
Join the DZone community and get the full member experience.Join For Free
Hadoop is mostly written in Java. Spark is mostly written in Scala.
Apache Spark programming's default language is Scala and it can't be argued that Scala is the easiest and cleaniest language to implement Spark programs. The Spark Shell is Scala REPL and is awesome because of Scala. Take a look at my learning tutorial on Spark actions and transformations.
After 15 years of coding in Java, this line of code disgusts me:
PairFunction<Tuple2<Integer, Optional<String>>, Integer, String> KEY_VALUE_PAIRER = new PairFunction<Tuple2<Integer, Optional<String>>, Integer, String>()
Java 8 and Spring Boot are working on reducing boilerplate and extraneous code, but most of the
Spring XD is Java with Spring Boot and has a decent command line interface.
On the Pro Java Side...
Apache NiFi from the NSA!!!!
Google and AOL are using Java 8.
Apache Flink is implemented in Java, but supports Java and Scala.
Most of the Hadoop ecosystem is writtein in Java.
Apache Crunch is Java.
CDAP is Java.
On the Pro Scala Side...
Apache Spark is written in Scala and the Shell is Scala.
Looking at the number of projects, number of developers, and the number of companies, you can see in the numbers game Java is way ahead of Scala.
Both languages, and especially the JVM, will be part of Big Data for years to come. Can you pick a wrong language? No, if you are a Java programmer make sure you use Java 8 with all the latest tools to improve its abilities. Make sure you use these guides:
Opinions expressed by DZone contributors are their own.