Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Scala vs. Java for Big Data Engineering

DZone's Guide to

Scala vs. Java for Big Data Engineering

Which language is better for your big data needs?

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Hadoop is mostly written in Java.   Spark is mostly written in Scala.

Apache Spark programming's default language is Scala and it can't be argued that Scala is the easiest and cleaniest language to implement Spark programs.   The Spark Shell is Scala REPL and is awesome because of Scala.   Take a look at my learning tutorial on Spark actions and transformations.


valreducedByRDD= kvRDD.reduceByKey((a,b)=>a.concat(b))


After 15 years of coding in Java, this line of code disgusts me:

PairFunction<Tuple2<Integer, Optional<String>>, Integer, String> KEY_VALUE_PAIRER =
    new PairFunction<Tuple2<Integer, Optional<String>>, Integer, String>() 


Java 8 and Spring Boot are working on reducing boilerplate and extraneous code, but most of the

Java you face is not going to be ultra clean.   AOL-CyclopsReact is helping and Javaslang is neat.  Java still supports a lot of code soup.

Spring XD is Java with Spring Boot and has a decent command line interface.


On the Pro Java Side...

Apache NiFi from the NSA!!!!

Apache Beam (Google Data Flow) vs Scala DSL for Apache Beam

Google and AOL are using Java 8.

Apache Flink is implemented in Java, but supports Java and Scala.

Most of the Hadoop ecosystem is writtein in Java.

Apache Crunch is Java.

CDAP is Java.


On the Pro Scala Side...

Apache Spark is written in Scala and the Shell is Scala.

Lightbend is doing fast data and data anlytics in Scala. AKKA is very powerful and written in Scala. See this cool Scala/Spark class for more details.

Looking at the number of projects, number of developers, and the number of companies, you can see in the numbers game Java is way ahead of Scala. 

Scala is ahead in clean code and concise code. If you look at this presentation, you can see the power of Scala and Spark.

Both languages, and especially the JVM, will be part of Big Data for years to come. Can you pick a wrong language? No, if you are a Java programmer make sure you use Java 8 with all the latest tools to improve its abilities. Make sure you use these guides:  


Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
scala ,spark ,java ,jvm

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}