DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Scala vs. Java for Big Data Engineering

Scala vs. Java for Big Data Engineering

Which language is better for your big data needs?

Tim Spann user avatar by
Tim Spann
CORE ·
May. 20, 16 · Big Data Zone · Opinion
Like (8)
Save
Tweet
28.44K Views

Join the DZone community and get the full member experience.

Join For Free

Hadoop is mostly written in Java.   Spark is mostly written in Scala.

Apache Spark programming's default language is Scala and it can't be argued that Scala is the easiest and cleaniest language to implement Spark programs.   The Spark Shell is Scala REPL and is awesome because of Scala.   Take a look at my learning tutorial on Spark actions and transformations.


valreducedByRDD= kvRDD.reduceByKey((a,b)=>a.concat(b))


After 15 years of coding in Java, this line of code disgusts me:

PairFunction<Tuple2<Integer, Optional<String>>, Integer, String> KEY_VALUE_PAIRER =
    new PairFunction<Tuple2<Integer, Optional<String>>, Integer, String>() 


Java 8 and Spring Boot are working on reducing boilerplate and extraneous code, but most of the

Java you face is not going to be ultra clean.   AOL-CyclopsReact is helping and Javaslang is neat.  Java still supports a lot of code soup.

Spring XD is Java with Spring Boot and has a decent command line interface.


On the Pro Java Side...

Apache NiFi from the NSA!!!!

Apache Beam (Google Data Flow) vs Scala DSL for Apache Beam

Google and AOL are using Java 8.

Apache Flink is implemented in Java, but supports Java and Scala.

Most of the Hadoop ecosystem is writtein in Java.

Apache Crunch is Java.

CDAP is Java.


On the Pro Scala Side...

Apache Spark is written in Scala and the Shell is Scala.

Lightbend is doing fast data and data anlytics in Scala. AKKA is very powerful and written in Scala. See this cool Scala/Spark class for more details.

Looking at the number of projects, number of developers, and the number of companies, you can see in the numbers game Java is way ahead of Scala. 

Scala is ahead in clean code and concise code. If you look at this presentation, you can see the power of Scala and Spark.

Both languages, and especially the JVM, will be part of Big Data for years to come. Can you pick a wrong language? No, if you are a Java programmer make sure you use Java 8 with all the latest tools to improve its abilities. Make sure you use these guides:  

  • Java Optimizations

  • Java in Containers

  • Java for IoT

  • Core Java

  • Java 8 Best Practices


Java (programming language) Scala (programming language) Big data Spring Framework Engineering

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How BDD Works Well With EDA
  • What Are Cookies in Servlets?
  • Comprehensive Guide to Jenkins Declarative Pipeline [With Examples]
  • Why to Implement GitOps into Your Kubernetes CI/CD Pipelines

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo