DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Learning Spark With Scala

Learning Spark With Scala

Often, processing alone is not enough when it comes to big volumes of data. Data must be processed quickly, in real-time, continuously, and concurrently.

Mahesh Chand user avatar by
Mahesh Chand
·
Oct. 01, 17 · Big Data Zone · Tutorial
Like (5)
Save
Tweet
13.08K Views

Join the DZone community and get the full member experience.

Join For Free

The demand for stream processing is increasing a lot these days. The reason is that often, processing big volumes of data is not enough.

Data has to be processed fast so that a firm can react to changing business conditions in real-time.

Stream processing is the real-time processing of data continuously and concurrently.

For that, I have started learning Apache Spark, as it processes data in batch mode as well as in real-time.

Apache Spark is an open-source, general-purpose, lightning fast cluster computing system. It provides a high-level API that works with, for example, Java, Scala, Python and R. Apache Spark is a tool for running Spark applications. Spark is 100 times faster than doing big data on Hadoop and ten times faster than accessing data from disk.

Spark also provides interactive processing, graph processing, in-memory processing, and batch processing of data with very fast speed, ease of use, and a standard interface.

Spark is not only a big data processing engine. It is a framework that provides a distributed environment to process data. This means we can perform any type of task using Spark.

To see its performance, let's take a example of factorial.

Calculating the factorial for a very large number is always cumbersome in any programming language. CPU will take much time to complete the calculation.

I have written factorial function using two ways:

Using tail recursion in Scala:

def factorial(num: BigInt): BigInt = {
def factImp(num: BigInt, fact: BigInt): BigInt = {
if (num == 0) fact
else
factImp(num - 1, num * fact)
}
factImp(num, 1)
}

The time taken by above code to find the Factorial of 200000 on my machine (Quad Core Intel i5) was about 20s.

Factorial function using Spark:

def factorialUsingSpark(num: BigInt): BigInt = {
if (num == 0) BigInt(1)
else {
val list = (BigInt(1) to num).toList
sc.parallelize(list).reduce(_ * _)
}
}

The time taken by Spark to find the factorial of 200000 on the same machine was only 5s, which is almost 4x faster than using Scala alone.

Computation do depends on hardware of system but atleast it gives us an idea how spark efficiently processes complex computations.

So, this was my first step to learn Spark with Scala. I know that it is not much; I still need to explore more in Spark like RDD, DataFrames, structured streaming, etc., about which I will be writing in my future posts. So, stay tuned!

The complete code can be downloaded from GitHub. Comments and suggestions are welcome.

Scala (programming language) Big data Stream processing

Published at DZone with permission of Mahesh Chand, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Building a Login Screen With React and Bootstrap
  • Cloud-Based Integrations vs. On-Premise Models
  • Flutter vs React Native. How to Cover All Mobile Platforms in 2022 With No Hassle
  • Suspicious Sortings in Unity, ASP.NET Core, and More

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo