Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Streaming With Apache Spark Custom Receiver

DZone's Guide to

Streaming With Apache Spark Custom Receiver

This article covers writing a custom Apache Spark 2.0 Streaming Receiver in Scala with detailed code for general Spark 2.0 style streaming.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Hello inquisitor. In a previous post, we covered the predefined Stream receiver of Spark. In this blog, we are going to discuss the custom receiver of spark so that we can source the data from any. So if we want to use Custom Receiver than we should know first we are not going to use SparkSession as an entry point, if there are not any such use case .

First, you should add the following dependency to build.sbt :

“org.apache.spark” %% “spark-streaming” % “2.0.0”

Now create a class CustomReceiver which should extend Receiver class  and override onStart() and onStop().

class CustomReceiver extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) {
  override def onStart(): Unit = ???
  override def onStop(): Unit = ???
}

Here onStart() will contain the code to retrieve data from external source periodically and to store the data to stream using store().

override def onStart(): Unit = {
val externalData = retrieveExternalData();
store(externalData)
}

Now what we ned is to configure StreamingContext: 

val conf = new SparkConf()
.setAppName("wohooo")
.setMaster("local[2]")
val streamingContext = new StreamingContext(conf, Seconds.apply(2)

For the final step, we have to tell Spark StreamingContext about CustomReceiver:

val lines = streamingContext.receiverStream(new CustomReceiver)

After all these things we can do a transformation on streamed data.Transformation allows the data from the input DStream to be modified:

val words: DStream[String] = lines.flatMap(_.split(","))

Now we are concerned about the transformation of spark. Computation may be of following types:

  • map(func)
  • flatMap(func)
  • filter(func)
  • repartition(numPartitions)
  • union(otherStream)
  • count()
  • reduce(func)
  • countByValue()
  • reduceByKey(func, [numTasks])
  • join(otherStream, [numTasks])
  • cogroup(otherStream, [numTasks])
  • transform(func)
  • updateStateByKey(func)

I was very curious about how the Computation will be performed on stream. And i got that we can perform Windowed Computation as:

Spark Streaming

val windowedWords = words.reduceByWindow((a: String, b: String) => (a + b), Seconds(10), Seconds(4))

Now you please fasten your seatbelts at this time. and here we go :

streamingContext.start()
streamingContext.awaitTermination()

You can find complete code here.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
spark ,stream

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}