Converting Spark RDDs to DataFrames
In this quick post, we look at how to use Scala and Apache Spark to convert our data from RDD to DataFrames. Read on to get started!
Join the DZone community and get the full member experience.
Join For FreeSometimes you need transform you RRDs to DataFrames because DataFrames have a lot optimization options.
Let's see how this is done.
First, we need to create a SparkSession.
val session = SparkSession.builder().appName("RddToDataframe").master("local[*]").getOrCreate()
Then create a list of people:
case class Person(name:String,age:Int)
val persons = Seq(Person("Luis",10),Person("Marta",20),Person("Enrique",12))
But this is not distributed, so let's convert this:
val rdd = session.sparkContext.parallelize(persons)
Finally, we convert this RDD to a dataframe:
import session.sqlContext.implicits._
rdd.toDF().show()
The complete code is below:
package com.example
import org.apache.spark.sql.SparkSession
case class Person(name:String,age:Int)
object app {
def main(args: Array[String]) {
val session = SparkSession.builder().appName("RddToDataframe").master("local[*]").getOrCreate()
val persons = Seq(Person("Luis",10),Person("Marta",20),Person("Enrique",12))
val rdd = session.sparkContext.parallelize(persons)
import session.sqlContext.implicits._
rdd.toDF().show()
}
}
Convert (command)
optimization
Opinions expressed by DZone contributors are their own.
Comments