Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Converting Spark RDDs to DataFrames

DZone 's Guide to

Converting Spark RDDs to DataFrames

In this quick post, we look at how to use Scala and Apache Spark to convert our data from RDD to DataFrames. Read on to get started!

· Big Data Zone ·
Free Resource

Sometimes you need transform you RRDs to DataFrames because DataFrames have a lot optimization options.

Let's see how this is done.

First, we need to create a SparkSession.

    val session = SparkSession.builder().appName("RddToDataframe").master("local[*]").getOrCreate()

Then create a list of people: 

case class Person(name:String,age:Int)
val persons = Seq(Person("Luis",10),Person("Marta",20),Person("Enrique",12))

But this is not distributed, so let's convert this:

val rdd =  session.sparkContext.parallelize(persons)

Finally, we convert this RDD to a dataframe:

import session.sqlContext.implicits._
rdd.toDF().show()

The complete code is below:

package com.example

import org.apache.spark.sql.SparkSession
case class Person(name:String,age:Int)
object app {  

  def main(args: Array[String]) {
    val session = SparkSession.builder().appName("RddToDataframe").master("local[*]").getOrCreate()
    val persons = Seq(Person("Luis",10),Person("Marta",20),Person("Enrique",12))
    val rdd =  session.sparkContext.parallelize(persons)
    import session.sqlContext.implicits._
    rdd.toDF().show()
  }

}
Topics:
big data ,scala tutorial ,dataframes ,rdd ,apache spark tutorial scala

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}