Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

CSV File Writer Using Scala

DZone's Guide to

CSV File Writer Using Scala

Are you looking to generate your own CSV file using Scala? We've got you covered! Learn how to do it, and do it quickly to save you time.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

The other day, I was looking for a CSV file with some records and started approaching people about it, then I wondered whether I could write my own CSV file, since borrowing it from others is pointless. This actually made me write a piece of code in Scala which generates a CSV file in the specified directory. You can generate your own CSV file with n number of fields and n number of records in it. Also, you can play around with the fields and number of records in the file as and when required.

Creating a Scala Class

Today we're going to make an SBT project.

First, you will need to add a dependency in your build.sbt project:  libraryDependencies += "au.com.bytecode" % "opencsv" % "2.4" 

Now we will write code in our class. In my case, it’s a companion object MakeCSV. You will need to import a few packages in your class.

import java.io.{BufferedWriter, FileWriter}

import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
import scala.util.Random

import au.com.bytecode.opencsv.CSVWriter

Now We Will Start Writing Code In Our Class

  1. val outputFile = new BufferedWriter(new FileWriter("PATH_TO_STORE_FILE/output.csv")): This will create an output file which is an output.csv file in the said directory

  2. val csvWriter = new CSVWriter(outputFile) : this will create a csvwriter object which will have the outputFile in it.

  3. val csvSchema = Array("id", "name", "age", "city") : this is the schema for your CSV file, in my case I have four fields. You can include the schema if you want. It’s totally optional.

  4. val nameList = List("Deepak", "Sangeeta", "Geetika", "Anubhav", "Sahil", "Akshay"): This is the list for the name field.

  5. val ageList = (24 to 26).toList: This is the list for the age
    field.

  6. val cityList = List("Delhi", "Kolkata", "Chennai", "Mumbai"): This is the list for the city field.

  7. val random = new Random(): This is the random object which I have created to take up random items from the list of fields.

  8. var listOfRecords = new ListBuffer[Array[String]](): Here is the list buffer which holds all the records.

  9. listOfRecords += csvFields: This is how we add the fields to our CSV file.

  10. for (i listOfRecords += Array(i.toString, nameList(random.nextInt(nameList.length)), ageList(random.nextInt(ageList.length)).toString, cityList(random.nextInt(cityList.length)))}: Here is the loop which adds records to the listbuffers,here I have used random object to pick up random items from the list of fields.

  11. csvWriter.writeAll(listOfRecords.toList): Here we are writing all the records to the CSV files.

  12. outFile.close(): Here we will finally close the file after writing all the records into it.

The Final Code

import java.io.{BufferedWriter, FileWriter}

import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
import scala.util.Random

import au.com.bytecode.opencsv.CSVWriter

object MakeCSV extends App {

val outputFile = new BufferedWriter(new FileWriter(“/home/deepak/Desktop/deepak19.csv”)) //replace the path with the desired path and filename with the desired filename
val csvWriter = new CSVWriter(outputFile)
val csvFields = Array(“id”, “name”, “age”, “city”)
val nameList = List(“Deepak”, “Sangeeta”, “Geetika”, “Anubhav”, “Sahil”, “Akshay”)
val ageList = (24 to 26).toList
val cityList = List(“Delhi”, “Kolkata”, “Chennai”, “Mumbai”)
val random = new Random()
var listOfRecords = new ListBuffer[Array[String]]()
listOfRecords += csvFields
for (i listOfRecords += Array(i.toString, nameList(random.nextInt(nameList.length))
, ageList(random.nextInt(ageList.length)).toString, cityList(random.nextInt(cityList.length)))
}
csvWriter.writeAll(listOfRecords.toList)
outputFile.close()
}

I have tested the code to make 9 million records in a CSV file. It took 2 minutes and 22 seconds on my machine with an i5 processor and 8 GB RAM. I am gonna come up with a new blog where I will be writing the same code with Spark so that we can test the performance. I really hope the performance will increase when we use Spark.

If you have any challenges, please let me know in the comments. If you enjoyed this post, I’d be very grateful if you’d help it spread. Keep smiling, keep coding!

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
big data ,scala ,spark ,programming ,functional programming

Published at DZone with permission of Deepak Mehra, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}