DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Data
  4. Transferring Data From Cassandra to Couchbase Using Spark

Transferring Data From Cassandra to Couchbase Using Spark

Started off with Cassandra only to realize that Couchbase suits your needs more? This Spark plugin can help you transfer your data to Couchbase quickly and easily.

Laura Czajkowski user avatar by
Laura Czajkowski
·
Jan. 31, 17 · Tutorial
Like (2)
Save
Tweet
Share
3.57K Views

Join the DZone community and get the full member experience.

Join For Free

There are many NoSQL databases in the market like Cassandra, MongoDB, Couchbase, and others, and each have pros and cons.

Types of NoSQL Databases

There are mainly four types of NoSQL databases, namely:

  1. Column-oriented
  2. Key-value store
  3. Document-oriented
  4. Graph

The databases that support more than one format are called “multi-model,” like Couchbase which supports key-value and document-oriented databases.

Sometimes we choose the wrong database for our application and realize this harsh truth at a later stage.

Then what? What should we do?

Such is the case in our experience, where we were using Cassandra as our database and later discovered it is not fulfilling all of our needs. We needed to find a new database and discovered Couchbase to be the right fit.

The main difficulty was figuring out how we should transfer our data from Cassandra to Couchbase, because no such plugin was available.

In this blog post I’ll be describing the code I wrote that transfers data from Cassandra to Couchbase using Spark.

All of the code is available here.

Explanation of the code

Here, I am reading data from Cassandra and writing it back on Couchbase. This simple code solves our problem.

The steps involved are:

Reading the configuration:

val config = ConfigFactory.load()
//Couchbase Configuration
val bucketName = config.getString("couchbase.bucketName")
val couchbaseHost = config.getString("couchbase.host")
//Cassandra Configuration
val keyspaceName = config.getString("cassandra.keyspaceName")
val tableName = config.getString("cassandra.tableName")
val idFeild = config.getString("cassandra.idFeild")
val cassandraHost = config.getString("cassandra.host")
val cassandraPort = config.getInt("cassandra.port")


Setting up the Spark configuration and the creation of the Spark session:

val conf = new SparkConf()
    .setAppName(s"CouchbaseCassandraTransferPlugin")
    .setMaster("local[*]")
    .set(s"com.couchbase.bucket.$bucketName", "")
    .set("com.couchbase.nodes", couchbaseHost)
    .set("spark.cassandra.connection.host", cassandraHost)
    .set("spark.cassandra.connection.port", cassandraPort.toString)
val spark = SparkSession.builder().config(conf).getOrCreate()
val sc = spark.sparkContext


Reading data from Cassandra:

val cassandraRDD = spark.read
    .format("org.apache.spark.sql.cassandra")
    .options(Map("table" -> tableName, "keyspace" -> keyspaceName))
    .load()


Checking the id field:

The id field is being checked to see if it exists. Then use that as id in Couchbase too or else generate a random id and assign it to the document.

import org.apache.spark.sql.functions._
val uuidUDF = udf(CouchbaseHelper.getUUID _)
val rddToBeWritten = if (cassandraRDD.columns.contains(idFeild)) {
    cassandraRDD.withColumn("META_ID", cassandraRDD(idFeild))
} else {
    cassandraRDD.withColumn("META_ID", uuidUDF())
}


In a different file:

object CouchbaseHelper {
    def getUUID: String = UUID.randomUUID().toString
}


Writing to Couchbase:

rddToBeWritten.write.couchbase()


You can run this code directly to transfer data from Cassandra to Couchbase – all you need to do is some configuration.

Configurations

All the configurations can be done by setting the environment variables.

Couchbase configuration:

Configuration Name

Default Value

Description

COUCHBASE_URL

"localhost"

The hostname for the Couchbase.

COUCHBASE_BUCKETNAME

"foobar"

The bucket name to which data needs to be transferred.


Cassandra configuration:

Configuration Name

Default Value

Description

CASSANDRA_URL

"localhost"

The hostname for the Cassandra.

CASSANDRA_PORT

9042

The port for the Cassandra.

CASSANDRA_KEYSPACENAME

"foobar"

The keyspace name for the Cassandra

CASSANDRA_TABLENAME

"testcouchbase"

The table name that needs to be transferred.

CASSANDRA_ID_FEILD_NAME

"id"

The field name that should be used as Couchbase document id, if the field name does not match any column it gives a random id to the document.

Code in Action

This is how data looks on the Cassandra side.

Cassandra1.png


As for the Couchbase side, there are two cases.

Case 1: When the id exists and the same can be used as Couchbase document id.

Couchbase_with_id.png

 

Case 2: When the id name does not exist and we need to assign a random id to documents.Couchbase_idChanged.png

How to Run the Transfer plugin

Steps to run the code:

  1. Download the code from the repository.
  2. Configure the environment variables according to the configuration.
  3. Run the project using sbt run.
Data (computing) Database

Published at DZone with permission of Laura Czajkowski, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Introduction Garbage Collection Java
  • Reliability Is Slowing You Down
  • When Should We Move to Microservices?
  • Documentation 101: How to Properly Document Your Cloud Infrastructure Project

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: