DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Databases
  4. Import Your MongoDB Collection Data Into Couchbase Server With Golang

Import Your MongoDB Collection Data Into Couchbase Server With Golang

There are plenty of ways to move MongoDB collection data into Couchbase, and Go offers a simple set of tools that will do a lot of the legwork for you.

Nic Raboy user avatar by
Nic Raboy
·
Apr. 10, 17 · Tutorial
Like (0)
Save
Tweet
Share
4.06K Views

Join the DZone community and get the full member experience.

Join For Free

If you’ve been keeping up, you’ll remember I wrote a few tutorials around converting your MongoDB powered Node.js applications to Couchbase. These included a MongoDB Query Language to N1QL tutorial as well as a Mongoose to Ottoman tutorial. These were great migration tutorials from an application perspective, but they didn’t really tell you how to get your already existing MongoDB Collection data into Couchbase.

We’re going to explore how to import MongoDB collection data into Couchbase with Golang. The development language doesn’t really matter, but Golang is very fast and very powerful making it a perfect candidate for the job.

Before we worry about writing a data migration script, let’s figure out a sample dataset that we’re working with. The goal here is to be universal in our script, but it does help to have an example.

The MongoDB Collection Model

Let’s assume we have a Collection called courses that holds information about courses offered by a school. The document model for any one of these documents might look something like the following:

{
    "name": "Basket Weaving 101",
    "students": [
        "nraboy",
        "mgroves",
        "hgreeley"
    ]
}


Each document would represent a single course with a list of enrolled students. Each document has an id value and the enrolled students reference documents from another collection with matching id values.

With MongoDB installed you have access to its mongoexport utility. This will allow us to export the documents that exist in any Collection to a JSON file.

For example, we could run the following command against our MongoDB database:

mongoexport --db example --collection courses --out courses.json


The database in question would be example and we’re exporting the courses collection to a file called courses.json. If we try to open this JSON file, we’d see data that looks similar to the following:

{"_id":{"$oid":"course-1"},"name":"Basket Weaving 101","students":[{"$oid":"nraboy"},{"$oid":"mgroves"}]}
{"_id":{"$oid":"course-2"},"name":"TV Watching 101","students":[{"$oid":"jmichaels"},{"$oid":"tgreenstein"}]}


Each document will be a new line in the file, however it won’t be exactly how our schema was modeled. MongoDB will take all document references and wrap them in an $oidproperty which represents an object id.

So where does this leave us?

Planning the Couchbase Bucket Model

As you’re probably already aware, Couchbase does not use Collections, but instead Buckets. However, Buckets do not function the same as Collections. Instead of having one Bucket per every one document type like MongoDB does, you’ll have one Bucket for every application.

This means we’ll need to make some changes to MongoDB export so it makes any kind of sense inside of Couchbase.

In Couchbase it is normal to have a document property in every document that represents the type of document it is. Lucky for us we know the name of the former Collection and can work some magic. As an end result, our Couchbase documents should look something like this:

{
    "_id": "course-1",
    "_type": "courses",
    "name": "Basket Weaving 101",
    "students": [
        "nraboy",
        "mgroves",
        "hgreeley"
    ]
}


In the above example we have compressed all the $oid values and added the _idand _type properties.

Developing the Golang Collection Import Script

Now that we know where we’re headed, we can focus on the script that will do the manipulations and loading. However, let’s think about our Golang logic on how to accomplish the job.

We know we’re going to be reading line by line from a JSON file. For every line read we need to manipulate it, then save it. Reading from a file and inserting into Couchbase are both blocking operations. While reading is quite fast, inserting a single document at a time in a blocking fashion for terabytes of data can be quite slow. This means we should start goroutines to do things in parallel.

Create a new project somewhere in your $GOPATH and create a file called main.go with the following code:

package main

import (
    "bufio"
    "encoding/json"
    "flag"
    "fmt"
    "os"
    "sync"

    "github.com/couchbase/gocb"
)

var waitGroup sync.WaitGroup
var data chan string
var bucket * gocb.Bucket

func main() {}

func worker(collection string) {}

func cbimport(document string, collection string) {}

func compressObjectIds(mapDocument map[string] interface {}) string {}


The above code is merely a blueprint to what we’re going to accomplish. The main function will be responsible for starting several goroutines and reading our JSON file. We don’t want the application to end when the main function ends so we use a WaitGroup. This will prevent the application from ending until all goroutines have ended.

The worker function will be each goroutine and it will call cbimport which will call compressObjectIds to swap out any $oid with the compressed equivalent. By compressed I mean won’t include a wrapping $oid property.

So let’s look at that main function:

func main() {
    fmt.Println("Starting the import process...")

    flagInputFile: = flag.String("input-file", "", "file with path which contains documents")
    flagWorkerCount: = flag.Int("workers", 20, "concurrent workers for importing data")
    flagCollectionName: = flag.String("collection", "", "mongodb collection name")
    flagCouchbaseHost: = flag.String("couchbase-host", "", "couchbase cluster host")
    flagCouchbaseBucket: = flag.String("couchbase-bucket", "", "couchbase bucket name")
    flagCouchbaseBucketPassword: = flag.String("couchbase-bucket-password", "", "couchbase bucket password")
    flag.Parse()

    cluster, _: = gocb.Connect("couchbase://" + * flagCouchbaseHost)
    bucket, _ = cluster.OpenBucket( * flagCouchbaseBucket, * flagCouchbaseBucketPassword)

    file, _: = os.Open( * flagInputFile)
    defer file.Close()

    data = make(chan string)

    scanner: = bufio.NewScanner(file)
    scanner.Split(bufio.ScanLines)

    for i: = 0;
    i < * flagWorkerCount;
    i++{
        waitGroup.Add(1)
        go worker( * flagCollectionName)
    }

    for scanner.Scan() {
        data < -scanner.Text()
    }

    close(data)

    waitGroup.Wait()

    fmt.Println("The import has completed!")
}


The above function will take a set of command line flags that will be used in the configuration of the application. The connection to the destination Couchbase Server and Bucket will be established and the input file will be opened.

Because we’re using goroutines, we need to use channel variables to avoid locking scenarios. All lines read will be queued up in the channel where each goroutine will read from.

After spinning up the goroutines, the file will be read and the channel will be populated. After the file was completely read, the channel will close. This means that when the goroutines read all the data, the goroutines will be able to end. We’ll be waiting until the goroutines end before ending the application.

Now let’s take a look at the worker function:

func worker(collection string) {
    defer waitGroup.Done()
    for {
        document, ok: = < -data
        if !ok {
            break
        }
        cbimport(document, collection)
    }
}


The MongoDB Collection name will be passed to each worker and the worker will remain functional in a loop until the channel closes.

For every document read from the channel, the cbimport function will be called:

func cbimport(document string, collection string) {
    var mapDocument map[string] interface {}
    json.Unmarshal([] byte(document), & mapDocument)
    mapDocument["_type"] = collection
    compressObjectIds(mapDocument)
    bucket.Insert(mapDocument["_id"].(string), mapDocument, 0)
}


Each line of the file will be a string that we need to unmarshal into a map of interfaces. We know the Collection name, so we can create a property that will hold that particular name. Then we can pass the entire map into the compressObjectIds function to get rid of any $oid wrappers.

The compressObjectIds function looks like the following:

func compressObjectIds(mapDocument map[string] interface {}) string {
    var objectIdValue string
    for key, value: = range mapDocument {
        switch value.(type) {
            case string:
                if key == "$oid" && len(mapDocument) == 1 {
                    return value.(string)
                }
            case map[string] interface {}:
                objectIdValue = compressObjectIds(value.(map[string] interface {}))
                if objectIdValue != "" {
                    mapDocument[key] = objectIdValue
                }
            case [] interface {}:
                for index, element: = range value.([] interface {}) {
                    objectIdValue = compressObjectIds(element.(map[string] interface {}))
                    if objectIdValue != "" {
                        value.([] interface {})[index] = objectIdValue
                    }
                }
        }
    }
    return ""
}


In the above code block, we are essentially looping through every key in the document. If the value is a nested object or array, we recursively do the same thing until we hit a string with a key of $oid. If this condition is met we make sure it is the only key in that level of the document. This will let us know that it is an id that we can safely compress.

Not so bad right?

Running the MongoDB to Couchbase Importer

Assuming you have the Go programming language installed and configured, we need to build this application.

From the command line, you’ll need to get all the dependencies. With the project as your current working directory, execute the following:

go get -d -v


The above command will get any dependencies found in our Go files.

Now the application can be built and ran, or just ran. The steps aren’t really any different, but we’re just going to run the code.

From the command line, execute the following:

./collectionimport \
    --input-file FILE_NAME.json \
    --collection COLLECTION_NAME \
    --couchbase-host localhost \
    --couchbase-bucket default \
    --workers 20


The above command will allow us to pass any flags into the application such as Couchbase Server information, number of worker goroutines, etc.

If successful, the MongoDB export should now be present in your Couchbase NoSQL database.

Conclusion

You just saw one of many possible methods towards getting your MongoDB data into Couchbase. Sure the code we saw can be optimized, but from a simplicity standpoint, I’m sure you can see what we were trying to accomplish.

Want to download this importer project and try it out for yourself? I’ve gone and uploaded it to GitHub, with further instructions for running. Keep in mind that it is unofficial and hasn’t been tested for massive amounts of data. Treat it as an example for learning how to meet your data migration needs.

MongoDB Data (computing) Couchbase Server Golang Document application Database

Published at DZone with permission of Nic Raboy, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • OWASP Kubernetes Top 10
  • Secure APIs: Best Practices and Measures
  • Testing Repository Adapters With Hexagonal Architecture
  • Java Code Review Solution

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: