Loading JSON Data in Couchbase
Editor's Note: This post was originally written by Don Pinto at the Couchbase blog.
If you're writing a web application, you're probably already familiar with JSON documents. Couchbase supports JSON documents and sooner or later you will need to import some JSON documents into Couchbase Server
But just because you inserted data into Couchbase doesn’t mean that it goes directly to disk. Your data will first be inserted into the in-memory object managed cache and later in the background written to the disk asynchronously - decoupled completely from your action.
But what tools does a developer have to get a bunch of JSON data into Couchbase? This blog describes the cbdocloader tool in more detail. It saved me a ton of time by allowing me to import an entire Vancouver tree dataset that I was playing with.
Following are the different command line parameters for the cbdocloader tool :
/opt/couchbase/bin/tools/cbdocloader -u Administrator -p password -n 10.3.2.54:8091 -b bucket_zip -s 10 output
-s denotes the RAM quota in MB. This is an optional parameter (100 MB by default)
-n is the node ip address
-b the bucket name (If the bucket does not exist, an error will be thrown)
The Vancouver Tree Dataset
The City of Vancouver added a new dataset of street trees to the city’s open data catalog. This dataset includes a full address listing of all boulevard trees on the streets of Vancouver, along with the tree type and other characteristics.
Each JSON file in the dataset contains information for all the trees in a particular area. Using a simple python script, we split each JSON into multiple files to produce one JSON file per tree. We then loaded the data into Couchbase using the cbdocloader tool.
Loading the individual JSON files into Couchbase
The source documents fed into cbdocloader can be in a particular directory or in .zip format.
cbdocloader to load JSON documents in a folder: /opt/couchbase/bin/tools/cbdocloader -u Administrator -p password -n 10.3.2.54:8091 -b bucket -s 1000 output
cbdocloader to load a zipped folder (that contains json documents): /opt/couchbase/bin/tools/cbdocloader -u Administrator -p password -n 10.3.2.54:8091 -b bucket_zip -s 1000 output.zip
Interesting Data Facts
So can you guess how many trees are in the Vancouver Tree dataset?
Click here for the answer. Clue: It is the item count in the bucket shown.
Do you know which Vancouver neighborhood has the tallest tree in the city?
Now that you have loaded the data into Couchbase, try to write a simple view to figure out the answer. We will revisit this question in our view blog series so stay tuned folks!
Thanks to Abhinav for putting the screenshots together.