Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

CLI for Indexing Data From MongoDB to Elasticsearch

DZone's Guide to

CLI for Indexing Data From MongoDB to Elasticsearch

This article is part of a series on how to index data from X into ElasticSearch using ABC. This time, we'll see how to sync a Mongo database to an ElasticSearch index.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

ElasticSearch is fantastic for indexing and filtering data. But hey, you have your data on a Mongo database in production. How do you copy all data from Mongo to Elastic? Even better, how do you keep both the data stores in sync? Is it even possible?

I am going to answer these questions in this blog post. To start off, yes it is indeed possible. We have made an awesome CLI tool called ABC that allows you to do this with a single command.

abc import --src_type=mongodb --src_uri=<uri> <elasticsearch_uri>

That's it. Seriously, this is all you need to sync a Mongo database to an ElasticSearch index. Here's a video showing the process.

The Steps

The first step is to install ABC if you have not done so already. Go to the GitHub releases page for ABC and download the most recent version. It's a single no-dependancy binary so put it anywhere you like. We recommended putting it inside a PATH directory so that it can be accessed from anywhere in the terminal.

Ensure that ABC is working by running the following command.

abc version

Image title

Now, let's take a Mongo database and sync it to an ElasticSearch index hosted on Appbase.io. We will use the free GUI MongoDB visualizer called Compass in this tutorial. Go ahead and install it.

Once it has been installed, we will first start the Mongo daemon. This will start the MongoDB server. Go ahead and run the following command in a terminal.

mongod --smallfiles --oplogSize 50 --replSet test

Image title

Then, we will login into the Mongo shell and create a database. Let's call it admin. Open a new shell and run the following command.

$ mongo
> cfg = {_id: "test", members: [{_id:0, host: "localhost:27017"}]}
> use admin
> rs.initiate(cfg)

Image title

We should now have the Users database ready. Now it's time to enter the data. We will log into the database using Compass. Enter the following settings on the Connection page and click the Connect button.

Once connected, we create a new collection called "users" and add some data to it.

The final Users collection looks as follows.

The Mongo test source is now complete. Its URL is:

mongodb://localhost:27001/admin

Next, we are going to create the sink ElasticSearch index. We go to appbase.io and create a new app called abcmongotest. The complete URL to this index looks like the following.

https://USER:PASS@scalr.api.appbase.io/abcmongotest

Now, we have both the source and the sink ready. It's time for some ABC magic. Using this source and sink, we can build the import command. It will be as follows

abc import --src_type=mongodb --src_uri="mongodb://localhost:27017/admin" "https://USER:PASS@scalr.api.appbase.io/abcmongotest"

Once you run this command, you should see that the command will finish in some time with no errors. Now, if you visit the appbaseio dashboard, you can see that the data has been transferred to the target ElasticSearch index.

Voila. Everything works. The data has been transferred to ElasticSearch and that too without doing anything at all. Next, we will see how to make ABC listen to the changes in the Mongo database.

Indexing Real-Time Data Changes From Mongo

If you are using Mongo as your production database system, there are good chances that your data is constantly changing. How to sync the Elasticsearch index with all the changes?

ABC has a nifty tail mode that allows synchronising the Mongo database in realtime to an Elasticsearch index. It uses the replication slot feature of Postgres to be able to do this.

It can be enabled by passing a --tail switch.

abc import--tail --src_type=mongodb --src_uri="mongodb://localhost:27017/admin" "https://USER:PASS@scalr.api.appbase.io/abcmongotest"

Run the above command now and you will see that the command still keeps running even after indexing. This is because it is listening for more changes. Add a new document from the Compass GUI.

Image title

You might see the "1 item indexed" message in the import log now.

2 item(s) indexed
1 item(s) indexed

This means it worked. You should see a new entry in the app dashboard when you hit reload.

Try making some more changes and all of them will be reflected on the appbaseio-based Elasticsearch cluster. 

Transforming Data Before Indexing Into Elasticsearch

There are times when you don't need the data to go as it is from source to the sink. You might like to change the target type name (i.e. users to accounts) or you might like to remove certain fields (i.e. age) or create new fields. For all this, we have the transforms feature in ABC. It can be used by the transform_file parameter.

abc import --transform_file="transform_file.js" --src_type=mongodb --src_uri="mongodb://localhost:27017/admin" "https://USER:PASS@scalr.api.appbase.io/abcmongotest"

The transform_file parameter takes the path to the transform file. That file, in turn, contains the JavaScript transforms that should be applied to the pipeline. Let's take the contents of transform_file.js as follows.

t.Source("source", source, "/.*/")
 .Transform(omit({"fields":["bio"]}))
 .Save("sink", sink, "/.*/")

In the above transform, you can see that we are going to omit the age field from the data transfer. Now, when we run the new import command, we should have the following result.

As you can see, the age field was omitted when data reached the sink. More documentation on the transform file can be found on GitHub. It supports lots of inbuilt functions like omit and even supports running custom JavaScript code as a transform. It's a good idea to explore its documentation.

Further Reading

ABC's README is a good place to start if you want to learn more about the application. You can also have a look at the MongoDB Adaptor Docs. Furthermore, you may star the repo on GitHub and watch it to stay tuned for updates.

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

Topics:
database ,tutorial ,indexing data ,mongodb ,elasticsearch ,cli

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}