Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Mongolastic

DZone's Guide to

Mongolastic

Mongolastic is an handy tool written in Java 8 which converts a MongoDB JSON dataset to the applicative ElasticSearch JSON Bulk file, ready for the environment.

· Database Zone
Free Resource

Learn NoSQL for free with hands-on sample code, example queries, tutorials, and more.  Brought to you in partnership with Couchbase.

{
	"misc": {
		"dindex": {
			"name": "twitter",
			"as": "kodcu"
		},
		"ctype": {
			"name": "tweets",
			"as": "posts"
		}
	},
	"mongo": {
		"host": "localhost",
		"port": 27017,
		"query": "{ 'user.name' : 'kodcu.com'}"
	},
	"elastic": {
		"host": "localhost",
		"port": 9300
	}
}


Mongolastic enables you to migrate your datasets from a mongod node to an elasticsearch node and vice versa. Since mongo and elastic servers can run with different characteristics, the tool provides several optional and required features to ably connect them. Mongolastic works with a yaml or json configuration file to begin a migration process. It reads your demand on the file and start syncing data in the specified direction.

How it works

First, you can either pull the corresponding image of the app from Docker Hub

Supported tags and respective Dockerfile links:

or download the latest mongolastic.jar file.

Second, create a yaml or json file which must contain the following structure: 


misc:
    dindex:
        name: <string>      (1)
        as: <string>        (2)
    ctype:
        name: <string>      (3)
        as: <string>        (4)
    direction: (em | me)    (5)
    batch: <number>         (6)
    dropDataset: <bool>     (7)
mongo:
    host: <ip-address>      (8)
    port: <number>          (9)
    query: "mongo-query"    (10)
    project: "projection"   (11)
    auth:                   (12)
        user: <string>
        pwd: "password"
        source: <db-name>
        mechanism: ( plain | scram-sha-1 | x509 | gssapi | cr )
elastic:
    host: <ip-address>     (13)
    port: <number>         (14)
    dateFormat: "<format>" (15)
    longToString: <bool>   (16)
    clusterName: <string>  (17)
    auth:                  (18)
        user: <string>
        pwd: "password"


  1. the database/index name to connect to.
  2. another database/index name in which documents will be located in the target service (Optional)
  3. the collection/type name to export.
  4. another collection/type name in which indexed/collected documents will reside in the target service (Optional)
  5. direction of the data transfer. the default direction is me (that is, mongo to elasticsearch). You can skip this option if your data move from mongo to es.
  6. Override the default batch size which is normally 200. (Optional)
  7. configures whether or not the target table should be dropped prior to loading data. Default value is true (Optional)
  8. the name of the host machine where the mongod is running.
  9. the port where the mongod instance is listening.
  10. data will be transferred based on a json mongodb query (Optional)
  11. with 1.4.1, you can manipulate documents that will be migrated from mongo to es based on the $projectoperator (Optional)
  12. as of v1.3.5, you can access an auth mongodb by giving auth configuration. (Optional)
  13. the name of the host machine where the elastic node is running.
  14. the transport port where the transport module will communicate with the running elastic node. E.g. 9300 for node-to-node communication.
  15. a custom formatter for Date fields rather than the default DateCodec (Optional)
  16. serialize long value as a string for backwards compatibility with other tools (Optional)
  17. connect to a spesific elastic cluster (Optional)
  18. as of v1.3.9, you can access an auth elastic search by giving auth configuration. (Optional)

Alternatively, a JSON file can be specified as a mongolastic configuration file including the same YAML file structure above.


{
	"misc": {
		"dindex": {
			"name": "twitter",
			"as": "media"
		},
		"ctype": {
			"name": "tweets",
			"as": "posts"
		},
		"direction": "me",
		"batch": 400,
		"dropDataset": true
	},
	"mongo": {
		"host": "127.0.0.1",
		"port": 27017,
		"query": "{ lang: 'en' }",
		"project": "{ user:1, name:'$user.name', location: { $substr: [ '$user.location', 10, 15 ] }}",
		"auth": {
			"user": "joe",
			"pwd": "1234",
			"source": "twitter",
			"mechanism": "scram-sha-1"
		}
	},
	"elastic": {
		"host": "127.0.0.1",
		"port": 9300,
		"dateFormat": "yyyy-MM-dd",
		"longToString": true,
		"auth": {
			"user": "joe",
			"pwd": "4321"
		}
	}
}


Example #1

The following files have the same configuration details:

misc:
    dindex:
        name: twitter
        as: kodcu
    ctype:
        name: tweets
        as: posts
mongo:
    host: localhost
    port: 27017
    query: "{ 'user.name' : 'kodcu.com'}"
elastic:
    host: localhost
    port: 9300


he config says that the transfer direction is from mongodb to elasticsearch, mongolastic first looks at the tweets collection, where the user name is kodcu.com, of the twitter database located on a mongod server running on default host interface and port number. If It finds the corresponding data, It will start copying those into an elasticsearch environment running on default host and transport number. After all, you should see a type called "posts" in an index called "kodcu" in the current elastic node. Why the index and type are different is because "dindex.as" and "ctype.as" options were set, these indicates that your data being transferred exist in posts type of the kodcu index.


After downloading the jar or pulling the image and providing a conf file, you can either run the tool as:

$ java -jar mongolastic.jar -f config.file

or

$ docker run --rm -v $(PWD)/config.file:/config.file --net host ozlerhakan/mongolastic:<tag> config.file


Note Every attempt of running the tool drops the mentioned db/index in the target environment.


Example #2

Using the project field, you are able to manipulate documents when migrating them from mongodb to elasticsearch. For more examples about the $project operator of the aggregation pipeline, take a look at its documentation.

misc:
    dindex:
        name: twitter
    ctype:
        name: tweets
mongo:
    host: 192.168.10.151
    port: 27017
    project: "{ user: 1, name: '$user.name', location: { $substr: [ '$user.location', 10, 15 ] }}" (1)
elastic:
    host: 192.168.10.152
    port: 9300
  1. the migrated documents will include the user field and contain new fields name and location.

License


Mongolastic is released under MIT.

The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

Topics:
nosql ,tool ,json ,mongodb ,dataset ,elasticsearch ,bulk ,database

Published at DZone with permission of Hakan Özler. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}