In the last week I was working on a key project to migrate a BI platform from MySQL to MongoDB. The product that its development is headed by Yuval Leshem is gaining a major adaption and the company was facing a scale challenge. We chose MongoDB as the platform data infrastructure to support high data insert rate and scale data analysis.
Unlike many projects of this type, we accomplished the migration from plan to production in a week, mostly due to smart and simple plan.
I wanted to share with you some of lessons we learnt during the process:
Data Migration: Mongify
This tool provides a two steps solution to migrate your RDBMS (MySQL) to NoSQL (Mongo):
- Mapping database structure
- Export the data and import it according to the defined structure
Since it's an open source you can easily dive into the code and adjust it to your own business case. Moreover, the code is maintained by Andrew Kalek that is very cooperative.
Filter by Date Part (Day, Month, Year..)
If you are regular to using the DB date parts functions such as YEAR() and MONTH(), there are options to do it in MongoDB (see $where and aggregation framework). However, both require intensive IO. The best solution in this case is saving 3 (or more) more fields for each original field. These fields will include the relevant date part, and can be indexed for effective query:
- [original field]
- [original field]_[year part]
- [original field]_[month part]
- [original field]_[day part]
- [original field]_[hour part]
MongoDB has no defined schema, so there are no default values as well. Therefore it's up to your data layer (or ORM) to take care of it
This is relevant to to default timestamp as well
Same case as with default values. You app should take care of it.
Please notice that there is a defined mapping from values and types that you can find at Mongify code.
Auto Numbers (1..N)
Same case here, but you may have to choose one of following ways:
- Shift your way of thinking of auto increment ids and start using MongoDB auto "_id"s a solution.
- You can generate the auto increment ids using a counters database and findAndModify (in this case I will recommend you having a special purpose database and 1:1 collection mapping to gain future releases granular locking). For details see the link on top.
Mongoose as an ORM
If you use node.js consider using Mongoose as your ORM, This one will solve many of your issues by adding structure to your schema. However, please notice that you may loose some flexibility.
MongoDB is not SQL compliant, and you will have hard time with your data analysts. However, you can ease the change by using the following two methods:
- Introduce them to Query Mongo.
- Make sure your documents have no sub documents, if you don't have to. Elsewhere, transforming the data to tabulator view will require a major effort from them.
Avoid Normalizing Your Data
If you designed your data infrastructure as a non normalized structure, it will be much easier to move data to NoSQL. If your data is normalized, it is better to the app to take care of the data reconstruction.
Queries Results Limitation
MongoDB results are limited to a document size. If you need to query 200K+ records, you may need to page the data using skip and limit (or better, adding a filter based on the last limited row key value).
Migration from MySQL to MongoDB requires some effort and a shift in your state of mind, but it can be done relatively fast using careful planning according to the steps defined before.