DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • SQL vs NoSQL and SQL to NoSQL Migration
  • How Trustworthy Is Big Data?
  • Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
  • Enhancing Avro With Semantic Metadata Using Logical Types

Trending

  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • My LLM Journey as a Software Engineer Exploring a New Domain
  • Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake
  • How AI Agents Are Transforming Enterprise Automation Architecture
  1. DZone
  2. Data Engineering
  3. Databases
  4. go-mysql-mongodb: Replicate Data From MySQL To MongoDB

go-mysql-mongodb: Replicate Data From MySQL To MongoDB

This tool is relatively lightweight, and only needs to deploy one service for the synchronization of a MySQL instance to MongoDB.

By 
Xiang Wang user avatar
Xiang Wang
·
Feb. 10, 21 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
9.7K Views

Join the DZone community and get the full member experience.

Join For Free

MySQL to mongoDB

This year’s Spring Festival holiday is relatively idle, so I organized my open source project go-mysql-mongodb.

This tool is used to synchronize MySQL data to MongoDB. It has been developed for a long time, but there has been no maintenance. A few days ago, I suddenly received an email from a user asking about the problems encountered during use. I realized that this tool is still used by some people. I took advantage of the holiday to maintain it, and hope that I can help more people in the future.

Origins

This project goes back to 2017. At that time, my job was mainly to investigate various big data platforms, and I needed to synchronize MySQL data to databases such as Elasticsearch and MongoDB.

tungsten-replicator

I searched for some solutions on Google. At first, I used tungsten-replicator to synchronize MySQL data to MongoDB. This tool is powerful and is used for data synchronization between multiple heterogeneous databases. For example, the following is the topology diagram of tungsten-replicator synchronizing MySQL/Oracle data to heterogeneous databases:


tungsten-replicator synchronizing MySQL/Oracle data to heterogeneous databases


Workflow

  1. Deploy a Replicator service on the Master and Slave servers respectively.
  2. Master Replicator pulls binlog/CDC data from Master DB and converts it into a common THL format data.
  3. Master Replicator transfers THL data to Slave Replicator.
  4. Slave Replicator converts THL data into SQL and synchronizes it to Slave DB according to the type of Slave DB.

This tool is relatively mature; you can see that their Usage Document, with a total of more than 300 pages, has various scenarios with instructions. But there are mainly these problems in the process of use:

  • The structure is too heavy. As you can see from the above example, you need to deploy one on the Master and Slave servers, and you need to save THL data, which will take up a lot of disks.
  • The deployment and configuration are more complicated. It supports very rich functions, but its powerful functions also require complex configurations to support. But in fact, we only need to use a small part of the functions.
  • Development language (not counting the problem of this tool). This tool was developed in Java, and my group at the time mainly used Python and Golang to develop. If there are new requirements or bugs that need to be fixed, maintenance will be more difficult.

go-mysql-elasticsearch

Later I needed to synchronize MySQL data to Elasticsearch, I found another tool—go-mysql-elasticsearch, tried it out for a while, and found that this tool is relatively lightweight and simple to configure and deploy. The workflow of this tool is as follows:

  1. Use mysqldump to export the full data of MySQL.
  2. Import the full amount of data into Elasticsearch.
  3. Pull MySQL binlog data from the binlog postion position of the full data.
  4. Convert binlog to Elasticsearch format data, and synchronize to Elasticsearch in the form of RESTful API.

This tool is relatively lightweight, and only needs to deploy one service for the synchronization of a MySQL instance; in addition, it is developed in the Golang language, which I am familiar with. I can also implement it myself when I encounter new requirements during use. 

go-mysql-mongodb

After successfully applying go-mysql-elasticsearch to the quasi-production environment, I got the idea of replacing the tungsten-replicator.

MongoDB and Elasticsearch are similar—both belong to NoSQL, and the stored data is all document type. So I reused most of the logic in go-mysql-elasticsearch, just modified the code of the Elasticsearch client in the code to MongoDB and it ran. In this way, the project go-mysql-mongodb was formed.

go-mysql-mongodb Function

As go-mysql-mongodb mainly refers to go-mysql-elasticsearch, the function is basically the same.

To Configure the Data Source

You must set which tables to synchronize MySQL to MongoDB. Example configuration:

TOML
 




xxxxxxxxxx
1
13


 
1
[[source]]
2

          
3
schema = "test"
4

          
5
tables = ["t1", t2]
6

          
7

          
8

          
9
[[source]]
10

          
11
schema = "test_1"
12

          
13
tables = ["t3", t4]



It also supports some simple expressions, such as:

TOML
 




xxxxxxxxxx
1


 
1
[[source]]
2

          
3
schema = "test"
4

          
5
tables = ["test_river_[0-9 ]{4}"]



In this way, tables like test_river_0001 and test_river_0002 in the test database are selected.

Conversion Rules

Support synchronizing tables in MySQL to collections specified in MongoDB, and also support conversion of field names in tables, for example:

TOML
 




xxxxxxxxxx
1
17


 
1
[[rule]]
2

          
3
schema = "test"
4

          
5
table = " t1"
6

          
7
database = "t"
8

          
9
collection = "t"
10

          
11

          
12

          
13
    [rule.field]
14

          
15
    mysql = "title"
16

          
17
    mongodb = "my_title"



This configuration will synchronize the table test.t1 in MySQL to collection t.t in MongoDB, the title field in the table will also be renamed to my_title.

Filter Fields

Support to synchronize only the fields specified in the table. For example:

TOML
 




xxxxxxxxxx
1


 
1
[[rule]]
2

          
3
schema = "test"
4

          
5
table = "tfilter"
6

          
7
database = "test"
8

          
9
collection = "tfilter"



This configuration will only synchronize the data in the two columns of id and name in the table test.tfilter.

For more functions, please refer to the README of the project.

go-mysql-mongodb Status Quo

This project is still under development. The basic functions should be fine, but more tests are needed to ensure it. In addition, we need to pay attention to the changes of go-mysql-elasticsearch and bring some fixes to go-mysql-mongodb in time.

I hope this tool can help you. If you encounter problems during use, you can raise an issue, or you can directly email me at wx347249478@gmail.com.

Big data MySQL MongoDB Database

Opinions expressed by DZone contributors are their own.

Related

  • SQL vs NoSQL and SQL to NoSQL Migration
  • How Trustworthy Is Big Data?
  • Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
  • Enhancing Avro With Semantic Metadata Using Logical Types

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!