DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Migrate JSON Data From MongoDB to MaxCompute

How to Migrate JSON Data From MongoDB to MaxCompute

In this article, we will learn how to use the Data Integration function on Alibaba Cloud DataWorks console to extract JSON fields from MongoDB to MaxCompute.

Leona Zhang user avatar by
Leona Zhang
·
Jan. 22, 19 · Tutorial
Like (2)
Save
Tweet
Share
5.07K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will learn how to use the Data Integration function on Alibaba Cloud DataWorks console to extract JSON fields from MongoDB to MaxCompute.

Prepare Data and Account

First, upload the data to your MongoDB database. This example uses Alibaba Cloud's ApsaraDB for MongoDB. The network type is VPC because a public IP address is required for MongoDB to communicate with the default resource group of DataWorks. The test data is as follows:

{
    "store": {
        "book": [
             {
                "category": "reference",
                "author": "Nigel Rees",
                "title": "Sayings of the Century",
                "price": 8.95
             },
             {
                "category": "fiction",
                "author": "Evelyn Waugh",
                "title": "Sword of Honour",
                "price": 12.99
             },
             {
                 "category": "fiction",
                 "author": "J. R. R. Tolkien",
                 "title": "The Lord of the Rings",
                 "isbn": "0-395-19395-8",
                 "price": 22.99
             }
          ],
          "bicycle": {
              "color": "red",
              "price": 19.95
          }
    },
    "expensive": 10
}

Log on to the DMS for MongoDB console. In this example, the database is admin and the collection is userlog. You can run the db.userlog.find().limit(10) command in the Query Window to view the uploaded data, as shown in the following figure.

Image title

Create a user in the database in advance to add data sources in DataWorks. In this example, run the db.createUser({user:"bookuser",pwd:"123456",roles:["root"]})command to create a user named bookuser. The password of the user is 123456, and the permission is root.

Use DataWorks to Extract Data to MaxCompute

Add a MongoDB data source In the DataWorks console, go to the Data Integration page and add a MongoDB data source.

Image title

For specific parameters, see the following figure. Click Finish after the data source connectivity test is successful. In this example, the MongoDB network type is VPC. Therefore, set the Data Source Type to Has Public IP Address.

Image title

To retrieve the endpoint and the port number, log on to the ApsaraDB for MongoDB console and click an instance, as shown in the following figure.

Image title

Create a data synchronization task in the DataWorks console, and create a data synchronization node.

Image title

 Meanwhile, create a table named mqdata in DataWorks to store JSON data.

Image title

You can set the table parameters on the graphic interface. In this example, the mqdata table has only one column named MQ data, whose data type is string.

Image title

After creating the table, set the data synchronization task parameters on the graphic interface, as shown in the following figure. Set the target data source to odps_first and the target table to mqdata. Set the original data source to MongoDB and select mongodb_userlog. After completing the preceding configuration, click Switch to Script Mode.

Image title

The following shows the example code in script mode:

{
    "type": "job",
    "steps": [
        {
            "stepType": "mongodb",
            "parameter": {
                "datasource": "mongodb_userlog",
 //Indicates the data source name.
                "column": [
                    {
                        "name": "store.bicycle.color", //Indicates the JSON field path. In this example, the value of color is extracted.
                        "type": "document.document.string" //Indicates the number of fields in this line must be consistent with that in the preceding line (the name line). If the JSON field is a level 1 field, such as the "expensive" field in this example, enter "string" for this field.
                    }
                ],
                "collectionName // Collection name": "userlog"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
                "partition": "",
                "isCompress": false,
                "truncate": true,
                "datasource": "odps_first",
                "column": [
     //Indicates the table column name in MaxCompute, namely "mqdata".
                ],
                "emptyAsNull": false,
                "table": "mqdata"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "concurrent": 2,
            "throttle": false,
            "dmu": 1
        }
    }
}

After completing the preceding configuration, click Run. If the operation is successful, the following log is displayed.

Image title

Verify Results

Create an ODPS SQL node in your Business Flow.

Image title

Enter the SELECT*from mqdata; statement to view the data in the mqdata table. Alternatively, you can run this command on the MaxCompute client for the same purpose.

Image title

Data integration Test data JSON MongoDB

Published at DZone with permission of Leona Zhang. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 5 Java REST API Frameworks
  • Stream Processing vs. Batch Processing: What to Know
  • Quick Pattern-Matching Queries in PostgreSQL and YugabyteDB
  • What Is Policy-as-Code? An Introduction to Open Policy Agent

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: