Over a million developers have joined DZone.

Moving Your Data From MongoDB to AWS Redshift for Analytical Processing

DZone's Guide to

Moving Your Data From MongoDB to AWS Redshift for Analytical Processing

Generating complex analytics directly from MongoDB can be difficult. When considering moving the data to a data warehouse in AWS, a good solution is Amazon Redshift.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

If you are using MongoDB as your database, have you ever considered how you are going to do analytics on top of the NoSQL database? This is one of the questions I have heard often as a limitation of MongoDB and generally for NoSQL. The common complaint is that it is difficult to derive relationships between collections compared to relational databases where tables are already related to generating analytics.

However, moving further, it is important to understand the difference between Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP).

Online Transaction Processing (OLTP)

OLTP is generally considered as short database Create, Read, Update, Delete (CRUD) operations. These are common CRUD operations that often run in an application and ensure fast query processing and data integrity, and is measured by the number of transactions per second.

Online Analytical Processing (OLAP)

OLAP, in contrast, does not consider the volume of transactions as a critical measurement. The queries in OLAP are often very complex and involve aggregate operations. However, the response time can be considered an important measurement for OLAP after initiating a query.

Where Does MongoDB Fit?

MongoDB is mainly used for OLTP operations for applications to store data in the form of document collections. It also provides support for MapReduce operations, which is used for big data processing. However, it can be challenging for complex analysis of data stored in document form compared to data that is transformed for analysis purposes. Therefore, it is important to consider using a data warehousing solution like Amazon Redshift for OLAP operations.

MongoDB to Redshift ETL

Moving data from MongoDB requires Extract, Transform, and Load operations. Extracting and transforming data is needed so that the data loaded to Amazon Redshift can be optimized for querying. However, writing your custom code from scratch can be challenging since you need to consider the following to implement a proper ETL process:

  • Querying the database without overloading it.
  • Scheduling ETL jobs.
  • Handling reliability and integrity of data moved.
  • Transforming data.
  • Managing infrastructure.
  • Monitoring the operation.

It can be challenging to support these capabilities unless you're using third-party ETL tools to handle MongoDB to Redshift ETL jobs.

Generally, it's a best practice to use multiple VPCs for an application (including the MongoDB Cluster) and data warehouse (Amazon Redshift) if you have both environments in AWS. This will require having a VPC Peering connection between the VPCs so that the ETL process can execute.

When you are using an ETL tool, make sure that it's possible to connect with your MongoDB cluster located in AWS or in other locations in a secure manner. It is important to evaluate whether the tool transports the data with encryption at transit using SSL and if it stores data temporarily whether they are encrypted at rest.

Why Amazon Redshift?

First of all, generating complex analytics directly from MongoDB is difficult due to the nature of the database as well as due to performance implications. When we consider moving the data to a data warehouse, in AWS, the primary solution is Amazon Redshift. You should be able to find similar solutions from other cloud vendors, as well. Amazon Redshift is fully managed and can scale for petabytes with an on-demand pricing model.

Amazon Redshift integrates with AWS security policies using IAM, as well as connects with data visualization and reporting tools like Amazon QuickSight seamlessly. This is important in providing end-to-end support for a reporting solution.

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

mongodb ,amazon redshift ,data warehousing ,etl ,aws ,database ,data analytics ,crud ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}