DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Join us today at 1 PM EST: "3-Step Approach to Comprehensive Runtime Application Security"
Save your seat
  1. DZone
  2. Data Engineering
  3. Databases
  4. Backup and Anonymize Your Cosmos Collections With the Cosmic Clone Tool

Backup and Anonymize Your Cosmos Collections With the Cosmic Clone Tool

Learn about the Cosmos Clone tool and see how you can backup and anonymize your Cosmos collections.

Kranthi Medam user avatar by
Kranthi Medam
·
Mar. 20, 19 · Tutorial
Like (6)
Save
Tweet
Share
9.35K Views

Join the DZone community and get the full member experience.

Join For Free

Image title

Introduction

As part of an application lifecycle, we are periodically required to refresh our non-production (dev/test) environments with production data. This helps us test applications with the right data and ensures we do not leak any obvious defects. It also enables us to test for performance of our application, as we will have the same quantity of data as in production. Further, testing on real data is bound to inspire confidence on an application release.

But copying live data increases the risks and the exposure of confidential information. A non-production database is likely to be accessed by developers and business analysts who may not have the same access in a live environment. They might only be interested in testing a feature but should not be exposed to the confidential information in the live system itself. To reduce such risks, data needs to be anonymized. i.e., personally identifiable/confidential information is removed or replaced with dummy values.

Thus, restoring data from production to a test environment is a two-part exercise. The first part involves copying the database with all its data, code (procedures/views), and settings (indexes/RU's) to a test environment. The second part involves the anonymization of confidential or sensitive information in the copied content.

For a Relational DBMS system like SQL server, this can be achieved by out of the box tools such as SQL backup restore and static data masking utilities. While for Azure Cosmos DB (Microsoft’s NoSQL Document\Multi model Database), there are no out-of-the-box tools to perform these tasks. Cosmos DB does provide a Data migration tool that helps to copy documents but does not provide options either to create a similar collection (with partition keys or indexes) or to copy the associated code (stored procedures, UDFs, and triggers). Further, there are no options to anonymize data in a collection. This requires every team to acquaint themselves with either the .Net SDK for cosmos DB or the Javascript syntax for stored procedures and then write their own scripts to update documents and anonymize their data. Needless to say, this process involves manual effort and has a learning curve attached to it.

Cosmic Clone is a utility that was developed to help ease the above process and aid in the copy and anonymization of a cosmos collection. This tool helps in the creation of a backup copy of your Cosmos Collection in few clicks and provides options to anonymize data in attributes that may contain personally identifiable or sensitive information.

Cosmic Clone provides options like below and enables us to create a new collection with all the settings, code, and documents intact. And as an exact replica of the source collection, one can opt out of any of these settings as well

Image title

To anonymize various attributes, the tool allows us to provide rules to indicate the attributes to anonymize and the possible values to replace them with. There are also options to perform a random shuffle of the data.

Image title

With a few clicks, the tool begins to copy the collection. It also allows us to save the anonymization rules used for the copy of data, such that they can be reused in a subsequent run of the tool.

Image title

Similar backup and anonymize scenarios are applicable for various cases such as:

Reporting and Analytics

Consider as an example, that you need to generate analytics related to the number of people in different departments of your company. But your cosmos collection also has information on Mobile and contact details of various employees, which is bound to be irrelevant to the current scenario. It is in your best interest to anonymize such fields in the copy of your data. You could define a simple rule like below and run the tool to anonymize such data.

Image title

In most cases, it would be wise to anonymize data that is irrelevant to the analysis or analytics at hand.

Data Validation Post-Release

Consider scenarios where you need a copy of the data to validate before and after a period of time. For example, you have rolled out a few major changes to your collection structure, including changes in partition key and indexes to a few columns and added in a few new object types onto the same collection. You would need a backup copy to validate or compare with the old data.

Debugging Issues

For scenarios where you need to debug a production issue that cannot be replicated in a non-production environment, it is likely to be caused by a remote data scenario that was unaccounted for in testing. You would need to restore a copy of the collection to debug rather than risk modifying live data with test values.

GDPR Compliance

Data protection regulations such as GDPR now mandate data anonymization in all non-production environments. Microsoft's core services engineering teams have a mandatory task to anonymize their lower environments, which recurs every 90 days. In such cases, usage of the cosmic clone tool can save the manual effort of a developer, as they will no longer need to write, test, update, or maintain their own anonymization scripts.

Conclusion

As the usage of Azure Cosmos DB continues to rise, self-serve capabilities such as backup, restore, and anonymization of a data collection continue to become more essential. Cosmic Clone is a handy utility that aids in this endeavor. The out-of-the-box anonymizations options are a huge advantage that help perform the first of its kind data masking tasks on a Cosmos database. It is sure to save time from routine backup restore tasks, which is time that can be spent on more productive work. Cosmic Clone has the potential to become a handy tool in every Cosmos Developer/DBA's arsenal.

For a complete walkthrough of the tool, visit the GitHub page, as the tool is now publicly available.

Disclaimer: Please note this is not an official tool from the Azure Cosmos DB team, but a utility developed by an independent developer within Microsoft IT.

Cosmos (operating system) Data masking Backup Clone (Java method) Cosmos DB Database

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Develop a Portrait Retouching Function
  • Public Cloud-to-Cloud Repatriation Trend
  • Visual Network Mapping Your K8s Clusters To Assess Performance
  • 2023 Software Testing Trends: A Look Ahead at the Industry's Future

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: