DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. The Perfect Storm: Gitlab Data Loss

The Perfect Storm: Gitlab Data Loss

Humans aren’t perfect. We make mistakes and even with our daily IT security practices. Organizations should do point-in-time backups of mission-critical databases.

Jeannie Liou user avatar by
Jeannie Liou
·
Feb. 19, 17 · News
Like (3)
Save
Tweet
Share
6.83K Views

Join the DZone community and get the full member experience.

Join For Free

An ineffective backup strategy was identified to be the main culprit for data loss at source-code hub Gitlab.com recently. Gitlab is hugely popular among developers who appreciate the fact that it’s an all-in-one solution, providing everything a developer needs over the course of a project. At the core is a Git-based version control system, which is paired with helpful extras. As a result, a lot of companies depend on it, ranging from smaller startups and individual developers to larger enterprises like Intel and Red Hat.

Image title

Last Tuesday evening, Gitlab.com found that a fatigued system administrator, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a database replication process. It appears that he deleted a folder containing 300GB of live production data that was due to be replicated. By the time he canceled the action, only 4.5GB remained, and the last potentially viable backup was taken six hours beforehand.

From the detailed information available on Gitlab.com, it is clear that they knew about possible data protection techniques, ranging from volume snapshots to replication, and backup and recovery. However, it is unclear whether they had the right expertise or data protection and recovery products in house to use these techniques correctly. We have seen this before; enterprises with critical applications either fail to leverage the right tools available for backup and recovery, or they deploy legacy solutions for distributed cloud applications. Worse yet, they simply rely on replication as a backup strategy. From what we know, Gitlab may have experienced data loss for any one of the following reasons: 

  • Neglecting backup and recovery specific tools in favor of home-grown scripts.
  • Delaying the deployment of backup and recovery tools until a data loss has occurred.
  • Not performing a thorough analysis of business requirements before choosing a solution that can meet those application uptime requirements.
  • Failing to perform a recovery operation, and blindly believing that it will work.

Other reasons exist, but what matters is how they fix the issue. But first, it's important to know who is ultimately responsible when errors like this happen. Is it the operator who made the mistake, the database administrator who is responsible for the database, the architect who designed the end-to-end application stack, or the application owner who is impacted by business loss?

We all know humans aren’t perfect. We make mistakes and even with our daily IT security practices. Mistakes happen. What organizations can do to protect themselves from these potential incidents is take a point-in-time backup of mission-critical databases. In its simplest form, this could be a snapshot of all the nodes in a cluster that is transferred to a backend storage. However, given the distributed nature and frequent hardware failures in scale-out databases, these patchwork solutions, such as node-by-node snapshots become operational nightmares to manage. In the best scenario, it takes several days to recover data, resulting in significant application and business downtime. In the worst scenario, the data may never be recoverable!

That is why a more robust solution is needed to reduce data loss risk for next-generation application environments. Listed below are some steps organizations can take to develop a reliable data protection and availability strategy:

  • List all possible failure scenarios that may occur in a given environment. Don’t forget the human errors!
  • Understand the failure resiliency of the data protection product — no one wants their data protection product to fail when it’s needed most.
  • Know about your recovery point objective (RPO) and recovery time objective (RTO) to choose the right data protection product for specific requirements.
  • Different data protection technologies such as replication, backup and recovery, and snapshots are available. Organizations must understand what each technology offers and their limitations
  • Create a recovery plan and test that plan regularly (every quarter) to make sure people and products work as expected during emergency situations.

Whatever the cause of failure, the best way to keep them from harming your organization is to verify your backups by performing regular recovery test restores. Although testing your backups regularly won’t prevent failures, they can help in noticing the issue which will allow you to fix the problem.

It’s important to highlight that even in this incident, Gitlab.com showed that they are dedicated to transparency — even in the worst days.

Data (computing) Data loss GitLab Backup application

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • NEXT.JS 13: Be Dynamic Without Limits
  • Handling Automatic ID Generation in PostgreSQL With Node.js and Sequelize
  • How to Perform Accessibility Testing of Websites and Web Apps
  • Build CRUD RESTful API Using Spring Boot 3, Spring Data JPA, Hibernate, and MySQL Database

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: