DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.

Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.

Kubernetes in the Enterprise: The latest expert insights on scaling, serverless, Kubernetes-powered AI, cluster security, FinOps, and more.

A Guide to Continuous Integration and Deployment: Learn the fundamentals and understand the use of CI/CD in your apps.

Related

  • ChatGPT Applications: Unleashing the Potential Across Industries
  • Effective Methods of Tackling Modern Cybersecurity Threats
  • Post-Pandemic Cybersecurity: Lessons Learned and Predictions
  • Challenge Your Cybersecurity Systems With AI Controls in Your Hand

Trending

  • A Comprehensive Approach to Performance Monitoring and Observability
  • Reading an HTML File, Parsing It and Converting It to a PDF File With the Pdfbox Library
  • Legacy App Migration: How Discovery Phase Speeds up Project Delivery
  • Demystifying Static Mocking With Mockito
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Microsoft AI Involuntarily Exposed a Secret Giving Access to 38TB of Confidential Data for 3 Years

Microsoft AI Involuntarily Exposed a Secret Giving Access to 38TB of Confidential Data for 3 Years

The story of how an overprovisioned SAS token exposed a massive 38TB trove of private data on GitHub for nearly three years.

Thomas Segura user avatar by
Thomas Segura
·
Oct. 14, 23 · Analysis
Like (5)
Save
Tweet
Share
6.0K Views

Join the DZone community and get the full member experience.

Join For Free

The WIZ Research team recently discovered that an overprovisioned SAS token had been lying exposed on GitHub for nearly three years. This token granted access to a massive 38-terabyte trove of private data. This Azure storage contained additional secrets, such as private SSH keys, hidden within the disk backups of two Microsoft employees. This revelation underscores the importance of robust data security measures.

Microsoft AI


What Happened?

WIZ Research recently disclosed a data exposure incident found on Microsoft’s AI GitHub repository on June 23, 2023.

The researchers managing the GitHub used an Azure Storage sharing feature through an SAS token to give access to a bucket of open-source AI training data.

This token was misconfigured, giving access to the account's entire cloud storage rather than the intended bucket.

This storage comprised 38TB of data, including a disk backup of two employees’ workstations with secrets, private keys, passwords, and more than 30,000 internal Microsoft Teams messages.

SAS (Shared Access Signatures) are signed URLs for sharing Azure Storage resources. They are configured with fine-grained controls over how a client can access the data: what resources are exposed (full account, container, or selection of files), with what permissions, and for how long. See Azure Storage documentation.

After disclosing the incident to Microsoft, the SAS token was invalidated. From its first commit to GitHub (July 20, 2020) to its revoking, nearly three years elapsed. See the timeline presented by the Wiz Research team:

timeline


Yet, as emphasized by the WIZ Research team, there was a misconfiguration with the Shared Access Signature (SAS).

Data Exposure

The token was allowing anyone to access an additional 38TB of data, including sensitive data such as secret keys, personal passwords, and over 30,000 internal Microsoft Teams messages from hundreds of Microsoft employees.

Here is an excerpt from some of the most sensitive data recovered by the Wiz team:

1695041318-03-files.jpg


As highlighted by the researchers, this could have allowed an attacker to inject malicious code into the storage blob that could then automatically execute with every download by a user (presumably an AI researcher) trusting in Microsoft's reputation, which could have led to a supply chain attack.

Security Risks

According to the researchers, Account SAS tokens such as the one presented in their research present a high-security risk. This is because these tokens are highly permissive, long-lived tokens that escape the monitoring perimeter of administrators.

When a user generates a new token, it is signed by the browser and doesn't trigger any Azure event. To revoke a token, an administrator needs to rotate the signing account key, therefore revoking all the other tokens at once.

Ironically, the security risk of a Microsoft product feature (Azure SAS tokens) caused an incident for a Microsoft research team, a risk recently referenced by the second version of the Microsoft threat matrix for storage services:

persistence


Secrets Sprawl

This example perfectly underscores the pervasive issue of secrets sprawl within organizations, even those with advanced security measures. Intriguingly, it highlights how an AI research team, or any data team, can independently create tokens that could potentially jeopardize the organization. These tokens can cleverly sidestep the security safeguards designed to shield the environment.

Mitigation Strategies

For Azure Storage Users:

1 - Avoid Account Sas Tokens

The lack of monitoring makes this feature a security hole in your perimeter. A better way to share data externally is using a Service SAS with a Stored Access Policy. This feature binds a SAS token to a policy, providing the ability to centrally manage token policies.

Better though, if you don't need to use this Azure Storage sharing feature, is to simply disable SAS access for each account you own.

2 - Enable Azure Storage Analytics

Active SAS token usage can be monitored through the Storage Analytics logs for each of your storage accounts. Azure Metrics allows the monitoring of SAS-authenticated requests and identifies storage accounts that have been accessed through SAS tokens, for up to 93 days.

For All:

1 -  Audit Your Github Perimeter for Sensitive Credentials

With around 90 million developer accounts, 300 million hosted repositories, and 4 million active organizations, including 90% of Fortune 100 companies, GitHub holds a much larger attack surface than meets the eye.

Last year, GitGuardian uncovered 10 million leaked secrets on public repositories, up 67% from the previous year.

GitHub must be actively monitored as part of any organization's security perimeter. Incidents involving leaked credentials on the platform continue to cause massive breaches for large companies, and this security hole in Microsoft's protective shell wasn't without reminding us of the Toyota data breach from a year ago.

On October 7, 2022 Toyota, the Japanese-based automotive manufacturer, revealed they had accidentally exposed a credential allowing access to customer data in a public GitHub repo for nearly 5 years. The code was made public from December 2017 through September 2022.

If your company has development teams, likely, some of your company's secrets (API keys, tokens, passwords) end up on public GitHub. Therefore it is highly recommended to audit your GitHub attack surface as part of your attack surface management program.

Final Words

Every organization, regardless of size, needs to be prepared to tackle a wide range of emerging risks. These risks often stem from insufficient monitoring of extensive software operations within today's modern enterprises. In this case, an AI research team inadvertently created and exposed a misconfigured cloud storage sharing link, bypassing security guardrails. But how many other departments - support, sales, operations, or marketing - could find themselves in a similar situation? The increasing dependence on software, data, and digital services amplifies cyber risks on a global scale.

Combatting the spread of confidential information and its associated risks necessitates reevaluating security teams' oversight and governance capabilities.

AI Data security GitHub azure Data (computing) SAS (software) security

Published at DZone with permission of Thomas Segura. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • ChatGPT Applications: Unleashing the Potential Across Industries
  • Effective Methods of Tackling Modern Cybersecurity Threats
  • Post-Pandemic Cybersecurity: Lessons Learned and Predictions
  • Challenge Your Cybersecurity Systems With AI Controls in Your Hand

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: