DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Beyond the Obvious: Uncovering the Hidden Challenges in Cybersecurity
  • AI and Cybersecurity Protecting Against Emerging Threats
  • Beyond SOLID: Embracing CUPID for Modern Software Craftsmanship
  • Why AI Forces a Rethink of Everything We Know About Software Security

Trending

  • Build a GitHub Slack Bot With AWS Bedrock and MCP, Part 2
  • OpenAPI From Code With Spring and Java: A Recipe for Your CI
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  • Stateless JWT Auth Microservice Architecture With Spring Boot 3 and Redis Sentinel
  1. DZone
  2. Software Design and Architecture
  3. Security
  4. What Happens When an AI Company Falls Victim to a Software Supply Chain Vulnerability

What Happens When an AI Company Falls Victim to a Software Supply Chain Vulnerability

Taking a look at OpenAI's breach and extrapolating to a possible AI company SSC hack and its possible effects. What can you do to defend yourself?

By 
Barak Brudo user avatar
Barak Brudo
·
Jul. 27, 23 · Analysis
Likes (1)
Comment
Save
Tweet
Share
4.6K Views

Join the DZone community and get the full member experience.

Join For Free

On March 20th, 2023, OpenAI took down the popular generative AI tool ChatGPT for a few hours. It later admitted that the reason for the outage was a software supply chain vulnerability that originated in the open-source in-memory data store library 'Redis.'  

As a result of this vulnerability, there was a time window (between 1-10 am PST on March 20) where users could accidentally access other users' chat history titles and possibly expose payment-related information such as names, email addresses, payment addresses, credit card type and last four digits of the payment card number. 

This was a relatively minor bug that was caught and patched quickly. Considering the rising popularity of ChatGPT and other generative LLM, what could be the fallout from a more targeted software supply chain attack?  

In this article, we'll look into what exactly took place on March 20th and how was the user information exposed. We'll also take a short imaginary trip into a more severe potential attack and see what information can be exposed and what can be done to help prevent such cases. We'll finish with a few general software supply chain security suggestions that can be relevant no matter what software your company is working on. 

Here's What Happened

Like almost any other software company, OpenAI's code is reliant in no small part on open-source libraries and code. In this case, the bug was discovered in the Redis client open-source library, redis-py. Here's the bug description as it appears in the company's own recount:

  • OpenAI uses Redis to cache user information in their server so as not to require checking their database for every request. 
  • Redis Clusters are used to distribute this load over multiple Redis instances. 
  • The redis-py library is used to interface with Redis from the company's Python server, which runs with Asyncio. 
  • The library maintains a shared pool of connections between the server and the cluster and recycles a connection to be used for another request once done.
  • When using Asyncio, requests and responses with redis-py behave as two queues: the caller pushes a request onto the incoming queue, pops a response from the outgoing queue, and then returns the connection to the pool.
  • Suppose a request is canceled after it's been pushed onto the incoming queue but before the response pops from the outgoing queue. In that case, we see our bug: the connection becomes corrupted, and the next response that's pulled for an unrelated request can receive data left behind in the connection. 
  • In most cases, this results in an unrecoverable server error, and the user will have to try their request again. 
  • But in some cases, the corrupted data happens to match the data type the requester was expecting, and so what gets returned from the cache appears valid, even if it belongs to another user.
  • At 1 a.m. Pacific time on Monday, March 20, OpenAI inadvertently introduced a change to their server that caused a spike in Redis request cancellations. This created a higher-than-usual probability for each connection to return bad data.

This specific bug only appeared in the Asyncio redis-py client for Redis Cluster and has since been fixed by combined work from the OpenAI engineers and the Redis library maintainers. 

As a reminder, this bug could inadvertently expose another active user's search title and part of that user's payment information. Some users are now giving ChatGPT full or partial control over their personal finances, giving the exposure of this information potentially catastrophic results.

Here's What Could Happen

In this case, the software supply chain bug inherited by OpenAi from the open-source library Redis was a relatively simple one and easily patched. I would like to ask your indulgence in imagining a more severe scenario, one where a targeted software supply chain attack, similar to the one visited upon SolarWinds, takes place and is left undiscovered for a significant period of time, let's say, months.

As users are now paying OpenAI for more direct access to their LLM, such an attack could potentially reveal the client's information, including their payment data. But that is not really the information that our hypothetical hacker group is interested in. ChatGPT currently has 1.16 billion users. It crossed 1 billion users in March 2023. These numbers depict an increase of almost 55% from February 2023 to March 2023. With numerous people now using generative AI for anything from art to history homework to finances, unrestricted access to OpenAI's database could reveal potential blackmail information on uncounted users. The Black Mirror episode 'Shut Up and Dance' (Season 3, Episode 3, 2016) gives a pretty good imaginative outcome to such explicit information finding its way to the hands of unscrupulous people. If you're looking for a more real-world parallel, the Ashley Madison data breach from 2015 had some severe consequences, some of them still relevant even years later.

Let's go a bit further in our imaginative hack and say that not only can this unnamed hacker group gain access to the OpenAI database, but it can also influence the results of requests. Can you imagine the potential of millions of people getting targeted financial advice tailor-made by a hacker group? Or getting false security scan information or code testing information courtesy, again, of our mysterious hacker group? The fact that ChatGPT can now access the internet makes it all the easier to hide information going in or out of OpenAI's servers as nothing more than regular, innocuous data.

I'll stop here, but I think you can see the enormous potential damage a software supply chain attack against a successful LLM can cause.

How To Protect Yourself and Your Software Supply Chain

One of the first things you can do to protect yourself is to sharpen your sense of suspicion. Don't implicitly trust any tool, no matter how benign it seems, unless you can guarantee you have full control over what it does, what it can potentially do, and what resources it has access to. The option to run an open-source version of ChatGPT locally can give you more control over both the training information and the level of access it has.

Having more transparency into what is happening with your software supply chain and building a pipeline is also a good idea. You can start with an SBOM for each of your builds, but that is only one step, and there are many other things you can do to increase the transparency of exactly what it is that is happening in your servers, cloud, or network. 

The Future of AI 

AI is here to stay no matter what we do. The level of its involvement in our everyday lives is a matter of speculation, but based on the past six months alone, it seems certain that we're looking at a potential watershed moment for LLM technology and its uses. As AI makes the creation of code and whole-cloth apps a matter of finding the right prompts in 'natural language,' we may be facing an unprecedented deluge in applications that have not been properly tested nor have the proper security safeguards to protect both their users and the people or companies that created them. 

Until the day a true intelligence will be listening to us behind our screens, we're left to find other ways to deal with our own security. I believe that promoting visibility as a precursor to trust is an excellent place to start.

AI Software User information Vulnerability Data (computing) security

Published at DZone with permission of Barak Brudo. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Beyond the Obvious: Uncovering the Hidden Challenges in Cybersecurity
  • AI and Cybersecurity Protecting Against Emerging Threats
  • Beyond SOLID: Embracing CUPID for Modern Software Craftsmanship
  • Why AI Forces a Rethink of Everything We Know About Software Security

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook