DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Security in the Age of MCP: Preventing "Hallucinated Privilege"
  • Security and Governance Patterns for Your Conversational AI
  • Poisoning AI Brain: The Hidden Dangers of Third-Party Data and Agents in AI Systems
  • Securing Generative AI Applications

Trending

  • Every Cache Miss Is a Tiny Tax on Your Performance
  • The Missing `bandit` for AI Agents: How I Built a Static Analyzer for Prompt Injection
  • Event-Driven Pipelines With Apache Pulsar and Go
  • Identity in Action
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. AI Protection: Securing The New Attack Frontier

AI Protection: Securing The New Attack Frontier

The world is moving toward AI-first products. In this article, we discuss how to defend against sophisticated attacks that target the underlying models.

By 
Aditya Visweswaran user avatar
Aditya Visweswaran
·
Apr. 02, 25 · Analysis
Likes (0)
Comment
Save
Tweet
Share
4.0K Views

Join the DZone community and get the full member experience.

Join For Free

We’re amidst a paradigm shift in society where many product verticals are being reimagined through an ‘AI-first’ architecture. An AI-first architecture is one where much of the core business logic is driven by AI, and the product is architected to fully exploit the capabilities of the underlying AI models. A striking example is IDEs; intelligent editors such as Cursor have quickly gained popularity in the software community. Countless startups have emerged to challenge well-established experiences (email, online shopping, real estate, to name a few) with AI-first alternatives. 

This promises not only an exciting future but also a more dangerous one. Traditional attack paths are outmoded under the new regime of AI-centric architectures. In this article, we discuss novel attack paradigms that AI-first architectures are vulnerable to and how companies operating in this space can defend against them. 

Model Extraction

Pre-AI applications are typically shipped as binary executables or served over the Internet. In either case, reverse-engineering the core business logic of the application is very difficult. This opaqueness prevents trade secrets from being leaked and makes it harder for attackers to devise new exploits. 

AI-driven architectures are different. Attackers can query the AI to generate training data, which is then used to replicate the model. Such an attack can be used to build competing products or to identify vulnerabilities or weaknesses in the original model. Notably, OpenAI has recently accused DeepSeek of stealing its intellectual property. What this likely means is that OpenAI believes they were the target of a model extraction attack by the DeepSeek team. 

Model extraction is difficult to defend against, because it’s not trivial to distinguish a model extraction attempt from legitimate usage. 

Defenses

Rate Limiting

It is harder to replicate a model if you are only able to access a small trickle of responses from it. If typical usage for your product is fairly low, build robust throttling mechanisms that validate that assumption. You can always increase the limit for any legitimate power users. 

Usage Monitoring

Typical user interactions differ significantly from those of an attacker attempting model extraction. While it is generally not feasible to examine user prompts or actions due to privacy concerns, a possible option is to have client-side usage monitoring, where dubious usage patterns such as prompt-injection attacks are flagged (and potentially auto-throttled) without sharing sensitive user data with the server. 

Model Inversion

It is easier for pre-AI applications to defend against attempts to access sensitive data. Traditional access control mechanisms can reliably prevent a user from gaining access to any data that doesn’t belong to them. 

AI-first architectures cannot rely purely on access control, because they’re vulnerable to model inversion attempts. Model inversion is a type of attack where the attacker aims to get the model to leak sensitive data from its training set. The simplest model inversion attacks involve prompt engineering, where attackers attempt to ‘trick’ the model into leaking information that it is trained not to. 

But there are far more sophisticated approaches. It is possible to train an inversion model that takes the output of the target model, and predicts sensitive data from it. For instance, an inversion model can be trained to infer someone’s private medical history from the output of a model that calculates their heart disease markers. Another approach is ‘membership inference,’ where the model is queried with a datapoint, and its output is used to guess whether the query was in its training dataset. 

Defenses

Differential Privacy

This is a technique that adds noise to the model outputs, such that the output cannot be traced back to any single datapoint in the training set. The methodology will depend on the nature of your application, but differential privacy can typically provide statistical guarantees about the privacy of the data subjects. 

Data Anonymization

The safest approach to not leak sensitive data, is to not have sensitive data in your training set at all. The specific anonymizing technique depends on the nature of the model and dataset. For instance, text datasets can be anonymized using an LLM such that useful context is preserved but sensitive data is removed. 

Data Poisoning

Traditional applications can be assessed for security rigor by auditing their codebase. This is not true of AI-first architectures, where the training data can be as vulnerable as the application code itself. Data poisoning is a type of cyberattack that targets the model’s training set, usually to build a backdoor into the model or degrade its performance. 

For AI-first applications, data is valuable and scarce; it’s tempting to collect data from wherever one can, including the public internet. This makes data poisoning a particularly rewarding strategy for bad actors — it is feasible to plant poisoned data into public websites, knowing that a data scraper will pick it up to build a training set. 

Defenses

Data Sanitization

Just as training data should be anonymized, it should also be sanitized for adversarial inputs. In the case of text, LLMs can be used to identify and filter out data poisoning attempts.

Data Provenance

Sourcing high-quality training data, ensuring a full chain of custody for it, and recording and auditing any subsequent mutations to it are all important protections to have in place for your training dataset. 

Conclusion

Companies building AI-first products should expand their cybersecurity horizons beyond traditional threat modeling and guard against the sophisticated cybersecurity threats that AI-based products are uniquely susceptible to. This article explores the major attack vectors to be mindful of, but the broader lesson is to think outside the box and scrutinize not just the application code, but the model and its training process for security vulnerabilities. 

AI large language model security

Opinions expressed by DZone contributors are their own.

Related

  • Security in the Age of MCP: Preventing "Hallucinated Privilege"
  • Security and Governance Patterns for Your Conversational AI
  • Poisoning AI Brain: The Hidden Dangers of Third-Party Data and Agents in AI Systems
  • Securing Generative AI Applications

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook