AI Protection: Securing The New Attack Frontier
The world is moving toward AI-first products. In this article, we discuss how to defend against sophisticated attacks that target the underlying models.
Join the DZone community and get the full member experience.
Join For FreeWe’re amidst a paradigm shift in society where many product verticals are being reimagined through an ‘AI-first’ architecture. An AI-first architecture is one where much of the core business logic is driven by AI, and the product is architected to fully exploit the capabilities of the underlying AI models. A striking example is IDEs; intelligent editors such as Cursor have quickly gained popularity in the software community. Countless startups have emerged to challenge well-established experiences (email, online shopping, real estate, to name a few) with AI-first alternatives.
This promises not only an exciting future but also a more dangerous one. Traditional attack paths are outmoded under the new regime of AI-centric architectures. In this article, we discuss novel attack paradigms that AI-first architectures are vulnerable to and how companies operating in this space can defend against them.
Model Extraction
Pre-AI applications are typically shipped as binary executables or served over the Internet. In either case, reverse-engineering the core business logic of the application is very difficult. This opaqueness prevents trade secrets from being leaked and makes it harder for attackers to devise new exploits.
AI-driven architectures are different. Attackers can query the AI to generate training data, which is then used to replicate the model. Such an attack can be used to build competing products or to identify vulnerabilities or weaknesses in the original model. Notably, OpenAI has recently accused DeepSeek of stealing its intellectual property. What this likely means is that OpenAI believes they were the target of a model extraction attack by the DeepSeek team.
Model extraction is difficult to defend against, because it’s not trivial to distinguish a model extraction attempt from legitimate usage.
Defenses
Rate Limiting
It is harder to replicate a model if you are only able to access a small trickle of responses from it. If typical usage for your product is fairly low, build robust throttling mechanisms that validate that assumption. You can always increase the limit for any legitimate power users.
Usage Monitoring
Typical user interactions differ significantly from those of an attacker attempting model extraction. While it is generally not feasible to examine user prompts or actions due to privacy concerns, a possible option is to have client-side usage monitoring, where dubious usage patterns such as prompt-injection attacks are flagged (and potentially auto-throttled) without sharing sensitive user data with the server.
Model Inversion
It is easier for pre-AI applications to defend against attempts to access sensitive data. Traditional access control mechanisms can reliably prevent a user from gaining access to any data that doesn’t belong to them.
AI-first architectures cannot rely purely on access control, because they’re vulnerable to model inversion attempts. Model inversion is a type of attack where the attacker aims to get the model to leak sensitive data from its training set. The simplest model inversion attacks involve prompt engineering, where attackers attempt to ‘trick’ the model into leaking information that it is trained not to.
But there are far more sophisticated approaches. It is possible to train an inversion model that takes the output of the target model, and predicts sensitive data from it. For instance, an inversion model can be trained to infer someone’s private medical history from the output of a model that calculates their heart disease markers. Another approach is ‘membership inference,’ where the model is queried with a datapoint, and its output is used to guess whether the query was in its training dataset.
Defenses
Differential Privacy
This is a technique that adds noise to the model outputs, such that the output cannot be traced back to any single datapoint in the training set. The methodology will depend on the nature of your application, but differential privacy can typically provide statistical guarantees about the privacy of the data subjects.
Data Anonymization
The safest approach to not leak sensitive data, is to not have sensitive data in your training set at all. The specific anonymizing technique depends on the nature of the model and dataset. For instance, text datasets can be anonymized using an LLM such that useful context is preserved but sensitive data is removed.
Data Poisoning
Traditional applications can be assessed for security rigor by auditing their codebase. This is not true of AI-first architectures, where the training data can be as vulnerable as the application code itself. Data poisoning is a type of cyberattack that targets the model’s training set, usually to build a backdoor into the model or degrade its performance.
For AI-first applications, data is valuable and scarce; it’s tempting to collect data from wherever one can, including the public internet. This makes data poisoning a particularly rewarding strategy for bad actors — it is feasible to plant poisoned data into public websites, knowing that a data scraper will pick it up to build a training set.
Defenses
Data Sanitization
Just as training data should be anonymized, it should also be sanitized for adversarial inputs. In the case of text, LLMs can be used to identify and filter out data poisoning attempts.
Data Provenance
Sourcing high-quality training data, ensuring a full chain of custody for it, and recording and auditing any subsequent mutations to it are all important protections to have in place for your training dataset.
Conclusion
Companies building AI-first products should expand their cybersecurity horizons beyond traditional threat modeling and guard against the sophisticated cybersecurity threats that AI-based products are uniquely susceptible to. This article explores the major attack vectors to be mindful of, but the broader lesson is to think outside the box and scrutinize not just the application code, but the model and its training process for security vulnerabilities.
Opinions expressed by DZone contributors are their own.
Comments