OWASP Top 10 for LLMs: The New Attack Surface Nobody Trained For

The senior application security engineer at the financial services firm is staring at the architecture diagram for the company's new AI customer assistant. He has fifteen years of experience securing web applications. He has internalized the traditional OWASP Top 10 — injection attacks, broken authentication, security misconfigurations. The new AI assistant should be straightforward: it's just another web service that talks to a third-party API.

Two hours into the threat model, he realizes his usual playbook doesn't help. SQL injection? The system doesn't construct SQL from user input — but it constructs LLM prompts from user input, RAG retrievals, system instructions, and external data, all blended together with no clear boundary between what's "data" and what's "instruction." Cross-site scripting? The LLM output is being rendered to users, but the LLM could be induced to produce content that traditional input validation never anticipated.

His honest note to the security team that evening: "We're competent at securing web applications. We're starting from near-zero at securing LLM applications. The categories we know don't transfer, and we haven't built the operational instincts the new categories need."

Abstract visualization of AI neural networks and security threat vectors in a digital landscape

1. Why Traditional Web App Security Doesn't Transfer

The OWASP Top 10 for LLMs is not "the OWASP Top 10 for web apps with a few additions." It is a different list, addressing different threats, requiring different defenses, against a different attack surface.

Structural difference 1: The boundary between data and instruction has collapsed.

In traditional web applications, you can clearly distinguish between data (user input, database records) and instructions (code, queries, commands). Security is built around keeping these separate — parameterized queries, input validation, output encoding.

In LLM applications, the same model that processes user input also follows instructions from system prompts, retrieved documents, conversation history, external content, and tool outputs. There is no clean boundary. The boundary between "data" and "instruction" has been permanently erased.

This single difference invalidates much of the traditional defensive thinking. You can't sanitize input the way you would for SQL injection because the "input" might be untrusted content the LLM is supposed to read and act on.

Structural difference 2: Non-determinism is the default.

Traditional applications produce deterministic outputs given the same inputs. LLMs produce probabilistic outputs. The same prompt run twice can produce meaningfully different responses. Traditional approaches to verification (write test cases, confirm they pass) don't work the same way.

Structural difference 3: The attack surface includes the model's behavior itself.

Traditional web app security defends the code, the infrastructure, the data. LLM application security must also defend against the model behaving in unexpected ways — refusing legitimate requests, complying with malicious ones, hallucinating, leaking training data, or being manipulated into bypassing safety controls.

Structural difference 4: New infrastructure components add new attack surface.

Traditional applications have web servers, databases, application code. LLM applications add vector databases, embedding models, retrieval pipelines, prompt management systems, fine-tuned weights, and increasingly, tool/agent infrastructure. Each is a new attack surface category.

Structural difference 5: Supply chain depth has increased dramatically.

LLM applications depend on foundation models (which can have backdoors), training data (which can be poisoned), embedding models, tool integrations, and the entire ecosystem of MCP servers and agentic frameworks. The supply chain is deeper, more opaque, and harder to audit.

The honest framing: 87% of cybersecurity leaders report increased vulnerabilities due to generative AI (World Economic Forum's 2026 Global Cybersecurity Outlook). The increase isn't because security professionals are worse — it's because the attack surface has expanded into categories existing training and tooling didn't cover.

Developer reviewing code on multiple screens, representing the complex security challenges in LLM application development

2. The 2025 List at a Glance

The OWASP Top 10 for LLM Applications 2025 contains ten categories:

LLM01:2025 Prompt Injection (unchanged at #1) — User inputs or external content manipulate the LLM's behavior or output in unintended ways.

LLM02:2025 Sensitive Information Disclosure (up from #6) — The LLM reveals confidential data — PII, credentials, training data, system prompt contents.

LLM03:2025 Supply Chain (broadened, up from #5) — Vulnerabilities from compromised foundation models, training datasets, embedding models, fine-tuned weights, or tool integrations.

LLM04:2025 Data and Model Poisoning (renamed from Training Data Poisoning) — Malicious manipulation of pre-training, fine-tuning, or embedding data.

LLM05:2025 Improper Output Handling (renamed from Insecure Output Handling) — LLM outputs aren't properly validated before being passed to downstream systems.

LLM06:2025 Excessive Agency (up from #8) — LLMs or agentic systems are granted too much autonomy, too many permissions, or too many tools.

LLM07:2025 System Prompt Leakage (NEW) — System prompts contain sensitive information that can be extracted through various means.

LLM08:2025 Vector and Embedding Weaknesses (NEW) — The vector storage, embedding pipelines, and retrieval systems used in RAG applications have their own vulnerabilities.

LLM09:2025 Misinformation (replaced Overreliance) — LLMs produce information that's incorrect but presented confidently.

LLM10:2025 Unbounded Consumption (replaced Model Theft + Model DoS) — Uncontrolled resource consumption producing operational disruption, cost attacks, or availability issues.

The pattern in changes: the list has matured from describing nascent threats to describing operationally important threat categories that defenders encounter routinely.

Security framework documentation on a screen with annotation and analysis tools visible

3. The Boundary Collapse: LLM01, LLM04, LLM07

Three categories share a common root: the collapse of the boundary between data and instruction.

Prompt Injection (LLM01)

Prompt injection has evolved well beyond early "jailbreaking via chat interface" patterns. The current threat landscape includes:

Direct injection — the original pattern. User types adversarial content into a chat interface; the LLM responds in unintended ways. Still common, increasingly handled by guardrails, but never completely solved.

Indirect injection — the more dangerous pattern. The LLM consumes external content (web pages, PDFs, emails, retrieved documents) that contains hidden instructions. The canonical scenario: an attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model's response, resulting in a positive recommendation despite the actual resume contents.

Cross-modal injection — the newest pattern. Hidden instructions in images, audio, or video that a multimodal LLM processes. An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model's behavior.

The trajectory is consistent: the input surface for prompt injection keeps expanding. Anywhere your LLM accepts content — text, files, images, audio, retrieved documents, tool outputs — is a potential injection vector.

Why RAG Doesn't Help

A specific misconception: many practitioners believe that retrieval-augmented generation (RAG) and fine-tuning solve prompt injection by grounding the model in trusted data. They don't.

RAG retrieves documents that the LLM then incorporates into its context. If those documents contain prompt injection payloads, the injection happens through the retrieved content. RAG has just provided a new injection vector while doing nothing to prevent it.

The System Prompt Reality Check

A specific operational principle: the system prompt should NOT be considered a secret, nor should it be used as a security control (OWASP 2025).

System prompts can be extracted through prompt injection, inferred through repeated probing, revealed through error messages, or reconstructed from model behavior patterns. Security must be enforced independently of what the LLM does or doesn't reveal. The rate limit lives in your backend API, not in your prompt. The authorization check happens in code, not in instructions.

Mitigation Reality

The honest framing on prompt injection mitigation: there are no fool-proof solutions. The mitigations that help (partially):

Defense in depth across multiple layers
Application-level guardrails that operate independently of the LLM
Output validation that catches injection-induced outputs
Authorization at the API/service layer, not the LLM layer
Minimizing what the LLM has access to (least-privilege agent design)
Adversarial testing as a continuous practice

The realistic security posture: assume prompt injection will eventually succeed against your application, and design so that successful injection produces bounded harm.

Abstract representation of data flow in a complex AI system showing where injection vulnerabilities can occur

4. The New Infrastructure: LLM03, LLM08

Two categories address infrastructure that didn't exist in traditional applications.

Supply Chain in 2026

LLM application supply chain concerns are deeper than traditional software:

Foundation models: Are they safe? Have they been backdoored? Most are opaque.
Training and fine-tuning datasets: Were they curated for security? Could they have been poisoned?
Embedding models: Often less audited than generation models. Have their own vulnerabilities.
Model registries and hubs: Hugging Face and similar are the npm of AI. The same supply chain risks apply.
Tool integrations: MCP servers, function calling APIs, agent toolkits. Each is a trust extension.

Traditional dependencies can be audited by reading the code. Foundation models are billions of parameters — uninspectable by humans. Model vulnerabilities have less mature reporting infrastructure. The opacity of large models means the supply chain for LLM applications is harder to verify than for traditional software.

The Vector and Embedding Surface (LLM08)

A new category in 2025 recognizes that RAG infrastructure has its own attack surface:

Embedding inversion: Embeddings of sensitive data can sometimes be inverted to reveal the underlying content. Embeddings aren't a privacy primitive. Treating them as anonymized data is incorrect.

Embedded prompt injection: When documents are embedded for RAG, prompt injection payloads in those documents persist into retrieval. Retrieval brings the injection back to inference, even though the document was embedded weeks ago.

Cross-tenant contamination: In multi-tenant RAG systems, can one tenant's content be retrieved by another? Are vector spaces appropriately partitioned? Are access controls enforced at retrieval time?

Vector database access: Is the vector database authenticated and authorized? Can users access other users' embeddings? Are embeddings stored with appropriate access controls?

The supply chain audit of a typical 2026 AI assistant found: 1 foundation model, in-house fine-tuning, an open-source embedding model, a managed vector database, 12 MCP servers from various sources, and mixed prompt templates. Verifiable components: partial. Unverifiable: substantial. The response was a defensive perimeter around the LLM, assuming components could be compromised, with continuous behavioral monitoring.

Cloud infrastructure and data pipeline visualization representing the complex supply chain of modern AI systems

5. The Agency Dimension: LLM06

Excessive Agency moved up significantly in 2025 (from #8 to #6) as agentic architectures have proliferated.

Traditional applications have well-defined permissions. LLM-based agents introduce a complication: the agent doesn't decide what to do in advance. It decides at runtime, based on the input, the model's interpretation, and the tools available. The same agent might appropriately access customer records for one request and inappropriately attempt to access financial records for another.

The canonical scenario: an indirect prompt injection through a malicious email convinces the LLM to forward the user's entire inbox to an external address. If the plugin had only read access, the attack would have been impossible.

Excessive agency failure modes:

Over-permissioned service accounts: The agent uses a service account with broad permissions because "it needs to access many things." Successful injection produces broad consequences.
Too many tools: The agent has access to 50 tools because "we might want to use any of them." The unused 48 are attack surface.
Insufficient action confirmation: The agent takes irreversible actions (sending email, making payments) without human confirmation.
No audit trail: No record of why the agent decided to take an action, limiting forensic capability.
Cross-user data access: The agent operates with privileges that span users.

The defensive pattern — least privilege for agents:

Minimal tools: Give the agent only the tools the specific use case requires
Minimal permissions per tool: Each tool should have the narrowest scope possible
Per-user scoping: Agent permissions should match the user's permissions, not exceed them
Action confirmation for irreversible operations: Human-in-the-loop for high-stakes actions
Comprehensive audit: Every agent decision logged with context

An audit of a company's AI assistant found a single agent with 23 tools, broad service account permissions, no per-user scoping, and no action confirmation. After remediation — function-specific agents with minimal toolsets, per-user scoping, confirmation for irreversible operations — the same user-facing functionality was delivered with substantially reduced attack surface.

Security operations dashboard with AI agent monitoring and threat detection visualizations

6. The Output Side: LLM02, LLM05, LLM09

Three categories address what the LLM produces and how it gets handled downstream.

Sensitive Information Disclosure (LLM02)

The threat: LLM outputs reveal information they shouldn't. Categories of sensitive information that can leak:

PII leakage: Personal information from training data or conversation history
Credential disclosure: API keys, tokens, or passwords that appear in training data or system prompts
Prompt leakage: The system prompt's contents being revealed
Intellectual property exposure: Proprietary algorithms or trade secrets

Sensitive Information Disclosure jumped from #6 to #2 in 2025, reflecting increased operational experience: practitioners have seen many ways LLM outputs reveal information that shouldn't be revealed, often through entirely innocent-seeming user interactions.

Improper Output Handling (LLM05)

The traditional injection vectors return through LLM outputs:

XSS: LLM produces HTML/JavaScript that gets rendered without sanitization
SQL injection: LLM produces SQL that gets executed without parameterization
Command injection: LLM produces shell commands that get executed
SSRF: LLM produces URLs that get fetched

Security teams applied input validation discipline in their traditional applications but have not extended that discipline to LLM outputs. The LLM produces a payload (often inadvertently), and downstream systems trust the LLM output enough to execute it.

Misinformation (LLM09)

A category that replaced "Overreliance" in 2025. The threat: LLMs produce confident, plausible-sounding information that is factually wrong.

The danger amplifier: misinformation in LLM outputs is particularly dangerous when downstream systems automatically act on the output without human review. An LLM that fabricates a legal citation and the citation goes unchecked into a legal brief. An LLM that misidentifies a drug interaction and the output goes unchecked into clinical guidance.

The operational countermeasure: treat LLM outputs with the same skepticism you'd apply to any external, unverified source. Implement human review for high-stakes outputs. Build validation layers that check claims against authoritative sources where possible.

7. The Training Gap: What Security Professionals Need to Build

For practitioners who have strong web app security backgrounds, the specific capabilities to develop:

Capability 1: Threat modeling for stochastic systems. The same prompt run twice can produce different results. Threat modeling must account for probabilistic behavior, not just deterministic code paths.

Capability 2: Prompt injection assessment. Assess direct, indirect, and cross-modal injection vectors in your application. Test with adversarial inputs across all content types the LLM processes.

Capability 3: Supply chain assessment for AI components. Build an SBOM-equivalent for AI components. Assess foundation models, embedding models, fine-tuning data, tool integrations.

Capability 4: Least-privilege agent design. Scope agent permissions to the minimum required. Design human-in-the-loop confirmation for irreversible operations.

Capability 5: Output validation extension. Extend the output validation discipline you apply to user input to also cover LLM outputs before they reach downstream systems.

The transition requires rebuilding rather than extending existing security capability. Many of the underlying disciplines (threat modeling, security architecture, vulnerability management) transfer. The specific threat categories, defensive techniques, and verification approaches are different.

Security teams that approach LLM application security as "web app security with AI" will miss the specific threat categories the OWASP list captures. Security teams that approach it as a new domain requiring new operational instincts will be better positioned to defend these systems.