Hackers Manipulate AI via Indirect Prompt Injection

Search engines, automated content processing, and even ad reviews have all been transformed by the emergence of large language models (LLMs) and AI agents. But this integration also creates new security risks, with indirect prompt injection (IDPI) emerging as a major threat. This attack entails inserting altered instructions into seemingly innocuous content, like user-generated text or HTML pages.

Unauthorized actions result from LLMs ingesting these instructions during their regular processing tasks. Although a large portion of earlier IDPI research concentrated on theoretical risks and proof-of-concept (PoC) examples, actual incidents have now surfaced. Comprehending Web-Based IDPI Web-based IDPI happens when hackers insert manipulated or hidden prompts into web content. When performing tasks like content analysis or summarization, LLMs process these prompts.

Crucially, direct interaction with the model is not necessary for this type of attack. Rather, adversaries take advantage of benign web features, like metadata or content review systems, which AI agents subsequently examine. Unaware that the content it is processing contains dangerous instructions, the AI carries them out, which could result in serious harm.

Web-based IDPI threat model illustration (Source: Paloaltonetworks) The Change in the Attacker's Intent A real-world attack using IDPI to get around an AI-based product ad review system was discovered in December 2025. This scam site (hosted at “reviewerpress.com”) showcased a deceptive advertisement for “military glasses” with fake discounts. The attacker used a hidden prompt embedded in the page’s HTML code, instructing the AI to approve the scam advertisement, which would otherwise have been flagged as fraudulent.

This is the first instance of a targeted IDPI attack against an AI-based ad review system that we have seen. It highlights a larger trend in which attackers go beyond low-risk manipulations, such as promoting websites or inserting "hire me" prompts into resumes. IDPI-containing webpage with a military eyewear advertisement, a phony special discount, and phony comments (Source: Paloaltonetworks) These attacks use a variety of payload delivery techniques, including: Obfuscation: Including prompts in HTML sections or attributes that appear harmless in order to evade detection.

CSS rendering suppression: Preventing users and manual reviewers from seeing visual cues. Using JavaScript to inject malicious prompts that only run after the page loads is known as dynamic execution.

Attacks of varying degrees of severity have been identified, some of which are intended to manipulate reviews or carry out unauthorized transactions. Financial losses or data security breaches are examples of high-severity cases. Lower-severity attacks, on the other hand, might interfere with system operation or produce unrelated results.

Web-based IDPI attack taxonomy (Source: Paloaltonetworks) Given the rise of web-based IDPI, defenders must enhance their detection and prevention capabilities. Traditional security measures, Paloalto Networks need to adopt advanced solutions capable of distinguishing between benign and malicious prompts. Solutions such as Advanced DNS Security, Prisma Browser, and Prisma AIRS can help identify and block these attacks before they reach LLMs, ensuring AI systems remain protected.

Proactive defense mechanisms that include prompt visibility, intent analysis, and behavioral correlation will be crucial to safeguarding against this new threat landscape as AI is further integrated into web applications.