Topic

Prompt Injection

Prompt injection attacks, mitigations, detection, and design patterns for safer AI applications.

prompt injectionindirect prompt injectionjailbreakagent hijackprompt abuse
Evergreen Overview

Prompt injection is the core attack pattern in modern AI applications. It happens when a model treats malicious or conflicting instructions from users, retrieved content, documents, tools, or pages as trusted guidance and changes its behavior in response.

What this page helps explain
  • Direct, indirect, and cross-context prompt injection
  • How documents, web content, and tool output become attack carriers
  • Why prompt injection is a workflow problem as much as a model problem
What secure teams focus on
  • Trust boundaries between instructions, content, tools, and actions
  • Approvals, isolation, and scoped permissions for agent behavior
  • Detection and monitoring patterns when prompt controls fail
Who this page is for
  • Agent builders and platform engineers
  • Readers studying retrieval or tool-enabled products
  • Leaders who need practical language for why this risk matters
References

Current notes, events, and source material

These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.

The Hacker News AI Security June 11, 2026 news

New Attacks Trick OpenClaw AI Agent Into Running Code and Leaking Secrets

Two security teams have shown, in separate research published this week, that OpenClaw, the popular self-hosted AI agent, can be driven to run attacker-controlled code or hand over sensitive data through ordinary-looking inputs. Imperva buried instructions inside shared contacts, vCards, and location pins that the agen

The Hacker News AI Security June 10, 2026 news

Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards

On June 9, Anthropic released Claude Fable 5, the most capable model it has ever made, generally available. It also did something unusual: it shipped one model as two products, split not by capability but by a layer of safety classifiers. Fable 5 goes to the public. Its twin, Claude Mythos 5, the same underlying model

Microsoft Security Blog June 5, 2026 news

Securing CI/CD in an agentic world: Claude Code Github action case

Microsoft Threat Intelligence identified a prompt injection pathway in Claude Code GitHub Action that allowed access to workflow secrets under specific conditions. This research examines the attack chain, responsible disclosure process, Anthropic's mitigation, and guidance for securing AI-powered CI/CD workflows. The p

Microsoft Security Blog June 4, 2026 news

Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us

A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces seven new failure modes, from supply chain compromise to goal hijacking, and the practical mitigations teams need now. The post Updating the taxonomy of failure modes

Microsoft Security Blog June 3, 2026 news

Preinstall to persistence: Inside the Red Hat npm Miasma credential-stealing campaign

A large-scale npm supply chain attack compromised over 90 versions of @redhat-cloud-services packages, silently infecting CI/CD environments and developer systems. The malicious code steals credentials from GitHub, cloud platforms, and local machines, then spreads like a worm by republishing trusted packages. Discover