Topic

Prompt Injection

Prompt injection attacks, mitigations, detection, and design patterns for safer AI applications.

prompt injectionindirect prompt injectionjailbreakagent hijackprompt abuse

Evergreen Overview

Prompt injection is the core attack pattern in modern AI applications. It happens when a model treats malicious or conflicting instructions from users, retrieved content, documents, tools, or pages as trusted guidance and changes its behavior in response.

What this page helps explain

Direct, indirect, and cross-context prompt injection
How documents, web content, and tool output become attack carriers
Why prompt injection is a workflow problem as much as a model problem

What secure teams focus on

Trust boundaries between instructions, content, tools, and actions
Approvals, isolation, and scoped permissions for agent behavior
Detection and monitoring patterns when prompt controls fail

Who this page is for

Agent builders and platform engineers
Readers studying retrieval or tool-enabled products
Leaders who need practical language for why this risk matters

References

Current notes, events, and source material

These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.

DEF CON August 6, 2026 - August 9, 2026 event upcoming

DEF CON 34 / AI Village 2026

DEF CON 34 takes place in Las Vegas and is expected to include AI security activity through villages, workshops, contests, and community-led research tracks as schedules firm up.

View details Open event page

Google Cloud Security Blog June 12, 2026 news

Powering the next era of Confidential AI

We are thrilled to collaborate with Apple on its expanded Private Cloud Compute (PCC) systems announced this week at WWDC 2026.

Agent Security Prompt Injection Model Evaluation

Read summary Source link

The Hacker News AI Security June 11, 2026 news

New Attacks Trick OpenClaw AI Agent Into Running Code and Leaking Secrets

Two security teams have shown, in separate research published this week, that OpenClaw, the popular self-hosted AI agent, can be driven to run attacker-controlled code or hand over sensitive data through ordinary-looking inputs. Imperva buried instructions inside shared contacts, vCards, and location pins that the agen

Prompt Injection Agent Security AI Red Teaming

Read summary Source link

Microsoft Security Blog June 10, 2026 news

Turn specs into evals for any agent with ASSERT

Adaptive Spec-driven Scoring for Evaluation and Regression Testing (ASSERT) is an open-source framework for converting natural language behavior requirements into executable evaluations of AI models and agents. The post Turn specs into evals for any agent with ASSERT appeared first on Microsoft Security Blog .

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

The Hacker News AI Security June 10, 2026 news

Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards

On June 9, Anthropic released Claude Fable 5, the most capable model it has ever made, generally available. It also did something unusual: it shipped one model as two products, split not by capability but by a layer of safety classifiers. Fable 5 goes to the public. Its twin, Claude Mythos 5, the same underlying model

Prompt Injection Agent Security AI Red Teaming

Read summary Source link

Google Cloud Security Blog June 9, 2026 news

Detecting and containing AI-powered threats with Google Security Operations agents

Learn how Google Security Operations works in concert with AI Threat Defense to monitor, detect, and respond to threats, particularly from code you do not own or can not patch.

Agent Security Prompt Injection Model Evaluation

Read summary Source link

Microsoft Security Blog June 9, 2026 news

Reconstructing AI activity in investigations

Learn how to investigate AI activity in Microsoft 365 Copilot and Azure AI services using a structured, telemetry-driven approach. This playbook helps security teams reconstruct events, assess data exposure, and detect potential threats faster. The post Reconstructing AI activity in investigations appeared first on Mic

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Microsoft Security Blog June 8, 2026 news

AI brands as bait: How threat actors are using the AI hype in social engineering

As threat actors operationalize AI to accelerate attacks, they are also leveraging the wider global interest around AI itself as a social engineering lure. The post AI brands as bait: How threat actors are using the AI hype in social engineering appeared first on Microsoft Security Blog .

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Microsoft Security Blog June 5, 2026 news

Securing CI/CD in an agentic world: Claude Code Github action case

Microsoft Threat Intelligence identified a prompt injection pathway in Claude Code GitHub Action that allowed access to workflow secrets under specific conditions. This research examines the attack chain, responsible disclosure process, Anthropic's mitigation, and guidance for securing AI-powered CI/CD workflows. The p

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Microsoft Security Blog June 4, 2026 news

Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us

A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces seven new failure modes, from supply chain compromise to goal hijacking, and the practical mitigations teams need now. The post Updating the taxonomy of failure modes

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Microsoft Security Blog June 3, 2026 news

Preinstall to persistence: Inside the Red Hat npm Miasma credential-stealing campaign

A large-scale npm supply chain attack compromised over 90 versions of @redhat-cloud-services packages, silently infecting CI/CD environments and developer systems. The malicious code steals credentials from GitHub, cloud platforms, and local machines, then spreads like a worm by republishing trusted packages. Discover

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Microsoft Security Blog June 2, 2026 news

Microsoft Build 2026: Securing code, agents, and models across the development lifecycle

Discover how Microsoft enables fast, secure AI development with MDASH and new security capabilities. The post Microsoft Build 2026: Securing code, agents, and models across the development lifecycle appeared first on Microsoft Security Blog .

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Google Cloud Security Blog May 29, 2026 news

Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

From industrial control systems to decades-old municipal databases, here’s our CISO guidance to prep AI-ready security programs for the public sector.

Agent Security Prompt Injection Model Evaluation

Read summary Source link

Attacking AI - Jason Haddix - NDC Security 2026 video thumbnail

Play video

YouTube April 16, 2026 video

Attacking AI - Jason Haddix - NDC Security 2026

Attacking AI is a one of a kind session releasing case studies, tactics, and methodology from Arcanum’s AI assessments in 2024 and 2025. While most AI assessment material focuses on academic AI red team content, “Attacking AI” is focused on the task of assessing AI enabled systems.

Prompt Injection

Open notes Watch on YouTube

Microsoft Security Blog March 12, 2026 guide

Detecting and analyzing prompt abuse in AI tools

Microsoft Incident Response explains how to detect prompt abuse using logging, telemetry, and incident response workflows.

Prompt Injection Agent Security

Read summary Source link

OpenAI March 11, 2026 analysis

Designing AI agents to resist prompt injection

OpenAI frames prompt injection as an agent-security problem that increasingly resembles social engineering rather than simple string matching.

Prompt Injection Agent Security

Read summary Source link

OpenAI March 9, 2026 news

OpenAI to acquire Promptfoo

OpenAI announced plans to acquire Promptfoo, highlighting automated AI security testing, red teaming, and evaluation as core enterprise requirements.

AI Red Teaming Prompt Engineering Prompt Injection

Read summary Source link

OpenAI December 22, 2025 analysis

Continuously hardening ChatGPT Atlas against prompt injection attacks

OpenAI describes using automated red teaming and reinforcement learning to discover agent prompt injection attacks before they appear in the wild.

Prompt Injection Agent Security AI Red Teaming

Read summary Source link

Google Cloud Blog December 4, 2025 guide

Building a Production-Ready AI Security Foundation

Google Cloud outlines a defense-in-depth view of AI security spanning application controls, data protections, and infrastructure isolation.

Agent Security Prompt Injection Adversarial ML

Read summary Source link

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that video thumbnail

Play video

AI Explained November 14, 2025 video

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering Prompt Injection AI Red Teaming Adversarial ML

Open notes Watch on YouTube

OpenAI November 7, 2025 guide

Understanding prompt injections: a frontier security challenge

An accessible explanation of prompt injection risk in real AI products, including how third-party content can redirect or manipulate agent behavior.

Prompt Injection Prompt Engineering

Read summary Source link

Google Cloud Blog March 5, 2025 news

Announcing AI Protection: Security for the AI era

Google introduced AI Protection and Model Armor to address prompt injection, jailbreaks, data loss, and multicloud AI workload security.

Prompt Injection Agent Security

Read summary Source link

OpenAI February 25, 2025 framework

Deep research System Card

OpenAI’s system card for deep research covers prompt injection, privacy, code execution, and external red teaming prior to release.

Model Evaluation Prompt Injection AI Compliance

Read summary Source link

OpenAI January 23, 2025 framework

Operator System Card

The Operator system card documents red teaming and mitigation choices for a computer-using agent, with prompt injections listed as a central risk area.

Agent Security Model Evaluation Prompt Injection AI Compliance

Read summary Source link

Microsoft Cloud Blog January 14, 2025 analysis

Enhancing AI safety: Insights and lessons from red teaming

Microsoft summarizes lessons from red teaming more than one hundred generative AI products, emphasizing system-level testing, human expertise, and automation.

AI Red Teaming Prompt Injection Agent Security

Read summary Source link

Microsoft Security Blog January 13, 2025 guide

3 takeaways from red teaming 100 generative AI products

Microsoft Security distills lessons from red teaming more than 100 generative AI products, including multimodal prompt injection and core cyber hygiene.

AI Red Teaming Prompt Injection

Read summary Source link

OWASP January 1, 2025 framework

OWASP Top 10 for Large Language Model Applications

OWASP’s GenAI security project remains a practical baseline for teams building or assessing LLM applications and agentic systems.

Prompt Injection Agent Security Adversarial ML

Read summary Source link

AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights video thumbnail

Play video

AI Explained October 10, 2024 video

AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Prompt Engineering Prompt Injection

Open notes Watch on YouTube

Gemini Ultra - Full Review video thumbnail

Play video

AI Explained February 8, 2024 video

Gemini Ultra - Full Review

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection

Open notes Watch on YouTube

OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star video thumbnail

Play video

AI Explained December 3, 2023 video

OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection

Open notes Watch on YouTube

11 Major AI Developments: RT-2 to '100X GPT-4' video thumbnail

Play video

AI Explained July 30, 2023 video

11 Major AI Developments: RT-2 to '100X GPT-4'

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection AI Red Teaming

Open notes Watch on YouTube

ChatGPT's Achilles' Heel video thumbnail

Play video

AI Explained June 25, 2023 video

ChatGPT's Achilles' Heel

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection

Open notes Watch on YouTube