Prompt injection is the core attack pattern in modern AI applications. It happens when a model treats malicious or conflicting instructions from users, retrieved content, documents, tools, or pages as trusted guidance and changes its behavior in response.
Prompt Injection
Prompt injection attacks, mitigations, detection, and design patterns for safer AI applications.
- Direct, indirect, and cross-context prompt injection
- How documents, web content, and tool output become attack carriers
- Why prompt injection is a workflow problem as much as a model problem
- Trust boundaries between instructions, content, tools, and actions
- Approvals, isolation, and scoped permissions for agent behavior
- Detection and monitoring patterns when prompt controls fail
- Agent builders and platform engineers
- Readers studying retrieval or tool-enabled products
- Leaders who need practical language for why this risk matters
Current notes, events, and source material
These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.
DEF CON 34 / AI Village 2026
DEF CON 34 takes place in Las Vegas and is expected to include AI security activity through villages, workshops, contests, and community-led research tracks as schedules firm up.
Powering the next era of Confidential AI
We are thrilled to collaborate with Apple on its expanded Private Cloud Compute (PCC) systems announced this week at WWDC 2026.
New Attacks Trick OpenClaw AI Agent Into Running Code and Leaking Secrets
Two security teams have shown, in separate research published this week, that OpenClaw, the popular self-hosted AI agent, can be driven to run attacker-controlled code or hand over sensitive data through ordinary-looking inputs. Imperva buried instructions inside shared contacts, vCards, and location pins that the agen
Turn specs into evals for any agent with ASSERT
Adaptive Spec-driven Scoring for Evaluation and Regression Testing (ASSERT) is an open-source framework for converting natural language behavior requirements into executable evaluations of AI models and agents. The post Turn specs into evals for any agent with ASSERT appeared first on Microsoft Security Blog .
Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards
On June 9, Anthropic released Claude Fable 5, the most capable model it has ever made, generally available. It also did something unusual: it shipped one model as two products, split not by capability but by a layer of safety classifiers. Fable 5 goes to the public. Its twin, Claude Mythos 5, the same underlying model
Detecting and containing AI-powered threats with Google Security Operations agents
Learn how Google Security Operations works in concert with AI Threat Defense to monitor, detect, and respond to threats, particularly from code you do not own or can not patch.
Reconstructing AI activity in investigations
Learn how to investigate AI activity in Microsoft 365 Copilot and Azure AI services using a structured, telemetry-driven approach. This playbook helps security teams reconstruct events, assess data exposure, and detect potential threats faster. The post Reconstructing AI activity in investigations appeared first on Mic
AI brands as bait: How threat actors are using the AI hype in social engineering
As threat actors operationalize AI to accelerate attacks, they are also leveraging the wider global interest around AI itself as a social engineering lure. The post AI brands as bait: How threat actors are using the AI hype in social engineering appeared first on Microsoft Security Blog .
Securing CI/CD in an agentic world: Claude Code Github action case
Microsoft Threat Intelligence identified a prompt injection pathway in Claude Code GitHub Action that allowed access to workflow secrets under specific conditions. This research examines the attack chain, responsible disclosure process, Anthropic's mitigation, and guidance for securing AI-powered CI/CD workflows. The p
Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us
A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces seven new failure modes, from supply chain compromise to goal hijacking, and the practical mitigations teams need now. The post Updating the taxonomy of failure modes
Preinstall to persistence: Inside the Red Hat npm Miasma credential-stealing campaign
A large-scale npm supply chain attack compromised over 90 versions of @redhat-cloud-services packages, silently infecting CI/CD environments and developer systems. The malicious code steals credentials from GitHub, cloud platforms, and local machines, then spreads like a worm by republishing trusted packages. Discover
Microsoft Build 2026: Securing code, agents, and models across the development lifecycle
Discover how Microsoft enables fast, secure AI development with MDASH and new security capabilities. The post Microsoft Build 2026: Securing code, agents, and models across the development lifecycle appeared first on Microsoft Security Blog .
Cloud CISO Perspectives: How to build an AI-ready security program for the public sector
From industrial control systems to decades-old municipal databases, here’s our CISO guidance to prep AI-ready security programs for the public sector.
Play video
Attacking AI - Jason Haddix - NDC Security 2026
Attacking AI is a one of a kind session releasing case studies, tactics, and methodology from Arcanum’s AI assessments in 2024 and 2025. While most AI assessment material focuses on academic AI red team content, “Attacking AI” is focused on the task of assessing AI enabled systems.
Detecting and analyzing prompt abuse in AI tools
Microsoft Incident Response explains how to detect prompt abuse using logging, telemetry, and incident response workflows.
Designing AI agents to resist prompt injection
OpenAI frames prompt injection as an agent-security problem that increasingly resembles social engineering rather than simple string matching.
OpenAI to acquire Promptfoo
OpenAI announced plans to acquire Promptfoo, highlighting automated AI security testing, red teaming, and evaluation as core enterprise requirements.
Continuously hardening ChatGPT Atlas against prompt injection attacks
OpenAI describes using automated red teaming and reinforcement learning to discover agent prompt injection attacks before they appear in the wild.
Building a Production-Ready AI Security Foundation
Google Cloud outlines a defense-in-depth view of AI security spanning application controls, data protections, and infrastructure isolation.
Play video
Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Understanding prompt injections: a frontier security challenge
An accessible explanation of prompt injection risk in real AI products, including how third-party content can redirect or manipulate agent behavior.
Announcing AI Protection: Security for the AI era
Google introduced AI Protection and Model Armor to address prompt injection, jailbreaks, data loss, and multicloud AI workload security.
Deep research System Card
OpenAI’s system card for deep research covers prompt injection, privacy, code execution, and external red teaming prior to release.
Operator System Card
The Operator system card documents red teaming and mitigation choices for a computer-using agent, with prompt injections listed as a central risk area.
Enhancing AI safety: Insights and lessons from red teaming
Microsoft summarizes lessons from red teaming more than one hundred generative AI products, emphasizing system-level testing, human expertise, and automation.
3 takeaways from red teaming 100 generative AI products
Microsoft Security distills lessons from red teaming more than 100 generative AI products, including multimodal prompt injection and core cyber hygiene.
OWASP Top 10 for Large Language Model Applications
OWASP’s GenAI security project remains a practical baseline for teams building or assessing LLM applications and agentic systems.
Play video
AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini Ultra - Full Review
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
11 Major AI Developments: RT-2 to '100X GPT-4'
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT's Achilles' Heel
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.