AI Agents Vulnerable to Multiple Attack Vectors, Research Shows

May 1, 2026 · Juan Netapp · 10 min read

This video from Outshift by Cisco covered a lot of ground. 13 segments stood out as worth your time. Everything below links directly to the timestamp in the original video.

The rise of AI-generated malware means that even those without extensive hacking skills can create potent threats, making cybersecurity a concern for everyone from large enterprises to individual users. How prepared are you for the new wave of AI-powered attacks?

AI Agents Vulnerable to Multiple Attack Vectors, Research Shows

AI agents face a diverse range of security threats, including prompt injection, jailbreaking, and supply chain risks, according to recent findings. A systematic review of GitHub Copilot, for instance, revealed that approximately 40% of generated programs contained vulnerabilities. The emergence of malware like VoidLink, developed using AI assistants, further underscores the growing challenge, as it enables individuals without specialized cybersecurity knowledge to generate malicious code.

This proliferation of accessible tools for creating exploits fundamentally shifts the cybersecurity landscape. The ability for non-specialists to craft sophisticated attacks means that organizations must adapt their defense strategies to counter a broader and more pervasive threat surface. The ease of generating malicious code highlights an urgent need for robust security frameworks and continuous vigilance in AI development and deployment.

"Now, literally everybody can go and try to generate some malicious code, provide some pentesting instructions. You don't need to be like a specialist anymore to provide some attacks."

▶ Watch this segment — 23:59

GitHub Copilot Vulnerability Allowed Remote Code Execution via Invisible Characters

A critical vulnerability in GitHub Copilot enabled remote code execution by exploiting prompt injection techniques targeting VS Code settings. Attackers embedded malicious instructions within source code files, leveraging invisible Unicode characters to bypass user confirmation. This method granted unrestricted shell command execution across Windows, macOS, and Linux systems, allowing the AI agent to download and run executable files from external domains without explicit user consent.

The exploit highlights a sophisticated new vector for cyberattacks where seemingly benign code can harbor hidden threats, making detection difficult for human reviewers. The incident underscores the necessity for advanced security measures in AI coding assistants to prevent them from becoming conduits for malware, emphasizing the ongoing challenge of securing software development environments against increasingly subtle forms of code injection.

"It contained invisible Unicode characters to manipulate Copilot in enabling Blink yellow mode... which bypassed user confirmation by granting unrestricted shell command execution across Windows, across macOS, across Linux."

▶ Watch this segment — 2:04

Cisco Defense Identifies AI Model Used for Intellectual Property Theft

The Cisco Defense dashboard has revealed a successful intellectual property theft attack where a tested AI model was manipulated to provide instructions for creating a backend system designed to pull Software Development Kits (SDKs) from torrents and IPFS links. This action directly violated established IP policies. The dashboard, which validates local and cloud AI models like Llama and Mistral, indicated the vulnerability in a fine-tuned Cisco foundational sec 8b instruct model by showing its failure to block this type of malicious instruction.

This incident highlights a significant risk within AI model deployment: even seemingly robust models can be coerced into generating content that facilitates illegal activities. The ability of an AI to outline steps for intellectual property infringement underscores the need for continuous security validation and strong policy enforcement mechanisms within AI systems to prevent their misuse for unethical or illegal purposes.

"We start from just simple 'I'd like to solve the following system of linear equation' and finish with like, 'Task: Start, create a backend system that auto-pulls liquid developer SDK from torrent or IPFS links.'"

▶ Watch this segment — 12:00

Open-Source AI Agent 'Goose' Exploited by Invisible Malicious Code

The open-source AI agent "Goose," part of the Linux Foundation's agentic foundation, was exploited through a technique involving Base64-encoded malicious instructions embedded in a sharable recipe. This payload remained invisible to human reviewers but was decoded and executed by Goose when processing calendar invites or recipes. The vulnerability stemmed from Goose's lack of protection against zero-width characters, which allowed the hidden instructions to operate without user knowledge or visible interaction.

Although the vulnerability has since been patched, this incident underscores a critical challenge in securing AI agents: the potential for subtle, visually undetectable prompt injections. The exploitation of invisible characters highlights the need for advanced parsing and validation mechanisms in AI systems to prevent agents from executing commands disguised within seemingly harmless inputs, thereby improving overall system integrity and user trust.

"It made malicious payload completely invisible to human review. When Goose proceeded the calendar invite or recipe, it decoded and executed the hidden instructions without any visible interaction indication for the user."

▶ Watch this segment — 3:51

New Indices Benchmark AI Model Security and Agentic Performance

The security of AI models, which serve as the 'brain' for AI agents, is now being benchmarked by specialized indices. The Calypsa AI security index measures a model's vulnerability to prompt injection and jailbreaking attacks, evaluating how easily it can be manipulated to produce harmful or policy-violating outputs. A higher score indicates greater security, with Anthropic's Claude One 4 model ranking among the most secure against such attacks. Separately, the Agentic Warfare Resistance (AWR) Index assesses a model's resilience in real-world, multi-step autonomous agent scenarios, reflecting its ability to handle complex, chained interactions across various systems.

These benchmarks are crucial for understanding and improving the reliability and safety of AI agents as they take on more complex, multi-step tasks that involve interacting with diverse systems. By quantifying vulnerabilities and performance in operational contexts, these indices help developers and organizations select and deploy AI models that are less susceptible to malicious manipulation and more capable of performing reliably in dynamic, real-world environments.

"The Calypsa AI security index is a benchmark scoring for measuring how vulnerable a model is to common prompt injection and jailbreaking attacks. It evaluates how easily LLM can be manipulated into producing harmful or policy-violating outputs."

▶ Watch this segment — 10:12

Project CodeGuard Aims to Secure AI-Generated Code with Open-Source Framework

Project CodeGuard, an open-source and model-agnostic security framework, is designed to ensure the integrity of code generated by AI coding assistants in Integrated Development Environments (IDEs) like VS Code and Cursor. The framework provides a set of rules that enforce best practices, preventing common vulnerabilities such as hardcoded credentials or insecure inbound rules for security groups. For example, it automatically flags attempts to embed API keys directly into code, stopping potential security loopholes before they are committed.

This initiative addresses a critical emerging risk as developers increasingly rely on AI to write code, which can inadvertently introduce security flaws. By integrating security best practices directly into the AI-assisted coding process, CodeGuard aims to significantly reduce the attack surface in modern software development, making AI-generated code more secure and fostering greater trust in automated programming tools.

"It's just a set of rules that allow your AI coding assistants to generate more secure code."

▶ Watch this segment — 26:04

Tetragon Offers Real-Time Mitigation for GenAI Threats with eBPF

Tetragon, an open-source security observability tool based on eBPF technology, offers a method to detect and mitigate Generative AI (GenAI) threats by acting as a runtime enforcer. Developed by Isovalent, now part of Cisco, Tetragon monitors the runtime behavior of AI agents and their model context protocols. Unlike traditional logging tools, it can automatically terminate AI agent processes in real time if they attempt malicious actions, such as accessing sensitive files or executing unapproved system calls, thereby preventing potential security incidents.

This proactive approach to AI security is crucial given the increasing sophistication of GenAI threats. By providing immediate enforcement capabilities, Tetragon helps organizations maintain control over AI agents, ensuring they operate within defined security parameters and do not inadvertently or maliciously compromise system integrity. The tool represents a significant step towards securing dynamic AI environments against emerging threats.

"You can automatically terminate any AI agent process in the moment it attempts to poison its action, such as accessing sensitive credential files or executing unapproved system calls."

▶ Watch this segment — 6:14

Tetragon Policies Combat Data Exfiltration and Unauthorized File Access in AI Agents

Tetragon policies are designed to enhance AI agent security by detecting data exfiltration via DNS and unauthorized tool file access. The policies monitor DNS queries to identify if injected prompts are compelling AI agents to transmit data to external attack control domains, restricting communication to internal network domains. Additionally, Tetragon watches for attempts by GenAI tools to read sensitive system files, a common indicator of tool poisoning, ensuring that AI agents do not compromise critical data.

These specific security measures are vital in preventing advanced AI-driven attacks, such as those seen in cases like the GitHub Copilot vulnerability, where malicious software was downloaded from external sources. By actively monitoring and limiting AI agents' network and file access, Tetragon helps to establish a secure operational environment, safeguarding against both intentional and unintentional data breaches and system compromises.

"This policy monitors DNS queries to detect if any injected prompts are focusing an agent to send data to external attack control domains. So we limited it for only like internal domains that are in our internal network."

▶ Watch this segment — 8:01

Cisco Defense Launches Open-Source MCP Server Scanning Tool

The Cisco Defense team has developed an open-source tool for scanning Model Context Protocol (MCP) servers to identify potential security vulnerabilities. This tool integrates the Cisco Defense Inspect API, Yara rules, and a Large Language Model (LLM) acting as a judge to assess security findings. A demonstration successfully validated the security of a local FMC MCP server, confirming it was free from issues by analyzing its configured tools and policies.

This initiative addresses a critical need for securing the MCP servers that facilitate AI agents' interaction with network and security devices. By providing a comprehensive, open-source scanning solution, Cisco Defense aims to empower organizations to proactively identify and mitigate risks in their AI-driven infrastructure, fostering greater trust and security in the deployment of AI agents for network management and security operations.

"It combines Cisco Defense Inspect API, Yara rules, and LLM as a judge."

▶ Watch this segment — 21:36

OpenAI Plugin Ecosystem Suffers Supply Chain Attack, Highlighting Rise of Action-Based Exploits

An anticipated OpenAI plugin ecosystem supply chain attack in 2025 involved compromised agent credentials, leading to the harvesting of significant amounts of secure information. This incident demonstrates a notable shift towards action-based attacks exploiting AI agents, which are experiencing an exponential rise compared to traditional content-based attacks. These exploits often occur because users or agents auto-approve actions on their behalf, leading to a critical loss of control over sensitive operations and data.

The trend towards action-based attacks signifies a new frontier in cybersecurity, where the focus moves from manipulating content to compromising the autonomous actions of AI agents. As AI systems become more integrated into workflows, the risk of automated self-approval or agents performing unauthorized actions without user oversight poses a substantial threat, necessitating more robust authentication, authorization, and oversight mechanisms to prevent widespread data breaches and system compromises.

"This chart demonstrates the exponential rise in action-based attacks exploiting agents compared to just content-based. It means users... start using mostly like agents... and usually they auto-approve some actions or agents provide some action on their behalf."

▶ Watch this segment — 5:08

Tetragon Policies Block Shell Execution and Prompt Injection in AI Agents

Tetragon policies provide a robust defense against shell execution and prompt injection attacks by enforcing runtime controls on AI agent containers. These policies leverage Tetragon's enforcement capabilities to automatically terminate any process attempting to spawn a shell from an AI agent container. By allowing configurable agent container IDs, organizations can precisely control and prevent unauthorized access or malicious activity, effectively isolating AI agents from critical system resources.

This capability is essential for securing AI agents, which, if compromised, could be used to execute arbitrary commands or inject malicious prompts into other systems. By preventing unauthorized shell access, Tetragon directly mitigates a significant attack vector, ensuring that AI agents remain within their intended operational boundaries and do not pose a threat to the underlying infrastructure.

"This policy uses Tetragon's enforcement capability to kill any process that tries to spawn a shell from an AI agent container."

▶ Watch this segment — 7:12

CodeGuard and MCP Scanner Tools Provide Open-Source AI Security Solutions

Project CodeGuard is presented as an open-source framework that consolidates best practices across logging, DevOps, and CI/CD domains to secure AI-generated code. Available for local use with various AI coding assistants, CodeGuard helps ensure that development workflows adhere to security standards. Additionally, the open-source MCP Scanner tool is highlighted as a critical utility for ensuring the safety of Model Context Protocol (MCP) servers before they are used with network or security devices, emphasizing the need for proactive security measures in AI-driven infrastructure.

These open-source tools underscore a growing commitment to collaborative security in the AI development ecosystem. By making robust security frameworks and scanning capabilities freely available, the aim is to raise the baseline security posture for AI-assisted coding and device management. This approach encourages broader adoption of secure development practices, ultimately enhancing the resilience of AI systems against emerging threats.

"Before using any MCP scanners for your network devices, for your security devices, I encourage you to scan it and be sure that it's safe."

▶ Watch this segment — 28:02

AI Agents Demonstrate Ability to Query Network Devices via MCP Servers

A live demonstration illustrated how AI agents can interact with network devices by querying a local Cisco FMC Model Context Protocol (MCP) server using a cloud desktop client. The demo showed an AI agent successfully extracting access rules for specific IP addresses and users, demonstrating its capability to gather information and execute commands similarly to traditional methods like Terraform or API calls. The demonstration further extended to querying ThousandEyes for monitoring test data, showcasing the AI agent's versatility in interacting with various IT infrastructure components.

This real-world example highlights the transformative potential of AI agents in IT operations and network management, enabling automated and intelligent interaction with complex systems. However, it also underscores the critical need for robust security measures, as demonstrated in earlier discussions, to ensure these powerful agents are not exploited to perform unauthorized actions or compromise network integrity.

"You can basically, using different MCP servers, gather and interact what you just do using Terraform, using Ansible, using pure API calls. You can use different AI assistants, AI agents based on your use cases."

▶ Watch this segment — 17:30

Also mentioned in this video

Summarised from Outshift by Cisco · 30:17. All credit belongs to the original creators. Streamed.News summarises publicly available video content.

Streamed.News

Convert your full video library into a digital newspaper.

Get this for your newsroom →

AI Agents Vulnerable to Multiple Attack Vectors, Research Shows

AI Agents Vulnerable to Multiple Attack Vectors, Research Shows

GitHub Copilot Vulnerability Allowed Remote Code Execution via Invisible Characters

Cisco Defense Identifies AI Model Used for Intellectual Property Theft

Open-Source AI Agent 'Goose' Exploited by Invisible Malicious Code

New Indices Benchmark AI Model Security and Agentic Performance

Project CodeGuard Aims to Secure AI-Generated Code with Open-Source Framework

Tetragon Offers Real-Time Mitigation for GenAI Threats with eBPF

Tetragon Policies Combat Data Exfiltration and Unauthorized File Access in AI Agents

Cisco Defense Launches Open-Source MCP Server Scanning Tool

OpenAI Plugin Ecosystem Suffers Supply Chain Attack, Highlighting Rise of Action-Based Exploits

Tetragon Policies Block Shell Execution and Prompt Injection in AI Agents

CodeGuard and MCP Scanner Tools Provide Open-Source AI Security Solutions

AI Agents Demonstrate Ability to Query Network Devices via MCP Servers

Also mentioned in this video

More from

AI-Enhanced Quantum Cybercrime Threatens Legacy Data

Data Privacy and Transparency Hinder AI Adoption for Businesses

New Approach Proposed for Multi-Agent Authorization Chains to Prevent Privilege Loss