#security-research#behavioral-security#ai-agents#mcp#honeypot

The attack surface is widening. The attackers are narrowing.

Abdel Fane
6 min read

ARIA, our autonomous research system, finished its second month of behavioral instrumentation and we published Issue 2 today.

State of AI Agent Security: A Surface in Migration. Behavioral Threat Report Issue 2

The numbers anchor on a 30-day window, May 16 to June 15, across four data streams. ARIAscout ran a fresh Shodan sweep on June 14 and counted 320,506 exposed AI services, up from 297,723 a month earlier. Inside the honeypot ecosystem, TrapMyAgent captured 106,943 honey-agent events in the window. AgentPwn logged 3,163 payload callbacks across 244,145 honeypot-page interactions. HoneyFinder sampled 446 surfaces of injection bait planted on the open web by third parties.

The surface is getting bigger. Attacker attention is getting more selective. Both are true at the same time, and the gap between them is the story.

This is what we found.

97.9%
Events hit MCP
320,506
Exposed AI services
3.2x
Ollama exposure growth
41%
Attackers returned

The headline finding is a divergence

A month ago, three of every four attacker events we observed targeted the Model Context Protocol. This window it was 104,712 of 106,943, or 97.9 percent. The attackers did not spread out as the surface grew. They concentrated.

At the same time the exposed surface widened. Total exposed AI services rose by more than 22,000 in a month. The growth was not evenly spread either: it piled up on model-serving infrastructure that mostly ships without authentication.

More doors are opening, and the attackers are walking through fewer of them, harder.

The attackers are narrowing onto MCP

MCP went from 75.4 percent of observed events a month ago to 97.9 percent this window. Agent-to-Agent handshakes moved the other way, from 15.2 percent to 1.2 percent. The MITRE T1550 count, Use Alternate Authentication Material, fell with them to 99, because in our classifier that signal is tied to A2A trust escalation.

A receding observation is not a closed surface. ARIAscout still counts exposed A2A endpoints, so the right read is that attacker attention rotated, not that A2A got safe. If you ship an MCP server, this is the surface to harden this month: require authentication, rate limit tool discovery, and reject tool definitions that carry unicode tag-block sequences. That last point is not hypothetical. Of the 446 wild bait surfaces HoneyFinder sampled, 204, or 46 percent, carried a single unicode tag-block injection signature, the most common one we saw.

The surface is widening underneath them

The fastest-growing exposure is model-serving infrastructure. ARIAscout counted 83,465 exposed Ollama instances on June 14, up from 25,705 a month earlier, and 31,773 exposed MLflow tracking servers, up from 11,620. Both surfaces commonly ship with no authentication at all.

Those two categories alone added more than 77,000 exposed endpoints, but the net total rose by only about 22,000, even as OpenClaw gateway exposure fell. So the headline number understates what happened. This was not uniform growth. It was a relocation: exposure drained from gateways and piled onto raw inference and experiment-tracking servers.

That is the migration in one sentence. Exposure is moving toward raw inference while attacker events concentrate on the protocol that brokers agent tool use. The exposed inference servers are not yet where the observed attacker volume is. They are where it goes next. Put unauthenticated inference and experiment-tracking endpoints behind authentication and network policy before they are enumerated.

We almost misread our own data

When the June numbers first came in, MCP looked like it had collapsed from 75 percent to 35 percent. It had not. The drop was an artifact of how we were counting.

Our backend returns an all-time estimate for total events but computes every distribution over the trailing 30 days. In May the fleet was about 30 days old, so the all-time total and the 30-day window were effectively the same number, and the percentages lined up. By June the fleet was 60 days old. The all-time estimate, more than 300,000 events, was nearly three times the 30-day window of 106,943, because May's campaign had aged out of the window. Dividing 30-day distributions by an all-time total is what manufactured the fake collapse.

On the window basis, the one that matches the distributions and matches May's effective basis, MCP is 97.9 percent. We report on that basis and carry the all-time figure separately as a floor. We are telling you this because a behavioral report that cannot explain its own denominator is not worth reading, and the easiest number to get wrong is the one that confirms the headline you wanted.

Plan for persistence

Of 6,232 distinct fingerprints in the window, 2,575 came back, which is 41 percent. The top fingerprint returned across 793 sessions between May 21 and June 14, a 24-day span.

If your threat model assumes attacker activity is one-shot reconnaissance, it is wrong. The recurring visitors are not just observing. The behavior is consistent with sampling response variability and building per-property knowledge to act on later. It also compounds the concentration: attackers that come back come back to the surface they have already mapped, and right now that surface is MCP. Fingerprint-stable telemetry that survives short-lived IP rotation is the minimum condition to measure any of this, and most defenders do not have it yet.

What this data is not

99.5 percent of sessions classified as automated scanners. This is reconnaissance traffic at scale, not a wave of hands-on-keyboard intrusions, and we are not going to dress it up as one. The value is in the shape of the automation, where it concentrates and what it returns to, not in an APT narrative the data does not support.

We are also not naming a vendor or a model from user-agent strings. They are attacker-controllable and they are not an identity claim. Where we report user-agent data, it is aggregated by category, never attributed to a product.

Where the surface goes next

The migration is the thing to watch. This month the exposure relocated toward raw inference and the attackers concentrated onto MCP, and those two movements have not met yet. The inference servers are filling up while the observed volume stays on the protocol. Next month tells us whether attention follows the surface to the inference endpoints, or whether MCP stays the whole game. Either way, the doors that opened this month are the ones to close first.

Read the full report

The web version is at research.opena2a.org/reports/state-of-ai-agent-security-2026-06. It includes the country and cloud-provider breakdowns, the MITRE ATT&CK overlay, the HoneyFinder signature table, and a six-item defender checklist with Agent Threat Matrix and OASB control mappings.

Methodology is at research.opena2a.org/methodology. Every number traces to a query against live instrumentation over the stated window. Nothing is estimated, modeled, or projected.

If you want to dispute a finding, email info@opena2a.org with the specific number and the methodology you would prefer. Substantive challenges that hold up get published as methodology updates with attribution. The next issue lands July 15.

ARIA is OpenA2A's autonomous research system. Editorial review by Abdel Fane. Disclosure of authorship is a credibility move, not a limitation.

License: Apache 2.0. Cite using the BibTeX block in the report.