#security-research#behavioral-security#ai-agents#mcp#honeypot

The attack surface is widening. The attackers are narrowing.

Abdel Fane

June 15, 2026

6 min read

ARIA, our autonomous research system, finished its second month of behavioral instrumentation and we published Issue 2 today.

State of AI Agent Security: A Surface in Migration. Behavioral Threat Report Issue 2

The numbers anchor on a 30-day window, May 16 to June 15, across four data streams. ARIAscout ran a fresh Shodan sweep on June 14 and counted 320,506 exposed AI services, up from 297,723 a month earlier. Inside the honeypot ecosystem, TrapMyAgent captured 106,943 honey-agent events in the window. AgentPwn logged 3,163 payload callbacks across 244,145 honeypot-page interactions. HoneyMap sampled 446 surfaces of injection bait planted on the open web by third parties.

The surface is getting bigger. Attacker attention is getting more selective. Both are true at the same time, and the gap between them is the story.

This is what we found.

97.9%

Events hit MCP

320,506

Exposed AI services

3.2x

Ollama exposure growth

41%

Attackers returned

The headline finding is a divergence

A month ago, three of every four attacker events we observed targeted the Model Context Protocol. This window it was 104,712 of 106,943, or 97.9 percent. The attackers didn't spread out as the surface grew. They concentrated.

At the same time the exposed surface widened. Total exposed AI services rose by more than 22,000 in a month. The growth wasn't evenly spread either: it piled up on model-serving infrastructure that mostly ships without authentication.

More doors are opening, and the attackers are walking through fewer of them, harder.

Some of that contraction is mechanical, and we checked before we leaned on it. May's window carried a single large recurring campaign, and a campaign that doesn't come back takes its volume with it. We still read the narrowing as real, because two independent signals moved the same way over the same window. AgentPwn page interactions rose to 244,145, and the MCP share sharpened to 97.9 percent. A campaign just leaving wouldn't pull both of those with it.

The attackers are narrowing onto MCP

MCP went from 75.4 percent of observed events a month ago to 97.9 percent this window. Agent-to-Agent handshakes moved the other way, from 15.2 percent to 1.2 percent. The MITRE T1550 count, Use Alternate Authentication Material, fell with them to 99, because in our classifier that signal is tied to A2A trust escalation.

A receding observation isn't a closed surface. ARIAscout still counts exposed A2A endpoints, so the right read is that attacker attention rotated, not that A2A got safe. If you ship an MCP server, this is the surface to harden this month: require authentication, rate limit tool discovery, and reject tool definitions that carry unicode tag-block sequences. That last point isn't hypothetical. Of the 446 wild bait surfaces HoneyMap sampled, 204, or 46 percent, carried a single unicode tag-block injection signature, the most common one we saw. The bait arrives months before the campaign that uses it. The recon runs ahead, so the time to filter for that signature is now, not when something weaponizes it.

The surface is widening underneath them

The fastest-growing exposure is model-serving infrastructure. ARIAscout counted 83,465 exposed Ollama instances on June 14, up from 25,705 a month earlier, and 31,773 exposed MLflow tracking servers, up from 11,620. Both surfaces commonly ship with no authentication at all.

Those two categories alone added more than 77,000 exposed endpoints, but the net total rose by only about 22,000, even as OpenClaw gateway exposure fell. So the headline number understates what happened. This wasn't uniform growth. It was a relocation: exposure drained from gateways and piled onto raw inference and experiment-tracking servers.

That's the migration in one sentence. Exposure is moving toward raw inference while attacker events concentrate on the protocol that brokers agent tool use. The exposed inference servers aren't yet where the observed attacker volume is. They're where it goes next. Put unauthenticated inference and experiment-tracking endpoints behind authentication and network policy before they're enumerated.

We almost misread our own data

When the June numbers first came in, MCP looked like it had collapsed from 75 percent to 35 percent. It hadn't. The drop was an artifact of how we were counting.

Our backend returns an all-time estimate for total events but computes every distribution over the trailing 30 days. In May the fleet was about 30 days old, so the all-time total and the 30-day window were effectively the same number, and the percentages lined up. By June the fleet was 60 days old. The all-time estimate, more than 300,000 events, was nearly three times the 30-day window of 106,943, because May's campaign had aged out of the window. Dividing 30-day distributions by an all-time total is what manufactured the fake collapse.

On the window basis, the one that matches the distributions and matches May's effective basis, MCP is 97.9 percent. We report on that basis and carry the all-time figure separately as a floor. We're telling you this because a behavioral report that can't explain its own denominator isn't worth reading, and the easiest number to get wrong is the one that confirms the headline you wanted.

Plan for persistence

Of 6,232 distinct fingerprints in the window, 2,575 came back, which is 41 percent. The top fingerprint returned across 793 sessions between May 21 and June 14, a 24-day span.

If your threat model assumes attacker activity is one-shot reconnaissance, it's wrong. The recurring visitors aren't just observing. The behavior is consistent with sampling response variability and building per-property knowledge to act on later. It also compounds the concentration: attackers that come back come back to the surface they have already mapped, and right now that surface is MCP. Fingerprint-stable telemetry that survives short-lived IP rotation is the minimum condition to measure any of this, and most defenders don't have it yet.

What this data isn't

99.5 percent of sessions classified as automated scanners. This is reconnaissance traffic at scale, not a wave of hands-on-keyboard intrusions, and we're not going to dress it up as one. The value is in the shape of the automation, where it concentrates and what it returns to, not in an APT narrative the data doesn't support.

We're also not naming a vendor or a model from user-agent strings. They're attacker-controllable and they're not an identity claim. Where we report user-agent data, it's aggregated by category, never attributed to a product.

Where the surface goes next

The migration is the thing to watch. This month the exposure relocated toward raw inference and the attackers concentrated onto MCP, and those two movements haven't met yet. The inference servers are filling up while the observed volume stays on the protocol. Next month tells us whether attention follows the surface to the inference endpoints, or whether MCP stays the whole game. Either way, the doors that opened this month are the ones to close first. And the wild bait is that same pattern one step earlier. The recon shows up before the campaign does, so the signatures in this window are worth filtering for now, not after something weaponizes them.

Read the full report

The web version is at research.opena2a.org/reports/state-of-ai-agent-security-2026-06. It includes the country and cloud-provider breakdowns, the MITRE ATT&CK overlay, the HoneyMap signature table, and a six-item defender checklist with Agent Threat Matrix and OASB control mappings.

Methodology is at research.opena2a.org/methodology. Every number traces to a query against live instrumentation over the stated window. Nothing is estimated, modeled, or projected.

If you want to dispute a finding, email info@opena2a.org with the specific number and the methodology you'd prefer. Substantive challenges that hold up get published as methodology updates with attribution. The next issue lands July 15.

ARIA is OpenA2A's autonomous research system. Editorial review by Abdel Fane. Disclosure of authorship is a credibility move, not a limitation.

License: Apache 2.0. Cite using the BibTeX block in the report.