Observability

Make Every Agent Authorization a Span

OpenA2A Team

June 11, 2026

#observability#opentelemetry#agent-authorization#aim

An AI agent charges a card it should not have, or it gets blocked at the worst possible moment, and the on-call engineer asks the one question that matters: why was it allowed, or why was it denied? For most agent systems the honest answer is that the decision isn't recorded anywhere. Authorization was a branch in the code that returned true or false and moved on.

You already trace the HTTP request that triggered the agent. The authorization decision the agent made next deserves the same trace. So we made it one.

The decision that matters most is the one nobody records

Web infrastructure solved this for requests a long time ago. A request enters, and you can follow it as a trace across every service it touches, with the latency and the outcome of each hop attached. When something breaks, you open the trace and read what happened.

Agent authorization is the decision with the highest blast radius and the least instrumentation. Whether an agent may charge a card, read a table, or call a tool is exactly the thing you'll want to reconstruct after an incident, and it's exactly the thing that usually leaves no structured record. The check ran in process, returned a boolean, and the reasoning evaporated.

“Why was the agent allowed?” should be a trace lookup, not an investigation.

Every check leaves a span, so the trace names the one that decided

In AIM, an action runs through five authorization checks before it executes: capability, attribute, context, chain, and intent. The backend emits that decision as one OpenTelemetry span, with a child span for each check. The trace is the decision, and reading it tells you which check allowed or denied the action and how long each one took.

fga.authorize            agent.id, agent.capability, fga.outcome, fga.denied_by
├── fga.capability_check  is the action inside the agent's grant?
├── fga.attribute_check   do the request attributes match policy?
├── fga.context_check     is the call consistent with the agent's context?
├── fga.chain_check       is the delegation chain intact?
└── fga.intent_check      on high-risk actions, the local classifier checks intent

The parent span carries the signals you'd actually filter on: which agent, which capability, the outcome, and, when the action was denied, the exact check that denied it. The trust score, drift signal, and scan verdict ride along too, so the agent's security posture at the moment of the decision is part of the same record, not a separate lookup.

Standard names mean it lands in the stack you already run

Emitting a span is only useful if your tools understand it. The attribute names AIM uses aren't invented for AIM. They come from a semantic-conventions proposal we authored for agent identity and authorization, so the same trace is queryable in any OpenTelemetry-compatible backend without custom parsing. Denied authorizations for one agent, in TraceQL:

{name="fga.authorize" && .agent.id="<uuid>" && .fga.outcome != "ALLOW"}

The same decisions are emitted as metrics and logs that share those attribute keys, so a Prometheus counter, a Tempo trace, and a Loki log line for one decision line up on agent.id and fga.outcome. Nothing here is a dashboard you have to adopt. It's your dashboard, with agent decisions in it.

What's solid and what's still moving

The tracing is real and runs today. The naming isn't a standard yet, and we're not going to call it one. The attribute names are locked to our proposal; they're a candidate for OpenTelemetry semantic conventions, not an adopted convention. If the working group changes them, we change with it.

One attribute is ahead of its wiring. agent.scan_verdict is emitted, but it reflects whatever a producer writes; the integration that fills it from a real security scan is on the roadmap, so treat it as reserved until then. And this traces the authorization decisions AIM makes, not every action an agent takes outside that boundary. Observability shows you the decisions you put through it.

Run it before you believe it

A hermetic demo stack ships with the backend: an OpenTelemetry Collector, Tempo, Prometheus, Loki, and Grafana. Bring it up and the smoke test confirms a trace, a metric, and a log all land end to end, so you can see a real authorization decision in Tempo rather than take our word for it.

cd apps/backend/deployments/otel-demo
docker compose up -d
./smoke-backend.sh

Agent authorization doesn't have to be the one decision in your system with no record. It's a decision with steps, inputs, an outcome, and a latency, which is to say it's a span. Make it one, and the question that used to start an investigation becomes a filter in the trace view you already have open.

Why was the agent allowed? Read the trace.

Read the observability docs How trust scoring works