Make Every Agent Authorization a Span
An AI agent charges a card it should not have, or it gets blocked at the worst possible moment, and the on-call engineer asks the one question that matters: why was it allowed, or why was it denied? For most agent systems the honest answer is that the decision is not recorded anywhere. Authorization was a branch in the code that returned true or false and moved on.
You already trace the HTTP request that triggered the agent. The authorization decision the agent made next deserves the same trace. So we made it one.
The decision that matters most is the one nobody records
Web infrastructure solved this for requests a long time ago. A request enters, and you can follow it as a trace across every service it touches, with the latency and the outcome of each hop attached. When something breaks, you open the trace and read what happened.
Agent authorization is the decision with the highest blast radius and the least instrumentation. Whether an agent may charge a card, read a table, or call a tool is exactly the thing you will want to reconstruct after an incident, and it is exactly the thing that usually leaves no structured record. The check ran in process, returned a boolean, and the reasoning evaporated.
“Why was the agent allowed?” should be a trace lookup, not an investigation.
Every check leaves a span, so the trace names the one that decided
In AIM, an action runs through five authorization checks before it executes: capability, attribute, context, chain, and intent. The backend emits that decision as one OpenTelemetry span, with a child span for each check. The trace is the decision, and reading it tells you which check allowed or denied the action and how long each one took.
fga.authorize agent.id, agent.capability, fga.outcome, fga.denied_by
├── fga.capability_check is the action inside the agent's grant?
├── fga.attribute_check do the request attributes match policy?
├── fga.context_check is the call consistent with the agent's context?
├── fga.chain_check is the delegation chain intact?
└── fga.intent_check on high-risk actions, the local classifier checks intentThe parent span carries the signals you would actually filter on: which agent, which capability, the outcome, and, when the action was denied, the exact check that denied it. The trust score, drift signal, and scan verdict ride along too, so the agent's security posture at the moment of the decision is part of the same record, not a separate lookup.
Standard names mean it lands in the stack you already run
Emitting a span is only useful if your tools understand it. The attribute names AIM uses are not invented for AIM. They come from a semantic-conventions proposal we authored for agent identity and authorization, so the same trace is queryable in any OpenTelemetry-compatible backend without custom parsing. Denied authorizations for one agent, in TraceQL:
{name="fga.authorize" && .agent.id="<uuid>" && .fga.outcome != "ALLOW"}The same decisions are emitted as metrics and logs that share those attribute keys, so a Prometheus counter, a Tempo trace, and a Loki log line for one decision line up on agent.id and fga.outcome. Nothing here is a dashboard you have to adopt. It is your dashboard, with agent decisions in it.
What is solid and what is still moving
The tracing is real and runs today. The naming is not a standard yet, and we are not going to call it one. The attribute names are locked to our proposal; they are a candidate for OpenTelemetry semantic conventions, not an adopted convention. If the working group changes them, we change with it.
One attribute is ahead of its wiring. agent.scan_verdict is emitted, but it reflects whatever a producer writes; the integration that fills it from a real security scan is on the roadmap, so treat it as reserved until then. And this traces the authorization decisions AIM makes, not every action an agent takes outside that boundary. Observability shows you the decisions you put through it.
Run it before you believe it
A hermetic demo stack ships with the backend: an OpenTelemetry Collector, Tempo, Prometheus, Loki, and Grafana. Bring it up and the smoke test confirms a trace, a metric, and a log all land end to end, so you can see a real authorization decision in Tempo rather than take our word for it.
cd apps/backend/deployments/otel-demo
docker compose up -d
./smoke-backend.shAgent authorization does not have to be the one decision in your system with no record. It is a decision with steps, inputs, an outcome, and a latency, which is to say it is a span. Make it one, and the question that used to start an investigation becomes a filter in the trace view you already have open.
Why was the agent allowed? Read the trace.