The benchmark for AI agent security.
222 standardized attack scenarios that evaluate whether a runtime security tool can detect and respond to threats against AI agents. Three maturity levels. Mapped to MITRE ATLAS and OWASP Agentic Top 10. Open specification, open test corpus.
npx hackmyagent secure -b oasb-1What is OASB
OASB evaluates security products, not agents. It answers a specific question. Can your runtime security tool detect and block attacks against AI agents. This is the same concept as MITRE ATT&CK Evaluations, which test endpoint security products against known adversary techniques. Applied to AI.
| OASB | HackMyAgent | |
|---|---|---|
| Purpose | Evaluate security products | Pentest agents |
| Analogy | MITRE ATT&CK Evaluations | OWASP ZAP |
| Target | Runtime security tools | AI agents themselves |
| Output | Detection rate scorecard | Vulnerability report |
Three maturity levels
Pick the level your project needs to clear. L1 is the floor for any project shipping to production. L2 is the standard for customer facing AI agents. L3 is for regulated, high stakes, or autonomous fleet deployments.
Essential
Baseline checks every AI agent project should pass before going to production. Credential hygiene, governance presence, basic identity.
- Credential and secret detection
- SOUL.md governance file present
- Cryptographic agent identity
- Lock file and dependency hygiene
Standard
What a typical production AI agent deployment is expected to clear. Adds runtime monitoring, MCP validation, and trust scoring.
- All L1 checks plus 18 additional controls
- MCP server identity and tool call validation
- Runtime behavior monitoring and anomaly detection
- 8 factor trust scoring with audit trail
Hardened
Full coverage for regulated workloads, autonomous fleets, and high stakes deployments. Adds capability enforcement and post quantum readiness.
- All L2 checks plus 2 advanced controls
- Capability enforcement with default deny policies
- Post quantum signing readiness (ML-DSA)
- Full A2A trust boundary validation
Numbers above mirror the HackMyAgent OASB profile counts: 26 essential, 44 standard, 46 hardened.
Ten assessment categories
OASB groups every test into one of ten categories that span the full attack surface of an AI agent. Identity, capability, input, output, credentials, supply chain, A2A, memory, operations, and monitoring.
Identity and Provenance
Ed25519 and ML-DSA post quantum keypairs, ownership verification, agent bill of materials.
Capability and Authorization
Capability based access control, just in time access grants, runtime enforcement.
Input Security
164 attack payloads across 16 categories, runtime prompt interception, jailbreak detection.
Output Security
Output validation, exfiltration detection, runtime output scanning for sensitive data.
Credential Protection
49 credential patterns, MCP vault protection, context window isolation, scope drift analysis.
Supply Chain Integrity
Skill hash pinning, configuration signing, trust verification across npm, PyPI, and GitHub sources.
Agent to Agent Security
Mutual authentication, 10 A2A attack payloads, trust boundaries, federated identity.
Memory and Context
Context manipulation testing, runtime memory isolation, conversation history hygiene.
Operational Security
209 static plus 29 semantic configuration checks, process, network, and filesystem monitoring.
Monitoring and Response
8 factor trust scoring, behavioral anomaly detection, kill switch, audit trail with append only logs.
Test structure
Four kinds of tests cover discrete detection, multi step chains, false positive validation, and real OS level execution.
Atomic Tests
Discrete detection tests covering OS-level system calls and AI-layer attacks. Each test isolates a single technique for precise evaluation.
Integration Tests
Multi-step attack chains that combine techniques into realistic scenarios. Tests whether security tools detect coordinated threats.
Baseline Tests
False positive validation using benign operations. Ensures security products do not block legitimate agent behavior.
E2E Tests
Real OS level detection tests that execute actual system operations. Validates runtime interception capabilities.
MITRE ATLAS coverage
Every test scenario maps to a MITRE ATLAS technique. OASB covers 10 techniques across the adversarial ML threat landscape.
| Technique ID | Technique Name |
|---|---|
AML.T0046 | Unsafe ML Inference |
AML.T0057 | Data Leakage |
AML.T0024 | Exfiltration |
AML.T0018 | Persistence |
AML.T0029 | Denial of Service |
AML.T0015 | Evasion |
AML.T0054 | Jailbreak |
AML.T0056 | MCP Compromise |
AML.T0051 | Prompt Injection |
AML.TA0006 | Defense Response |
AI layer tests
40 tests target the AI specific attack surface. Prompt input and output scanning, MCP tool call validation, and inter agent message inspection.
Prompt Input Scanning
14 testsTests whether the tool detects malicious instructions embedded in user prompts, system prompts, and injected context.
Prompt Output Scanning
12 testsTests whether the tool detects sensitive data, credential leakage, and unsafe content in model outputs.
MCP Tool Call Validation
8 testsTests whether the scanner validates tool calls for parameter injection, unauthorized access, and privilege escalation.
A2A Message Scanning
6 testsTests whether the tool inspects inter agent messages for instruction injection, data exfiltration, and trust boundary violations.
Run OASB through HackMyAgent
The reference implementation lives in HackMyAgent. One command runs the full OASB benchmark plus 209 static, 29 semantic, and 164 adversarial security checks against your project.
Run the spec directly
Prefer to run OASB without HackMyAgent. Clone the repository, install dependencies, and execute the benchmark against your own security tool.
Benchmark your security tool
One command. Three maturity levels. Open specification, open test corpus, open results.
npx hackmyagent secure -b oasb-1