Part of the security infrastructure for AI agents

The benchmark for AI agent security.

222 standardized attack scenarios that evaluate whether a runtime security tool can detect and respond to threats against AI agents. Three maturity levels. Mapped to MITRE ATLAS and OWASP Agentic Top 10. Open specification, open test corpus.

$npx hackmyagent secure -b oasb-1

View on GitHub Read the specification

222

Attack scenarios

MITRE techniques

Test files

Maturity levels

What is OASB

OASB evaluates security products, not agents. It answers a specific question. Can your runtime security tool detect and block attacks against AI agents. This is the same concept as MITRE ATT&CK Evaluations, which test endpoint security products against known adversary techniques. Applied to AI.

	OASB	HackMyAgent
Purpose	Evaluate security products	Pentest agents
Analogy	MITRE ATT&CK Evaluations	OWASP ZAP
Target	Runtime security tools	AI agents themselves
Output	Detection rate scorecard	Vulnerability report

Three maturity levels

Pick the level your project needs to clear. L1 is the floor for any project shipping to production. L2 is the standard for customer facing AI agents. L3 is for regulated, high stakes, or autonomous fleet deployments.

Essential

26 checks

Solo developers and prototypes

Baseline checks every AI agent project should pass before going to production. Credential hygiene, governance presence, basic identity.

Credential and secret detection
SOUL.md governance file present
Cryptographic agent identity
Lock file and dependency hygiene

Standard

44 checks

Production teams and SaaS

What a typical production AI agent deployment is expected to clear. Adds runtime monitoring, MCP validation, and trust scoring.

All L1 checks plus 18 additional controls
MCP server identity and tool call validation
Runtime behavior monitoring and anomaly detection
9 factor trust scoring with audit trail

Hardened

46 checks

Regulated, high stakes, or autonomous fleets

Full coverage for regulated workloads, autonomous fleets, and high stakes deployments. Adds capability enforcement and post quantum readiness.

All L2 checks plus 2 advanced controls
Capability enforcement with default deny policies
Post quantum signing readiness (ML-DSA)
Full A2A trust boundary validation

Numbers above mirror the HackMyAgent OASB profile counts: 26 essential, 44 standard, 46 hardened.

Ten assessment categories

OASB groups every test into one of ten categories that span the full attack surface of an AI agent. Identity, capability, input, output, credentials, supply chain, A2A, memory, operations, and monitoring.

Identity and Provenance

Ed25519 and ML-DSA post quantum keypairs, ownership verification, agent bill of materials.

Capability and Authorization

Capability based access control, just in time access grants, runtime enforcement.

Input Security

164 attack payloads across 16 categories, runtime prompt interception, jailbreak detection.

Output Security

Output validation, exfiltration detection, runtime output scanning for sensitive data.

Credential Protection

49 credential patterns, MCP vault protection, context window isolation, scope drift analysis.

Supply Chain Integrity

Skill hash pinning, configuration signing, trust verification across npm, PyPI, and GitHub sources.

Agent to Agent Security

Mutual authentication, 10 A2A attack payloads, trust boundaries, federated identity.

Memory and Context

Context manipulation testing, runtime memory isolation, conversation history hygiene.

Operational Security

209 static plus 29 semantic configuration checks, process, network, and filesystem monitoring.

Monitoring and Response

9 factor trust scoring, behavioral anomaly detection, kill switch, audit trail with append only logs.

Test structure

Four kinds of tests cover discrete detection, multi step chains, false positive validation, and real OS level execution.

Atomic Tests

65 tests

Discrete detection tests covering OS-level system calls and AI-layer attacks. Each test isolates a single technique for precise evaluation.

Integration Tests

8 tests

Multi-step attack chains that combine techniques into realistic scenarios. Tests whether security tools detect coordinated threats.

Baseline Tests

3 tests

False positive validation using benign operations. Ensures security products do not block legitimate agent behavior.

E2E Tests

6 tests

Real OS level detection tests that execute actual system operations. Validates runtime interception capabilities.

MITRE ATLAS coverage

Every test scenario maps to a MITRE ATLAS technique. OASB covers 10 techniques across the adversarial ML threat landscape.

Technique ID	Technique Name
`AML.T0046`	Unsafe ML Inference
`AML.T0057`	Data Leakage
`AML.T0024`	Exfiltration
`AML.T0018`	Persistence
`AML.T0029`	Denial of Service
`AML.T0015`	Evasion
`AML.T0054`	Jailbreak
`AML.T0056`	MCP Compromise
`AML.T0051`	Prompt Injection
`AML.TA0006`	Defense Response

AI layer tests

40 tests target the AI specific attack surface. Prompt input and output scanning, MCP tool call validation, and inter agent message inspection.

Prompt Input Scanning

14 tests

Tests whether the tool detects malicious instructions embedded in user prompts, system prompts, and injected context.

Prompt Output Scanning

12 tests

Tests whether the tool detects sensitive data, credential leakage, and unsafe content in model outputs.

MCP Tool Call Validation

8 tests

Tests whether the scanner validates tool calls for parameter injection, unauthorized access, and privilege escalation.

A2A Message Scanning

6 tests

Tests whether the tool inspects inter agent messages for instruction injection, data exfiltration, and trust boundary violations.

Run OASB through HackMyAgent

The reference implementation lives in HackMyAgent. One command runs the full OASB benchmark plus 209 static, 29 semantic, and 164 adversarial security checks against your project.

Terminal

# Run the OASB Essential profile

$ npx hackmyagent secure -b oasb-1

# Standard profile, JSON output for CI

$ npx hackmyagent secure -b oasb-2 --ci --json

# Hardened profile against the OASB test corpus

$ npx hackmyagent secure -b oasb-3

# From the OpenA2A CLI wrapper

$ npx opena2a-cli benchmark

HackMyAgent 0.23.0

The reference scanner. Runs all three OASB profiles plus 209 static, 29 semantic, and 164 adversarial checks.

OpenA2A CLI 0.10.7

Wraps HackMyAgent and exposes the OASB benchmark behind a single command for integrated workflows.

Run the spec directly

Prefer to run OASB without HackMyAgent. Clone the repository, install dependencies, and execute the benchmark against your own security tool.

Terminal

# Clone the repository

$ git clone https://github.com/opena2a-org/oasb.git

$ cd oasb

# Install dependencies

$ npm install

# Run all tests

$ opena2a benchmark run

# Run a specific category

$ opena2a benchmark run --category atomic

# Run a specific MITRE technique

$ opena2a benchmark run --technique AML.T0051

Benchmark your security tool

One command. Three maturity levels. Open specification, open test corpus, open results.

$npx hackmyagent secure -b oasb-1

View on GitHub Use HackMyAgent