Build your own OASB adapter: benchmark any security product
OASB (Open Agent Security Benchmark) evaluates AI agent security products with 222 standardized attack scenarios. It is not tied to any specific product. Any vendor can implement the adapter interface, run the same test suite, and get a detection coverage scorecard. This walkthrough covers building an adapter for a security product, from capability declaration to a published scorecard.
By the end, the benchmark produces a scorecard like this:
Test Files 47 passed (47)
Tests 194 passed | 28 n/a (222)
Product: my-product v1.0.0
Declared: prompt-input, pattern scanning
N/A: 28 scenarios on surfaces not declaredPrerequisites
- Node.js 18 or later
- The security product accessible as a Node.js module (npm package, local path, or API client)
- 30 minutes
Step 1: Clone and install OASB
git clone https://github.com/opena2a-org/oasb.git
cd oasb
npm installStep 2: Understand the adapter interface
Open src/harness/adapter.ts. The key interface is SecurityProductAdapter:
interface SecurityProductAdapter {
// Lifecycle
getCapabilities(): CapabilityMatrix;
start(): Promise<void>;
stop(): Promise<void>;
// Event injection (for tests that simulate attacks)
injectEvent(event): Promise<SecurityEvent>;
waitForEvent(predicate, timeout): Promise<SecurityEvent>;
// Event collection (for assertions)
getEvents(): SecurityEvent[];
getEventsByCategory(category): SecurityEvent[];
getEnforcements(): EnforcementResult[];
getEnforcementsByAction(action): EnforcementResult[];
resetCollector(): void;
// Sub-component access
getEventEngine(): EventEngine;
getEnforcementEngine(): EnforcementEngine;
// Factory methods (for component-level tests)
createPromptScanner(): PromptScanner;
createMCPScanner(allowedTools?): MCPScanner;
createA2AScanner(trustedAgents?): A2AScanner;
createPatternScanner(): PatternScanner;
createBudgetManager(dataDir, config?): BudgetManager;
createAnomalyScorer(): AnomalyScorer;
}Not every method needs an implementation. A product that only does prompt scanning can return no-op implementations for MCP, A2A, and infrastructure methods. The getCapabilities() method tells OASB which tests are applicable versus N/A.
Step 3: Declare your capabilities
// my-adapter.ts
import type { SecurityProductAdapter, CapabilityMatrix } from './adapter';
export class MyProductAdapter implements SecurityProductAdapter {
getCapabilities(): CapabilityMatrix {
return {
product: 'my-product',
version: '1.0.0',
capabilities: new Set([
'prompt-input-scanning',
'prompt-output-scanning',
// Only list what the product actually does
]),
};
}
// ...
}Available capabilities: process-monitoring, network-monitoring, filesystem-monitoring, prompt-input-scanning, prompt-output-scanning, mcp-scanning, a2a-scanning, anomaly-detection, budget-management, enforcement-*, pattern-scanning, event-correlation.
Step 4: Implement the scanners
The most important factory methods are the scanners. Each returns an object with start(), stop(), and a scan method that returns ScanResult:
interface ScanResult {
detected: boolean;
matches: Array<{
pattern: {
id: string; // e.g. "PI-001"
category: string; // e.g. "prompt-injection"
description: string;
severity: 'medium' | 'high' | 'critical';
};
matchedText: string;
}>;
}
// Example: wrap the product's scan function
createPromptScanner(): PromptScanner {
return {
start: async () => {},
stop: async () => {},
scanInput: (text: string): ScanResult => {
const threats = myProduct.analyze(text);
return {
detected: threats.length > 0,
matches: threats.map(t => ({
pattern: {
id: t.ruleId,
category: t.type,
description: t.message,
severity: t.severity,
},
matchedText: t.match,
})),
};
},
scanOutput: (text: string): ScanResult => {
// Same pattern for output scanning
},
};
}Step 5: Register your adapter
Add the adapter to src/harness/create-adapter.ts:
import { MyProductAdapter } from './my-adapter';
switch (adapterName) {
case 'arp':
AdapterClass = ArpWrapper;
break;
case 'my-product':
AdapterClass = MyProductAdapter;
break;
// ...
}Step 6: Run the benchmark
OASB_ADAPTER=my-product npm testThe result is a scorecard showing pass, fail, and N/A for all 222 tests, broken down by category: process, network, filesystem, AI-layer, intelligence, enforcement, integration, baseline, and E2E.
Reading the scorecard
The raw score (for example, 194/222) can overstate capability. Infrastructure tests (process, network, filesystem monitoring) pass for products that lack those capabilities because the adapter handles event injection via stubs.
The honest comparison is the AI-layer score: how many of the 40 AI-layer tests pass. This measures actual detection capability across prompt injection, jailbreak, data exfiltration, MCP exploitation, and A2A attacks.
| Product | Declared surfaces | Capabilities | AI-layer conformance |
|---|---|---|---|
| arp-guard | all surfaces | 15/16 | 40/40 |
| llm-guard | prompt-only | 2/16 | 13/40 |
| rebuff | prompt-only | 2/16 | 13/40 |
Scenario pass-counts are not comparable across products with different declared capability sets. A prompt-only scanner reports N/A (not a failure) on filesystem, process, MCP, and A2A surfaces it never claimed. The capability column is the honest cross-product signal. For neutral detection quality, use the verdict-based corpus benchmark.
Reference adapters
OASB ships with three built-in adapters to study:
- 1.
arp-wrapper.ts. Full-stack adapter (ARP/HackMyAgent). Uses lazyrequire('arp-guard')for zero-cost import when not selected. - 2.
llm-guard-wrapper.ts. Prompt scanner only. Shows how to map a simple regex library to the OASB interface. - 3.
rebuff-wrapper.ts. Heuristic scanner. Shows how to wrap a similarity-based detector.
Submit your results
Once the adapter works, submit a pull request to the OASB repository. Approved products are added to the public scorecard at oasb.ai/eval. Same tests, same scorecard, transparent comparison.
Related reading
OASB: Why AI Agents Need CIS-Style Security Benchmarks
Why the CIS Benchmark model maps onto agentic AI: 46 controls, 10 categories, 3 maturity levels, all machine-readable and automatable.
Introducing OASB: The Security Benchmark for AI Agents
The original OASB announcement. Covers the specification structure, the 10 categories, maturity levels, and how to run compliance checks with HackMyAgent.
About OpenA2A: OpenA2A builds open-source security infrastructure for AI agents. Projects include HackMyAgent (security scanner), OASB (security benchmark), and AIM (agent identity management).