#oasb#adapter#benchmark#tutorial

Build your own OASB adapter: benchmark any security product

OpenA2A Team
Estimated time: 30 minutes

OASB (Open Agent Security Benchmark) evaluates AI agent security products with 222 standardized attack scenarios. It is not tied to any specific product. Any vendor can implement the adapter interface, run the same test suite, and get a detection coverage scorecard. This walkthrough covers building an adapter for a security product, from capability declaration to a published scorecard.

By the end, the benchmark produces a scorecard like this:

Test Files  47 passed (47)
     Tests  194 passed | 28 n/a (222)

Product:     my-product v1.0.0
Declared:    prompt-input, pattern scanning
N/A:         28 scenarios on surfaces not declared

Prerequisites

  • Node.js 18 or later
  • The security product accessible as a Node.js module (npm package, local path, or API client)
  • 30 minutes

Step 1: Clone and install OASB

git clone https://github.com/opena2a-org/oasb.git
cd oasb
npm install

Step 2: Understand the adapter interface

Open src/harness/adapter.ts. The key interface is SecurityProductAdapter:

interface SecurityProductAdapter {
  // Lifecycle
  getCapabilities(): CapabilityMatrix;
  start(): Promise<void>;
  stop(): Promise<void>;

  // Event injection (for tests that simulate attacks)
  injectEvent(event): Promise<SecurityEvent>;
  waitForEvent(predicate, timeout): Promise<SecurityEvent>;

  // Event collection (for assertions)
  getEvents(): SecurityEvent[];
  getEventsByCategory(category): SecurityEvent[];
  getEnforcements(): EnforcementResult[];
  getEnforcementsByAction(action): EnforcementResult[];
  resetCollector(): void;

  // Sub-component access
  getEventEngine(): EventEngine;
  getEnforcementEngine(): EnforcementEngine;

  // Factory methods (for component-level tests)
  createPromptScanner(): PromptScanner;
  createMCPScanner(allowedTools?): MCPScanner;
  createA2AScanner(trustedAgents?): A2AScanner;
  createPatternScanner(): PatternScanner;
  createBudgetManager(dataDir, config?): BudgetManager;
  createAnomalyScorer(): AnomalyScorer;
}

Not every method needs an implementation. A product that only does prompt scanning can return no-op implementations for MCP, A2A, and infrastructure methods. The getCapabilities() method tells OASB which tests are applicable versus N/A.

Step 3: Declare your capabilities

// my-adapter.ts
import type { SecurityProductAdapter, CapabilityMatrix } from './adapter';

export class MyProductAdapter implements SecurityProductAdapter {
  getCapabilities(): CapabilityMatrix {
    return {
      product: 'my-product',
      version: '1.0.0',
      capabilities: new Set([
        'prompt-input-scanning',
        'prompt-output-scanning',
        // Only list what the product actually does
      ]),
    };
  }
  // ...
}

Available capabilities: process-monitoring, network-monitoring, filesystem-monitoring, prompt-input-scanning, prompt-output-scanning, mcp-scanning, a2a-scanning, anomaly-detection, budget-management, enforcement-*, pattern-scanning, event-correlation.

Step 4: Implement the scanners

The most important factory methods are the scanners. Each returns an object with start(), stop(), and a scan method that returns ScanResult:

interface ScanResult {
  detected: boolean;
  matches: Array<{
    pattern: {
      id: string;        // e.g. "PI-001"
      category: string;  // e.g. "prompt-injection"
      description: string;
      severity: 'medium' | 'high' | 'critical';
    };
    matchedText: string;
  }>;
}

// Example: wrap the product's scan function
createPromptScanner(): PromptScanner {
  return {
    start: async () => {},
    stop: async () => {},
    scanInput: (text: string): ScanResult => {
      const threats = myProduct.analyze(text);
      return {
        detected: threats.length > 0,
        matches: threats.map(t => ({
          pattern: {
            id: t.ruleId,
            category: t.type,
            description: t.message,
            severity: t.severity,
          },
          matchedText: t.match,
        })),
      };
    },
    scanOutput: (text: string): ScanResult => {
      // Same pattern for output scanning
    },
  };
}

Step 5: Register your adapter

Add the adapter to src/harness/create-adapter.ts:

import { MyProductAdapter } from './my-adapter';

switch (adapterName) {
  case 'arp':
    AdapterClass = ArpWrapper;
    break;
  case 'my-product':
    AdapterClass = MyProductAdapter;
    break;
  // ...
}

Step 6: Run the benchmark

OASB_ADAPTER=my-product npm test

The result is a scorecard showing pass, fail, and N/A for all 222 tests, broken down by category: process, network, filesystem, AI-layer, intelligence, enforcement, integration, baseline, and E2E.

Reading the scorecard

The raw score (for example, 194/222) can overstate capability. Infrastructure tests (process, network, filesystem monitoring) pass for products that lack those capabilities because the adapter handles event injection via stubs.

The honest comparison is the AI-layer score: how many of the 40 AI-layer tests pass. This measures actual detection capability across prompt injection, jailbreak, data exfiltration, MCP exploitation, and A2A attacks.

ProductDeclared surfacesCapabilitiesAI-layer conformance
arp-guardall surfaces15/1640/40
llm-guardprompt-only2/1613/40
rebuffprompt-only2/1613/40

Scenario pass-counts are not comparable across products with different declared capability sets. A prompt-only scanner reports N/A (not a failure) on filesystem, process, MCP, and A2A surfaces it never claimed. The capability column is the honest cross-product signal. For neutral detection quality, use the verdict-based corpus benchmark.

Reference adapters

OASB ships with three built-in adapters to study:

  • 1.
    arp-wrapper.ts. Full-stack adapter (ARP/HackMyAgent). Uses lazy require('arp-guard') for zero-cost import when not selected.
  • 2.
    llm-guard-wrapper.ts. Prompt scanner only. Shows how to map a simple regex library to the OASB interface.
  • 3.
    rebuff-wrapper.ts. Heuristic scanner. Shows how to wrap a similarity-based detector.

Submit your results

Once the adapter works, submit a pull request to the OASB repository. Approved products are added to the public scorecard at oasb.ai/eval. Same tests, same scorecard, transparent comparison.

Related reading

About OpenA2A: OpenA2A builds open-source security infrastructure for AI agents. Projects include HackMyAgent (security scanner), OASB (security benchmark), and AIM (agent identity management).