NanoMind

On-device semantic security classifier for AI agent artifacts. NanoMind compiles skills, MCP configs, SOUL.md governance, and system prompts and sorts each one into an attack class. The production model is version 0.5.0, a 2.1M-parameter Mamba classifier published on HuggingFace as opena2a/nanomind-security-classifier. It ships inside HackMyAgent and runs locally with no cloud calls.

98.45%
Held-out eval accuracy
v0.5.0 model card
0.978
Macro F1 (10-class)
accuracy benchmark
0.76 ms
Median inference (p50)
latency benchmark
8.3 MB
On-disk model size
v0.5.0 model card

How it ships

NanoMind is not a standalone install. It runs automatically inside HackMyAgent on every secure scan. On first use, HackMyAgent downloads the 8.3 MB ONNX model from HuggingFace and caches it locally. After that there are no external calls. The model produces a class label and a confidence score for each artifact; the static analyzers and the AST layer consume that signal.

# NanoMind runs as part of a standard scan
npx hackmyagent secure

The model (v0.5.0)

A Mamba state-space classifier in the NLM tier (1 to 50M parameters). The architecture and on-disk footprint come from the published v0.5.0 model card.

PropertyValue
ArchitectureMamba TME, 8 blocks, d_model 128, d_state 64
Parameters2,089,482 (NLM tier, 1 to 50M)
Vocabulary6,000 tokens
On-disk size8.3 MB total (140 KB ONNX graph, 8.0 MB weights, 165 KB tokenizer)
RuntimeONNX, CPU only, no GPU required
LicenseApache 2.0
PublishedHuggingFace opena2a/nanomind-security-classifier, v0.5.0, 2026-04-09

Ten attack classes

The model classifies each artifact into one of ten classes. These are the model training and inference labels, not the consumer decision contract described further below.

exfiltrationinjectionprivilege_escalationpersistencecredential_abuselateral_movementsocial_engineeringpolicy_violationsteganographybenign

Training data

v0.5.0 was fine-tuned from v0.4.0 weights on the sft-v10 corpus. The split sizes below are post-sanitizer counts from the corpus manifest.

SplitSamples
Training3,168
Eval (held-out)194
Holdout204

Evaluation results

On the 194-sample held-out eval set, v0.5.0 reaches 98.45 percent accuracy and a macro F1 of 0.978. Per-class precision, recall, and F1 with support counts are below. The numbers come from the accuracy benchmark report bundled with the model.

ClassPrecisionRecallF1Support
exfiltration1.0001.0001.00011
injection1.0000.9380.96816
privilege_escalation0.9381.0000.96815
persistence1.0000.8570.92314
credential_abuse1.0001.0001.00020
lateral_movement0.9291.0000.96313
social_engineering1.0001.0001.00015
policy_violation0.9171.0000.95711
steganography1.0001.0001.00039
benign1.0001.0001.00040

Inference performance

Latency measured over 1,000 inferences against a 50 ms budget. All percentiles sit well under the budget, so the classifier adds no perceptible cost to a scan.

MetricValue
p50 latency0.76 ms
p95 latency0.84 ms
p99 latency0.93 ms
Throughput1,296 inferences per second
Budget50 ms (all percentiles pass)

Consumer decision contract

Inference and the decision contract are two different axes. The model emits one of the ten raw labels above. The NanoMind daemon maps those ten labels onto a five-value canonical attackClass enum for the fine-grained authorization decision, and preserves the raw label in the evidence field so audit and telemetry keep full granularity.

attackClass valueMeaning
""No attack class. Always-emitted default; benign or below threshold.
exfiltration_patternData forwarding to an external endpoint.
prompt_injectionInstruction override or jailbreak attempt.
tool_misusePrivilege escalation or unauthorized tool use.
data_extractionCredential harvesting or sensitive-data extraction.

A consumer treats a response as actionable when attackClass is non-empty and confidence clears the threshold:

attackClass != "" && confidence > 0.8

Known limitations

The 98.45 percent figure is in-distribution accuracy on the held-out eval set. On a frozen adversarial oracle of 50 fixtures (40 malicious, 10 benign hard-negatives), v0.5.0 holds recall at 1.0 but precision drops to 0.796, with a 9.1 percent false-positive rate on benign hard-negatives. Inputs outside the training distribution can be misclassified.

Because of this, a consumer that blocks on a NanoMind verdict should raise the threshold to confidence > 0.95 and corroborate with a non-classifier signal before acting. A retrain to v0.6.0 with broader corpus coverage is tracked separately; the five-value canonical attackClass enum will not change across that retrain.