NanoMind

On-device semantic security classifier for AI agent artifacts. NanoMind compiles skills, MCP configs, SOUL.md governance, and system prompts and sorts each one into an attack class. The production model is version 0.5.0, a 2.1M-parameter Mamba classifier published on HuggingFace as opena2a/nanomind-security-classifier. It ships inside HackMyAgent and runs locally with no cloud calls.

98.45%

Held-out eval accuracy

v0.5.0 model card

0.978

Macro F1 (10-class)

accuracy benchmark

0.76 ms

Median inference (p50)

latency benchmark

8.3 MB

On-disk model size

v0.5.0 model card

How it ships

NanoMind is not a standalone install. It runs automatically inside HackMyAgent on every secure scan. On first use, HackMyAgent downloads the 8.3 MB ONNX model from HuggingFace and caches it locally. After that there are no external calls. The model produces a class label and a confidence score for each artifact; the static analyzers and the AST layer consume that signal.

# NanoMind runs as part of a standard scan

npx hackmyagent secure

The model (v0.5.0)

A Mamba state-space classifier in the NLM tier (1 to 50M parameters). The architecture and on-disk footprint come from the published v0.5.0 model card.

Property	Value
Architecture	Mamba TME, 8 blocks, d_model 128, d_state 64
Parameters	2,089,482 (NLM tier, 1 to 50M)
Vocabulary	6,000 tokens
On-disk size	8.3 MB total (140 KB ONNX graph, 8.0 MB weights, 165 KB tokenizer)
Runtime	ONNX, CPU only, no GPU required
License	Apache 2.0
Published	HuggingFace opena2a/nanomind-security-classifier, v0.5.0, 2026-04-09

Ten attack classes

The model classifies each artifact into one of ten classes. These are the model training and inference labels, not the consumer decision contract described further below.

exfiltrationinjectionprivilege_escalationpersistencecredential_abuselateral_movementsocial_engineeringpolicy_violationsteganographybenign

Training data

v0.5.0 was fine-tuned from v0.4.0 weights on the sft-v10 corpus. The split sizes below are post-sanitizer counts from the corpus manifest.

Split	Samples
Training	3,168
Eval (held-out)	194
Holdout	204

Evaluation results

On the 194-sample held-out eval set, v0.5.0 reaches 98.45 percent accuracy and a macro F1 of 0.978. Per-class precision, recall, and F1 with support counts are below. The numbers come from the accuracy benchmark report bundled with the model.

Class	Precision	Recall	F1	Support
`exfiltration`	1.000	1.000	1.000	11
`injection`	1.000	0.938	0.968	16
`privilege_escalation`	0.938	1.000	0.968	15
`persistence`	1.000	0.857	0.923	14
`credential_abuse`	1.000	1.000	1.000	20
`lateral_movement`	0.929	1.000	0.963	13
`social_engineering`	1.000	1.000	1.000	15
`policy_violation`	0.917	1.000	0.957	11
`steganography`	1.000	1.000	1.000	39
`benign`	1.000	1.000	1.000	40

Inference performance

Latency measured over 1,000 inferences against a 50 ms budget. All percentiles sit well under the budget, so the classifier adds no perceptible cost to a scan.

Metric	Value
p50 latency	0.76 ms
p95 latency	0.84 ms
p99 latency	0.93 ms
Throughput	1,296 inferences per second
Budget	50 ms (all percentiles pass)

Consumer decision contract

Inference and the decision contract are two different axes. The model emits one of the ten raw labels above. The NanoMind daemon maps those ten labels onto a five-value canonical attackClass enum for the fine-grained authorization decision, and preserves the raw label in the evidence field so audit and telemetry keep full granularity.

attackClass value	Meaning
`""`	No attack class. Always-emitted default; benign or below threshold.
`exfiltration_pattern`	Data forwarding to an external endpoint.
`prompt_injection`	Instruction override or jailbreak attempt.
`tool_misuse`	Privilege escalation or unauthorized tool use.
`data_extraction`	Credential harvesting or sensitive-data extraction.

A consumer treats a response as actionable when attackClass is non-empty and confidence clears the threshold:

attackClass != "" && confidence > 0.8

Known limitations

The 98.45 percent figure is in-distribution accuracy on the held-out eval set. On a frozen adversarial oracle of 50 fixtures (40 malicious, 10 benign hard-negatives), v0.5.0 holds recall at 1.0 but precision drops to 0.796, with a 9.1 percent false-positive rate on benign hard-negatives. Inputs outside the training distribution can be misclassified.

Because of this, a consumer that blocks on a NanoMind verdict should raise the threshold to confidence > 0.95 and corroborate with a non-classifier signal before acting. A retrain to v0.6.0 with broader corpus coverage is tracked separately; the five-value canonical attackClass enum will not change across that retrain.

Classifier on HuggingFace Analyst on HuggingFace GitHub Repository NanoMind inside HackMyAgent NanoMind overview