NanoMind
On-device semantic security classifier for AI agent artifacts. NanoMind compiles skills, MCP configs, SOUL.md governance, and system prompts and sorts each one into an attack class. The production model is version 0.5.0, a 2.1M-parameter Mamba classifier published on HuggingFace as opena2a/nanomind-security-classifier. It ships inside HackMyAgent and runs locally with no cloud calls.
How it ships
NanoMind is not a standalone install. It runs automatically inside HackMyAgent on every secure scan. On first use, HackMyAgent downloads the 8.3 MB ONNX model from HuggingFace and caches it locally. After that there are no external calls. The model produces a class label and a confidence score for each artifact; the static analyzers and the AST layer consume that signal.
npx hackmyagent secureThe model (v0.5.0)
A Mamba state-space classifier in the NLM tier (1 to 50M parameters). The architecture and on-disk footprint come from the published v0.5.0 model card.
| Property | Value |
|---|---|
| Architecture | Mamba TME, 8 blocks, d_model 128, d_state 64 |
| Parameters | 2,089,482 (NLM tier, 1 to 50M) |
| Vocabulary | 6,000 tokens |
| On-disk size | 8.3 MB total (140 KB ONNX graph, 8.0 MB weights, 165 KB tokenizer) |
| Runtime | ONNX, CPU only, no GPU required |
| License | Apache 2.0 |
| Published | HuggingFace opena2a/nanomind-security-classifier, v0.5.0, 2026-04-09 |
Ten attack classes
The model classifies each artifact into one of ten classes. These are the model training and inference labels, not the consumer decision contract described further below.
exfiltrationinjectionprivilege_escalationpersistencecredential_abuselateral_movementsocial_engineeringpolicy_violationsteganographybenignTraining data
v0.5.0 was fine-tuned from v0.4.0 weights on the sft-v10 corpus. The split sizes below are post-sanitizer counts from the corpus manifest.
| Split | Samples |
|---|---|
| Training | 3,168 |
| Eval (held-out) | 194 |
| Holdout | 204 |
Evaluation results
On the 194-sample held-out eval set, v0.5.0 reaches 98.45 percent accuracy and a macro F1 of 0.978. Per-class precision, recall, and F1 with support counts are below. The numbers come from the accuracy benchmark report bundled with the model.
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
exfiltration | 1.000 | 1.000 | 1.000 | 11 |
injection | 1.000 | 0.938 | 0.968 | 16 |
privilege_escalation | 0.938 | 1.000 | 0.968 | 15 |
persistence | 1.000 | 0.857 | 0.923 | 14 |
credential_abuse | 1.000 | 1.000 | 1.000 | 20 |
lateral_movement | 0.929 | 1.000 | 0.963 | 13 |
social_engineering | 1.000 | 1.000 | 1.000 | 15 |
policy_violation | 0.917 | 1.000 | 0.957 | 11 |
steganography | 1.000 | 1.000 | 1.000 | 39 |
benign | 1.000 | 1.000 | 1.000 | 40 |
Inference performance
Latency measured over 1,000 inferences against a 50 ms budget. All percentiles sit well under the budget, so the classifier adds no perceptible cost to a scan.
| Metric | Value |
|---|---|
| p50 latency | 0.76 ms |
| p95 latency | 0.84 ms |
| p99 latency | 0.93 ms |
| Throughput | 1,296 inferences per second |
| Budget | 50 ms (all percentiles pass) |
Consumer decision contract
Inference and the decision contract are two different axes. The model emits one of the ten raw labels above. The NanoMind daemon maps those ten labels onto a five-value canonical attackClass enum for the fine-grained authorization decision, and preserves the raw label in the evidence field so audit and telemetry keep full granularity.
| attackClass value | Meaning |
|---|---|
"" | No attack class. Always-emitted default; benign or below threshold. |
exfiltration_pattern | Data forwarding to an external endpoint. |
prompt_injection | Instruction override or jailbreak attempt. |
tool_misuse | Privilege escalation or unauthorized tool use. |
data_extraction | Credential harvesting or sensitive-data extraction. |
A consumer treats a response as actionable when attackClass is non-empty and confidence clears the threshold:
attackClass != "" && confidence > 0.8Known limitations
The 98.45 percent figure is in-distribution accuracy on the held-out eval set. On a frozen adversarial oracle of 50 fixtures (40 malicious, 10 benign hard-negatives), v0.5.0 holds recall at 1.0 but precision drops to 0.796, with a 9.1 percent false-positive rate on benign hard-negatives. Inputs outside the training distribution can be misclassified.
Because of this, a consumer that blocks on a NanoMind verdict should raise the threshold to confidence > 0.95 and corroborate with a non-classifier signal before acting. A retrain to v0.6.0 with broader corpus coverage is tracked separately; the five-value canonical attackClass enum will not change across that retrain.