<nil>NILScript

Proof

The numbers

Across 4,216 real prompt-injection attacks, on two models, unauthorized writes through NIL = 0.00% — while every benign task still completed. Whatever fraction the agent gets hijacked, NIL commits none of those writes.

The result

InjecAgent unauthorized-write rate: raw vs NIL — NIL bars are zero across every model and setting
4,216 evaluations, one headline
Raw agents were hijacked into a real write on up to 1 in 22 cases. Through NIL, unauthorized writes commit 0.00% — across every model and attack setting — while benign tasks stay at 100%. The defense is structural, not model-dependent.
ModelSettingCasesHijack rate (ASR)Unauth. write — rawUnauth. write — NILBenign
gpt-oss-120bbase10542.75%2.75%0.00%100%
gpt-oss-120benhanced10540.47%0.47%0.00%100%
zai-glm-4.7base10544.46%4.46%0.00%100%
zai-glm-4.7enhanced10540.00%0.00%0.00%100%

How it's measured

NIL is the layer between the agent and the backend, so we don't compete on a leaderboard — we instrument one. InjecAgent (ACL Findings 2024) injects a malicious instruction into a tool's response while the user only asked for a benign read; a hijacked agent then calls the attacker's tool — a state-changing write. We run every case twice: the agent calling tools directly (raw), and the same agent routed through NIL (gated). Same model, same attacks — only the gate differs. The claim isn't “NIL makes the model smarter”; it's structural: a write only commits after a previewed propose → approve → commit, and the agent can only touch verbs the backend's skeleton exposes.

Conformance — protocol invariants

Beyond safety, the wire itself is tested as properties, not single runs: a property-based state machine drives random propose/commit/rollback sequences and asserts idempotency, no side effect on PROPOSE, rollback honesty (a reversal targets the real record, never a stale name), and refusal correctness (unknown verbs are refused, never faked).

Honest caveats

We publish the caveats, not just the win. The harness uses a single-step decision, not InjecAgent's two-step ReAct, and these reasoning models' raw hijack rates (0–4.5%) sit below the paper's 24% GPT-4-ReAct base — so the ASR numbers are harness-specific and not a head-to-head with the published figure. The NIL → 0 result is the robust, comparable claim. Unauthorized-write rate is always reported paired with benign task-success, never alone.

NILScript is an open standard, stewarded by the Wosool project. The spec is extracted from running code.

Draft standard v0.3.0 · 0.x stage · NIL wire 0.1 · DSL 0.1