Trust · verify, don’t trust

Don’t take our word for it — verify every claim.

Phionyx does not ask you to trust architectural claims. Every load-bearing claim below points to a real artifact and a command you can run — and is marked either reproducible now, beta, planned, or pending external validation.

Current maturity: cooperative-grade governance with a capability boundary. We do not claim to contain a model, to prevent hallucination, or to make a model correct — Phionyx makes the governance path deterministic and the evidence replayable. The limits are shown, not hidden.

The runtime governs its own development, live and inspectable: the self-governance coverage snapshot shows real gate calls, signed records, and the honest gaps that are still accruing.

The evidence is grouped by component, so you can jump straight to the part you care about.

The per-decision records the signed-audit rows below point at use AIREP — a neutral, vendor- and model-independent format for one signed, hash-chained record per AI runtime decision. Phionyx is its reference implementation.

Public18
Reproducible now
Beta6
Reproducible from this repo, not independently re-run
Planned3
Roadmap item, no current claim
Pending0
Depends on external action

If you have ten minutes, verify these three

These three cover the core of what Phionyx asserts about itself.

  1. 1.
    Installability
    pip install phionyx-core
  2. 2.
    Deterministic governance path
    jupyter nbconvert --to notebook --execute examples/notebooks/01_determinism_and_physics.ipynb
  3. 3.
    Kill-switch behaviour
    pytest tests/core/ -k kill_switch -q

A signed record, end to end

This is what the Evidence Runtime profile produces: one signed, hash-chained record per decision, emitted in AIREP. The record below is synthetic (no real data). The real 5-record chain is in AIREP's examples/chain.jsonl.

{
  "subject":   { "producer": "phionyx", "event_type": "decision" },
  "claim":     { "decision": "release", "policy_basis": ["input_safety_gate"] },
  "integrity": {
    "previous":  "sha256:0000…0000",
    "current":   "sha256:514a4d3b…916ee7",
    "signature": "ed25519:…"
  }
}
verify chain of 2 records → valid ✓
tamper record #1, re-verify → broken_at: 1 (record hash mismatch)

Tamper at any link is detected at the exact record — deterministically, and verifiable with the public key alone. The signed-audit and cross-runtime rows below let you run this yourself.

The engine — Phionyx Core

The deterministic runtime: the pipeline, the safety gates, the signed audit chain, and everything you can install and test today.

AreaClaimEvidenceReproducibilityStatus
Pipeline
46-block canonical pipeline
pytest tests/contract/ -q
Runtime:
~5 s
Expect:
119 passed, 2 skipped, 0 failed
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Determinism
Identical inputs produce identical governance-state evolution
This does not imply deterministic LLM output. The model stays probabilistic; the governance path around it is reproducible.
jupyter nbconvert --to notebook --execute examples/notebooks/01_determinism_and_physics.ipynb
Runtime:
~30 s
Expect:
1000 runs, 1 unique state hash, zero variance across runs
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Coherence
State drift and incoherence are detected and gated as the conversation evolves
Coherence and safety classification runs inside the engine pipeline; it is a runtime gate over state, distinct from the self-governance claim gate.
pytest tests/core/ -k "state or coherence" -q
Runtime:
~10 s
Expect:
state-coherence subset passes, 0 failed
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Kill switch
4 fail-closed triggers + NaN guard + tamper-evident event log
pytest tests/core/ -k kill_switch -q
Runtime:
~5 s
Expect:
28 passed (kill_switch subset), 0 failed
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Signed audit chain
Every state mutation produces a hash-chained, Ed25519-signed audit record; tampering is detectable on replay
  • examples/adversarial/ (audit replay scenario)
python examples/adversarial/<audit_replay>.py — replays the chain and reports any broken link
Runtime:
~5 s
Expect:
Intact chain verifies; a tampered record is flagged on replay
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Cross-runtime verification
A signed decision record verifies in two independent implementations (Python + Node), byte-for-byte, via RFC 8785 (JCS) canonicalization
AIREP is the neutral interchange format; its conformance kit includes both verifiers so "it verifies" is never self-referential. Verified by running the kit, not asserted.
git clone https://github.com/halvrenofviryel/ai-runtime-evidence-protocol
cd ai-runtime-evidence-protocol/spec/airep/v0.1
pip install jsonschema cryptography   # Node 20+ also required
python3 conformance/validate.py
python3 conformance/verify.py  examples/chain.jsonl --pubkey examples/test_public_key.txt
node     conformance/verify.mjs examples/chain.jsonl --pubkey examples/test_public_key.txt
Runtime:
~30 s (clone + verify)
Expect:
validate.py → all conformance checks PASSED; verify.py + verify.mjs → 5 records PASS sig=ok with identical hashes, all records OK
Tested on:
Python 3.12 + Node 20
Last verified:
2026-06-03
Public
Memory
+24% retention vs LRU, +72% vs FIFO under impact-weighted eviction
Controlled benchmark, not a third-party measurement.
pytest tests/benchmarks/ -q (full JSON is included with the reproducibility pack on the releases page)
Beta
CPU overhead
Sub-millisecond per-block overhead under controlled workload, ~31% CPU overhead reduction
Controlled reference benchmark. Hardware-dependent.
pytest tests/benchmarks/ -q
Beta
Tests
1,131 pass / 7 skip / 0 fail on commit `c8fa1f9`, Python 3.12 (reproduced in a clean venv)
git clone … && git checkout c8fa1f9 && python -m venv .venv && source .venv/bin/activate && pip install -e ".[dev]" && pytest -q
Runtime:
~3–5 min (Python 3.12 verified; 3.10–3.13 matrix)
Expect:
1,131 pass / 7 skip / 0 fail
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-05-08
Public
Type safety
Full `phionyx_core/` is mypy strict-clean — 333 source files, zero errors
  • CI mypy step
rm -rf .mypy_cache && mypy phionyx_core
Runtime:
~30 s
Expect:
Success: no issues found in 333 source files
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Lint
ruff strict-clean across the engine
ruff check phionyx_core
Runtime:
<5 s
Expect:
All checks passed!
Tested on:
Ubuntu 22.04, Python 3.12
Last verified:
2026-06-07
Public
Distribution
phionyx-core published on PyPI via OIDC trusted publisher
pip install phionyx-core
Runtime:
~30 s
Expect:
phionyx-core installs cleanly
Tested on:
macOS / Linux / Windows (any pip)
Last verified:
2026-06-07
Public
Archival
Each release is archived on Zenodo with a citable DOI
The concept DOI always resolves to the latest archived release; each release also gets its own versioned DOI.
curl -L https://zenodo.org/records/20027534/files-archive -o phionyx-core.zip
Runtime:
<10 s
Expect:
Concept record 20027534 resolves to the latest archived release
Tested on:
any HTTP client
Last verified:
2026-06-07
Public
Patents
4 UKIPO patent families filed, 66 claims total — PATENT PENDING
Filing receipts held by applicant; UKIPO public registry visibility depends on publication stage.
UKIPO registry lookup once filings are published
Runtime:
Expect:
SF1 GB2609503.4 · SF2 GB2609500.0 · SF3 GB2609504.2 · SF4 GB2609511.7
Tested on:
Last verified:
2026-06-07
Public
Compliance: OWASP
OWASP Agentic AI — Threats and Mitigations v1.0 (15 threats) evidence mapping
Coverage: 1 Full · 10 Partial · 4 Gap (T11 RCE — out of scope by design; T13/T14 multi-agent — single-instance scope; T6 latent goal hijack — open research). Each Partial row carries an explicit residual-risk line.
Each mapping row points to a file path + a pytest invocation a reviewer can run to verify the named control.
Runtime:
Read-only; per-control verification ~5–30 s each
Expect:
15 rows: 1 Full (T8 Audit), 10 Partial, 4 Gap (T11 / T13 / T14 / T6-deep)
Tested on:
Last verified:
2026-06-07
Public
Compliance: NIST
NIST AI RMF 1.0 four-function mapping (Govern / Map / Measure / Manage) — 1 Full / 3 Partial within the runtime perimeter
4 functions: 1 Full (MANAGE) / 3 Partial (GOVERN, MAP, MEASURE) / 0 Gap. Voluntary guidance, not certification.
Per-function rows cite blocks + tests; deployer-responsibility line per function; MANAGE = Full (covered by the engine governance gates)
Public
Compliance: EU AI Act
EU AI Act Article 9–15 evidence mapping with explicit gap analysis (1 Full / 5 Partial / 1 Gap)
Per-article rows link to canonical blocks + artifacts; deployer-responsibility line per article
Public
Compliance: ISO 42001
ISO/IEC 42001:2023 AI Management System mapping — 15 control-type rows (1 Full / 8 Partial / 6 Gap) — DRAFT (Annex A identifier accuracy requires paid-text verification)
Draft mapping — gracefully degrades to non-draft once paid full text is verified. Heavy Gap-leaning by architecture: 6 Gap rows (policy / roles / resources / communication / internal audit / management review) explicitly out-of-scope.
Mapping organised by control-type rather than Annex A ID; promotion path: BSI Knowledge access or deployer AIMS audit feedback
Beta
Compliance: schema
Compliance-mapping JSON Schema (Draft 2020-12) — machine-readable row format used by all four mappings
Schema enforces non-empty mechanism + non-empty evidence + required deployer_responsibility on every row (including Full).
pip install jsonschema && python docs/mappings/schema/validate.py
Public
Adversarial
Prompt-injection / unsafe-action / policy-conflict / audit-replay demos — each maps to an OWASP / EU AI Act / NIST row
4 scenarios: input_safety_gate / kill_switch / RBAC-vs-ethics conflict / SHA-256 audit replay tamper-detection.
python examples/adversarial/<scenario>.py — runs in seconds, only depends on phionyx-core
Public
Comparison
Before/after demo: same agent prompt, with vs. without Phionyx — runnable case-study harness
Benign row deliberately included — shows gates do NOT false-positive on legitimate traffic.
Single script, 3 prompts (benign / injection / harmful), prints side-by-side table
Public

The gate — the self-governance gate

Holds an AI agent’s own "I read / I fixed / this changed" claims to evidence. It is claim-grounding today; deeper evidence-binding is on the roadmap and opt-in by design.

AreaClaimEvidenceReproducibilityStatus
Claim grounding
A claim about an unread source is grounded or blocked before it can stand
This is the gate working today: it holds an agent’s "I read / I fixed / this changed" claims to evidence via a knowledge-boundary check.
Install the gate and run a claim through it; an ungrounded claim is held.
Beta
Evidence-before-claim
A factual claim made without same-turn evidence is required to bind evidence first
Opt-in and default-off in phionyx-pipeline-mcp 0.3.0 (the require_tool directive): a factual claim with no same-turn evidence binds a tool first. Non-regressive — it changes a passing directive only when enforcement is enabled.
  • phionyx-pipeline-mcp 0.3.0
pip install 'phionyx-pipeline-mcp>=0.3.0'
PHIONYX_GATE_REQUIRE_TOOL_ENFORCE=1 — an unbacked factual claim becomes a require_tool directive
Beta
Continuity binding
Prior context is bound into the plan rather than silently dropped
Opt-in and default-off in phionyx-pipeline-mcp 0.3.0: catches "read-but-not-bound" — a cited source that was not bound this turn.
  • phionyx-pipeline-mcp 0.3.0
pip install 'phionyx-pipeline-mcp>=0.3.0'
PHIONYX_GATE_CONTINUITY_ENFORCE=1 — a claim citing an unbound source is downgraded
Beta
Detector calibration
Detector-calibration ledger; calibration error reported once labelled data exists
Tested, not yet validated. The measurement window opens once enough labelled matched pairs are collected.
  • the self-governance gate
Calibration reproduction published after the measurement window opens
Planned
Benchmark
Agent claim-grounding benchmark — seed cases below the public-reference threshold
Seed benchmark, not yet a public artifact; the public threshold is a larger labelled case count.
  • the self-governance gate
Benchmark runner published once the seed reaches the public threshold
Planned
Independent review
Third-party security / governance review
Planned; target window after public CI hardening.
  • External report — planned
External report attached to release
Planned

What this page is not

This is not a marketing page. Rows here are bounded by:

  • Compliance mappings are evidence mappings, not legal certification.
  • Benchmarks are controlled reference benchmarks, not third-party audits.
  • Coherence and entropy metrics are experimental, not externally validated.