Trust · verify, don’t trust
Don’t take our word for it — verify every claim.
Phionyx does not ask you to trust architectural claims. Every load-bearing claim below points to a real artifact and a command you can run — and is marked either reproducible now, beta, planned, or pending external validation.
Current maturity: cooperative-grade governance with a capability boundary. We do not claim to contain a model, to prevent hallucination, or to make a model correct — Phionyx makes the governance path deterministic and the evidence replayable. The limits are shown, not hidden.
The runtime governs its own development, live and inspectable: the self-governance coverage snapshot shows real gate calls, signed records, and the honest gaps that are still accruing.
The evidence is grouped by component, so you can jump straight to the part you care about.
The per-decision records the signed-audit rows below point at use AIREP — a neutral, vendor- and model-independent format for one signed, hash-chained record per AI runtime decision. Phionyx is its reference implementation.
If you have ten minutes, verify these three
These three cover the core of what Phionyx asserts about itself.
- 1.Installability
pip install phionyx-core
- 2.Deterministic governance path
jupyter nbconvert --to notebook --execute examples/notebooks/01_determinism_and_physics.ipynb
- 3.Kill-switch behaviour
pytest tests/core/ -k kill_switch -q
A signed record, end to end
This is what the Evidence Runtime profile produces: one signed, hash-chained record per decision, emitted in AIREP. The record below is synthetic (no real data). The real 5-record chain is in AIREP's examples/chain.jsonl.
{
"subject": { "producer": "phionyx", "event_type": "decision" },
"claim": { "decision": "release", "policy_basis": ["input_safety_gate"] },
"integrity": {
"previous": "sha256:0000…0000",
"current": "sha256:514a4d3b…916ee7",
"signature": "ed25519:…"
}
}Tamper at any link is detected at the exact record — deterministically, and verifiable with the public key alone. The signed-audit and cross-runtime rows below let you run this yourself.
The engine — Phionyx Core
The deterministic runtime: the pipeline, the safety gates, the signed audit chain, and everything you can install and test today.
| Area | Claim | Evidence | Reproducibility | Status |
|---|---|---|---|---|
| Pipeline | 46-block canonical pipeline | pytest tests/contract/ -q
| Public | |
| Determinism | Identical inputs produce identical governance-state evolution This does not imply deterministic LLM output. The model stays probabilistic; the governance path around it is reproducible. | jupyter nbconvert --to notebook --execute examples/notebooks/01_determinism_and_physics.ipynb
| Public | |
| Coherence | State drift and incoherence are detected and gated as the conversation evolves Coherence and safety classification runs inside the engine pipeline; it is a runtime gate over state, distinct from the self-governance claim gate. | pytest tests/core/ -k "state or coherence" -q
| Public | |
| Kill switch | 4 fail-closed triggers + NaN guard + tamper-evident event log | pytest tests/core/ -k kill_switch -q
| Public | |
| Signed audit chain | Every state mutation produces a hash-chained, Ed25519-signed audit record; tampering is detectable on replay |
| python examples/adversarial/<audit_replay>.py — replays the chain and reports any broken link
| Public |
| Cross-runtime verification | A signed decision record verifies in two independent implementations (Python + Node), byte-for-byte, via RFC 8785 (JCS) canonicalization AIREP is the neutral interchange format; its conformance kit includes both verifiers so "it verifies" is never self-referential. Verified by running the kit, not asserted. | git clone https://github.com/halvrenofviryel/ai-runtime-evidence-protocol cd ai-runtime-evidence-protocol/spec/airep/v0.1 pip install jsonschema cryptography # Node 20+ also required python3 conformance/validate.py python3 conformance/verify.py examples/chain.jsonl --pubkey examples/test_public_key.txt node conformance/verify.mjs examples/chain.jsonl --pubkey examples/test_public_key.txt
| Public | |
| Memory | +24% retention vs LRU, +72% vs FIFO under impact-weighted eviction Controlled benchmark, not a third-party measurement. | pytest tests/benchmarks/ -q (full JSON is included with the reproducibility pack on the releases page) | Beta | |
| CPU overhead | Sub-millisecond per-block overhead under controlled workload, ~31% CPU overhead reduction Controlled reference benchmark. Hardware-dependent. | pytest tests/benchmarks/ -q | Beta | |
| Tests | 1,131 pass / 7 skip / 0 fail on commit `c8fa1f9`, Python 3.12 (reproduced in a clean venv) | git clone … && git checkout c8fa1f9 && python -m venv .venv && source .venv/bin/activate && pip install -e ".[dev]" && pytest -q
| Public | |
| Type safety | Full `phionyx_core/` is mypy strict-clean — 333 source files, zero errors |
| rm -rf .mypy_cache && mypy phionyx_core
| Public |
| Lint | ruff strict-clean across the engine | ruff check phionyx_core
| Public | |
| Distribution | phionyx-core published on PyPI via OIDC trusted publisher | pip install phionyx-core
| Public | |
| Archival | Each release is archived on Zenodo with a citable DOI The concept DOI always resolves to the latest archived release; each release also gets its own versioned DOI. |
| curl -L https://zenodo.org/records/20027534/files-archive -o phionyx-core.zip
| Public |
| Patents | 4 UKIPO patent families filed, 66 claims total — PATENT PENDING Filing receipts held by applicant; UKIPO public registry visibility depends on publication stage. | UKIPO registry lookup once filings are published
| Public | |
| Compliance: OWASP | OWASP Agentic AI — Threats and Mitigations v1.0 (15 threats) evidence mapping Coverage: 1 Full · 10 Partial · 4 Gap (T11 RCE — out of scope by design; T13/T14 multi-agent — single-instance scope; T6 latent goal hijack — open research). Each Partial row carries an explicit residual-risk line. | Each mapping row points to a file path + a pytest invocation a reviewer can run to verify the named control.
| Public | |
| Compliance: NIST | NIST AI RMF 1.0 four-function mapping (Govern / Map / Measure / Manage) — 1 Full / 3 Partial within the runtime perimeter 4 functions: 1 Full (MANAGE) / 3 Partial (GOVERN, MAP, MEASURE) / 0 Gap. Voluntary guidance, not certification. | Per-function rows cite blocks + tests; deployer-responsibility line per function; MANAGE = Full (covered by the engine governance gates) | Public | |
| Compliance: EU AI Act | EU AI Act Article 9–15 evidence mapping with explicit gap analysis (1 Full / 5 Partial / 1 Gap) | Per-article rows link to canonical blocks + artifacts; deployer-responsibility line per article | Public | |
| Compliance: ISO 42001 | ISO/IEC 42001:2023 AI Management System mapping — 15 control-type rows (1 Full / 8 Partial / 6 Gap) — DRAFT (Annex A identifier accuracy requires paid-text verification) Draft mapping — gracefully degrades to non-draft once paid full text is verified. Heavy Gap-leaning by architecture: 6 Gap rows (policy / roles / resources / communication / internal audit / management review) explicitly out-of-scope. | Mapping organised by control-type rather than Annex A ID; promotion path: BSI Knowledge access or deployer AIMS audit feedback | Beta | |
| Compliance: schema | Compliance-mapping JSON Schema (Draft 2020-12) — machine-readable row format used by all four mappings Schema enforces non-empty mechanism + non-empty evidence + required deployer_responsibility on every row (including Full). | pip install jsonschema && python docs/mappings/schema/validate.py | Public | |
| Adversarial | Prompt-injection / unsafe-action / policy-conflict / audit-replay demos — each maps to an OWASP / EU AI Act / NIST row 4 scenarios: input_safety_gate / kill_switch / RBAC-vs-ethics conflict / SHA-256 audit replay tamper-detection. | python examples/adversarial/<scenario>.py — runs in seconds, only depends on phionyx-core | Public | |
| Comparison | Before/after demo: same agent prompt, with vs. without Phionyx — runnable case-study harness Benign row deliberately included — shows gates do NOT false-positive on legitimate traffic. | Single script, 3 prompts (benign / injection / harmful), prints side-by-side table | Public |
The gate — the self-governance gate
Holds an AI agent’s own "I read / I fixed / this changed" claims to evidence. It is claim-grounding today; deeper evidence-binding is on the roadmap and opt-in by design.
| Area | Claim | Evidence | Reproducibility | Status |
|---|---|---|---|---|
| Claim grounding | A claim about an unread source is grounded or blocked before it can stand This is the gate working today: it holds an agent’s "I read / I fixed / this changed" claims to evidence via a knowledge-boundary check. | Install the gate and run a claim through it; an ungrounded claim is held. | Beta | |
| Evidence-before-claim | A factual claim made without same-turn evidence is required to bind evidence first Opt-in and default-off in phionyx-pipeline-mcp 0.3.0 (the require_tool directive): a factual claim with no same-turn evidence binds a tool first. Non-regressive — it changes a passing directive only when enforcement is enabled. |
| pip install 'phionyx-pipeline-mcp>=0.3.0' PHIONYX_GATE_REQUIRE_TOOL_ENFORCE=1 — an unbacked factual claim becomes a require_tool directive | Beta |
| Continuity binding | Prior context is bound into the plan rather than silently dropped Opt-in and default-off in phionyx-pipeline-mcp 0.3.0: catches "read-but-not-bound" — a cited source that was not bound this turn. |
| pip install 'phionyx-pipeline-mcp>=0.3.0' PHIONYX_GATE_CONTINUITY_ENFORCE=1 — a claim citing an unbound source is downgraded | Beta |
| Detector calibration | Detector-calibration ledger; calibration error reported once labelled data exists Tested, not yet validated. The measurement window opens once enough labelled matched pairs are collected. |
| Calibration reproduction published after the measurement window opens | Planned |
| Benchmark | Agent claim-grounding benchmark — seed cases below the public-reference threshold Seed benchmark, not yet a public artifact; the public threshold is a larger labelled case count. |
| Benchmark runner published once the seed reaches the public threshold | Planned |
| Independent review | Third-party security / governance review Planned; target window after public CI hardening. |
| External report attached to release | Planned |
What this page is not
This is not a marketing page. Rows here are bounded by:
- Compliance mappings are evidence mappings, not legal certification.
- Benchmarks are controlled reference benchmarks, not third-party audits.
- Coherence and entropy metrics are experimental, not externally validated.