Agentic Development · Governance snapshot

The governance runs, acts, and records — reproducibly.

Name: phionyx-core
Author: Phionyx

This snapshot measures how the Phionyx self-governance runtime governs the AI that writes Phionyx itself — aggregate numbers only, reviewer-reproducible. It is refreshed when the maintainer re-runs the public audit script (manual, not a continuous feed); the snapshot date is shown below. What is already proven: every governed decision is signed into a tamper-evident, hash-chained record, and the gate refuses, regenerates, or rewrites work that fails its checks — it acts, it does not only log. What is not yet proven: the full closed loop — a signed decision whose real-world outcome is later confirmed — which is honestly near zero today and accumulates forward. Coverage is one input, not the goal.

Snapshot generated 2026-06-16 12:51 UTC · rolling 30-day window · regenerated when the maintainer re-runs the public audit script (manual), not on a fixed schedule.

What the runtime proves today

Signed records (live)

476

tamper-evident and hash-chained within the runtime; the public feed publishes aggregate counts, not the records themselves (10 chains).

Gate refusals

100

the gate refused / regenerated / rewrote work that failed its checks — it acts, it does not only log.

Open method

public

the deterministic measurement script is public; these numbers are its output over the private dev telemetry, and the published snapshot is the audit artifact.

What we don’t claim yet: that a signed decision’s real-world outcome has been independently confirmed end to end. Confirming outcomes takes time and real labels (which we never fabricate), so that long-loop evidence is still accruing. We’d rather show that gap than hide it.

Invocation discipline · rolling 30-day window (one input, not the headline)

47.5%coverage · 344 of 429 commits instrumented · 327 gate calls (expected 688)

Our own rule asks the AI to call the gate twice per commit, so “coverage” is simply how often it did. It is a discipline check, not the point — calling the gate never proves the work is correct (that is what the signed record above is for). Reading the per-day Coverage column:

— no governed activity recorded that day — not measured, and not counted as a failure.
0.0% the gate was available but the AI did not call it.
100%+ the AI called the gate more often than the twice-per-commit baseline — extra discipline, not an error.

Commits

429

in window

Gate calls

327

verify_claim + response_gate only

Expected

688

= 344 instrumented × 2

Sessions

154

MCP sessions in window

Per-day breakdown (30 days)

since 2026-05-17 15:51 UTC

Date	Commits	Gate	All MCP	Coverage
2026-06-16	4	12	23	100%+
2026-06-15	6	19	38	100%+
2026-06-14	18	65	123	100%+
2026-06-13	1	22	37	100%+
2026-06-12	4	0	0	—
2026-06-11	2	0	0	—
2026-06-10	15	4	8	13.3%
2026-06-09	11	32	64	100%+
2026-06-08	11	24	41	100%+
2026-06-07	36	14	24	19.4%
2026-06-06	20	16	20	40.0%
2026-06-05	2	0	1	0.0%
2026-06-04	2	0	3	0.0%
2026-06-03	1	0	7	0.0%
2026-06-02	1	0	12	0.0%
2026-06-01	1	1	4	50.0%
2026-05-31	1	0	0	—
2026-05-30	5	0	0	—
2026-05-29	24	10	10	20.8%
2026-05-28	67	51	51	38.1%
2026-05-27	24	36	36	75.0%
2026-05-26	40	15	22	18.8%
2026-05-25	34	2	5	2.9%
2026-05-24	26	4	16	7.7%
2026-05-23	27	0	0	—
2026-05-22	17	0	0	—
2026-05-21	6	0	0	—
2026-05-20	3	0	0	—
2026-05-19	18	0	0	—
2026-05-18	2	0	0	—

Directive distribution

The aggregate of every directive written to telemetry across MCP tools + observability hooks — broader than the gate's five directive dispositions (pass · damp · rewrite · regenerate · reject) because non-gate tools also write entries.

auto_attest: 2942pass: 318n/a: 103regenerate: 87require_tool: 15checkpoint: 8rewrite: 8reject: 5hedge: 1ok: 1

auto_attest — observability hooks (session start, user prompt log, commit attestation, …). Recorded but excluded from coverage math by design.

checkpoint — emitted by phionyx_checkpoint (lightweight context-note tool, not a gate).

n/a — entries without an explicit directive field (e.g. session start, prompt log, attestation writes).

solid / weak / incomplete — emitted by phionyx_causal_trace (chain-quality verdicts, distinct from the gate's five directive dispositions).

ok — emitted by the trust-boundary MCP server on successful third-party-tool attestation writes.

Reproduce

python3 case-studies/agentic-development-2026-05/scripts/runtime_evidence_self_audit.py --days 30

Same audit script lives at case-studies/.../scripts/runtime_evidence_self_audit.py. Generated 2026-06-16 12:51 UTC · schema v0.7.2 · regenerated on maintainer re-run (snapshot date above).