Phionyx for Agentic Development

Signed evidence for AI coding agents.

Name: phionyx-core
Author: Phionyx

AI coding agents can edit files, run tools, write commits, and describe what they did. Phionyx treats those descriptions as claims, not proof. LLM output is a noisy measurement, not a final answer.

This is the self-governance pillar: an AI assistant’s own self-claims (“I fixed it”, “I tested it”), its tool calls, and its trace events are bound into verifiable runtime-evidence chains. The agent can propose. The runtime must preserve what actually happened.

Phionyx does not make agents trustworthy by asking them to behave. It makes their governance path inspectable, signed, scoped, and replayable.

Runnable demo

See the governance run — under attack

An AI coding agent can read any file, write to any path, and execute commands. When something goes wrong, what is your evidence of what the agent was actually allowed to do?

This runnable demo puts the agent-governance control plane through the same adversarial scenario suite under three postures — ungoverned, governed, and governed + sandboxed. You see exactly what the fail-closed gates hold, what the capability sandbox blocks, and where denylist-based control still has a documented boundary. It is cooperative-grade governance with a capability boundary, not containment. The limits are shown, not hidden.

Run the demo ↗

AI output is not authority. The runtime must preserve what actually happened.

The problem

A coding agent says it fixed a bug.

The implementation did not change.

The test was quietly edited or disabled.

Without runtime evidence, the reviewer is left reading the agent’s story. Phionyx changes the review question from “do I trust the agent?” to “can I inspect the path?”

One install

pip install phionyx-pipeline-mcp phionyx-mcp-server phionyx-eval-inspect

Add the two MCP servers to your host’s .mcp.json. Claude Code is the primary tested host. Other MCP-capable clients (Cursor, Zed, VS Code, JetBrains) share the same protocol but are not actively tested by us — community PRs welcome.

A claim becomes evidence

Steps 1–6 are the gate pipeline: an agent’s claim is checked against the real diff, turned into a signed verdict, and written to a tamper-evident chain a reviewer can replay. Steps 7–8 are Claude Code harness tooling that closes the loop inside the same turn.

01
The agent says "fixed."
A coding agent — Claude Code is the primary tested host; any MCP-capable host can be wired the same way — edits files, runs tests, declares the bug fixed.
02
phionyx-pipeline-mcp checks the diff.
The MCP tool reads the actual git diff and compares it against the agent's claim. If the test was edited and the implementation file was untouched, the claim is unsupported.
03
A deterministic gate produces a verdict.
phionyx_response_gate(action_type="claim_fixed", ...) returns one of five core directives: pass | damp | rewrite | regenerate | reject (the MCP self-claim layer can surface a sixth, require_tool, when require-tool enforcement is enabled). Same inputs, same verdict — every time.
04
phionyx-mcp-server writes a signed envelope.
The decision, the inputs, the timing — everything goes into a hash-chained, Ed25519-signed envelope. Tamper-evident from this point on.
05
phionyx-eval-inspect exports the chain.
An external reviewer with Inspect AI opens the .eval log; the agent's turn can be reconstructed from the persisted chain alone, without operator-side infrastructure.
06
Tamper a byte → verification fails.
Modify any past envelope; verify_chain_integrity reports the break + reason. The audit chain is honest about its own integrity.
07
Per-edit language tools feed back into the same turn.
After every Edit/Write, a harness hook runs the cheapest appropriate check (py_compile + ruff for Python, scoped tsc --noEmit for TypeScript, json/yaml schema parse, memory frontmatter validation). Findings go to stderr; the next assistant turn sees them and corrects in place. Bounded: ≤ 8 s per file.
08
Fresh-context subagent reviews the semantic diff.
For commits touching schema, exit codes, version strings, or public copy, the harness invokes a diff-reviewer subagent. It runs in a fresh context window — no bias from the implementing session's reasoning — with Read-only tools and seven correctness-finding categories. Catches the semantic bugs the syntactic hooks can't.

Three repos, one track

These three repos are the current reference path through the broader Phionyx ecosystem. Other adapters (LangChain, OpenAI Agents) sit alongside them; this page focuses on the agent-pair-programming wedge — binding an AI assistant’s own self-claims into verifiable runtime evidence.

phionyx-pipeline-mcpcurrent reference path

Self-governance gate — verifies agent self-claims against git-diff truth.

PyPI ↗GitHub ↗

phionyx-mcp-servercurrent reference path

Trust boundary — signed envelope chain over third-party MCP tool calls.

PyPI ↗GitHub ↗

phionyx-eval-inspectcurrent reference path

Evidence bridge — Phionyx envelope chain → Inspect AI .eval log format.

PyPI ↗GitHub ↗

Observability records traces. Phionyx makes governance evidence verifiable.

Phionyx is not an observability platform. It works above the trace. The differences that matter for a third-party reviewer:

What	Observability platform	Phionyx
Records the trace	✓ yes	— (works above the trace)
Signs each turn	— no	✓ Ed25519 per envelope
Hash-chains across turns	— no	✓ SHA-256 chain
Verifiable without operator access	— operator-bound	✓ public key + chain alone
Tamper-evident	— editable storage	✓ chain break = verification fail

What this page is, and is not

Is: a technical research note on the agent-pair-programming strand of the Phionyx programme.
Is not: a product sales page, a certification claim, or a finished case-study publication.
Is: Phionyx’s answer to “how do I review an AI coding agent’s work without trusting its own story?”
Is not: a replacement for legal review, security review, or human oversight.

See the governance run — under attack

The problem

One install

A claim becomes evidence

The agent says "fixed."

phionyx-pipeline-mcp checks the diff.

A deterministic gate produces a verdict.

phionyx-mcp-server writes a signed envelope.

phionyx-eval-inspect exports the chain.

Tamper a byte → verification fails.

Per-edit language tools feed back into the same turn.

Fresh-context subagent reviews the semantic diff.

Three repos, one track

Observability records traces. Phionyx makes governance evidence verifiable.

What this page is, and is not