Phionyx for Agentic Development

Signed evidence for AI coding agents.

AI coding agents can edit files, run tools, write commits, and describe what they did. Phionyx treats those descriptions as claims, not proof. LLM output is a noisy measurement, not a final answer.

This is the self-governance pillar: an AI assistant’s own self-claims (“I fixed it”, “I tested it”), its tool calls, and its trace events are bound into verifiable runtime-evidence chains. The agent can propose. The runtime must preserve what actually happened.

Phionyx does not make agents trustworthy by asking them to behave. It makes their governance path inspectable, signed, scoped, and replayable.

The problem

A coding agent says it fixed a bug.

The implementation did not change.

The test was quietly edited or disabled.

Without runtime evidence, the reviewer is left reading the agent’s story. Phionyx changes the review question from “do I trust the agent?” to “can I inspect the path?”

One install

pip install phionyx-pipeline-mcp phionyx-mcp-server phionyx-eval-inspect

Add the two MCP servers to your host’s .mcp.json. Claude Code is the primary tested host. Other MCP-capable clients (Cursor, Zed, VS Code, JetBrains) share the same protocol but are not actively tested by us — community PRs welcome.

A claim becomes evidence

Steps 1–6 are the gate pipeline: an agent’s claim is checked against the real diff, turned into a signed verdict, and written to a tamper-evident chain a reviewer can replay. Steps 7–8 are Claude Code harness tooling that closes the loop inside the same turn.

  1. 01

    The agent says "fixed."

    A coding agent — Claude Code is the primary tested host; any MCP-capable host can be wired the same way — edits files, runs tests, declares the bug fixed.

  2. 02

    phionyx-pipeline-mcp checks the diff.

    The MCP tool reads the actual git diff and compares it against the agent's claim. If the test was edited and the implementation file was untouched, the claim is unsupported.

  3. 03

    A deterministic gate produces a verdict.

    phionyx_response_gate(action_type="claim_fixed", ...) returns one of five directives: pass | damp | rewrite | regenerate | reject. Same inputs, same verdict — every time.

  4. 04

    phionyx-mcp-server writes a signed envelope.

    The decision, the inputs, the timing — everything goes into a hash-chained, Ed25519-signed envelope. Tamper-evident from this point on.

  5. 05

    phionyx-eval-inspect exports the chain.

    An external reviewer with Inspect AI opens the .eval log; the agent's turn can be reconstructed from the persisted chain alone, without operator-side infrastructure.

  6. 06

    Tamper a byte → verification fails.

    Modify any past envelope; verify_chain_integrity reports the break + reason. The audit chain is honest about its own integrity.

  7. 07

    Per-edit language tools feed back into the same turn.

    After every Edit/Write, a harness hook runs the cheapest appropriate check (py_compile + ruff for Python, scoped tsc --noEmit for TypeScript, json/yaml schema parse, memory frontmatter validation). Findings go to stderr; the next assistant turn sees them and corrects in place. Bounded: ≤ 8 s per file.

  8. 08

    Fresh-context subagent reviews the semantic diff.

    For commits touching schema, exit codes, version strings, or public copy, the harness invokes a diff-reviewer subagent. It runs in a fresh context window — no bias from the implementing session's reasoning — with Read-only tools and seven correctness-finding categories. Catches the semantic bugs the syntactic hooks can't.

Three repos, one track

These three repos are the current reference path through the broader Phionyx ecosystem. Other adapters (LangChain, OpenAI Agents) sit alongside them; this page focuses on the agent-pair-programming wedge — binding an AI assistant’s own self-claims into verifiable runtime evidence.

phionyx-pipeline-mcpcurrent reference path

Self-governance gate — verifies agent self-claims against git-diff truth.

phionyx-mcp-servercurrent reference path

Trust boundary — signed envelope chain over third-party MCP tool calls.

phionyx-eval-inspectcurrent reference path

Evidence bridge — Phionyx envelope chain → Inspect AI .eval log format.

Observability records traces. Phionyx makes governance evidence verifiable.

Phionyx is not an observability platform. It works above the trace. The differences that matter for a third-party reviewer:

WhatObservability platformPhionyx
Records the trace✓ yes— (works above the trace)
Signs each turn— no✓ Ed25519 per envelope
Hash-chains across turns— no✓ SHA-256 chain
Verifiable without operator access— operator-bound✓ public key + chain alone
Tamper-evident— editable storage✓ chain break = verification fail

What this page is, and is not

  • Is: a technical case landing for the agent-pair-programming wedge of the Phionyx programme.
  • Is not: a product sales page, a certification claim, or a finished case-study publication.
  • Is: Phionyx’s answer to “how do I review an AI coding agent’s work without trusting its own story?”
  • Is not: a replacement for legal review, security review, or human oversight.

The signed, hash-chained record this gate writes for each decision is Phionyx’s governed-response envelope, which maps onto AIREP, the vendor- and model-independent AI Runtime Evidence Protocol that Phionyx is the reference implementation of.

All three packages are AGPL-3.0. Numeric claims on this page are tied to pinned commits, the self-audit script, and the canonical evidence table in each package’s source repository.