Agentic Development · Architecture

Five-layer governance stack

Name: phionyx-core
Author: Phionyx

The architecture behind Phionyx-for-Agentic-Development is a five-layer governance stack: a memory / continuity layer and a project constitution at the base, deterministic gate invocations (exposed as MCP tools) and lifecycle hooks in the middle, and adversarial subagent review on top — all wrapped around a deterministic gate. The three-layer claim verification (agent self-claim vs repo truth vs deterministic gate) is the decision logic inside the gate layer, not the whole architecture. Everything here is the same architecture used by Phionyx’s own development loop, measured publicly under the Coverage snapshot tab.

Plain-language key

MCP (Model Context Protocol): — the open standard that lets an AI coding tool call external tools. Phionyx ships its gate as MCP tools, so any MCP-capable editor can use it.
Gate: — a deterministic check the agent must call before it claims, commits, or deploys. Same inputs produce the same verdict, every time.
Block: — one ordered step in the pipeline. The canonical pipeline is a fixed sequence of 46 blocks; each does one job.
Signed, hash-chained record: — every gate decision is written to a tamper-evident log: each entry links to the previous one by hash, so altering any entry breaks verification at exactly that link.
Abstention: — when the system cannot back an answer with evidence, it refuses or defers instead of guessing.

1. Five-layer governance stack

The stack runs top to bottom: the top layer (Layer 5, adversarial subagent review) is the latest line of defence and is invoked on demand; every layer below it is always-on, down to the foundational memory / continuity layer (Layer 1). Each layer narrows what the one above can wrongly let through.

5Subagent review

Adversarial diff review (manual)

A fresh-context, read-only reviewer (the diff-reviewer subagent) re-reads an uncommitted or just-committed diff with no memory of the implementing pass’s reasoning.

Catches semantic drift, contract / interface mismatch, and version-string drift the implementing pass is biased to miss.

4Lifecycle hooks

15 deterministic hooks (5 blocking + 10 observability)

Hooks fire on the host’s lifecycle events — session start, prompt submit, pre/post tool-use, stop, subagent stop, pre-compact. Blocking hooks gate commit / push, external-effect commands, large edits, subagent spawns, and answer finalization.

Observability hooks never block; they make every action visible without inflating the gate-coverage metric.

3MCP self-governance

Deterministic gate invocations

The MCP tools below (verify_claim, response_gate, causal_trace, verify_paths, checkpoint, session_report) are the gate the agent must call before it claims, commits, or deploys. This is where the three-layer verification core lives.

Extended by binding checks: claim-grounding, continuity-binding, and detector calibration.

2Constitution + skills

Advisory rules, on-demand skills

A project constitution plus on-demand skills and layer rules the assistant must follow — import boundaries, governance / safety, testing-evidence, and security.

Advisory, not blocking: it shapes behaviour; the gate and hooks enforce it.

1Memory / continuity

Persistent, schema-validated state

A persistent, schema-validated memory layer (semantic notes, prior corrections, verified external state) that carries goals and constraints across session boundaries.

Validated on every session boot against a fixed frontmatter schema.

Layer 4 in detail — the 15 lifecycle hooks

Five hooks block; ten only observe. Blocking hooks fail closed when a recent gate call is missing; observability hooks always pass and are excluded from the gate-coverage metric so visibility never inflates the score.

Blocking (5)

Commit / push gate — A git commit or push requires a recent gate call first.
External-effect gate — gh PR / release / issue mutations, package publishes (npm / pip / twine), destructive git, and deploy commands require a recent gate call.
Large-edit gate — Edits above a line threshold require a recent gate call; small edits are exempt.
Subagent-spawn gate — Spawning a subagent requires a prior gate call — the subagent’s claims propagate upward.
Answer-finalization gate — Blocks finalizing a response that asks about or claims something tied to an artifact not opened this turn.

Observability (10)

Session start — Resets or continues the session trace id.
Memory schema check — Validates every memory file’s frontmatter on boot.
User-prompt log — Records a hash + length of each prompt, not the raw text.
Pre-compact checkpoint — Snapshots gate state before context compression.
Subagent-stop attestation — Records a completed subagent.
Third-party MCP-call log — Records every non-Phionyx MCP tool call.
Commit attestation — Records the SHA on a successful commit.
External-ingress log — Records a web fetch / search URL or query.
Post-edit language check — Runs the right compiler / linter after each edit.
Targeted tests — Runs the affected test directory when a turn ends.

Layers 1–2 in detail — constitution, skills, rules & memory

The base of the stack is advisory and always loaded. The constitution sets identity, forbidden actions, escalation triggers, and completion criteria; layer rules and on-demand skills specialise it; the memory layer carries the result across sessions.

Layer rules

Core boundary — The engine stays framework-agnostic — no delivery framework may leak into it.
Governance / safety — Gates are never deleted, only policy-bypassed with an audit trail.
Testing-evidence — A claim of “done” needs test output, not assertion.
Security — No secrets in source; external input is validated.

On-demand skills

Self-governance discipline — When to call which gate tool before claiming, committing, or deploying.
State-grounding — Verify an external-state fact (version, deploy, liveness) before asserting it.
Binding-enforcement — The specification for the hook layer itself.

Memory (Layer 1) is a schema-validated continuity layer — semantic notes, prior corrections, and verified external state — checked against a fixed frontmatter schema on every session boot.

2. The verification core (three layers)

This is the decision logic inside Layer 3 (the MCP self-governance gate) of the stack above. Each verification step has a different source and a different trust level. The deterministic gate (layer 3) is what produces the decision; the other two layers exist to keep the gate honest.

1LLM Declaration

Source: Claude (stochastic)

The agent says what it changed, what it tested, and what evidence type it used.

Low — the agent can underreport to make a gate pass.

2Repo Truth

Source: git diff (deterministic)

git diff --name-only + git diff -U0 extract the files and functions that actually changed.

High — tool-derived, cannot be gamed by the agent.

3Gate Decision

Source: Pipeline blocks (deterministic)

Physics state + revision thresholds produce a directive. Same inputs → same output, every time.

Deterministic gate logic; verdicts are replayable from the signed record.

3. MCP tool surface (6 tools)

The pipeline MCP server (phionyx-pipeline-mcp) exposes these six tools to any MCP-capable host (Claude Code, Continue, Cursor, Aider). Each tool references a specific subset of the canonical 46-block pipeline.

phionyx_verify_claimTrigger: Before claim_fixed / claim_done

Nine-block claim verification — must be called before saying 'fixed' or 'done'.

Args: claimevidenceevidence_typecode_paths_testedcode_paths_affected

Pipeline blocks referenced: #3#15#16#23#37#38#39#41#44

phionyx_verify_pathsTrigger: To audit own declarations

Path declarations vs git diff — cross-checks the agent's stated affected/tested paths against the repo truth.

Args: claimed_affectedclaimed_tested

phionyx_causal_traceTrigger: When debugging

Causal-chain verification — input safety + confidence fusion + audit. Used when debugging.

Args: symptomcausal_chain

Pipeline blocks referenced: #3#37#38#39#44

phionyx_response_gateTrigger: Before committing / deploying / responding / asking

Action-type-specific revision gate — seven blocks (Block 15 added for the ask_question short-circuit that enforces artifact-grounding).

Args: action_typeconfidenceevidence_countevidence_typeaffects_user_facingartifact_references?artifact_paths_read?

Pipeline blocks referenced: #16#23#37#38#39#41#44

phionyx_session_reportTrigger: On demand / before reporting session complete

Session summary — w_final, trust, integrity, and mcp_envelope_chain join field.

Pipeline blocks referenced: #23#37#38

phionyx_checkpointTrigger: After subtask completion / context switch

Lightweight checkpoint — context note + telemetry write. Observability only, not a gate.

Args: context

What the referenced block numbers mean

Each tool above references a subset of the canonical 46-block pipeline (shown as #N). These are the blocks the MCP tools touch and what each one does — the mapping functions themselves are held back (see §7).

#	Block	Role
#3	input_safety_gate	Screens the incoming request before any reasoning runs.
#15	knowledge_boundary_check	Flags claims that reach beyond grounded / opened sources (the artifact-grounding short-circuit).
#16	trust_evaluation	Scores how far the current declaration can be trusted.
#23	behavioral_drift_detection	Tracks how often recent claims were rejected or contradicted.
#37	phi_computation	Computes how internally coherent the current state is (the integration signal, φ).
#38	entropy_computation	Computes the uncertainty / disorder signal of the state.
#39	confidence_fusion	Fuses the signals into a single confidence value.
#41	response_revision_gate	Turns the fused state into a directive (pass … reject) before the response is built.
#44	audit_layer	Writes the hash-chained, signed audit record of the decision.

4. Directive hierarchy

Gate verdicts. Listed in increasing severity. The most severe applicable directive wins when multiple thresholds trip.

1pass

Gate accepts the claim.

2damp

Soft signal — confidence reduced but action allowed.

3rewrite

Output must be rewritten before submission.

4regenerate

Output must be regenerated; current output rejected.

5reject

Hard reject — action blocked.

6require_tool

A factual / external claim with no same-turn evidence must bind evidence (run a tool) before it may proceed.

5. Threshold table

Action-type-specific thresholds. The stricter the action (claim_fixed ↔ deploy), the tighter the threshold.

Threshold	Default	claim_fixed	deploy	Directive
entropy_damp	0.70	0.60	0.55	`damp`
entropy_rewrite	0.85	0.75	0.70	`rewrite`
entropy_reject	0.95	0.90	0.85	`reject`
phi_min	0.05	0.08	0.10	`regenerate`
confidence_regenerate	0.35	0.40	0.45	`regenerate`
confidence_rewrite	0.50	0.55	0.60	`rewrite`
drift_rewrite	0.60	0.50	0.45	`rewrite`

6. Evidence taxonomy

Evidence types and their weights. The agent passes one of these labels to phionyx_verify_claim and phionyx_response_gate — but git diff is what the gate trusts.

Type	Weight	Description
browser_test	0.9	User flow tested in the browser.
manual_repro	0.8	Reproduced manually and verified.
integration_test	0.7	End-to-end test across multiple components.
endpoint_test	0.6	API endpoint tested directly.
log_inspection	0.5	Logs / output inspected for expected behaviour.
unit_test	0.4	Isolated unit test of a single function.
code_review	0.3	Code read; not executed.
none	0.0	No evidence.

7. Claim → state (conceptual)

Four signals from the agent’s declaration map into three internal state variables that drive the gate decision:

Effective coverage (declared paths tested / declared paths affected, cross-checked against git diff) → drives the entropy variable.
Evidence quality (weight from the taxonomy in §6) → modulates the valence variable. No evidence pushes valence negative; high-weight evidence on high-coverage claims pushes it positive.
Session drift (running measure of how often recent claims were rejected or contradicted) → drives the arousal variable.
User-facing flag (whether the action affects an end user) → adds to arousal; higher stakes shrink the directive’s pass region.

The exact mapping functions are part of the underlying specification and are not published here. The mapping is deterministic — identical inputs produce identical state — and the threshold table in §5 is the public observable for how state crosses into the directive.