Five-layer governance stack
The architecture behind Phionyx-for-Agentic-Development is a five-layer governance stack: a memory / continuity layer and a project constitution at the base, deterministic MCP gate invocations and lifecycle hooks in the middle, and adversarial subagent review on top — all wrapped around a deterministic gate. The three-layer claim verification (agent self-claim vs repo truth vs deterministic gate) is the decision logic inside the gate layer, not the whole architecture. Everything here is the same architecture used by Phionyx’s own development loop, measured publicly under the Live coverage tab.
1. Five-layer governance stack
The stack runs top to bottom: the top layer (Layer 5, adversarial subagent review) is the latest line of defence and is invoked on demand; every layer below it is always-on, down to the foundational memory / continuity layer (Layer 1). Each layer narrows what the one above can wrongly let through.
Adversarial diff review (manual)
A fresh-context, read-only reviewer (the diff-reviewer subagent) re-reads an uncommitted or just-committed diff with no memory of the implementing pass’s reasoning.
Catches semantic drift, contract / interface mismatch, and version-string drift the implementing pass is biased to miss.
15 deterministic hooks (5 blocking + 10 observability)
Hooks fire on the host’s lifecycle events — session start, prompt submit, pre/post tool-use, stop, subagent stop, pre-compact. Blocking hooks gate commit / push, external-effect commands, large edits, subagent spawns, and answer finalization.
Observability hooks never block; they make every action visible without inflating the gate-coverage metric.
Deterministic gate invocations
The MCP tools below (verify_claim, response_gate, causal_trace, verify_paths, checkpoint, session_report) are the gate the agent must call before it claims, commits, or deploys. This is where the three-layer verification core lives.
Extended by binding checks: claim-grounding, continuity-binding, and detector calibration.
Advisory rules, on-demand skills
A project constitution plus on-demand skills and layer rules the assistant must follow — import boundaries, governance / safety, testing-evidence, and security.
Advisory, not blocking: it shapes behaviour; the gate and hooks enforce it.
Persistent, schema-validated state
A persistent, schema-validated memory layer (semantic notes, prior corrections, verified external state) that carries goals and constraints across session boundaries.
Validated on every session boot against a fixed frontmatter schema.
Layer 4 in detail — the 15 lifecycle hooks
Five hooks block; ten only observe. Blocking hooks fail closed when a recent gate call is missing; observability hooks always pass and are excluded from the gate-coverage metric so visibility never inflates the score.
- Commit / push gate — A git commit or push requires a recent gate call first.
- External-effect gate — gh PR / release / issue mutations, package publishes (npm / pip / twine), destructive git, and deploy commands require a recent gate call.
- Large-edit gate — Edits above a line threshold require a recent gate call; small edits are exempt.
- Subagent-spawn gate — Spawning a subagent requires a prior gate call — the subagent’s claims propagate upward.
- Answer-finalization gate — Blocks finalizing a response that asks about or claims something tied to an artifact not opened this turn.
- Session start — Resets or continues the session trace id.
- Memory schema check — Validates every memory file’s frontmatter on boot.
- User-prompt log — Records a hash + length of each prompt, not the raw text.
- Pre-compact checkpoint — Snapshots gate state before context compression.
- Subagent-stop attestation — Records a completed subagent.
- Third-party MCP-call log — Records every non-Phionyx MCP tool call.
- Commit attestation — Records the SHA on a successful commit.
- External-ingress log — Records a web fetch / search URL or query.
- Post-edit language check — Runs the right compiler / linter after each edit.
- Targeted tests — Runs the affected test directory when a turn ends.
Layers 1–2 in detail — constitution, skills, rules & memory
The base of the stack is advisory and always loaded. The constitution sets identity, forbidden actions, escalation triggers, and completion criteria; layer rules and on-demand skills specialise it; the memory layer carries the result across sessions.
- Core boundary — The engine stays framework-agnostic — no delivery framework may leak into it.
- Governance / safety — Gates are never deleted, only policy-bypassed with an audit trail.
- Testing-evidence — A claim of “done” needs test output, not assertion.
- Security — No secrets in source; external input is validated.
- Self-governance discipline — When to call which gate tool before claiming, committing, or deploying.
- State-grounding — Verify an external-state fact (version, deploy, liveness) before asserting it.
- Binding-enforcement — The specification for the hook layer itself.
Memory (Layer 1) is a schema-validated continuity layer — semantic notes, prior corrections, and verified external state — checked against a fixed frontmatter schema on every session boot.
2. The verification core (three layers)
This is the decision logic inside Layer 3 (the MCP self-governance gate) of the stack above. Each verification step has a different source and a different trust level. The deterministic gate (layer 3) is what produces the decision; the other two layers exist to keep the gate honest.
Source: Claude (stochastic)
The agent says what it changed, what it tested, and what evidence type it used.
Low — the agent can underreport to make a gate pass.
Source: git diff (deterministic)
git diff --name-only + git diff -U0 extract the files and functions that actually changed.
High — tool-derived, cannot be gamed by the agent.
Source: Pipeline blocks (deterministic)
Physics state + revision thresholds produce a directive. Same inputs → same output, every time.
Deterministic gate logic; verdicts are replayable from the signed record.
3. MCP tool surface (6 tools)
The pipeline MCP server (phionyx-pipeline-mcp) exposes these six tools to any MCP-capable host (Claude Code, Continue, Cursor, Aider). Each tool references a specific subset of the canonical 46-block pipeline.
phionyx_verify_claimTrigger: Before claim_fixed / claim_doneNine-block claim verification — must be called before saying 'fixed' or 'done'.
claimevidenceevidence_typecode_paths_testedcode_paths_affectedphionyx_verify_pathsTrigger: To audit own declarationsPath declarations vs git diff — cross-checks the agent's stated affected/tested paths against the repo truth.
claimed_affectedclaimed_testedphionyx_causal_traceTrigger: When debuggingCausal-chain verification — input safety + confidence fusion + audit. Used when debugging.
symptomcausal_chainphionyx_response_gateTrigger: Before committing / deploying / responding / askingAction-type-specific revision gate — seven blocks (Block 15 added for the ask_question short-circuit that enforces artifact-grounding).
action_typeconfidenceevidence_countevidence_typeaffects_user_facingartifact_references?artifact_paths_read?phionyx_session_reportTrigger: On demand / before reporting session completeSession summary — w_final, trust, integrity, and mcp_envelope_chain join field.
phionyx_checkpointTrigger: After subtask completion / context switchLightweight checkpoint — context note + telemetry write. Observability only, not a gate.
contextWhat the referenced block numbers mean
Each tool above references a subset of the canonical 46-block pipeline (shown as #N). These are the blocks the MCP tools touch and what each one does — the mapping functions themselves are held back (see §7).
| # | Block | Role |
|---|---|---|
| #3 | input_safety_gate | Screens the incoming request before any reasoning runs. |
| #15 | knowledge_boundary_check | Flags claims that reach beyond grounded / opened sources (the artifact-grounding short-circuit). |
| #16 | trust_evaluation | Scores how far the current declaration can be trusted. |
| #23 | behavioral_drift_detection | Tracks how often recent claims were rejected or contradicted. |
| #37 | phi_computation | Computes the integration signal (φ) of the current state. |
| #38 | entropy_computation | Computes the uncertainty / disorder signal of the state. |
| #39 | confidence_fusion | Fuses the signals into a single confidence value. |
| #41 | response_revision_gate | Turns the fused state into a directive (pass … reject) before the response is built. |
| #44 | audit_layer | Writes the hash-chained, signed audit record of the decision. |
4. Directive hierarchy
Gate verdicts. Listed in increasing severity. The most severe applicable directive wins when multiple thresholds trip.
passGate accepts the claim.
dampSoft signal — confidence reduced but action allowed.
rewriteOutput must be rewritten before submission.
regenerateOutput must be regenerated; current output rejected.
rejectHard reject — action blocked.
require_toolA factual / external claim with no same-turn evidence must bind evidence (run a tool) before it may proceed.
5. Threshold table
Action-type-specific thresholds. The stricter the action (claim_fixed ↔ deploy), the tighter the threshold.
| Threshold | Default | claim_fixed | deploy | Directive |
|---|---|---|---|---|
| entropy_damp | 0.70 | 0.60 | 0.55 | damp |
| entropy_rewrite | 0.85 | 0.75 | 0.70 | rewrite |
| entropy_reject | 0.95 | 0.90 | 0.85 | reject |
| phi_min | 0.05 | 0.08 | 0.10 | regenerate |
| confidence_regenerate | 0.35 | 0.40 | 0.45 | regenerate |
| confidence_rewrite | 0.50 | 0.55 | 0.60 | rewrite |
| drift_rewrite | 0.60 | 0.50 | 0.45 | rewrite |
6. Evidence taxonomy
Evidence types and their weights. The agent passes one of these labels to phionyx_verify_claim and phionyx_response_gate — but git diff is what the gate trusts.
| Type | Weight | Description |
|---|---|---|
| browser_test | 0.9 | User flow tested in the browser. |
| manual_repro | 0.8 | Reproduced manually and verified. |
| integration_test | 0.7 | End-to-end test across multiple components. |
| endpoint_test | 0.6 | API endpoint tested directly. |
| log_inspection | 0.5 | Logs / output inspected for expected behaviour. |
| unit_test | 0.4 | Isolated unit test of a single function. |
| code_review | 0.3 | Code read; not executed. |
| none | 0.0 | No evidence. |
7. Claim → state (conceptual)
Four signals from the agent’s declaration map into three internal state variables that drive the gate decision:
- Effective coverage (declared paths tested / declared paths affected, cross-checked against git diff) → drives the entropy variable.
- Evidence quality (weight from the taxonomy in §6) → modulates the valence variable. No evidence pushes valence negative; high-weight evidence on high-coverage claims pushes it positive.
- Session drift (running measure of how often recent claims were rejected or contradicted) → drives the arousal variable.
- User-facing flag (whether the action affects an end user) → adds to arousal; higher stakes shrink the directive’s pass region.
The exact mapping functions are part of the underlying specification and are not published here. The mapping is deterministic — identical inputs produce identical state — and the threshold table in §5 is the public observable for how state crosses into the directive.