Agentic Development · Architecture

Five-layer governance stack

The architecture behind Phionyx-for-Agentic-Development is a five-layer governance stack: a memory / continuity layer and a project constitution at the base, deterministic MCP gate invocations and lifecycle hooks in the middle, and adversarial subagent review on top — all wrapped around a deterministic gate. The three-layer claim verification (agent self-claim vs repo truth vs deterministic gate) is the decision logic inside the gate layer, not the whole architecture. Everything here is the same architecture used by Phionyx’s own development loop, measured publicly under the Live coverage tab.

1. Five-layer governance stack

The stack runs top to bottom: the top layer (Layer 5, adversarial subagent review) is the latest line of defence and is invoked on demand; every layer below it is always-on, down to the foundational memory / continuity layer (Layer 1). Each layer narrows what the one above can wrongly let through.

5Subagent review

Adversarial diff review (manual)

A fresh-context, read-only reviewer (the diff-reviewer subagent) re-reads an uncommitted or just-committed diff with no memory of the implementing pass’s reasoning.

Catches semantic drift, contract / interface mismatch, and version-string drift the implementing pass is biased to miss.

4Lifecycle hooks

15 deterministic hooks (5 blocking + 10 observability)

Hooks fire on the host’s lifecycle events — session start, prompt submit, pre/post tool-use, stop, subagent stop, pre-compact. Blocking hooks gate commit / push, external-effect commands, large edits, subagent spawns, and answer finalization.

Observability hooks never block; they make every action visible without inflating the gate-coverage metric.

3MCP self-governance

Deterministic gate invocations

The MCP tools below (verify_claim, response_gate, causal_trace, verify_paths, checkpoint, session_report) are the gate the agent must call before it claims, commits, or deploys. This is where the three-layer verification core lives.

Extended by binding checks: claim-grounding, continuity-binding, and detector calibration.

2Constitution + skills

Advisory rules, on-demand skills

A project constitution plus on-demand skills and layer rules the assistant must follow — import boundaries, governance / safety, testing-evidence, and security.

Advisory, not blocking: it shapes behaviour; the gate and hooks enforce it.

1Memory / continuity

Persistent, schema-validated state

A persistent, schema-validated memory layer (semantic notes, prior corrections, verified external state) that carries goals and constraints across session boundaries.

Validated on every session boot against a fixed frontmatter schema.

Layer 4 in detail — the 15 lifecycle hooks

Five hooks block; ten only observe. Blocking hooks fail closed when a recent gate call is missing; observability hooks always pass and are excluded from the gate-coverage metric so visibility never inflates the score.

Blocking (5)
  • Commit / push gateA git commit or push requires a recent gate call first.
  • External-effect gategh PR / release / issue mutations, package publishes (npm / pip / twine), destructive git, and deploy commands require a recent gate call.
  • Large-edit gateEdits above a line threshold require a recent gate call; small edits are exempt.
  • Subagent-spawn gateSpawning a subagent requires a prior gate call — the subagent’s claims propagate upward.
  • Answer-finalization gateBlocks finalizing a response that asks about or claims something tied to an artifact not opened this turn.
Observability (10)
  • Session startResets or continues the session trace id.
  • Memory schema checkValidates every memory file’s frontmatter on boot.
  • User-prompt logRecords a hash + length of each prompt, not the raw text.
  • Pre-compact checkpointSnapshots gate state before context compression.
  • Subagent-stop attestationRecords a completed subagent.
  • Third-party MCP-call logRecords every non-Phionyx MCP tool call.
  • Commit attestationRecords the SHA on a successful commit.
  • External-ingress logRecords a web fetch / search URL or query.
  • Post-edit language checkRuns the right compiler / linter after each edit.
  • Targeted testsRuns the affected test directory when a turn ends.

Layers 1–2 in detail — constitution, skills, rules & memory

The base of the stack is advisory and always loaded. The constitution sets identity, forbidden actions, escalation triggers, and completion criteria; layer rules and on-demand skills specialise it; the memory layer carries the result across sessions.

Layer rules
  • Core boundaryThe engine stays framework-agnostic — no delivery framework may leak into it.
  • Governance / safetyGates are never deleted, only policy-bypassed with an audit trail.
  • Testing-evidenceA claim of “done” needs test output, not assertion.
  • SecurityNo secrets in source; external input is validated.
On-demand skills
  • Self-governance disciplineWhen to call which gate tool before claiming, committing, or deploying.
  • State-groundingVerify an external-state fact (version, deploy, liveness) before asserting it.
  • Binding-enforcementThe specification for the hook layer itself.

Memory (Layer 1) is a schema-validated continuity layer — semantic notes, prior corrections, and verified external state — checked against a fixed frontmatter schema on every session boot.

2. The verification core (three layers)

This is the decision logic inside Layer 3 (the MCP self-governance gate) of the stack above. Each verification step has a different source and a different trust level. The deterministic gate (layer 3) is what produces the decision; the other two layers exist to keep the gate honest.

1LLM Declaration

Source: Claude (stochastic)

The agent says what it changed, what it tested, and what evidence type it used.

Low — the agent can underreport to make a gate pass.

2Repo Truth

Source: git diff (deterministic)

git diff --name-only + git diff -U0 extract the files and functions that actually changed.

High — tool-derived, cannot be gamed by the agent.

3Gate Decision

Source: Pipeline blocks (deterministic)

Physics state + revision thresholds produce a directive. Same inputs → same output, every time.

Deterministic gate logic; verdicts are replayable from the signed record.

3. MCP tool surface (6 tools)

The pipeline MCP server (phionyx-pipeline-mcp) exposes these six tools to any MCP-capable host (Claude Code, Continue, Cursor, Aider). Each tool references a specific subset of the canonical 46-block pipeline.

phionyx_verify_claimTrigger: Before claim_fixed / claim_done

Nine-block claim verification — must be called before saying 'fixed' or 'done'.

Args: claimevidenceevidence_typecode_paths_testedcode_paths_affected
Pipeline blocks referenced: #3#15#16#23#37#38#39#41#44
phionyx_verify_pathsTrigger: To audit own declarations

Path declarations vs git diff — cross-checks the agent's stated affected/tested paths against the repo truth.

Args: claimed_affectedclaimed_tested
phionyx_causal_traceTrigger: When debugging

Causal-chain verification — input safety + confidence fusion + audit. Used when debugging.

Args: symptomcausal_chain
Pipeline blocks referenced: #3#37#38#39#44
phionyx_response_gateTrigger: Before committing / deploying / responding / asking

Action-type-specific revision gate — seven blocks (Block 15 added for the ask_question short-circuit that enforces artifact-grounding).

Args: action_typeconfidenceevidence_countevidence_typeaffects_user_facingartifact_references?artifact_paths_read?
Pipeline blocks referenced: #16#23#37#38#39#41#44
phionyx_session_reportTrigger: On demand / before reporting session complete

Session summary — w_final, trust, integrity, and mcp_envelope_chain join field.

Pipeline blocks referenced: #23#37#38
phionyx_checkpointTrigger: After subtask completion / context switch

Lightweight checkpoint — context note + telemetry write. Observability only, not a gate.

Args: context

What the referenced block numbers mean

Each tool above references a subset of the canonical 46-block pipeline (shown as #N). These are the blocks the MCP tools touch and what each one does — the mapping functions themselves are held back (see §7).

#BlockRole
#3input_safety_gateScreens the incoming request before any reasoning runs.
#15knowledge_boundary_checkFlags claims that reach beyond grounded / opened sources (the artifact-grounding short-circuit).
#16trust_evaluationScores how far the current declaration can be trusted.
#23behavioral_drift_detectionTracks how often recent claims were rejected or contradicted.
#37phi_computationComputes the integration signal (φ) of the current state.
#38entropy_computationComputes the uncertainty / disorder signal of the state.
#39confidence_fusionFuses the signals into a single confidence value.
#41response_revision_gateTurns the fused state into a directive (pass … reject) before the response is built.
#44audit_layerWrites the hash-chained, signed audit record of the decision.

4. Directive hierarchy

Gate verdicts. Listed in increasing severity. The most severe applicable directive wins when multiple thresholds trip.

1pass

Gate accepts the claim.

2damp

Soft signal — confidence reduced but action allowed.

3rewrite

Output must be rewritten before submission.

4regenerate

Output must be regenerated; current output rejected.

5reject

Hard reject — action blocked.

6require_tool

A factual / external claim with no same-turn evidence must bind evidence (run a tool) before it may proceed.

5. Threshold table

Action-type-specific thresholds. The stricter the action (claim_fixed ↔ deploy), the tighter the threshold.

ThresholdDefaultclaim_fixeddeployDirective
entropy_damp0.700.600.55damp
entropy_rewrite0.850.750.70rewrite
entropy_reject0.950.900.85reject
phi_min0.050.080.10regenerate
confidence_regenerate0.350.400.45regenerate
confidence_rewrite0.500.550.60rewrite
drift_rewrite0.600.500.45rewrite

6. Evidence taxonomy

Evidence types and their weights. The agent passes one of these labels to phionyx_verify_claim and phionyx_response_gate — but git diff is what the gate trusts.

TypeWeightDescription
browser_test0.9User flow tested in the browser.
manual_repro0.8Reproduced manually and verified.
integration_test0.7End-to-end test across multiple components.
endpoint_test0.6API endpoint tested directly.
log_inspection0.5Logs / output inspected for expected behaviour.
unit_test0.4Isolated unit test of a single function.
code_review0.3Code read; not executed.
none0.0No evidence.

7. Claim → state (conceptual)

Four signals from the agent’s declaration map into three internal state variables that drive the gate decision:

  • Effective coverage (declared paths tested / declared paths affected, cross-checked against git diff) → drives the entropy variable.
  • Evidence quality (weight from the taxonomy in §6) → modulates the valence variable. No evidence pushes valence negative; high-weight evidence on high-coverage claims pushes it positive.
  • Session drift (running measure of how often recent claims were rejected or contradicted) → drives the arousal variable.
  • User-facing flag (whether the action affects an end user) → adds to arousal; higher stakes shrink the directive’s pass region.

The exact mapping functions are part of the underlying specification and are not published here. The mapping is deterministic — identical inputs produce identical state — and the threshold table in §5 is the public observable for how state crosses into the directive.

This is the public reference for the architecture used by phionyx-pipeline-mcp + phionyx-mcp-server + phionyx-eval-inspect. The canonical 46-block pipeline lives in halvrenofviryel/phionyx-research — this page describes the subset the MCP tools reference. Live installation status (which version is running, which config file is present) is a founder-side diagnostic and not exposed publicly.

Agentic Development · Architecture — five-layer governance stack