CONCEPTS · QUALITY GATES

Quality gates

Quality gates are the non-negotiable checks every change passes before it merges. They exist because parallel agentic work amplifies both speed and risk — and speed is only useful when the output is trustworthy.

Two layers of gates

Synthex enforces quality at two layers, and a change has to clear both:

  1. Acceptance criteria gates — every plan task carries typed criteria that must each be satisfied before the task is marked done.
  2. Reviewer gates — every code change runs through code review and security review (and any project-specific specialists) before merge.

The acceptance criteria are scoped to what the task was supposed to accomplish. The reviewer gates are scoped to what the diff did to the codebase. Both are required.

Typed acceptance criteria

Each plan task has one or more acceptance criteria, each tagged with a type:

TagMeaningValidation
[T]Testable — can be proven by an automated testThe task must include a passing test that proves the criterion
[H]Human-validated — requires explicit user approvalnext-priority/synthex:next-priority calls AskUserQuestion before merge
[O]Observational — measurable only after deploymentRecorded as a post-deployment metric; no merge-time validation

The contract is strict: if a [T] criterion exists, the test that proves it must exist and pass before the task closes. The plan record explicitly links each [T] criterion to the test file and test name that satisfied it. If a [T] criterion has no linked test, the task gets sent back, no exceptions.

[H] criteria gate the merge until the user clicks Approve in AskUserQuestion. The orchestrator schedules [H] tasks early in parallel batches so the review can overlap with autonomous [T]-only work.

[O] criteria don't block merge. They become part of the milestone or phase observational outcomes, tracked in the plan and reviewed during retrospectives.

Reviewer gates

Two reviewers always run on every diff:

  • Code reviewer — correctness, maintainability, convention adherence, specification compliance, reuse opportunities. Modeled on Google's code review standards: favor approving code that improves the system over chasing perfection.
  • Security reviewer — vulnerability classes, input validation, secret handling, authorization paths, dependency risk.

Both run in parallel; both emit structured findings; both produce an explicit PASS / WARN / FAIL verdict. They are advisory in form — they recommend, they don't merge — but their verdict feeds the gate decision.

Optional specialists attach when configured for the project:

  • Performance engineer — flagged for performance-critical surfaces (request hot paths, rendering loops, large-list interactions).
  • Design system reviewer — checks token compliance and component-API drift against docs/specs/design-system.md.
  • Reliability / SRE reviewer — checks SLO impact, error budgets, runbook coverage.
  • Accessibility / domain reviewers — added per project (compliance, regulated industries).

Configure them in .synthex/config.yaml under the relevant reviewers: block. See Configuration for the per-section keys.

Severity policy

Reviewers tag each finding with one of four severity levels:

SeverityMeaning
criticalMust fix before merge. Security vulnerability, data loss risk, broken correctness
highMust fix before merge. Significant defect, major maintainability problem
mediumShould address. Notable concern but doesn't block merge by default
lowNice to address. Suggestions, style, minor refactor opportunities

The default policy is min_severity_to_address: high — meaning the review loop keeps iterating until all critical and high findings are resolved. medium and low findings are surfaced and recorded but do not block the merge. Adjust per command in .synthex/config.yaml:

review_loops:
  max_cycles: 2
  min_severity_to_address: high

Tightening to medium raises the bar; loosening to critical allows more high findings through. The trade-off is throughput versus thoroughness, and the right answer depends on the maturity of the codebase and the team's appetite for technical debt.

The review loop

Reviews aren't one-shot. The Tech Lead runs a bounded loop:

  1. Reviewers fan out in parallel against the diff.
  2. The Findings Consolidator deduplicates and prioritizes the raw findings (it never adds new findings or changes severity — it just consolidates).
  3. The Tech Lead addresses everything at or above the severity threshold.
  4. Reviewers re-run on the updated diff.
  5. Repeat until the threshold is met or max_cycles is exhausted.

The cycle cap prevents runaway loops. If the loop exits with unresolved findings, those findings are recorded in the task's completion summary and the user decides whether to merge anyway, send back for further iteration, or split the remaining work into a follow-up task.

The default is two cycles for most commands; write-implementation-plan/synthex:write-implementation-plan gets three because plan defects ripple downstream.

Escalation

Two escalation paths exist when the routine flow can't resolve an issue:

  • [H] review on a [T]-only task. If reviewers and the Tech Lead disagree about whether a finding is real, the orchestrator can ask the user via AskUserQuestion for adjudication. The task waits in awaiting [H] until the user weighs in.
  • Persistent blocker. When a task can't proceed (failing dependency, environmental issue, truly stuck on the right approach), it's marked blocked in the plan with diagnostic notes and the orchestrator moves on to other actionable work. The user resolves the blocker on their schedule; the next next-priority/synthex:next-priority run picks the task back up.

What gates do not cover

A few classes of correctness sit outside the merge gates:

  • Production monitoring. [O] criteria are validated post-deploy, not at merge. The reliability/SRE review can flag SLO concerns pre-merge but doesn't simulate production load.
  • User research. Whether the feature actually solves the problem stated in the PRD is a Discover-phase question, not a Build-phase gate.
  • Broad architectural drift. Each task review is local to its diff. Cross-cutting architectural concerns are caught by the Architect during plan review, not at code-review time.

This is by design — gates that try to validate everything end up validating nothing.

Pre-commit hooks

Project-level pre-commit hooks run alongside the agent gates. They are not optional. If a hook fails, the Tech Lead investigates and fixes the underlying issue rather than bypassing it. --no-verify, --no-gpg-sign, and similar flags are forbidden unless the user explicitly authorizes them for a specific commit — the project's quality bar lives in the hooks, and silently skipping them defeats the entire model.

Next