Quality gates
Quality gates are the non-negotiable checks every change passes before it merges. They exist because parallel agentic work amplifies both speed and risk — and speed is only useful when the output is trustworthy.
Two layers of gates
Synthex enforces quality at two layers, and a change has to clear both:
- Acceptance criteria gates — every plan task carries typed criteria that must each be satisfied before the task is marked done.
- Reviewer gates — every code change runs through code review and security review (and any project-specific specialists) before merge.
The acceptance criteria are scoped to what the task was supposed to accomplish. The reviewer gates are scoped to what the diff did to the codebase. Both are required.
Typed acceptance criteria
Each plan task has one or more acceptance criteria, each tagged with a type:
| Tag | Meaning | Validation |
|---|---|---|
[T] | Testable — can be proven by an automated test | The task must include a passing test that proves the criterion |
[H] | Human-validated — requires explicit user approval | next-priority/synthex:next-priority calls AskUserQuestion before merge |
[O] | Observational — measurable only after deployment | Recorded as a post-deployment metric; no merge-time validation |
The contract is strict: if a [T] criterion exists, the test that proves it must exist and pass
before the task closes. The plan record explicitly links each [T] criterion to the test file
and test name that satisfied it. If a [T] criterion has no linked test, the task gets sent
back, no exceptions.
[H] criteria gate the merge until the user clicks Approve in AskUserQuestion. The
orchestrator schedules [H] tasks early in parallel batches so the review can overlap with
autonomous [T]-only work.
[O] criteria don't block merge. They become part of the milestone or phase observational
outcomes, tracked in the plan and reviewed during retrospectives.
Reviewer gates
Two reviewers always run on every diff:
- Code reviewer — correctness, maintainability, convention adherence, specification compliance, reuse opportunities. Modeled on Google's code review standards: favor approving code that improves the system over chasing perfection.
- Security reviewer — vulnerability classes, input validation, secret handling, authorization paths, dependency risk.
Both run in parallel; both emit structured findings; both produce an explicit PASS / WARN / FAIL verdict. They are advisory in form — they recommend, they don't merge — but their verdict feeds the gate decision.
Optional specialists attach when configured for the project:
- Performance engineer — flagged for performance-critical surfaces (request hot paths, rendering loops, large-list interactions).
- Design system reviewer — checks token compliance and component-API drift against
docs/specs/design-system.md. - Reliability / SRE reviewer — checks SLO impact, error budgets, runbook coverage.
- Accessibility / domain reviewers — added per project (compliance, regulated industries).
Configure them in .synthex/config.yaml under the relevant reviewers: block. See
Configuration for the per-section keys.
Severity policy
Reviewers tag each finding with one of four severity levels:
| Severity | Meaning |
|---|---|
critical | Must fix before merge. Security vulnerability, data loss risk, broken correctness |
high | Must fix before merge. Significant defect, major maintainability problem |
medium | Should address. Notable concern but doesn't block merge by default |
low | Nice to address. Suggestions, style, minor refactor opportunities |
The default policy is min_severity_to_address: high — meaning the review loop keeps iterating
until all critical and high findings are resolved. medium and low findings are surfaced
and recorded but do not block the merge. Adjust per command in .synthex/config.yaml:
review_loops:
max_cycles: 2
min_severity_to_address: highTightening to medium raises the bar; loosening to critical allows more high findings
through. The trade-off is throughput versus thoroughness, and the right answer depends on the
maturity of the codebase and the team's appetite for technical debt.
The review loop
Reviews aren't one-shot. The Tech Lead runs a bounded loop:
- Reviewers fan out in parallel against the diff.
- The Findings Consolidator deduplicates and prioritizes the raw findings (it never adds new findings or changes severity — it just consolidates).
- The Tech Lead addresses everything at or above the severity threshold.
- Reviewers re-run on the updated diff.
- Repeat until the threshold is met or
max_cyclesis exhausted.
The cycle cap prevents runaway loops. If the loop exits with unresolved findings, those findings are recorded in the task's completion summary and the user decides whether to merge anyway, send back for further iteration, or split the remaining work into a follow-up task.
The default is two cycles for most commands; write-implementation-plan/synthex:write-implementation-plan gets three because plan defects ripple downstream.
Escalation
Two escalation paths exist when the routine flow can't resolve an issue:
[H]review on a[T]-only task. If reviewers and the Tech Lead disagree about whether a finding is real, the orchestrator can ask the user viaAskUserQuestionfor adjudication. The task waits inawaiting [H]until the user weighs in.- Persistent blocker. When a task can't proceed (failing dependency, environmental issue,
truly stuck on the right approach), it's marked
blockedin the plan with diagnostic notes and the orchestrator moves on to other actionable work. The user resolves the blocker on their schedule; the next next-priority/synthex:next-priority run picks the task back up.
What gates do not cover
A few classes of correctness sit outside the merge gates:
- Production monitoring.
[O]criteria are validated post-deploy, not at merge. The reliability/SRE review can flag SLO concerns pre-merge but doesn't simulate production load. - User research. Whether the feature actually solves the problem stated in the PRD is a Discover-phase question, not a Build-phase gate.
- Broad architectural drift. Each task review is local to its diff. Cross-cutting architectural concerns are caught by the Architect during plan review, not at code-review time.
This is by design — gates that try to validate everything end up validating nothing.
Pre-commit hooks
Project-level pre-commit hooks run alongside the agent gates. They are not optional. If a hook
fails, the Tech Lead investigates and fixes the underlying issue rather than bypassing it.
--no-verify, --no-gpg-sign, and similar flags are forbidden unless the user explicitly
authorizes them for a specific commit — the project's quality bar lives in the hooks, and
silently skipping them defeats the entire model.
Next
- The lifecycle — where quality gates fit in Build and Ship
- Reviewer pipeline — how the reviewer chain runs in detail
- Configuration — tune review loops, severity, and the reviewer roster