Continuous Review, Testing & CI-Doctor | GitHub Agentic Workflows: An Interactive Book

Objective

By the end of this chapter you can close the repository's quality loop with four more patterns — Review, Testing, CI-Doctor, and Refactoring — while keeping humans firmly on the merge decision. This is the second half of the Continuous-X library and the close of Part II.

Everything targets gh aw v0.81.6. The Repo Assistant graduates from tending the issue tracker to helping tend the code.

Concept: closing the quality loop

A repository's quality has a loop: code is proposed (a PR), reviewed, tested, merged, and — when something slips — fixed. Traditional CI automates the deterministic checks in that loop: does it compile, do the tests pass, does the linter approve. But the judgement steps — is this a good change? is this test worth adding? why did CI actually break? — still wait on a human.

Continuous-X patterns fill exactly those judgement gaps. Where Chapter 9 kept the inbox honest, these keep the codebase honest — each one a mini-product owning one link in the quality loop.

The one rule that makes it safe: humans keep the merge

The defining constraint of quality automation is that the agent proposes; a human disposes. A review agent comments, it doesn't approve. A test-improver opens a draft PR, it doesn't push to main. This isn't timidity — it's what lets you run these patterns at all. The agent accelerates the work up to the decision point and stops, leaving the irreversible call to a person. That's the human-in-the-loop principle, and it's why the safe-outputs boundary from Chapter 6 matters most here.

In gh-aw: the Review, Testing, CI-Doctor, and Refactoring recipes

Four patterns, each triggered by a different moment in the quality loop — and each writing through a safe output that stops short of merging.

Pattern	Trigger	Writes via
Review	`pull_request`	`submit-pull-request-review` (COMMENT only)
Testing	`schedule`	`create-pull-request` (draft)
CI-Doctor	`workflow_run` (CI failed)	`add-comment` / `create-issue`
Refactoring	`schedule` or command	`create-pull-request` (draft)

Review: comment, never approve

The Review pattern reads a PR diff and leaves inline feedback. The critical setting is allowed-events: [COMMENT], which “prevents the agent from submitting APPROVE reviews regardless of what the agent attempts to output” — the docs explicitly recommend it as “the default for automated review workflows… without creating a persistent merge-blocking state” (Safe Outputs). Infrastructure enforces the human-keeps-the-merge rule.

CI-Doctor: react to the failure

CI-Doctor is the elegant use of the workflow_run trigger from Chapter 4 with conclusion filtering: fire only when a named CI workflow finishes with failure, read the logs, and post a diagnosis. Because workflow_run is hardened against cross-repo abuse, this stays safe even on public repos.

The CI-Doctor trigger — wake only on a real CI failure

on:
  workflow_run:
    workflows: ["CI"]
    types: [completed]
    conclusion: [failure]     # only when CI actually broke
    branches: [main]

Testing & Refactoring: propose a diff

Both run on a schedule, do focused work, and open a draft create-pull-request. Testing adds coverage without touching production code; Refactoring makes a small, behavior-preserving cleanup. Draft PRs keep the human on the merge, exactly as in the Docs pattern.

When to automate quality (and where humans stay in the loop)

Quality patterns pay off when they act as a tireless first pass — catching the obvious before a human spends attention, never replacing the human's final say. The line to hold: automate the noticing and the drafting; reserve the deciding.

Agent may…	Human keeps…
comment on a PR, flag risks	approve / request changes / merge
open a draft test or refactor PR	review and merge that PR
diagnose a CI failure, file an issue	decide the fix and ship it

When not to

Don't let a review agent block merges. Auto REQUEST_CHANGES creates a persistent merge-blocking state from a fallible model. Keep allowed-events: [COMMENT] unless a human explicitly wants gating.
Don't let the test-improver edit production code. Instruct it to add tests only; a PR that “fixes” code to make a test pass is the opposite of what you want.
Don't auto-merge agent PRs. The draft PR is the human's decision point — automating the merge throws away the one safeguard that makes this safe.
Don't run a refactoring agent on a repo without good tests. “Behavior-preserving” is only verifiable if the tests can prove it. Ship Testing before Refactoring.

Worked example: a PR-review plus daily-test-improver pair

Two complementary quality agents: one reacts to every PR, the other proactively strengthens the tests. Both compile cleanly, and both stop short of the merge.

examples/ch10/continuous-review.md — comment-only PR review (compiles: 0/0)

on:
  pull_request: { types: [opened, synchronize] }
permissions: { contents: read, pull-requests: read }
engine: copilot
network: { allowed: [defaults, github] }
tools:
  github: { toolsets: [pull_requests] }
safe-outputs:
  create-pull-request-review-comment: { max: 10 }
  submit-pull-request-review:
    allowed-events: [COMMENT]     # can never approve or block
    max: 1

examples/ch10/daily-test-improver.md — proposes tests as a draft PR (compiles: 0/0)

on:
  schedule: daily
  workflow_dispatch:
permissions: { contents: read }
engine: copilot
network: { allowed: [defaults, github, node] }
tools:
  github: { toolsets: [repos] }
  bash: ["npm ci", "npm test", "npx jest", "npx vitest run"]   # scoped shell
  edit:
safe-outputs:
  create-pull-request: { title-prefix: "[tests] ", labels: [tests, automated], draft: true }

The review agent holds read-only PR access and can only emit a COMMENT review — the allowed-events setting makes “never block a merge” an infrastructural guarantee, not a hope. The test-improver gets a scoped shell to run the suite and edit to write tests, but its sole output is a draft PR a human reviews. Both accelerate the work right up to the human's decision, then hand it over.

Verifying both examples

gh aw compile examples/ch10/continuous-review.md
# ✓ examples\ch10\continuous-review.md (101.8 KB) — 0 error(s), 0 warning(s)
gh aw compile examples/ch10/daily-test-improver.md
# ✓ examples\ch10\daily-test-improver.md (103.8 KB) — 0 error(s), 0 warning(s)

Recap & what's next

You've closed the quality loop — and Part II:

Quality automation fills the judgement gaps CI can't: is this change good, is this test worth adding, why did CI break.
Four patterns — Review (PR, comment-only), Testing (scheduled draft PR), CI-Doctor (workflow_run on failure), Refactoring (scheduled draft PR).
The unbreakable rule is humans keep the merge — enforced by allowed-events: [COMMENT] and draft: true, not just by convention.
Automate the noticing and drafting; reserve the deciding. Ship Testing before Refactoring, and never auto-merge an agent's PR.

What's next. You now have a shelf of patterns — and you're about to notice how much they repeat. Part III scales from one repo to an org. Chapter 11: Reuse & Memory factors the shared parts into imported components and gives the Repo Assistant memory that persists across runs.