AI code review patterns

Table of content

by Ray Svitla


the worst code review is a confident one that’s wrong. a human reviewer who misses a bug is frustrating. an AI reviewer that stamps “LGTM” on a security vulnerability is dangerous — because people trust automated checks differently than they trust Bob from the backend team.

AI code review is useful. genuinely useful. but the patterns matter enormously. a naive “review this PR” prompt produces reviews that are simultaneously thorough-looking and unreliable. here’s what actually works.


multi-pass review

single-pass review is the default: Claude reads the diff, comments on everything it notices. the problem is attention dilution. Claude notices a variable naming issue and a potential null pointer dereference in the same pass, and treats them with similar severity.

multi-pass review separates concerns:

pass 1: security and correctness. only look for things that could break or be exploited. authentication bypasses, SQL injection, unhandled errors that crash the process, race conditions. ignore style entirely.

pass 2: logic and edge cases. does the code handle empty inputs? what happens at integer boundaries? are there off-by-one errors? what if the network call fails? focused attention on “does this do what it claims to do.”

pass 3: design and maintainability. is this the right abstraction? will this be painful to modify in six months? are there simpler approaches? this pass is where opinions live.

pass 4: style and conventions. naming, formatting, import ordering, comment quality. the stuff linters should catch but don’t always.

the order matters. security issues in pass 1 might block the entire PR — no point reviewing style if the code has a vulnerability. each pass has clear criteria, and Claude knows what to ignore during each pass.


confidence scoring

Claude’s biggest weakness as a reviewer: it presents everything with the same level of confidence. a definite bug and a style preference look identical in a standard review comment.

force explicit confidence:

for each issue found, rate your confidence:
- HIGH: I'm certain this is a bug or security issue
- MEDIUM: this looks wrong but I might be missing context
- LOW: this is a suggestion or preference, not a defect

this simple addition transforms review usefulness. developers scan HIGH items first, investigate MEDIUM items when they have time, and treat LOW items as optional. without confidence scoring, everything feels equally urgent and nothing gets prioritized.

the calibration isn’t perfect. Claude occasionally marks real bugs as LOW and style preferences as HIGH. but the signal-to-noise ratio improves dramatically compared to flat-confidence reviews.


adversarial review

the most underused pattern. instead of asking Claude to “review this code,” ask it to break the code.

you are a hostile attacker examining this PR. 
your goal: find inputs, sequences, or conditions 
that cause incorrect behavior, data loss, or security breaches. 
assume the author missed something. find it.

the adversarial framing changes Claude’s behavior measurably. standard review Claude is polite and comprehensive — it finds 20 issues of varying importance. adversarial review Claude is focused and paranoid — it finds 5 issues, and they’re usually the ones that matter.

pair adversarial review with standard review for important PRs. standard review catches breadth. adversarial review catches depth.


diff-aware vs codebase-aware

most AI review tools only see the diff. Claude saw what changed. it didn’t see the function this change calls, the test file that should have been updated, the config file that needs a new entry.

the fix: give Claude more context than just the diff.

here's the PR diff. also read:
- the files that import any changed function
- the test files for changed modules
- the CHANGELOG if it exists

this turns a surface-level review into a contextual review. “you changed the auth middleware but the test still uses the old mock” — this kind of cross-file issue is invisible in a diff-only review and obvious with context.

the tradeoff is token cost. a diff-only review is cheap. a codebase-aware review that reads 20 related files is expensive. for most PRs, a focused expansion (tests + direct callers) is the right balance.


the automation bias problem

the most dangerous pattern isn’t a review pattern — it’s the human pattern of trusting automated reviews too much.

studies on automation bias in other fields (aviation, medicine) show a consistent result: when humans receive automated analysis, they check less carefully themselves. the automation doesn’t need to be perfect to be harmful — it just needs to create a false sense of thoroughness.

practical mitigation:

→ never make AI review the only required review → rotate human reviewers so nobody defaults to “Claude checked it” → track AI review accuracy over time — flag PRs where AI missed something a human caught → periodically review PRs without AI assistance to keep human review skills sharp

the goal is AI review as a force multiplier for human review, not a replacement. the distinction sounds academic until an AI-approved PR takes down production.


what to review with AI and what to review yourself

good for AI: repetitive checks (all errors handled, all inputs validated, all new functions tested), cross-file consistency, security pattern compliance, dependency analysis.

good for humans: architectural decisions, business logic correctness, whether this feature should exist at all, naming that communicates intent, code that will need to change in ways the AI can’t predict.

good for both: anything important enough to check twice.

when was the last time a code review caught something that would have caused a real production issue?


claude code + github actions — automated review setup → spec-driven development — reviewing against specs → agent guardrails — safety beyond review


Ray Svitla stay evolving

Topics: code-review ai patterns quality development