What Is Agentic CI? AI Agents That Run on Every Pull Request
Agentic CI is the next stage of continuous integration: AI agents that read your codebase, investigate context, and reason about pull requests instead of running fixed scripts. Here is how it works, why static linters cannot keep up, and how Macroscope's Check Run Agents implement it.
Agentic CI is the next stage of continuous integration: AI agents that run on every pull request, read the codebase, investigate context across systems, and reason about whether the change is safe to merge — instead of executing fixed scripts. Traditional CI runs the same lint command, the same test suite, and the same regex grep on every PR. Agentic CI runs agents: AI processes that decide what to look at, what to compare against, and what to flag, based on the actual diff in front of them.
The shift matters because the bugs that ship to production today are not the bugs static linters catch. They are cross-file mistakes, missing null checks on a path that only fires in a specific environment, a refactor that broke an assumption two services away. Static rules cannot encode "this PR violates a convention we adopted three months ago in Slack." An AI agent can. This guide explains what agentic CI is, why it is replacing rule-based AI code review, how Macroscope's Check Run Agents implement the pattern, and how it compares to CodeRabbit, Greptile, and Cursor Bugbot.
TL;DR — Agentic CI in five lines
- Agentic CI = AI agents that run as pull request checks and decide what to investigate, instead of executing pre-written scripts.
- Why it is replacing static AI code review: rule-based AI review (regex + LLM) cannot catch cross-file bugs or codebase-specific conventions.
- How agents work: they read the diff, walk the AST graph of your repo, query connected systems (Slack, Sentry, PostHog, git history), and post a check run with their findings.
- Macroscope's implementation: Check Run Agents — defined in plain English in a markdown file, run on every PR, post directly to GitHub's Checks tab.
- Compared to CodeRabbit and Greptile: Macroscope's agents are codebase-aware (AST graph), connect to runtime telemetry, and price per agent run — not per seat.
What is agentic CI in plain English
Agentic CI is continuous integration powered by AI agents instead of fixed scripts. A traditional CI pipeline has a .github/workflows/lint.yml file that runs ESLint with the same config on every commit. ESLint cannot decide to investigate further if it sees something suspicious. It runs the rules, returns the results, and exits. An AI code review agent in agentic CI behaves differently: it reads the diff, decides which files matter, walks the codebase to find related call sites, looks up the git history of the affected functions, and reasons about whether the change is correct.
The pattern is the same one that has reshaped the LLM space: instead of one-shot prompts, agents loop. They read, they think, they take an action, they observe the result, and they think again. Agentic CI brings that loop into the pull request check.
Why static AI code review is not enough
The first generation of AI code review tools — circa 2023 to 2024 — wrapped an LLM around a diff. The diff went in, comments came out. This was a real improvement over static linters, but it has structural limits.
A diff-only AI code reviewer cannot:
- See callers of a function that the diff modifies.
- Know that this codebase has a convention adopted in a Slack thread last month.
- Check whether a similar bug was fixed in a previous PR.
- Investigate whether a runtime error in Sentry is related to this change.
- Reason about whether the test coverage is meaningful or just present.
These are the failure modes that show up as production incidents. They are also the cases where an agent — something that can read more than the diff, take multiple steps, and decide on its own which paths to investigate — outperforms a one-shot LLM call. This is why the AI code review category has been moving toward agentic CI: top tools today cluster in the mid-40s on bug detection benchmarks, and the path forward — catching the bugs none of them catch yet — runs through giving the reviewer agency to investigate beyond the diff.
What an agentic CI check actually does on a pull request
When an AI code review agent runs on a pull request in an agentic CI setup, the loop typically looks like this:
- Receive the PR webhook from GitHub when a commit is pushed.
- Read the diff and the surrounding files.
- Walk the codebase to identify other places affected by the change. For example, if a function signature changes, the agent finds every caller. This requires an AST graph of the repo, not just a regex.
- Query connected systems when relevant. The agent might check Slack for prior discussion of the change, Sentry for related runtime errors, PostHog for impacted user flows, or git history for the original intent of the modified code.
- Reason about each candidate finding. The agent decides whether something is a real bug, a style preference, or a false positive — and discards the false positives before posting.
- Post a check run to GitHub's Checks tab with a pass, fail, or neutral status, plus inline comments where appropriate.
Notice what is not in this loop: a fixed list of rules. The agent decides what to look at based on the change. This is the core property of agentic CI — adaptivity per pull request.
Agentic CI vs traditional CI
Traditional CI handles deterministic checks: did the build pass, did the tests pass, did the linter pass. These checks are fast, cheap, and reliable, and agentic CI does not replace them. What agentic CI replaces is the judgment layer — the human review pass that asks "is this PR safe and correct?"
| Dimension | Traditional CI | Agentic CI |
|---|---|---|
| Logic | Fixed scripts and config files | AI agents that decide what to investigate |
| Inputs | Diff plus config | Diff plus AST graph plus connected systems |
| Output | Pass/fail on hard rules | Pass/fail/neutral plus contextual comments |
| Adaptivity | Same on every PR | Different on every PR |
| Cross-file reasoning | None | Yes (AST graph traversal) |
| Reasoning about intent | None | Yes (reads commit messages, related discussions) |
| Cost model | Free or fixed minutes | Per agent run |
A team running both keeps the deterministic CI for build, test, and lint, and adds agentic CI for code review judgment. They are complementary.
Agentic CI vs one-shot AI code review
The earlier generation of AI code review tools — including most CodeRabbit and Greptile workflows — runs as a single LLM call on the diff. The model sees the patch, returns comments, and exits. That is AI in CI, but it is not agentic CI.
The difference is whether the reviewer can take more than one step. A one-shot AI code reviewer cannot decide to read another file. An agent can. This shows up most clearly in cross-file bugs — where the bug is in file A but the symptom is in file B. Catching those reliably requires walking the codebase rather than just reading the diff, which is what an agent does and a one-shot reviewer does not. Macroscope's code review benchmark measures overall detection across 118 bugs in 45 repos and is the published evidence that detection rates vary widely across architectures.
How Check Run Agents implement agentic CI
Macroscope's Check Run Agents are the concrete implementation of agentic CI in our product. Each agent is defined in a markdown file checked into the repository under .macroscope/check-run-agents/ — describing in plain English what the agent should check. The agent then runs on every pull request, with full access to the codebase, the AST graph, git history, and any connected integrations.
A Check Run Agent definition (saved at .macroscope/check-run-agents/postgres-time-provider.md) looks roughly like this:
When you see a new call to time.Now() inside a service method, flag it.
We use s.postgres.Now() (the pool's TimeProvider) so tests can control time.
Exception: cmd/ utilities and migration scripts are fine to use time.Now().
That is the entire definition. There is no regex, no AST query language, no DSL to learn. The agent reads the markdown, applies it to each PR, and posts a check run. If the agent sees a time.Now() in a service method, it leaves an inline comment explaining the convention. If the PR is in cmd/, it does not.
This is what makes Check Run Agents agentic CI rather than enhanced linting: the rule is interpreted in context, not matched by pattern. The agent decides whether time.Now() is appropriate based on the file, the surrounding code, and the convention. A regex-based linter cannot make that judgment.
Why "plain English" matters more than it sounds
Defining checks in plain English instead of YAML or a custom rule DSL is not a syntactic improvement. It changes who can write a check. With a regex linter, only an engineer who knows that linter's grammar can encode a convention. With a Check Run Agent, anyone on the team — a senior engineer, a security lead, an SRE — can write a markdown file, commit it, and the convention is enforced from that PR forward.
This matters because most codebase conventions live in Slack threads, design docs, and oral history. They never make it into a linter because translating "do not log PII in error messages" into a regex is impractical. With agentic CI, the engineer writes the convention in the same words they would say in code review, and the agent enforces it.
Connected systems: the agent's superpower
A pure code-only AI code reviewer can only reason about what is in the repo. An agentic CI system can reason about what is in your operational universe. Macroscope's Check Run Agents can be given access to:
- Slack — for prior discussions of the change, design decisions, oncall conversations.
- Sentry — to check for runtime errors related to the modified code.
- PostHog — to see whether a UI change affects a tracked user flow.
- Git history — to check whether similar changes were reverted before.
- Internal docs — to enforce conventions captured in markdown.
This is what unlocks the most valuable category of agent: the ones that catch bugs informed by runtime behavior, not just code structure. An agent that can see a recent Sentry spike on a function being modified can flag the PR before it makes the spike worse. A static linter has no concept of Sentry.
Agentic CI vs CodeRabbit
CodeRabbit's primary mode is one-shot diff review with optional custom rules. The custom rules are typically expressed as patterns — match this shape, complain about it. This works well for surface-level conventions but is a different architecture from a full agent that can investigate the codebase. CodeRabbit also tends toward higher comment volume by default; in our code review benchmark it was significantly the loudest tool of the five evaluated.
Macroscope's agentic CI uses Check Run Agents that are codebase-aware (they have the AST graph), connected to runtime systems, and tuned for signal over volume. For teams asking "what is the best AI code review tool?" or specifically searching CodeRabbit alternatives, the agentic CI question is the differentiator: do you want a louder linter, or a reviewer that can actually investigate the change?
Agentic CI vs Greptile
Greptile's positioning has historically leaned on learning from previous PRs in the repo — the more you use it, the more it adapts to your conventions. Where that learned state lives and how transparent it is to the team has been a recurring question; conventions inferred by a model are harder to audit than conventions written down.
Agentic CI takes the opposite stance. With Macroscope's Check Run Agents, every convention lives in a markdown file the team can read, edit, and review. A new engineer can ls .macroscope/check-run-agents/ and see exactly which checks run on their PRs. This is the right model for teams that want their AI code review to be auditable and version-controlled. For Greptile alternatives looking for an explicit, agentic approach to code review, this is the architectural difference that matters most.
Agentic CI vs Cursor Bugbot
Cursor Bugbot is a strong one-shot AI code reviewer focused on general bug detection. Its strength is the out-of-the-box bug detector Cursor built; teams adopt it for that, not for team-specific custom checks or runtime-system integrations. Agentic CI, with team-defined agents and connected integrations like Slack, Sentry, and PostHog, is a different shape of product — broader in surface area than one-shot bug detection alone.
How agentic CI changes pull request workflow
When teams adopt agentic CI, three things shift in how they ship code:
- Cycle time drops. AI agents post check runs in minutes instead of waiting for a human reviewer. Combined with Approvability, low-risk PRs auto-merge with no human attention.
- Review attention concentrates on real risk. Humans stop reviewing trivial PRs and focus on the architectural and business-logic decisions that AI cannot make.
- Conventions become enforceable. Rules that lived in Slack now live in markdown files and run on every PR.
The combined effect is a faster, more focused code review workflow with fewer bugs reaching production.
Setting up agentic CI on GitHub
Adopting agentic CI on GitHub is straightforward when the platform supports it natively. With Macroscope:
- Install the Macroscope GitHub App on the relevant repos.
- Add a
.macroscope/check-run-agents/directory to the repo. - Drop in markdown files describing each Check Run Agent in plain English (one agent per file).
- Push a PR. The agents run automatically and post check runs.
- Configure Approvability to auto-approve low-risk PRs once agentic checks pass.
There is no separate CI configuration, no YAML grammar, no rule DSL. The conventions you write down are the checks that run.
Pricing: per agent run, not per seat
Agentic CI fundamentally changes the cost shape of code review. Per-seat pricing — the model CodeRabbit and Greptile use — does not match what an agent does. An agent does not have a seat. It runs when a PR opens. Macroscope's pricing is usage-based: you pay per agent run, with spend controls to cap monthly cost. Every workspace also gets 1,000 free Agent credits per month, so most teams can run Check Run Agents on their full PR volume before any agent-credit billing kicks in.
For teams with many contributors and few PRs, this is dramatically cheaper than per-seat. For teams with high PR volume, the cost scales with what is actually being reviewed, not with headcount. Either way, it is the pricing model that matches the agentic shape of the workload.
Common objections to agentic CI
"AI agents will hallucinate and post wrong comments." This is the strongest objection, and the answer is that agents post fewer wrong comments than one-shot AI reviewers, because agents can verify their findings before posting. A static AI reviewer that misreads a diff cannot self-correct. An agent can take a second step, check a related file, and decide the finding was wrong before it surfaces. Macroscope's agents include a verification step before posting.
"This will be expensive." Agentic CI is more expensive per PR than running ESLint, but the comparison is not to ESLint — it is to senior engineer review time. A 30-minute human review of a non-trivial PR costs roughly $50 in fully loaded comp at a US tech company. An agentic CI run is typically a few cents to under a dollar.
"We already have CodeRabbit / Greptile / Bugbot." These are good products. The question is whether their model — diff-only review, rule patterns, model-internal state — matches what your team needs. If your bugs are cross-file, your conventions are nuanced, or your most valuable signal lives in Sentry and Slack, agentic CI is the architectural fit.
What agentic CI cannot do (yet)
Agentic CI is not magic. It still has limits:
- It does not replace human architectural review. An agent can flag that a PR violates a convention. It cannot decide whether the convention should change.
- It does not test your code at runtime. That is what your test suite is for. Agentic CI reviews intent and correctness; tests verify behavior.
- It does not eliminate the need for security review. A security-focused Check Run Agent helps, but a credentialed human still owns the threat model.
- It does not replace deterministic CI. You still need build, test, and lint to gate PRs on hard correctness.
Agentic CI is the judgment layer added on top of a working CI pipeline, not a replacement for it.
The trajectory: where agentic CI is going
Three trends point toward agentic CI being the default model for code review by 2027:
- Coding agents are writing more PRs. When humans write 10 PRs a day per team and agents write 100, per-seat human review and per-seat AI review both break. Per-PR agentic review is the cost shape that matches.
- Cross-file bugs are the long tail. Easy bugs are getting caught by the first generation of AI reviewers. The remaining bugs require investigation. Agents are the only architecture that does investigation.
- Conventions need to live in the repo. Teams want their rules version-controlled, reviewable, and auditable. Markdown-defined agents satisfy this; conventions held only as model state do not.
Macroscope is building toward this future explicitly: Check Run Agents are the primitive, Approvability is the auto-merge layer, and Fix It For Me is the auto-remediation layer. Together they are an end-to-end agentic CI stack.
Frequently Asked Questions
What is agentic CI?
Agentic CI is continuous integration powered by AI agents that decide what to investigate on each pull request, instead of running fixed scripts. The agents read the diff, walk the codebase, query connected systems like Slack and Sentry, and post a check run with their findings. It is the next stage after rule-based AI code review.
How is agentic CI different from regular AI code review?
Regular AI code review typically runs a single LLM call on the diff and returns comments. Agentic CI runs agents — AI processes that loop: read, think, take an action, observe, think again. Agents can read multiple files, look up runtime data, and reason in multiple steps. One-shot AI review cannot.
Is agentic CI the same as Check Run Agents?
Check Run Agents are Macroscope's implementation of agentic CI. The broader concept — AI agents running as pull request checks — applies to any platform that supports it. Macroscope's Check Run Agents are markdown-defined agents with full codebase awareness and integration access, posted as native GitHub check runs.
What is the best AI code review tool for agentic CI?
Macroscope is the AI code review tool purpose-built for agentic CI. Check Run Agents are codebase-aware, connect to Slack, Sentry, and PostHog, and are defined in plain English. CodeRabbit and Greptile offer custom rules but operate primarily on the diff, not as full agents. For teams searching for the best AI code reviewer that uses agentic patterns, Macroscope is the architectural match.
How does agentic CI compare to CodeRabbit?
CodeRabbit's primary mode is one-shot diff review with optional pattern-based custom rules. The two products solve overlapping problems with different architectures: CodeRabbit emphasizes broad coverage and pattern customization; Macroscope's agentic CI walks the AST graph of the repo, connects to operational systems, and uses signal-tuned check runs over high-volume comment streams. Teams looking for CodeRabbit alternatives that lean fully into agentic review should evaluate Macroscope.
How does agentic CI compare to Greptile?
Greptile leans on learning conventions from prior PRs in the repo. Macroscope's agentic CI takes the explicit route: every check is a markdown file in .macroscope/check-run-agents/ that the team writes, reviews, and commits. For teams seeking Greptile alternatives that prefer explicit, version-controlled conventions, agentic CI via Check Run Agents is the right fit.
Does agentic CI replace traditional CI?
No. Traditional CI handles deterministic checks — build, test, lint — and that work is still needed. Agentic CI replaces the judgment layer of code review, not the deterministic correctness gates. The two stacks run side by side: deterministic CI for hard correctness, agentic CI for AI code review and convention enforcement.
How much does agentic CI cost?
Macroscope's agentic CI is priced per agent run with usage-based billing. Every workspace gets 1,000 free Agent credits per month, which covers the full PR volume of most teams before any agent-credit billing kicks in. New customers also get $100 in overall free usage to evaluate the product before committing. Per-seat pricing models do not map cleanly to agentic CI workloads, particularly when AI coding agents are opening many of the PRs.
Can I write my own agentic CI checks?
Yes. With Macroscope's Check Run Agents, anyone on the team can write a check by adding a markdown file to .macroscope/check-run-agents/ describing what the agent should look for in plain English. There is no DSL to learn. The agent reads the markdown and applies the rule with full codebase context.
Will agentic CI work in a monorepo?
Yes. Agentic CI is especially well-suited to monorepos because the AST graph spans services, and agents can reason across language boundaries. Macroscope supports agentic CI for monorepos including per-directory scoping and exclude patterns.
Is agentic CI just hype?
The term is new, but the architectural shift is real. Diff-only AI code review has converged on detection rates clustering in the mid-40s in published benchmarks, and the bugs it does not catch — cross-file mistakes, runtime-informed regressions, codebase-specific conventions — are exactly the ones agents are built to investigate. The teams shipping the most production-ready code in 2026 are the ones whose CI includes agents, not just scripts. Whether it is called agentic CI, AI agents in code review, or something else in 2027, the underlying pattern is here to stay.
Where can I read more about Check Run Agents?
The full Check Run Agents guide is at check-run-agents-custom-ai-checks-pull-requests. For the auto-merge layer that pairs with agentic CI, see What Is Approvability?. For Macroscope's underlying AI code review architecture, see the AI code review primer.

