Check Run Agents: Custom AI Checks That Actually Understand Your Codebase
Check Run Agents are fully customizable AI agents that trigger on every pull request. Define what to check in plain English, give them access to your codebase, git history, Slack, Sentry, and more — and they run as proper GitHub check runs.
Check Run Agents are fully customizable AI agents from Macroscope that trigger on every GitHub pull request. You define what to check in a markdown file using plain English, give them access to your codebase, git history, and connected integrations like Slack, Sentry, and PostHog, and they run as proper GitHub check runs in the Checks tab. Check Run Agents represent a new category of CI tooling: agentic CI — where your checks can investigate, reason, and take action rather than just execute scripts.
TL;DR — What are Check Run Agents?
- What: AI-powered GitHub check runs you define in plain English markdown files
- Where:
.macroscope/directory in your repo root — one.mdfile per check- Trigger: Every PR open, push, and manual rerun
- Tools: Browse code, git history, Sentry, Slack, PostHog, LaunchDarkly, BigQuery, Jira, Linear, MCP, and more
- Output: GitHub Checks tab, inline PR comments, and top-level PR comments
- Cost: Free during beta. Will bill under Agent usage when GA
- Access: Beta — contact support@macroscope.com
What Are Check Run Agents?
Check Run Agents are AI agents that run as GitHub check runs on every pull request. Each agent is defined by a single markdown file in your repository's .macroscope/ directory. You write what the agent should check in natural language — the same way you'd explain a review standard to a senior engineer — and Macroscope runs that agent automatically on every PR.
Check Run Agents are part of Macroscope's AI code review platform — the best AI code reviewer for GitHub pull requests. Macroscope already runs two built-in check runs on every GitHub PR review:
- Correctness — catches runtime bugs, logic errors, and regressions. Includes Fix It For Me, which automatically opens fix PRs for detected issues — making Macroscope both an AI code reviewer and an AI code fixer.
- Approvability — evaluates whether the PR is safe to merge and can auto-approve safe PRs
Check Run Agents let you add unlimited custom checks on top of these built-in checks. Each agent runs independently and reports its findings directly in the GitHub Checks UI — the same place your CI tests, linters, and deployment checks appear.
The key difference between Check Run Agents and traditional CI checks is that Check Run Agents can investigate. They don't just run a script and pass/fail. They browse your codebase, query git blame, read related files, check Sentry for production errors, verify feature flags in LaunchDarkly, and post summaries to Slack — all within a single check run.
Why Do Engineering Teams Need Check Run Agents?
Every engineering team has standards that existing tools can't enforce. Check Run Agents exist to fill three gaps in the current CI landscape:
Gap 1: Standards that require judgment. "If a PR touches the payments flow, check Sentry for active errors." "If someone modifies the API schema, make sure the changelog is updated." "If a new React component is added, verify it has accessible labels." These are judgment calls. Linters can't make them. Scripts are too brittle to maintain. Check Run Agents handle them naturally.
Gap 2: Cross-system verification. Modern code review doesn't happen in isolation. You need to cross-reference Sentry errors, feature flag states, analytics events, production logs, and issue trackers. Check Run Agents have native access to all of these through their tool system. No custom webhooks. No glue scripts. No separate integrations to maintain.
Gap 3: Standards that evolve. When your team's conventions change — and they always do — you update a markdown file. Not a YAML pipeline. Not a custom GitHub Action. Not an ESLint plugin. The agent reads the new instructions on the next PR.
How Do Check Run Agents Work?
Check Run Agents work in three steps: define, trigger, and report.
Step 1: Define the Agent
Create a .md file in .macroscope/ at your repository root. The filename determines the check name. For example, .macroscope/security-review.md creates a check run called "Security Review" in your GitHub Checks tab.
Each file has two parts:
- Frontmatter (optional) — YAML configuration controlling the model, effort level, tools, and scoping
- Instructions (required) — Plain English description of what the agent should check
Step 2: The Agent Triggers
When a pull request is opened, updated (push), or manually rerun, Macroscope reads every .md file in .macroscope/ on your default branch and launches an AI agent for each one. Each agent receives the PR diff and begins investigating according to your instructions.
Step 3: The Agent Reports
Check Run Agent results appear in three places:
| Output Location | What Shows Up | Best For |
|---|---|---|
| Check run details (Checks tab) | Full investigation report | Comprehensive findings, tables, summaries |
| Inline PR comments | Line-level annotations on diff | Specific code issues with file + line reference |
| PR issue comments | Top-level comments on the PR | Broader findings, notifications, summaries |
What Configuration Options Do Check Run Agents Support?
Check Run Agents support the following frontmatter configuration fields:
| Field | Default | Options | What It Controls |
|---|---|---|---|
title | Filename-derived | Max 60 chars | Display name in GitHub Checks UI |
model | claude-opus-4-6 | claude-opus-4-5, claude-opus-4-6, gpt-5-2 | AI model powering the agent |
reasoning | low | off, low, medium, high | Extended thinking depth |
effort | low | low, medium, high | How deeply the agent investigates |
input | full_diff | full_diff, code_object | How the PR diff is processed |
tools | Default set | See tools list below | Agent capabilities and integrations |
exclude | none | Glob patterns | Files to skip (e.g., "*.go", "tests/**") |
conclusion | neutral | neutral, failure | Maximum severity — failure can block merges |
Input modes explained:
full_diff— One agent processes the entire PR diff. Lower cost. Best for PR-level checks like "is the changelog updated?" or "do all new endpoints have docs?"code_object— Up to 20 agents run in parallel, one per changed code object (function, class, method). Higher cost. Best for per-unit enforcement like "does every new function have a docstring?"
What Tools Can Check Run Agents Access?
Check Run Agents have access to a rich set of tools that make them truly agentic. This is the core differentiator — no other AI code review tool gives custom checks this level of integration access.
Default Tools (Always Available)
| Tool | Capability |
|---|---|
browse_code | Explore file tree, read files, search by filename or content |
git_tools | Git log, blame, diff, grep — full git history access |
github_api_read_only | Read issues, labels, PR metadata, commit statuses |
modify_pr | Update PR title/description/labels/assignees, post line-level review comments |
Optional Integration Tools
| Tool | Requires | What the Agent Can Do |
|---|---|---|
slack | Slack connection | Post messages and findings to channels |
sentry | Sentry connection | Check for active errors related to modified code |
posthog | PostHog connection | Query product analytics data |
launchdarkly | LaunchDarkly connection | Check feature flag states and targeting |
bigquery | BigQuery connection | Run queries against your data warehouse |
amplitude | Amplitude connection | Query product analytics |
gcp_cloud_logging | GCP connection | Search production logs |
issue_tracking_tools | Jira or Linear | Read and create issues |
web_tools | None | Fetch and parse web pages |
mcp | MCP server connection | Connect to any MCP-compatible server |
This tool access is what makes Check Run Agents fundamentally different from custom rules in CodeRabbit, Qodo, or Greptile. Those tools can check your diff against a pattern. A Check Run Agent can check your diff, look up the last 90 days of Sentry errors for the modified file, verify the relevant feature flag in LaunchDarkly is targeting the correct users, query PostHog for the conversion impact of the changed flow, and post a summary to your team's Slack channel — all in one check run.
Check Run Agent Examples
Example 1: Web Team Standards Review
This Check Run Agent enforces frontend standards, checks Sentry for production errors on modified files, auto-labels PRs, and notifies Slack when critical issues are found:
---
title: Web Review
model: claude-opus-4-6
effort: medium
input: full_diff
tools:
- browse_code
- git_tools
- modify_pr
- slack
- sentry
exclude:
- "*.go"
- "*.proto"
- "schema/**"
- "services/**"
---
Review this PR against our web team's standards:
## Event Tracking
If this PR touches payment flows, signup funnels, analytics calls,
CTA buttons, or redirect logic, check whether it could break event
tracking. Rate each issue: 🔴 will stop firing, 🟡 may fire
incorrectly, 🟢 low risk.
## Accessibility
Check new or modified React components for basic accessibility:
- Images must have alt text
- Buttons and links must have accessible labels
- Form inputs must have associated labels
## Production Errors
For each file modified, check Sentry for unresolved issues. If any
active errors exist, list them with frequency and last seen date.
## Labels
Add labels to this PR based on what changed:
- "frontend" if any UI components are modified
- "styles" if CSS or styled-components changed
- "docs" if only markdown files changed
## Notifications
If any 🔴/🟡 event tracking issues or accessibility violations are
found, post a summary to #eng on Slack with the PR link.
If nothing noteworthy is found, report that all checks passed.
What this single Check Run Agent replaces:
- A custom ESLint plugin for event tracking patterns
- An accessibility linter (which still can't check semantic labeling contextually)
- A Sentry integration script in your CI pipeline
- A GitHub Action for auto-labeling
- A Slack webhook for notifications
That is five separate tools replaced by one markdown file.
Example 2: API Contract Enforcement
This Check Run Agent verifies API changes are backward-compatible and properly documented:
---
title: API Contract Check
model: claude-opus-4-6
effort: high
input: full_diff
tools:
- browse_code
- git_tools
- modify_pr
exclude:
- "*.css"
- "*.mdx"
- "tests/**"
---
Check this PR for API contract compliance:
1. If any API endpoint is added or modified, verify the OpenAPI spec
is updated to match.
2. If request/response types changed, check whether existing clients
would break (backward compatibility).
3. If a new endpoint is added, verify it follows our naming convention
(kebab-case paths, plural resource names).
4. Check that all new endpoints have rate limiting middleware applied.
Format findings as a table:
| File | Issue | Severity | Suggestion |
Example 3: Security Review with Merge Blocking
This Check Run Agent performs a security review and blocks merges when critical issues are found:
---
title: Security Review
model: claude-opus-4-6
reasoning: high
effort: high
input: full_diff
conclusion: failure
tools:
- browse_code
- git_tools
- modify_pr
---
Perform a security review of this PR:
- Check for hardcoded secrets, API keys, or credentials
- Verify authentication middleware on new routes
- Check for SQL injection, XSS, or command injection vectors
- Verify input validation on user-facing endpoints
- Flag any new dependencies and check for known vulnerabilities
If any HIGH severity issue is found, the check MUST fail.
Format: 🔴 HIGH / 🟡 MEDIUM / 🟢 LOW with file paths and line numbers.
The conclusion: failure setting is critical here — it makes the check run fail in GitHub's Checks UI. Combined with branch protection rules, this blocks merges until the security issues are resolved.
How Do Check Run Agents Compare to Other Tools?
Check Run Agents vs. GitHub Agentic Workflows
| Check Run Agents | GitHub Agentic Workflows | |
|---|---|---|
| Purpose | Pull request review enforcement | General repository automation |
| Definition format | Markdown (.macroscope/*.md) | YAML (.github/workflows/) |
| Runs on | Macroscope infrastructure | GitHub Actions (consumes minutes) |
| Code review context | Native access to diff, codebase graph, review history | Must reconstruct from Actions context |
| External integrations | 10+ built-in (Slack, Sentry, PostHog, etc.) | Via Actions marketplace or custom scripts |
| Status | Beta | Technical preview |
GitHub Agentic Workflows are powerful for broad repository automation — issue triage, documentation updates, CI failure analysis. Check Run Agents are purpose-built for PR review enforcement with native code review context.
Check Run Agents vs. CodeRabbit Custom Rules
| Check Run Agents | CodeRabbit Custom Rules | |
|---|---|---|
| Definition | Natural language instructions per agent | YAML config file (.coderabbit.yaml) |
| Agent behavior | Autonomous investigation — browses code, queries git, calls external services | Configuration-based — adjusts review behavior per path |
| External integrations | Slack, Sentry, PostHog, LaunchDarkly, BigQuery, Jira, Linear, MCP | None from custom rules |
| Actions | Post comments, add labels, update PR, send Slack messages | Post review comments |
| Granularity | Unlimited agents per repo, each with different tools/scope | One config file per repo |
CodeRabbit's custom rules tell the tool "be stricter about security in the auth/ directory." Check Run Agents say "for every modified file in auth/, look up the last 90 days of Sentry errors, cross-reference with the deployment log, check the relevant LaunchDarkly flag state, and post a summary to #security on Slack."
Check Run Agents vs. Custom GitHub Actions
| Check Run Agents | Custom GitHub Actions | |
|---|---|---|
| Setup time | Minutes (write markdown) | Hours to days (write/maintain code) |
| Maintenance | Update markdown file | Update code, prompts, error handling, model APIs |
| Infrastructure | Managed by Macroscope | Self-managed (Actions compute, API keys, secrets) |
| Model routing | Automatic (choose via frontmatter) | Manual (manage API keys, handle model changes) |
| Tool orchestration | Built-in (10+ integrations) | Build it yourself |
| Cost management | Built into Macroscope billing | Track separately (Actions minutes + model API costs) |
You can build custom AI review logic in GitHub Actions. Open-source projects like claude-pr-reviewer and PR-Agent take this approach. The trade-off is ongoing maintenance of infrastructure, prompt engineering, model selection, output formatting, and error handling. With Check Run Agents, you maintain a markdown file. Macroscope handles everything else.
Check Run Agents vs. Greptile
| Check Run Agents | Greptile | |
|---|---|---|
| Custom checks | Unlimited agents, each with full tool access | Learns from team's PR comments over time |
| External integrations | Slack, Sentry, PostHog, LaunchDarkly, BigQuery, Jira, Linear, MCP | None from custom rules |
| Definition | Explicit markdown instructions per check | Implicit learning from reviewer behavior |
| Determinism | Same instructions = consistent enforcement | Behavior drifts as it learns new patterns |
| GitHub integration | Native check runs in Checks tab | PR comments only |
| Pricing | Usage-based (free in beta) | Seat-based |
Greptile takes a different approach — it learns your team's standards by observing PR comments over time. The upside is zero configuration. The downside is you can't explicitly define what gets checked, and the learned behavior can drift. Check Run Agents give you explicit, auditable control: each agent's instructions are a markdown file in your repo that anyone on the team can read, review, and update.
For teams evaluating Greptile alternatives or CodeRabbit alternatives for GitHub PR review, Check Run Agents offer a fundamentally different approach: explicit agentic checks with tool access rather than implicit pattern learning or configuration toggles.
Check Run Agents vs. Semgrep and Static Analysis
Check Run Agents are complementary to static analysis, not a replacement. Semgrep, ESLint, and golangci-lint are fast, deterministic, and excellent for pattern-based rules. Use them for what they're good at — import ordering, no console.log in production, no eval(), formatting enforcement.
Check Run Agents handle what static analysis cannot:
- Business logic validation — "Does this payment flow handle all currency edge cases?"
- Cross-system verification — "Are there active Sentry errors for this file?"
- Contextual judgment — "Is this architectural change consistent with the team's migration plan?"
- Natural-language standards — "Does this PR follow our API naming conventions?"
The best engineering setups use both: static analysis for deterministic rules, Check Run Agents for investigative checks.
How Do Check Run Agents Fit into GitHub Code Review?
Check Run Agents integrate directly into the GitHub pull request review workflow that your team already uses. When a developer opens a GitHub PR, Macroscope's AI code review runs automatically — the built-in Correctness and Approvability checks plus any custom Check Run Agents you've defined. Results appear in the same Checks tab as your CI tests, linting, and deployment checks.
This means your GitHub code review process becomes a three-layer system:
- AI code review (Macroscope built-in) — Catches runtime bugs, logic errors, and evaluates merge readiness across the full codebase graph. This is the best AI code reviewer for catching issues that span multiple files and functions.
- Custom Check Run Agents — Enforce team-specific standards, cross-reference external systems, and automate judgment calls that no linter or CI script can handle.
- Human review — Engineers focus on architecture, design decisions, and business logic — the high-value work that AI can't replace.
For teams searching for the best AI code review tool or evaluating CodeRabbit alternatives and Greptile alternatives, Check Run Agents are the key differentiator. No other GitHub code reviewer gives you this level of customizable, agentic enforcement with integrated access to Sentry, Slack, PostHog, and your issue tracker.
How to Write Effective Check Run Agent Instructions
The quality of a Check Run Agent's output depends on the quality of your instructions. Here are the patterns that work:
Be specific, not vague. "Check for security issues" is too broad. "Check for SQL injection in any function that takes user input and constructs a database query" gives the agent a clear target.
Define severity levels explicitly. "🔴 means this will break in production. 🟡 means it might cause issues under specific conditions. 🟢 means it's a suggestion for improvement."
Scope aggressively with exclude. A web review agent doesn't need to process Go backend files. An API contract check doesn't need to look at CSS. Use exclude to keep agents focused and costs low.
Permit "nothing found" reports. Explicitly tell the agent it's okay to report that all checks passed. Without this, agents sometimes stretch to find issues that aren't there.
Use markdown headings to organize. Each heading becomes a distinct investigation area. The agent treats ## Event Tracking and ## Accessibility as separate tasks within the same check run.
Don't duplicate Correctness. Macroscope's built-in Correctness check already catches runtime bugs and logic errors. Your custom Check Run Agents should focus on team-specific standards.
Describe output format. Want a markdown table? A checklist? Emoji-coded severity? Tell the agent. "Format findings as a table with columns: File, Issue, Severity, Suggestion."
What Is Agentic CI?
Agentic CI is a new approach to continuous integration where checks can investigate, reason, and take action — not just execute scripts. Traditional CI is procedural: run this script, check this condition, pass or fail. Agentic CI is investigative: here's what I care about, go figure out if this PR is safe.
Check Run Agents are the first implementation of agentic CI for pull request review. Instead of encoding standards as code (fragile, expensive to maintain, limited to pattern matching), you express standards as natural language instructions and give the agent the tools to enforce them.
This matters especially as coding agents produce an increasing share of pull requests. When AI writes the code, you need AI that can review it with the same depth and judgment a senior engineer would bring. Not pattern matching. Not rule checking. Actual investigation with access to the full context of your codebase, your production systems, and your team's standards.
Check Run Agents are how you encode your team's institutional knowledge into your CI pipeline — in plain English, with the tools to actually enforce it.
How to Get Started with Check Run Agents
Getting started with Check Run Agents takes less than five minutes:
- Sign up for Macroscope at macroscope.com and install the GitHub App. Every workspace gets $100 in free usage.
- Create a
.macroscope/directory in your repository root. - Add your first agent — start simple. A PR labeling agent or changelog enforcer is a good first agent.
- Commit to your default branch — the agent starts running on the next PR automatically.
- Iterate — refine instructions based on the agent's output. Check Run Agents improve as you sharpen your instructions.
Check Run Agents are currently in beta. Contact support@macroscope.com or book a demo for access.
Frequently Asked Questions
What are Check Run Agents?
Check Run Agents are fully customizable AI agents from Macroscope that run as GitHub check runs on every pull request. You define what to check in a markdown file inside .macroscope/ using plain English instructions, and the agent investigates the PR diff, browses your codebase, queries external tools like Sentry, Slack, and PostHog, and reports findings in the GitHub Checks tab, as inline PR comments, and as top-level PR comments.
How do I create a Check Run Agent?
Create a .md file in the .macroscope/ directory at your repository root. Add optional YAML frontmatter to configure the model, effort level, input mode, and tools. Write your review instructions in natural language below the frontmatter. Commit to your default branch. The agent starts running on the next pull request.
What tools can Check Run Agents use?
Check Run Agents have default access to browse_code (file tree, search), git_tools (log, blame, diff, grep), github_api_read_only (issues, labels, PR metadata), and modify_pr (update PR, post comments). Optional tools include slack, sentry, posthog, launchdarkly, bigquery, amplitude, gcp_cloud_logging, issue_tracking_tools (Jira/Linear), web_tools, and mcp (any MCP-compatible server).
Can Check Run Agents block PR merges?
Yes. Set conclusion: failure in the agent's frontmatter. When the agent finds critical issues, the check run fails in GitHub's Checks UI. Combined with GitHub branch protection rules that require check runs to pass, this blocks merges until issues are resolved. The default conclusion: neutral reports findings without blocking.
How are Check Run Agents different from GitHub Agentic Workflows?
GitHub Agentic Workflows are general-purpose repository automation defined in YAML that runs on GitHub Actions infrastructure. Check Run Agents are purpose-built for pull request review enforcement — defined in markdown, running on Macroscope's infrastructure (no Actions minutes consumed), with native access to the code review context including the diff, codebase graph, and review history.
How are Check Run Agents different from CodeRabbit custom rules?
CodeRabbit custom rules adjust the tool's review behavior via a .coderabbit.yaml configuration file — for example, being stricter about security in certain directories. Check Run Agents are autonomous AI investigators that can browse code, query git history, call external services like Sentry, Slack, PostHog, and LaunchDarkly, and take actions like adding labels, posting to Slack, or creating Jira issues. Each Check Run Agent is a full AI agent, not a configuration toggle.
How are Check Run Agents different from Semgrep or ESLint?
Check Run Agents are complementary to static analysis tools like Semgrep and ESLint, not a replacement. Static analysis is fast and deterministic — ideal for pattern-based rules like import ordering and formatting. Check Run Agents handle what static analysis cannot: business logic validation, cross-system verification (checking Sentry errors, PostHog analytics, LaunchDarkly flags), contextual judgment, and natural-language standards that resist formalization as regex or AST rules.
How much do Check Run Agents cost?
Check Run Agents are currently free during beta. When generally available, they will be billed under Macroscope's Agent usage meter as part of Macroscope's usage-based pricing. Manage costs by using exclude patterns to skip irrelevant files, choosing full_diff over code_object input mode, and selecting appropriate effort and reasoning levels for each agent.
What AI models do Check Run Agents use?
The default model is Claude Opus 4.6. You can also select Claude Opus 4.5 or GPT-5.2 via the model frontmatter field. The reasoning field controls extended thinking depth (off, low, medium, high) — use higher reasoning for complex security reviews and nuanced judgment calls.
Can I have multiple Check Run Agents on one repository?
Yes. Every .md file in .macroscope/ becomes a separate, independent check run. A repository can have a security review agent, an accessibility agent, a changelog enforcer, an API contract checker, and a PR labeling agent all running in parallel on every pull request.
Do Check Run Agents work with monorepos?
Yes. Use the exclude frontmatter field with glob patterns to scope each Check Run Agent to relevant directories. For example, a frontend review agent can exclude "*.go", "*.proto", and "services/**" to focus only on web files, while a backend agent excludes "*.tsx" and "*.css".
What are the two input modes for Check Run Agents?
Check Run Agents support two input modes. full_diff processes the entire PR diff in one agent — lower cost, best for PR-level checks like "is the changelog updated?" code_object runs up to 20 agents in parallel, one per changed code object (function, class, method) — higher cost, better for per-unit enforcement like "does every new function have error handling?"
Are Check Run Agents a good CodeRabbit alternative?
Yes. Teams evaluating CodeRabbit alternatives often choose Macroscope because Check Run Agents offer capabilities that CodeRabbit's custom rules cannot match — including autonomous codebase investigation, external service access (Sentry, Slack, PostHog, LaunchDarkly), and the ability to take actions like adding labels, posting to Slack, or creating Jira issues. CodeRabbit's custom rules adjust review behavior via configuration. Check Run Agents are full AI agents that investigate, reason, and act. See the Macroscope vs CodeRabbit comparison for a full breakdown.
Are Check Run Agents a good Greptile alternative?
For teams evaluating Greptile alternatives, Check Run Agents offer explicit, auditable enforcement rather than implicit pattern learning. Greptile learns your team's standards by observing PR comment behavior over time — no configuration needed, but no explicit control either. Check Run Agents let you define exactly what gets checked in plain English, with access to external tools. Both approaches have merit; Check Run Agents are better for teams that want deterministic, documented enforcement standards.
Can Check Run Agents fix code automatically?
Check Run Agents focus on review and enforcement. For automatic code fixes, Macroscope's built-in Correctness check includes Fix It For Me — an AI code fixer that automatically opens fix PRs for detected bugs and iterates until CI passes. Check Run Agents and Fix It For Me work together: agents find issues, Fix It For Me resolves them.
Where do Check Run Agent results appear?
Check Run Agent findings appear in three places on GitHub: (1) the Check run details page in the Checks tab with the full investigation report, (2) inline PR comments as line-level annotations on specific diff hunks, and (3) top-level PR issue comments for broader findings and summaries. You control the output format in your agent's instructions.
