Check Run Agents — Your rules, enforced on every PR

May 27, 2026

Macroscope

Product

Check Run Agents

Define custom AI checks in markdown. Macroscope runs them on every PR with access to your codebase, git history, and integrations.

Finding correctness bugs is the highest-stakes job of code review — and it's what Macroscope does on every PR out of the box. But code review was never just about bugs.

Every team has rules beyond correctness: style guides, architecture boundaries, event tracking invariants, ticket-to-code traceability. These rules live in tribal knowledge, enforced inconsistently by whoever reviews the PR — or through a patchwork of non-agentic CI/CD workflows.

Check Run Agents fix this.

What are Check Run Agents

A Check Run Agent is a markdown file in your repository that defines a custom AI agent. Drop it in .macroscope/check-run-agents/, write your instructions, and merge it into your default branch. Macroscope runs it as a GitHub check run on every PR that matches the agent's file scope.

Each agent gets its own GitHub Checks automatically. You scope each agent to the files it should care about, give it the tools it needs, and write the instructions like you would in a skill.

The instructions you write and the tools you enable define what the agent does. Most commonly, teams configure agents to take action by posting inline review comments on the PR and updating PR metadata like labels. But agents can take any action their tools allow — create GitHub issues, file Linear tickets, post Slack messages, consult your feature flag system, etc. Set the conclusion to failure and the agent blocks merges — the same way a failing test would.

Check run agents in progress...

5 agents · defined in .macroscope/check-run-agents

Security Review

Web Event Tracking

Ticket Requirements

Accessibility Audit

Production Errors

Anatomy of a check run agent

Every agent is a single .md file with optional YAML frontmatter and a markdown body.

Here's a minimal example — a security review scoped to your API layer:

---
title: Security Review
model: claude-sonnet-4-6
tools:
  - browse_code
  - web_tools
include:
  - "src/api/**"
---

Review this PR for security vulnerabilities. Focus on:
- SQL injection via string concatenation
- Hardcoded secrets or API keys
- Unsafe deserialization
- Missing input validation at API boundaries

Do not flag issues in test files. Do not flag pre-existing
issues that this PR does not introduce or modify.

This file creates a GitHub check run called "Macroscope - Security Review" that runs on every PR that touches files in src/api/, powered by Claude Sonnet 4.6, with access to browse the codebase and search the web for CVE references. PRs that don't touch that path? The agent doesn't run at all.

Here's what each frontmatter field controls (see the full reference for the latest supported fields and tools):

model

Which LLM powers the agent. Choose from Claude (Opus 4.5–4.7, Sonnet 4.5–4.6) or GPT (5.2, 5.4, 5.5). Default is claude-opus-4-6.

reasoning

Extended thinking depth: off, low, medium, or high. Higher reasoning means the agent thinks harder before acting, which matters for nuanced judgments like "is this a deliberate design choice?"

effort

Investigation depth: low, medium, or high. Controls how deeply the agent explores the codebase beyond the diff itself.

tools

What the agent can do. Four tools are available by default (browse_code, git_tools, github_api_read_only, modify_pr), with a dozen more you can opt into. More on this below.

input

How the diff is processed. full_diff (default) gives a single agent the entire PR. code_object dispatches up to 20 parallel agents, each reviewing an individual function or class — higher cost, higher coverage.

include / exclude

Glob patterns that scope the agent to specific files or directories. If a PR doesn't touch any included files, the agent doesn't run at all — no noise, no wasted credits.

conclusion

neutral (default) or failure. Set to failure and the agent becomes a merge gate.

The markdown body after the frontmatter is the agent's instructions. Write them like you'd write a runbook: what to look for, what to ignore, how to format findings, what severity levels mean. The more specific you are, the better the agent performs.

Tools and integrations

The default tools let an agent browse code, read git history, query the GitHub API, and post PR comments. That alone covers style guide enforcement and architecture boundary checks.

But the real power is in the extended tools. Each one you add unlocks a different class of review.

issue_tracking_tools

Query Jira and Linear tickets linked to the PR.

slack

Post messages to Slack channels and look up users.

sentry

Query error issues, event history, and stack traces from Sentry.

posthog / amplitude

Query product analytics events and feature flag evaluations.

launchdarkly

Query feature flag configurations and targeting rules.

bigquery

Run read-only SQL queries against your BigQuery datasets.

gcp_cloud_logging

Query GCP Cloud Logging entries.

web_tools

Search the web and fetch content from URLs.

image_gen

Generate and upload images and diagrams.

write_code

Commit code changes directly to the PR branch.

mcp

Connect any external MCP server. PagerDuty, Datadog, your documentation platform — any service that exposes an MCP endpoint. Add the connection in your workspace settings, include mcp in the tools list, and the agent can discover and call that server's tools at runtime.

Use cases

The flexibility of "markdown instructions + configurable tools + file scoping" makes the design space large. Here are the patterns we see teams using:

Style guide enforcement — Paste your coding standards into the instructions, scope to the relevant language or directory. We run six of these internally — one per language — each scoped to its own paths. The web team's agent knows about React patterns, barrel imports, and component structure. The backend agent knows about Go error handling, gRPC conventions, and database access patterns.

Architecture boundary enforcement — "The frontend must never query the database directly. All data access goes through the admin service." "Service A must not import from Service B." Rules like these are impossible to enforce with linters but trivial to describe in English.

Ticket requirement verification — Connect your issue tracker, and the agent reads linked tickets to verify the PR implements what was specified. Catches requirement drift before it reaches production — especially valuable when coding agents are writing the implementation.

Event tracking protection — Scope to files that contain analytics calls. The agent detects when tracking events are removed, renamed, or have their properties changed, and alerts the data team before the PR merges.

Content and SEO checks — Scope to your marketing site or docs directory. Verify that blog posts have complete metadata, that image references point to files that actually exist, that SEO titles target real search queries.

Compliance and security — Verify that sensitive code paths include audit logging. Check that new API endpoints have authentication middleware. Flag hardcoded secrets. Connect web_tools to cross-reference against CVE databases.

The pattern is always the same: if a human reviewer would catch it by reading the diff and checking some context, an agent can do it on every PR, every time.

Pricing

Check Run Agents are billed through agent credits. Every workspace gets 1,000 free credits per month. Beyond that, credits cost $0.01 each. The credit formula is straightforward: the raw LLM cost of running the agent, plus a 5% markup, divided by $0.01.

Agent credit usage is visible in your workspace billing settings, with per-agent and per-PR breakdowns available via CSV export.

Get started

Create a .macroscope/check-run-agents/ directory in your repo's default branch. Add a .md file with your first agent's instructions. Macroscope picks it up automatically on the next PR.

Define your rules. We'll enforce them.

Get started with $100 in free credits.