Landed vs. All: Measuring AI-Generated Code That Ships

June 1, 2026

Macroscope

Product

Landed vs. All: The Engineering Metric That Matters in the Age of AI Coding

AI made code generation abundant, so volume stopped being the signal. Landed vs. All measures the gap between everything produced and what actually shipped to the default branch — the efficiency metric for the AI coding era.

Landed vs. All is an engineering metric that compares everything produced during a period — including in-progress feature branches and PRs that closed without merging — against only the work that actually merged to the default branch. In the age of AI coding agents, this gap is the productivity signal that matters, because AI made code generation abundant and cheap, which means raw volume no longer tells you anything about real progress.

For years it was reasonable to treat total output as a proxy for productivity. More code, more commits, more activity all suggested forward motion. That logic breaks the moment coding agents can generate huge amounts of work for pennies. When code is abundant, the scarce resource is no longer typing. It is judgment, prioritization, and shipping. Landed vs. All is the metric that measures the scarce thing. This is a guide to what it is, why it matters now, and how Macroscope surfaces it directly in the product.

TL;DR — Landed vs. All

All: every bit of work committed in the period, including experiments, feature branches, abandoned directions, and PRs that closed without merging

Landed: only the work that merged to the default branch in the period

The insight is the gap. It measures how much effort, agent output, and spend turned into real product change

Why now: AI made code cheap, so volume stopped being a signal. Shipping is the bottleneck

In Macroscope: a one-click toggle between All and Landed on engineering activity, so you can see who shipped what

Why Volume Stopped Being a Signal

In the AI coding era, code generation is abundant, so measuring volume measures the wrong thing. A coding agent can open dozens of PRs, generate thousands of lines, and rack up commits without a human ever deciding whether any of it should exist. The activity is real. The progress may not be.

This is a genuine break from how engineering output was read for the last decade. Commit counts, lines of code, and PR volume were always imperfect — they rewarded fragmentation and verbosity — but they at least correlated loosely with human effort, because a human had to produce each unit. That correlation is gone. When the marginal cost of generating code approaches zero, the volume of generated code tells you about your tooling's throughput, not your team's progress.

The question worth asking changed accordingly. It is no longer "how much did we produce?" Production is cheap now. The question is "how much of what we produced actually shipped?" That is the gap Landed vs. All exists to measure.

What Landed vs. All Actually Measures

Landed vs. All splits engineering activity into two views of the same period, and the difference between them is where the insight lives.

All captures all work committed during the period, including in-progress feature branches and PRs that may have closed without merging. It is the full picture of everything attempted: experiments, exploratory directions, work in progress, and dead ends.
Landed captures only the work that merged to the default branch during the period. It excludes in-progress feature-branch work and PRs that closed without merging. It is the picture of what actually became product.

The delta between those two numbers is a measure of efficiency: how much of the effort, agent output, and token spend turned into real product change. A team can look busy on the All view and quiet on the Landed view, and the gap between them tells you something a single number never could.

View	Includes	Answers
All	Everything committed: feature branches, experiments, abandoned work, closed-unmerged PRs	How much did we attempt?
Landed	Only work merged to the default branch	How much actually shipped?
The gap	The difference	How efficiently did effort turn into product?

A Big Gap Isn't Automatically Bad

A large gap between All and Landed does not mean someone is doing bad work. This is the most important caveat, and skipping it turns a useful metric into a blunt instrument.

Iteration is necessary. Exploration is healthy. The best engineering often involves trying three approaches and shipping the fourth, or spinning up a spike to learn something and throwing the code away on purpose. A team doing hard, novel work will naturally have a wider gap than a team doing routine maintenance, and that is fine. Discarded work is frequently the cost of good judgment, not the absence of it.

What the gap is good for is the team level, over time, as a trend rather than a verdict. A gap that widens sharply after adopting a coding agent is worth a conversation: is the agent generating work the team then has to discard? Is review or prioritization the bottleneck? Is spend going toward output that never lands? Read as a leading question rather than a score, the delta points you at where to look. Read as an individual performance judgment, it gets gamed like every other metric — and the same team-level, not individual discipline applies here.

The Questions Modern Engineering Metrics Should Ask

In the AI era, coding is no longer the scarce resource. Judgment is. Prioritization is. Shipping is. So the metrics that matter shifted from one question to several.

The old question was:

How much did we produce?

The questions that actually matter now are:

How much of it landed? What share of generated work made it to the default branch.
How long did it take to land? The PR cycle time from work-started to merged — the speed at which good work converts to shipped product.
How much generated work was discarded along the way? The All-minus-Landed gap, read as a trend.
How much spend translated into shipped value? When agents bill per run and per token, the ratio of spend to landed work is a real efficiency number, not a vanity one.

These compose with the established frameworks rather than replacing them. DORA metrics still measure delivery performance; deployment frequency and change failure rate still matter. Landed vs. All adds the dimension those frameworks predate: when generation is cheap, you need to measure conversion, not just throughput and stability.

How Macroscope Surfaces Landed vs. All

Macroscope makes Landed vs. All a one-click toggle on engineering activity, so you can see who shipped what without building a reporting pipeline. Flip from All to Landed and the view recomputes to count only work merged to the default branch — excluding in-progress feature branches and PRs that closed without merging — using exactly the definitions above.

This is the same product that runs AI code review on every pull request, which is what makes the Landed view trustworthy rather than just a filter. Because Macroscope reads the codebase and the full PR history, "landed" reflects what genuinely merged and survived, and the activity it attributes is grounded in real review signal rather than raw commit noise. The result is a view an engineering leader can act on: in a period where agents opened a lot of PRs, how much of that actually became product, and where did the rest go?

The toggle answers the question this whole metric exists for. It is hard to tell, in the age of AI and agents, who got what done — because the activity stream is louder than ever and less correlated with progress than ever. Landed vs. All cuts through that by separating attempted from shipped.

Need better visibility into your codebase?

Get started with $100 in free usage.

Code Is Cheap. Shipping Is the Bottleneck.

The strategic point underneath the metric is simple: the winning teams will not be the ones that generate the most code. They will be the ones that turn the highest share of generated work into production outcomes.

When coding was the constraint, output was a fair proxy for progress, and tooling that produced more code looked like a win. Now that coding is abundant, the constraint moved downstream — to deciding what is worth building, reviewing it well, and getting it merged safely. A team that generates ten times more code but lands the same amount has not gotten faster. It has gotten noisier, and possibly more expensive.

That is why Landed vs. All is worth paying attention to. It measures the bottleneck instead of the abundance. It rewards judgment and shipping over volume and activity. And in an era where any team can generate enormous amounts of code on demand, the share of that code which actually lands is one of the few numbers that still separates motion from progress.

For the broader discipline of measuring engineering output without the vanity metrics, see how to measure developer productivity in 2026; for the platform landscape, see the engineering productivity analytics review.

Frequently Asked Questions

What is the Landed vs. All metric?

Landed vs. All is an engineering metric that compares all work committed during a period against only the work that merged to the default branch. "All" includes in-progress feature branches, experiments, and PRs that closed without merging. "Landed" includes only work that actually merged. The gap between the two measures how efficiently effort and AI-generated output converted into real product change. It matters most in the age of AI coding, when code generation is cheap and raw volume no longer signals progress.

Why does volume stop being a good productivity signal with AI coding agents?

Because the marginal cost of generating code approaches zero. When a coding agent can open dozens of PRs and produce thousands of lines for pennies, commit counts and lines of code measure your tooling's throughput, not your team's progress. The correlation that activity metrics used to rely on — that a human had to produce each unit — is gone. The useful question shifts from "how much did we produce?" to "how much of what we produced actually shipped?"

Does a large gap between Landed and All mean a team is doing bad work?

No. Iteration and exploration are healthy, and discarded work is often the cost of good judgment rather than the absence of it. A team doing hard, novel work will naturally have a wider gap than one doing routine maintenance. The gap is useful as a team-level trend, not an individual verdict: a gap that widens sharply after adopting an agent is worth investigating, but the metric should never be used to rank individual engineers, or it gets gamed like any other.

How is Landed vs. All different from DORA metrics?

DORA metrics — deployment frequency, lead time, change failure rate, and mean time to recovery — measure delivery performance and stability. Landed vs. All measures conversion: what share of generated work becomes shipped product. They are complementary. DORA predates cheap AI code generation and assumes the work being measured is work someone chose to do. Landed vs. All adds the dimension that matters when generation is abundant — how much of the attempted work actually lands.

What does "landed" mean exactly?

Landed means the work merged to the default branch during the period. It explicitly excludes in-progress feature-branch work and pull requests that closed without merging. So a feature branch that is still open, or a PR that was closed without being merged, counts toward "All" but not toward "Landed." The Landed view is the picture of what actually became product, as opposed to everything that was attempted.

How does Macroscope measure Landed vs. All?

Macroscope provides a one-click toggle between All and Landed on engineering activity. The All view counts everything committed in the period; the Landed view recomputes to count only work merged to the default branch, excluding in-progress branches and closed-unmerged PRs. Because Macroscope also runs AI code review on every pull request and reads the full codebase and PR history, the Landed view reflects what genuinely merged and survived — making it a view engineering leaders can act on rather than just a raw commit filter.

What questions should engineering leaders ask in the age of AI coding?

Beyond "how much did we produce?", ask: how much of it landed, how long did it take to land, how much generated work was discarded along the way, and how much spend translated into shipped value. These measure the scarce resources in the AI era — judgment, prioritization, and shipping — rather than the abundant one, which is code generation. Read them at the team level and over time, alongside DORA metrics and PR cycle time, not as individual scorecards.

Is "landed" the same as "deployed to production"?

Not exactly. Landed means the work merged to the default branch, which is the point where it becomes part of the shared codebase and is committed to ship. Deployment to production is a separate, downstream step measured by DORA's deployment frequency. Landed vs. All is about how much attempted work converts to merged product change; deployment frequency is about how often that merged work reaches users. Teams with continuous deployment see the two converge, but they answer different questions.