What Are DORA Metrics? The Developer's Guide to Measuring Engineering Performance
DORA metrics are four key measures of software delivery performance: deployment frequency, lead time, change failure rate, and MTTR.
DORA metrics are four key measurements that quantify software delivery performance: deployment frequency, lead time for changes, change failure rate, and mean time to restore service (MTTR). Developed by the DevOps Research and Assessment (DORA) team at Google, these metrics provide an empirical framework for evaluating how effectively engineering teams ship software and recover from failures.
Since their introduction in the Accelerate book by Nicole Forsgren, Jez Humble, and Gene Kim, DORA metrics have become the industry standard for benchmarking engineering performance. Google's annual State of DevOps reports, which draw on data from over 36,000 professionals worldwide, consistently show that teams scoring "elite" on DORA metrics deliver software 973 times more frequently than low performers while maintaining higher stability. These are not vanity metrics. They are predictive indicators of organizational performance, team health, and business outcomes.
What Are the Four DORA Metrics?
Each DORA metric captures a distinct dimension of software delivery. Two measure throughput (speed), and two measure stability (reliability). The power of the framework is that it forces teams to optimize for both simultaneously, rather than sacrificing one for the other.
1. Deployment Frequency
Deployment frequency measures how often your team deploys code to production. It is the most visible indicator of throughput. Teams that deploy frequently tend to ship smaller, lower-risk changes that are easier to review, test, and roll back if something goes wrong.
What counts as a "deployment" depends on your architecture. For monolithic applications, it is a full release. For microservice architectures, it is a deployment of any individual service. The important thing is consistency in how you define and count it.
2. Lead Time for Changes
Lead time for changes measures the time from a developer's first commit to that code running in production. It captures everything in between: code review, CI/CD pipeline execution, staging validation, and the actual deployment.
This metric exposes bottlenecks in your delivery pipeline. A team with fast coding but slow review processes will see long lead times. A team with quick reviews but manual deployment gates will see the same. Lead time makes the invisible visible.
3. Change Failure Rate
Change failure rate is the percentage of deployments that cause a failure in production, requiring a hotfix, rollback, or patch. It is the primary stability metric and acts as a counterbalance to deployment frequency. Deploying 50 times a day means nothing if 30% of those deployments break something.
A common mistake is defining "failure" too narrowly. Change failure rate should include any deployment that degrades service, triggers an incident, requires an emergency fix, or forces a rollback. Being honest about failures is the only way this metric provides value.
4. Mean Time to Restore (MTTR)
MTTR measures how long it takes to restore service after a production failure. It starts when the failure is detected and ends when the service is fully restored to normal operation. This metric captures both your monitoring capabilities (how fast you detect problems) and your incident response effectiveness (how fast you fix them).
Low MTTR does not mean you never have incidents. It means your team has the tooling, processes, and confidence to recover quickly. Elite teams treat fast recovery as more important than preventing every possible failure.
How Do DORA Metrics Benchmarks Work?
The DORA team classifies teams into four performance tiers based on their metrics. These benchmarks are updated annually in the State of DevOps report. Here are the current benchmarks:
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | Multiple times per day | Between once per week and once per month | Between once per month and once every 6 months | Fewer than once every 6 months |
| Lead Time for Changes | Less than 1 hour | Between 1 day and 1 week | Between 1 month and 6 months | More than 6 months |
| Change Failure Rate | 0-5% | 6-10% | 11-15% | 16-30%+ |
| Mean Time to Restore | Less than 1 hour | Less than 1 day | Between 1 day and 1 week | More than 6 months |
Two things stand out in these benchmarks. First, the gap between elite and low performers is enormous. Elite teams deploy thousands of times more frequently while maintaining lower failure rates. Second, throughput and stability are not trade-offs. The data consistently shows that elite teams score high on both dimensions.
How Do You Measure DORA Metrics?
Measuring DORA metrics accurately requires pulling data from multiple systems across your software delivery pipeline. Here is where each metric typically comes from.
Deployment frequency comes from your CI/CD system or deployment tooling. If you use GitHub Actions, GitLab CI, or ArgoCD, you can count production deployment events. Some teams use feature flag activation events from tools like LaunchDarkly as a proxy for deployment frequency.
Lead time for changes requires correlating data from your version control system (first commit timestamp) with your deployment system (production deployment timestamp). This is where many teams struggle, because the data lives in different systems that are not natively connected.
Change failure rate requires a definition of "failure" and a way to link deployments to incidents. If you use an incident management tool (PagerDuty, OpsGenie, or a custom system), you can match incident timestamps against deployment timestamps. Some teams tag deployments that triggered rollbacks in their deployment tooling.
MTTR comes from your incident management and monitoring systems. The start time is when an alert fires or an incident is opened. The end time is when the incident is resolved and service is restored.
Common Measurement Pitfalls
Cherry-picking the definition of "deployment." Some teams only count major releases, which inflates deployment frequency artificially. Count every production deployment consistently.
Ignoring queue time in lead time. Lead time should include the time a PR sits waiting for review, not just the time the CI pipeline runs. If a developer's code waits 3 days for review, that is lead time.
Underreporting change failures. If a deployment causes a minor bug that the team fixes with a quick follow-up commit, that still counts as a change failure. Underreporting leads to false confidence.
Measuring MTTR from acknowledgment, not detection. MTTR should start when the failure occurs or is detected, not when someone acknowledges the alert. If your monitoring has a 30-minute blind spot, that is part of your MTTR.
Why Do DORA Metrics Matter for Engineering Teams?
DORA metrics matter because they replace subjective opinions about engineering performance with objective measurements. Without them, discussions about team effectiveness devolve into anecdotes: "We feel like we are shipping faster" or "I think our quality has improved." DORA metrics turn these conversations into data-driven assessments.
They also matter because the research behind them is unusually rigorous. The DORA team's methodology uses cluster analysis rather than arbitrary cutoffs, meaning the performance tiers emerge from the data itself. This is not a consulting framework someone invented. It is empirical research validated across thousands of organizations over nearly a decade.
For engineering leaders, DORA metrics provide a common language for communicating with business stakeholders. Deployment frequency and lead time translate directly to "how fast can we ship features." Change failure rate and MTTR translate to "how reliable is our software." Executives understand speed and reliability even if they do not understand sprint velocity or story points.
How Do DORA Metrics Connect to Engineering Intelligence?
Raw DORA metrics tell you what is happening. Engineering intelligence platforms tell you why.
If your deployment frequency dropped 40% this quarter, the DORA metric flags the problem. But understanding the cause requires deeper analysis. Is it a review bottleneck? A flaky CI pipeline? A shift toward larger, riskier PRs? Answering these questions requires connecting deployment data to code review data, CI/CD data, and team activity data.
Platforms like Macroscope approach this by connecting code-level analysis (PR cycle time, review patterns, code complexity) to delivery metrics (deployment frequency, lead time). When a team's lead time increases, you can drill into whether the cause is slower reviews, longer CI runs, or changes sitting in a merge queue. This is the difference between knowing your metrics and acting on them.
DORA metrics are also most valuable when tracked over time rather than as point-in-time snapshots. Trends matter more than absolute numbers. A team moving from "low" to "medium" deployment frequency is making meaningful progress, even if they are not yet "elite." The trajectory tells you whether your process improvements are working.
What Are the Limitations of DORA Metrics?
DORA metrics are powerful, but they are not comprehensive. There are several things they do not capture.
Developer experience. A team can have elite DORA metrics while developers are burned out. High deployment frequency achieved through unsustainable hours is not a sign of health. Pair DORA metrics with developer satisfaction surveys and workload analysis.
Code quality. DORA metrics measure delivery performance, not code quality. A team can ship fast with low failure rates while accumulating significant technical debt. Complement DORA metrics with code health indicators like test coverage, cyclomatic complexity, and dependency freshness.
Business impact. Deploying frequently does not mean you are building the right things. DORA metrics measure how fast you ship, not whether what you ship creates value. Product metrics (adoption, retention, revenue impact) fill this gap.
Individual performance. DORA metrics are team-level measurements. Using them to evaluate individual developers is a misuse of the framework. Individuals contribute to team DORA outcomes, but isolating individual impact is both difficult and counterproductive.
Frequently Asked Questions
How often should we measure DORA metrics?
Measure continuously and review trends monthly or quarterly. Daily measurement provides the raw data, but weekly or monthly aggregation smooths out noise and reveals meaningful trends. Avoid reacting to day-to-day fluctuations, which are often caused by release schedules, holidays, or one-off incidents rather than systemic issues.
Can small teams benefit from DORA metrics?
Yes. DORA metrics are valuable for teams of any size. Small teams (under 10 engineers) often see the most immediate benefits because improvements are easier to implement and the feedback loop is shorter. The benchmarks still apply, though small teams should focus on trends rather than comparing themselves to large-organization benchmarks.
What is the relationship between DORA metrics and SPACE?
SPACE (Satisfaction, Performance, Activity, Communication, Efficiency) is a broader framework for developer productivity proposed by researchers at Microsoft and GitHub. DORA metrics map most directly to the Performance and Efficiency dimensions of SPACE. The frameworks are complementary. DORA provides specific, measurable metrics. SPACE provides a broader lens that includes subjective dimensions like developer satisfaction.
Should we set targets for DORA metrics?
Set directional targets (improve lead time by 20%) rather than absolute targets (achieve lead time under 1 hour). Absolute targets encourage gaming. Directional targets encourage genuine improvement. Use the DORA benchmarks as a reference point for what is achievable, not as a mandatory goal.
How do DORA metrics apply to platform and infrastructure teams?
Platform teams can adapt DORA metrics to their context. Deployment frequency becomes "how often do we release platform updates." Lead time becomes "how quickly can a product team go from requesting a platform capability to using it." Change failure rate and MTTR apply directly, since platform failures affect all downstream teams. The principles are the same even if the specifics differ.
