The Hidden AI Coding Metric: Time From Idea to Merged PR
Every pitch deck for an AI coding tool includes a version of the same claim: "Our tool makes developers X% more productive." The number varies — 30%, 50%, sometimes even 100% — but the underlying measurement is almost always the same: lines of code generated, completion acceptance rate, or time to write a function. These metrics are not wrong, exactly. But they measure the wrong thing. They measure typing speed. And typing speed has almost nothing to do with how fast your engineering team ships software.
On the TBPN show, Jordi Hays has been hammering this point for months: "If your AI coding tool metric is 'lines of code per hour,' you are measuring the speedometer instead of checking whether you are driving to the right destination." He is right. The metric that actually matters for startup engineering teams is one that almost nobody tracks explicitly: time from idea to merged PR. This is the end-to-end metric that captures everything — from the moment someone identifies a feature, bug fix, or improvement to the moment that change is live in production. And AI coding tools affect this metric in ways that are far more nuanced than "developers type faster."
This post defines the metric, breaks it down into measurable sub-components, explains how AI coding tools affect each stage, establishes benchmarks for what "good" looks like at different team sizes, and gives you a practical guide to setting up a dashboard that tracks it. If you care about engineering velocity — and if you are a startup founder or CTO, you should care about little else — this is the framework you need.
Why "Lines of Code" Is the Wrong Metric
The Problem With Output-Based Metrics
Lines of code as a productivity metric has been debunked so many times that it feels embarrassing to have to explain why it is wrong again. But the AI coding tool industry has resurrected it under new names — "completions per hour," "suggestions accepted," "code generated" — that obscure the same fundamental flaw: more code is not better code. In fact, the best engineering work often involves deleting code, simplifying systems, and finding solutions that require fewer lines rather than more.
AI coding tools exacerbate this problem. A tool that generates 500 lines of boilerplate code in 30 seconds looks incredibly productive by output-based metrics. But if that boilerplate introduces a subtle bug that takes three hours to debug, or if it adds technical debt that slows future development, the net impact on engineering velocity is negative. The tool generated code quickly. The team shipped software slowly. The metrics say everything is great while reality says otherwise.
What Actually Slows Down Software Delivery
When you decompose the time between "we should build this" and "this is in production," the coding phase — the part that AI tools directly accelerate — typically represents only 20-30% of the total elapsed time. The rest is consumed by:
- Specification and design (10-15%): Understanding the requirements, designing the solution, making architectural decisions
- Code review (15-25%): Waiting for reviews, responding to feedback, iterating on changes
- Testing and QA (10-20%): Writing tests, running test suites, fixing test failures, manual QA
- CI/CD and deployment (5-10%): Pipeline execution, staging verification, deployment
- Waiting and context switching (15-25%): Waiting for reviewer availability, blocked by dependencies, context switching between tasks
Even if an AI coding tool makes the coding phase 50% faster (a generous estimate for routine tasks), the impact on total delivery time is only 10-15% — because you only accelerated 20-30% of the overall process. To meaningfully accelerate delivery, you need a metric that captures the entire pipeline and a strategy that addresses bottlenecks wherever they occur, not just in the coding phase.
Defining the Metric: Time From Idea to Merged PR
What It Measures
Time from idea to merged PR (which we will abbreviate as TIMP — Time to Idea-Merged PR) is the elapsed time between two events: (1) a work item is created (a ticket in Linear, Jira, or your project management tool) and (2) the PR that implements that work item is merged into the main branch. This metric captures the full engineering delivery cycle — specification, implementation, review, testing, and merge — in a single number.
TIMP is not a new concept, but it has not been widely adopted as a primary engineering metric because it is harder to measure than simple output metrics. You cannot calculate TIMP by looking at Git history alone — you need to link work items to PRs and measure the elapsed time between them. This linkage requires discipline (engineers need to reference ticket IDs in their PRs) and tooling (your project management tool and Git hosting need to be connected). But the effort is worth it because TIMP captures what actually matters: how quickly your team converts ideas into shipped software.
The TIMP Decomposition
TIMP can be decomposed into four sub-metrics that correspond to the stages of the engineering delivery cycle. Each sub-metric is independently measurable and independently actionable:
- Time to First Commit (TFC): The elapsed time from ticket creation to the first commit that references the ticket. This measures how quickly an engineer begins implementation after a work item is defined. TFC is affected by sprint planning, ticket clarity, engineer availability, and ramp-up time for understanding the problem.
- Time to PR Open (TPO): The elapsed time from first commit to PR creation. This measures the implementation phase — the actual coding work. TPO is the sub-metric most directly affected by AI coding tools, since faster code generation, test writing, and debugging all reduce the time between first commit and a PR that is ready for review.
- Time to First Review (TFR): The elapsed time from PR creation to the first review comment. This measures reviewer responsiveness and is often the longest single sub-metric in the TIMP decomposition. TFR is affected by reviewer availability, PR size (larger PRs wait longer for review), and team review culture.
- Time to Merge (TTM): The elapsed time from first review to merge. This measures the review-iterate-approve cycle — how quickly the team resolves review feedback, makes requested changes, and reaches approval. TTM is affected by code quality (fewer issues = faster approval), review thoroughness, and CI pipeline speed.
By measuring each sub-metric independently, you can identify exactly where your engineering delivery pipeline is bottlenecked and target your improvement efforts accordingly. A team with fast TPO but slow TFR does not need a better coding tool — they need a better review culture. A team with fast TFR but slow TTM does not need faster reviewers — they need higher code quality or faster CI pipelines.
How AI Coding Tools Affect Each Stage
Impact on Time to First Commit (TFC)
AI coding tools have a moderate impact on TFC. Tools like Claude Code can help engineers understand unfamiliar codebases more quickly by answering questions about architecture, explaining complex code, and identifying relevant files — reducing the ramp-up time before an engineer feels confident enough to start coding. Cursor's codebase indexing serves a similar function, helping engineers navigate large repositories and understand existing patterns.
The practical impact: AI tools typically reduce TFC by 15-30% for tickets that involve unfamiliar code. For tickets in well-understood areas of the codebase, the impact on TFC is minimal because the ramp-up time is already short.
Impact on Time to PR Open (TPO)
This is where AI coding tools have their most visible impact, and it is substantial. For routine implementation tasks — adding a new API endpoint, building a UI component, writing data processing logic — AI tools can reduce TPO by 40-60%. The acceleration comes from multiple sources:
- Faster code generation: AI-assisted completion and generation means engineers spend less time writing boilerplate and more time on the unique logic of their feature
- Automated test writing: Claude Code can generate comprehensive test suites for new features, reducing the time engineers spend writing tests from hours to minutes
- Faster debugging: AI tools can identify and fix bugs more quickly by analyzing error messages, suggesting fixes, and iterating until tests pass
- Reduced context switching: Inline AI assistance means engineers do not need to leave their editor to search documentation, Stack Overflow, or internal wikis
For complex, novel implementation tasks — designing a new system architecture, solving a problem nobody has solved before — the impact on TPO is smaller (10-20%) because the bottleneck is thinking, not typing.
Impact on Time to First Review (TFR)
AI coding tools have an indirect but important impact on TFR. The primary driver of TFR is reviewer availability — how long a PR sits waiting for someone to look at it. AI tools can reduce TFR in two ways:
- Smaller PRs: AI tools enable engineers to work in smaller increments because the cost of creating a new PR (writing code, writing tests, writing description) is lower. Smaller PRs get reviewed faster because reviewers can process them in less time
- AI-assisted pre-review: Tools like Claude Code can perform a first-pass review before a human reviewer sees the PR, catching obvious issues and reducing the burden on human reviewers. This makes human reviewers more willing to pick up PRs quickly because they know the easy issues have already been flagged
The practical impact: teams that adopt AI coding tools and simultaneously adopt a culture of smaller, more frequent PRs see TFR reductions of 25-40%. The tool alone does not cause this — it enables a workflow change that causes it.
Impact on Time to Merge (TTM)
AI tools improve TTM by improving code quality at submission time. When AI tools help write better code and more comprehensive tests, PRs receive fewer review comments, require fewer revision cycles, and reach approval faster. Additionally, AI-assisted response to review comments — using Cursor or Claude Code to quickly implement requested changes — reduces the back-and-forth time that often dominates TTM.
The practical impact: 20-35% reduction in TTM, driven primarily by fewer revision cycles and faster implementation of reviewer-requested changes.
Benchmarks: What Good Looks Like
Benchmarks by Team Size
| Metric | 5-Person Startup | 15-Person Startup | 50-Person Team |
|---|---|---|---|
| TIMP (median) | 1-2 days | 2-4 days | 3-7 days |
| TFC | < 4 hours | < 8 hours | < 1 day |
| TPO | 2-6 hours | 4-12 hours | 1-2 days |
| TFR | < 2 hours | < 4 hours | < 8 hours |
| TTM | < 4 hours | < 8 hours | < 1 day |
| Deployment frequency | Multiple per day | Daily | Daily to weekly |
These benchmarks assume a team using AI coding tools effectively and following modern engineering practices (small PRs, continuous deployment, automated testing). Teams not using AI tools should expect metrics 30-50% slower than these benchmarks.
The DORA Metrics Overlap
If you are familiar with the DORA (DevOps Research and Assessment) metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — you will notice significant overlap with our TIMP framework. This is intentional. TIMP is essentially a more granular decomposition of DORA's "lead time for changes" metric, broken into sub-components that are individually actionable.
The key addition TIMP makes to the DORA framework is the bug introduction rate sub-metric. AI coding tools can increase deployment frequency and reduce lead time while simultaneously introducing more bugs — a trade-off that DORA metrics alone might miss. By tracking bug introduction rate alongside TIMP, you get a complete picture of whether your AI-assisted engineering pipeline is actually improving or just moving faster while breaking more things.
Sub-Metrics That Complete the Picture
Bug Introduction Rate
Bug introduction rate measures how many bugs are introduced per PR merged. This is critical for evaluating AI coding tools because there is a real risk that faster code generation leads to more bugs if engineers do not review AI-generated code carefully. Track this as: (number of bug-fix PRs linked to a feature PR) / (total feature PRs merged). A healthy rate is below 0.1 (fewer than 1 in 10 feature PRs introduces a bug that requires a follow-up fix).
Review Turnaround
Review turnaround measures the average time between each round of review feedback. If your first review happens quickly (low TFR) but the back-and-forth takes three rounds of comments over two days, your TTM is still slow. Track the number of review rounds per PR and the average time between rounds. Well-functioning teams average 1.5-2 review rounds per PR with less than 4 hours between rounds.
PR Size Distribution
Track the distribution of PR sizes (measured in lines changed). Research consistently shows that smaller PRs are reviewed faster, have fewer bugs, and merge more quickly. AI coding tools should enable smaller PRs by making it cheaper to create a well-structured, well-tested incremental change. If your AI tool adoption is increasing average PR size (because engineers are generating more code per sitting), that is a warning signal — you are optimizing for output volume rather than delivery velocity.
Tools to Track TIMP
LinearB
LinearB is the most comprehensive engineering metrics platform for tracking TIMP-style metrics. It integrates with GitHub/GitLab, Jira/Linear, and CI/CD pipelines to automatically calculate cycle time, PR pickup time, review time, and deployment frequency. LinearB's "Workflow Dashboard" provides a visual decomposition of the engineering delivery pipeline that maps closely to our TIMP sub-metrics. Pricing starts at approximately $20/developer/month for the Team tier.
Sleuth
Sleuth focuses specifically on DORA metrics and deployment tracking. It is particularly strong at linking code changes to deployments and tracking change failure rates — making it a good complement to TIMP tracking for teams that want to measure not just how fast they ship but how reliably. Sleuth's free tier covers basic deployment tracking, with paid tiers starting at $20/developer/month for full DORA metrics.
GitHub Insights and GitLab Analytics
Both GitHub and GitLab offer built-in analytics that can serve as a starting point for TIMP tracking. GitHub's "Insights" tab provides PR cycle time, review turnaround, and contributor activity metrics. GitLab's "Value Stream Analytics" provides a more complete pipeline view that tracks work items from creation through deployment. These built-in tools are less configurable than dedicated platforms but are free and require no additional integration.
Custom Dashboards
For teams that want full control over their metrics, a custom dashboard built on your existing data is often the best approach. The required data sources are: (1) your project management tool's API (Linear, Jira) for ticket creation timestamps, (2) your Git hosting platform's API (GitHub, GitLab) for commit, PR, review, and merge timestamps, and (3) your CI/CD platform's API for deployment timestamps. A lightweight script that pulls data from these APIs and writes to a visualization tool (Grafana, Metabase, or even a Google Sheet) can be built in a day and provides exactly the metrics you care about with no vendor lock-in.
Setting Up Your TIMP Dashboard: A Practical Guide
Step 1: Establish the Data Pipeline
Connect your project management tool to your Git hosting platform so that ticket IDs are automatically linked to PRs. In Linear, this happens automatically when PR branch names include the ticket ID. In Jira, configure the GitHub or GitLab integration to link PRs based on ticket references in branch names or commit messages. This linkage is the foundation — without it, you cannot calculate TIMP because you cannot connect work items to code changes.
Step 2: Define Your Baseline
Before making any changes to your workflow or tooling, calculate your current TIMP and sub-metrics for the past 30-60 days. This baseline tells you where you are starting and allows you to measure the impact of changes you make. Pull the data from your Git hosting API: for each merged PR, calculate TFC (ticket creation to first commit), TPO (first commit to PR creation), TFR (PR creation to first review), and TTM (first review to merge). Calculate the median for each sub-metric — medians are more useful than averages because they are not skewed by outlier PRs that sit open for weeks.
Step 3: Identify Your Bottleneck
Look at your TIMP decomposition and identify which sub-metric is the largest contributor to total TIMP. For most teams, the bottleneck is either TFR (PRs waiting for review) or TTM (review-iterate cycles taking too long). The bottleneck tells you where to focus: if TFR is your problem, the solution is review culture, not a faster coding tool. If TPO is your problem, AI coding tools will have the most impact. Do not invest in accelerating a phase that is not your bottleneck — it will have minimal impact on end-to-end TIMP.
Step 4: Set Targets and Track Weekly
Set targets for each sub-metric based on the benchmarks earlier in this post, adjusted for your team size and context. Track progress weekly — TIMP metrics are noisy at the daily level but show clear trends over weeks. Celebrate improvements and investigate regressions. Share the dashboard with the entire engineering team so that everyone understands how their work contributes to delivery velocity.
As John Coogan frequently says on the TBPN show — usually while adjusting his TBPN hat — "You cannot improve what you do not measure. And you cannot measure what you do not define. TIMP gives you a definition that actually corresponds to what your customers care about: how fast you ship." Start measuring it this week. You will be surprised what you learn about where your engineering time actually goes.
Frequently Asked Questions
How is TIMP different from DORA's "lead time for changes" metric?
DORA's lead time for changes measures from first commit to production deployment. TIMP extends this in both directions — it starts earlier (at ticket creation, not first commit) and decomposes the pipeline into four sub-metrics (TFC, TPO, TFR, TTM) that are individually actionable. Think of TIMP as a more granular version of DORA's lead time that includes the pre-coding and post-merge phases. Teams already tracking DORA metrics can add TIMP sub-metrics as a complement rather than a replacement.
What if our team does not use a project management tool with ticket IDs?
If you do not have a formal ticketing system, you can approximate TIMP using Git data alone. Use the first commit on a feature branch as the starting point (replacing ticket creation) and the merge commit as the ending point. This misses the TFC component but still captures TPO, TFR, and TTM. However, for any team beyond 3-4 engineers, investing in a lightweight project management tool (Linear is the popular choice among startups) is strongly recommended — both for TIMP tracking and for general engineering organization.
Should we track TIMP for all PRs or only feature PRs?
Track TIMP for all PRs but report on feature PRs and bug-fix PRs separately. Feature PRs have naturally longer TIMP because they involve more implementation work and more thorough review. Bug-fix PRs should have shorter TIMP because urgency drives faster review and simpler implementations. If your bug-fix TIMP is not significantly shorter than your feature TIMP, that is a signal that your team's review process does not adequately prioritize urgent fixes — a process issue worth addressing.
How do AI coding tools affect bug introduction rate?
The data is mixed. AI coding tools can both reduce and increase bug introduction rates depending on how they are used. Engineers who carefully review AI-generated code and write comprehensive tests (which AI tools help with) see lower bug rates. Engineers who accept AI-generated code uncritically see higher bug rates. The net effect depends on your team's review discipline. Our recommendation: when you adopt AI coding tools, simultaneously adopt a policy of mandatory test coverage for AI-generated code and explicit review attention to AI-assisted PRs until your team develops confidence in the tool's output quality.
