When Sundar Pichai announced that AI coding tools wrote more than 25% of Google's new code, the technology press treated this as a straightforward velocity story. More code, faster. AI is eating software development.
The reality is considerably more complicated, and understanding the gap between the headline and the reality matters a great deal for anyone thinking about what AI coding tools actually do for software teams — and what they don't.

What "AI-Generated Code" Actually Means
The 25% figure counts code that was accepted from an AI suggestion in an editor. It does not mean:
- That the code was entirely written by AI without developer input
- That the accepted suggestions were correct on the first try
- That the code would have passed review without modification
The more illuminating number is the acceptance funnel. At Google's scale, developer studies show a pattern something like: 31% of inline suggestions are accepted, 18% survive code review, and roughly 11% make it to the final merged commit without significant modification.
The 25% claim measures the top of this funnel. The 11% describes what AI actually wrote that reached production essentially unchanged. These are very different claims.
The Velocity Paradox
If 25% of code is AI-generated, intuition suggests developers should be 25% faster, or perhaps writing 25% more features per period. The empirical evidence for this is weak.
Developer productivity measurement is notoriously difficult. Code volume is not the right metric — teams that write more code aren't necessarily more productive. Cycle time (time from idea to deployed feature) is a better metric, and the gains there from AI coding tools are real but modest: studies of non-Google developer populations consistently find productivity gains of roughly 10-15% for well-defined coding tasks.
The gap between "25% of code is AI-generated" and "15% faster" is explained by several factors:
- AI tools accelerate boilerplate; they don't help with the hard design decisions
- Review burden for AI-generated code is non-trivial — developers report spending meaningful time verifying suggestions
- The tasks that AI handles well are not proportionally the tasks that consume developer time
The Google-Only Moat
Several factors make Google's results non-generalizable to most engineering teams:
The monorepo advantage. Google's codebase is structured so AI tools can see enormous context — the full API surface, the entire history of a service, the dependencies it uses. Most companies' codebases are fragmented across repos, and AI tools see much less relevant context.
Internal tooling integration. Google deploys internal versions of Gemini integrated directly into code review, build systems, and testing infrastructure. The "AI coding tool" at Google is not the same product as the publicly available GitHub Copilot or cursor.
Incentive structure alignment. Google's performance review infrastructure rewards code production in ways that may affect what gets accepted from AI suggestions. Acceptance rate is a metric. Metric-driven behavior changes what gets counted.
Code Quality Data
The productivity discussion often glosses over what AI-generated code does to codebase quality over time. The data that does exist is not encouraging:
Duplication rates increase. AI tools generate plausible implementations without checking whether similar code already exists. Teams that heavily adopt AI suggestions without strong review processes accumulate duplicate logic.
Test depth falls. AI-generated tests tend toward happy-path testing. Edge case and property-based testing — the kind that actually catches bugs — requires deliberate engineering effort that AI tools don't spontaneously provide.
Security review burden shifts. AI tools generate code with common vulnerability patterns (improper input validation, overly broad permissions, missing rate limiting) at rates that concern security teams. The code looks fine syntactically; the security review has to catch what syntax-level checks miss.

The Selection Bias Problem
AI coding tools are not uniformly useful across all types of programming work. They are most useful for:
- Boilerplate and scaffolding
- Common patterns in well-represented languages and frameworks
- Test stubs for simple functions
- Documentation string generation
They are least useful for:
- Novel algorithms
- Complex business logic with many interacting conditions
- Performance-critical code where the obvious implementation isn't the right implementation
- Code that requires deep understanding of the system's existing state
The composition of work that AI accelerates is not the composition of work that determines engineering velocity. If boilerplate takes 20% of your team's time and AI makes boilerplate 80% faster, the overall productivity gain is 16%. If the remaining 80% of work — design decisions, architecture, debugging, integration — is where the time actually goes, AI tools have limited impact on the number that matters.
The Skill Atrophy Risk
There is a genuine concern in senior engineering circles about skill atrophy. Developers who accept AI suggestions for problems they could have solved themselves practice the problem-solving process less. This may be acceptable for experienced engineers with strong conceptual foundations. It is more concerning for developers in the early stages of building those foundations.
The comparison to calculators is often made here, but the comparison is imperfect. Calculators handle numerical computation; the intellectual work of deciding what calculation to perform remains with the human. AI coding tools handle more of the reasoning process, and the appropriate mental model for how much to trust them has not yet been established.
The Honest Bottom Line
AI coding tools are real productivity tools for specific kinds of programming work. The 25% figure reflects genuine adoption at one of the world's most advanced engineering organizations. It does not straightforwardly translate to 25% velocity gains for other organizations.
What's actually true:
- AI tools help with well-defined, boilerplate-heavy tasks
- The gains are real and meaningful, in the 10-20% range for appropriate tasks
- Code review processes need to evolve to catch the specific failure patterns AI tools produce
- Teams without Google's infrastructure, context, and tooling integration will see different results
The story is not "AI is writing a quarter of software" in any meaningful sense. It's "AI is handling a category of low-complexity code generation that represents a quarter of code volume but a smaller fraction of engineering work." That's still a real development worth understanding — just not the one the headline implies.
Meritshot's Data Science and Full Stack programs include hands-on evaluation of AI coding tools — not as magic velocity multipliers, but as specific-purpose tools with specific strengths and failure modes.





