Deep Engineering #30: Kevlin Henney on TDD in the AI Era

Misconceptions, design pressure, legacy code, language cultures, and why testing & review—not tools or AI—should drive TDD.

Dec 11, 2025

AI Agent Frontiers: Innovation and Practical Applications - Dec 13 (Online)

If you are making decisions about AI systems in 2025, you cannot afford a fuzzy view of agents.

Join us on Dec 13, 2025 · 9am–2pm ET / 6am–11am PT for a 5-hour deep dive with the creators of AG2/AutoGen and leading researchers from MIT and Cambridge on real multi-agent architectures, autonomous research agents, and AI-native companies. Deep Engineering readers get 50% off with code FINAL50.

✍️From the editor’s desk,

This is our final Deep Engineering issue of the year as we break for the holidays 🎄. Thank you for reading, sharing, and thinking with us in 2025 — I hope you get time to rest, recharge, and come back to your work with a clearer head and a mysteriously missing backlog in the new year.☃️

In TestGrid’s 2025 benchmark, based on 7.3 million automated test runs across 55,800 organizations, the average pass rate is 75% — better than last year’s 59%, but still roughly one in four runs failing. For teams betting on CI, CD, and AI-assisted development, that gap is where production risk hides.

How should teams rethink test-driven development and developer testing in the context of legacy systems, AI-generated code, and increasingly automated pipelines?

To explore that, I sat down with Kevlin Henney — independent consultant, speaker, writer, and trainer. Henney works with companies on code, design, practices, and people; contributes to the Modern Software Engineering YouTube channel; co-authored A Pattern Language for Distributed Computing and “On Patterns and Pattern Languages” in the Pattern-Oriented Software Architecture series; and edited 97 Things Every Programmer Should Know and co-edited 97 Things Every Java Programmer Should Know. His work sits at the intersection of everyday development practice and long-term design thinking.

In this issue, we focus on:

TDD vs “just testing”: what TDD actually is, and why conflating it with ad hoc developer testing hides deeper gaps.
Design pressure from tests: using tests as executable specifications that expose coupling, cohesion, and legacy-code seams worth reshaping.
AI in the loop: why AI-generated code and tests increase the need for testing and review skills rather than replacing them.

You can watch the full conversation with Kevlin Henny below, read the complete Q&A here, or scroll down for the distilled insights.

Sponsored:

The First AI Employee for Your Mobile App: Fload monitors your app 24/7, catches issues before they cost you money, finds new growth opportunities, and predicts what’s coming next, all automatically. Connect your app in 60 seconds.

Try Fload Free

Rethinking Test-Driven Development for the AI Era with Kevlin Henney

Test-driven development sits in an awkward place in many teams: widely cited, unevenly practiced, and often misunderstood. For some developers, TDD is a niche technique only for greenfield code; for others, it’s reduced to “writing some unit tests” after the fact. Meanwhile, real-world pressures — legacy systems, language ecosystems, CI pipelines, even AI-generated code — make it challenging to apply TDD consistently under tight deadlines. In my conversation with Kevlin Henney, we talked about:

Why TDD adoption stalls even for experienced developers
The misconceptions that blur the line between “developer testing” and true TDD
How tests can shape software design without losing sight of the bigger architectural picture.
Best practices for introducing tests into large legacy codebases
How language and ecosystem culture influence testing practices
What distinguishes good, specification-like tests from brittle checklists.
Henney’s perspective on AI-assisted development — warning about the risks of offloading testing to code generators — and explained why, in an era of increasingly automated code, testing and review skills matter more than ever.

What follows is a distilled summary of a part of our extensive conversation, which I highly recommend reading or listening to in full here.

Why TDD Is Hard to Adopt

For many experienced engineers, TDD requires unlearning comfortable habits. It forces a fundamentally different workflow – writing tests first and working in small increments – which “is always going to be difficult” when you’re used to another rhythm. Many developers also lack a strong testing practice to begin with, so TDD’s strict discipline feels like a shock.

Overcoming the TDD adoption barrier requires a mindset shift

Henney notes that the classic “red, green, refactor” mantra by itself often isn’t convincing; without context, developers ask, “Why write a test for something I haven’t built yet?” The answer is that the failing test defines the next behavior you intend to add. You’re writing a concrete example of a new requirement – essentially a to-do list item for the code. TDD limits you to one small change at a time, preventing the urge to over-build and ensuring you constantly get feedback (tests passing or failing) on your design. But making this mental shift is hard without practice.

Common Misconceptions

TDD often suffers from misunderstandings. Some of the myths Henney debunks include:

“TDD is only for new code.” In reality, you can introduce TDD practices into legacy systems gradually – it’s not limited to greenfield projects.
“TDD just means writing tests.” No – it refers to a specific test-first development cycle. Writing some tests after coding is simply testing, not TDD.
“TDD isn’t about testing, it’s about design.” Not exactly. TDD is an intensive testing technique and a design aid. The tests are the vehicle, driving better design, but you are absolutely still writing tests (and lots of them) as you go.

Tackling Legacy Code

When dealing with a large, untested codebase, aim for incremental improvements instead of perfect coverage. Henney’s advice:

Start with a strategic slice: Identify a part of the code you need to modify for upcoming work, and focus your efforts there. Break dependencies to isolate that component (using safe refactoring tools and the compiler, if available), then add some tests around it and refactor in small steps. You can’t fix everything at once, but you can gradually create pockets of well-tested code.
Prioritize trouble spots: Use version history and bug reports to find the areas that change often or cause frequent issues. Target those “hot spots” for testing and cleanup first, where better tests will have the most payoff. Stable sections of the system can wait until they actually need attention.

What Do Good Tests Look Like?

Quality tests are the backbone of TDD. Henney offers a couple of guidelines for writing tests that stay valuable over time:

Test one concept per test: Each unit test should exercise a single behavior or scenario, not a grab-bag of things. This usually means writing tests at the level of use (what the code should do), not one test per function. For example, instead of separate tests for deposit and withdraw methods, write tests that express the outcomes (e.g. depositing increases the balance, withdrawing decreases it). A good test name reads like a specification, so if it fails, you immediately know what requirement broke.
Use tests to get design feedback: If a piece of code is hard to test, treat that as a sign that the design might need refactoring. TDD shines when you listen to those signals. For instance, if you’re setting up excessive mocks or needing to call private functions, consider splitting responsibilities or adding interfaces to decouple the design. The goal is to make both the code and the tests simpler. As Henney puts it, often “excessive use of mocking indicates a design that could be simplified.”

The Role of AI and the Future of TDD

Henney argues AI doesn’t replace the need for good testing – if anything, it heightens it. “If you do not already have an automated testing habit, now is a good time to start,” he says. AI tools can generate a lot of code or tests in seconds, but this often creates an illusion of speed – you get output quickly, then spend days fixing it. Henney advises teams to be skeptical and thorough: use AI as an assistant for ideas, but review everything it produces. Treat AI suggestions as if they came from a junior developer – useful as a starting point, but to be thoroughly reviewed, tested, and refined by experienced developers.

In the coming years, as more coding tasks become automated, the differentiator for great engineering teams will be their testing and review skills. TDD remains as relevant as ever. Its emphasis on clarity, rapid feedback, and thinking through requirements can keep software quality high. The specific tools may evolve, but the core discipline of writing tests to drive design and guard against regressions will continue to pay dividends.

🧠Expert Insight

Interviews

Rethinking Test-Driven Development for the AI Era: A Conversation with Kevlin Henney

Divya Anne Selvaraj and Kevlin Henney

December 11, 2025

Rethinking Test-Driven Development for the AI Era: A Conversation with Kevlin Henney

Test-driven development sits in an awkward place in many teams: widely cited, unevenly practiced, and often misunderstood. For some developers, TDD is a niche technique that only applies to greenfield code; for others, it is reduced to “writing some unit tests” after the fact. In between those extremes are practical concerns about legacy systems, langua…

Read full story

🛠️Tool of the Week

Hypothesis: Property-based testing for Python)

Hypothesis is a mature, open source property-based testing library for Python. Instead of hand-picking a few example inputs, you describe the space of valid inputs and Hypothesis automatically generates many concrete cases—including edge cases you probably would not have thought of.

Where it helps teams

Hard-to-enumerate behaviors: Algorithms, parsers, serialization code, numeric logic, scheduling logic, and anything with a large or subtle input space.
Guarding against AI-generated regressions: If AI suggests an implementation, property-based tests help check that it satisfies an invariant across many inputs instead of just the obvious happy paths.
Design feedback: When a function is hard to express as “for all inputs in this domain, the following relationship holds,” that friction often reveals API or modeling issues worth revisiting.

Learn more about Hypothesis

📎Tech Briefs

TestGrid | Continuous Testing Benchmark Report 2025: Trends and Findings: TestGrid’s latest benchmark blog post analyzes 7.3 million automated tests across 55,800 organizations and shows how continuous testing tied directly into CI pipelines affects reliability, runtime, coverage, and AI-assisted failure analysis in production environments.
CMocka 2.0 Released: Enhancing Unit Testing in C: Andreas Schneider announced CMocka 2.0, a major upgrade of the open-source C unit testing framework that modernizes it around C99 types, adds richer assertion and mocking APIs, supports TAP 14 output, and improves CI/CD integration while keeping API stability for existing test suites.
VectorCAST + AI: Instantly Generate Unit Tests from Your Requirements (Early Access): Vector introduces an early-access program for an LLM-powered test-case generation engine in VectorCAST that turns detailed natural-language requirements into executable unit tests for C/C++, aiming to accelerate requirements-level coverage while keeping engineers in the review loop.
John Pourdanis | More Scripts, Same Problems: Why Test Automation isn’t Automagic: In this practitioner essay, John Pourdanis argues that simply adding more automated checks does not guarantee quality and emphasizes risk mapping, shared ownership, exploratory work, and feedback loops over raw script volume for effective test automation strategies.
eInfochips | Accelerating Continuous Integration with Buildkite, GitHub, and S3 Bucket: eInfochips has outlined a production-grade CI setup using Buildkite, GitHub, and AWS S3, walking through org setup, agents, queues, pipeline-as-code YAML, and test automation best practices focused on fast feedback and reliable quality gates.

That’s all for today — and for this year. Thank you for reading Deep Engineering, sharing it with colleagues, and pushing us with thoughtful feedback. Do take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.

We’ll be back in the new year with more expert-led, practice-focused issues.

Merry Christmas🎄and Happy Holidays🦌 — and don’t forget to stay awesome🌟!
Divya Anne Selvaraj
Editor-in-Chief, Deep Engineering

🎶A small musical send-off for the year: Vince Guaraldi Trio – “Christmas Time Is Here”

If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want to advertise with us.

Refer a friend

A guest post by

Kevlin Henney

Rethinking Test-Driven Development for the AI Era: A Conversation with Kevlin Henney

Discussion about this post

Ready for more?