Packt Deep Engineering

Deep Engineering #53: Rick Spencer on Matching the Right AI to the Right Engineering Work

Saqib Jan — Thu, 25 Jun 2026 15:41:01 GMT

A Hands-On Workshop for the Next Generation of Engineers

Includes a free copy of the ebook 30 Agents Every AI Engineer Must Build and a certificate of completion.

A hands-on, build-along workshop where you build and run 10 production-inspired AI agents across finance, healthcare, and education. Walk away with a complete GitHub repo and implementations for OpenAI, Claude, Gemini, and local models.

🗓️ September 12th · 11:00 AM EDT onwards

Use code DEEPENG50 for 50% off the early bird price. First 10 sign-ups only.

✍️ From the editor’s desk,

Welcome to the 53rd issue of Deep Engineering!

At London Tech Week this month, NVIDIA made it exceedingly clear, showcasing a wave of UK companies building AI for environments where sovereignty is not optional. Building capable AI models is no longer the hardest problem. The real challenge is running them in environments where security, compliance, and cost aren’t negotiable. Organizations want AI they can trust, control, and operate on their own terms.

That’s exactly the challenge Rick Spencer has been solving at SUSE. As General Manager for Technology and Product at SUSE, he works with the teams behind its enterprise Linux and cloud native platforms for regulated organizations. His perspective is simple: engineers will use AI wherever it helps them move faster. But the real question isn’t whether teams should adopt AI; in fact, it is how to match the right model to the right kind of work.

In today’s issue, Spencer shares the framework his teams use to make those decisions. We discuss where agentic AI is delivering real value, why frontier models should be reserved for high-impact problems, and how SUSE keeps AI adoption practical without letting costs spiral.

You can read or watch the full Q&A interview here.

Let’s get started.

Featured Newsletter: AI Agents Simplified

AI Agents Simplified cuts through the noise with clear, actionable breakdowns of agents, automation, and what's actually worth your attention. Trusted by 58,000+ subscribers, with new issues every week and no hype.

→ Subscribe to AI Agents Simplified

Expert Insights

Not whether engineers use AI, but which AI for which work

by with Rick Spencer

Most engineering organizations start their AI adoption conversation with limits. Should engineers use it, and how much should they be allowed to? Rick Spencer sees that as the wrong question. As General Manager for Technology and Product at SUSE, he works with the teams building enterprise Linux and cloud native infrastructure for companies operating under strict compliance requirements. His starting point is practical: engineers will use AI wherever it helps them move faster.

“It’s not like, oh, don’t use AI,” he says. “That would just not be workable.”

For Spencer, the real question is not whether engineers should use AI. It is which AI belongs to which kind of work. Getting that decision right is what separates useful AI adoption from adoption that quietly burns money, trust, and control.

Three kinds of AI work, and why the distinction matters

SUSE breaks engineering use of AI into three categories, each calling for different tools, cost profiles, and levels of oversight. Spencer describes the first as daily work: statement completion and debugging that engineers rely on throughout the day. The second is agentics, where agents take care of repetitive work and interruptions that would otherwise consume engineering time. The third is what he calls curve jumping, and it is the most consequential of the three.

“That’s like when you’re just going from zero to infinity,” he explains. “You can do things with AI that you wouldn’t have tried before, like solve really deep problems in one big step.”

The value of this distinction is that it helps teams make more deliberate tooling and cost decisions instead of relying on guesswork. As Spencer puts it, the framework helps engineering managers pattern-match the right AI to the right kind of work. If a task only needs statement completion and debugging, that points to one set of tools. If it involves data sovereignty requirements, that points to another. Frontier models, the most capable and most expensive, are reserved for curve jumping, where their cost is justified by the scale and complexity of the problem.

“It sounds very organized now, but there was a lot of experimentation, a lot of really rapid innovation from the engineers,” Spencer says of how the framework came together.

The framework emerged from engineers first. Management then built structure around what those early adopters had learned so the approach could be shared across the organization. It’s a practical reminder that successful AI frameworks often evolve from experimentation before they become formal processes.

Frontier models are for curve jumping, not code completion

One of Spencer’s most practical observations is about matching model capability to the task at hand. Routing every request through the most capable model is both expensive and unnecessary, and he is clear about where the line sits.

“You do not need a frontier model to understand your Python module and give you code completion,” he says. “You just don’t need it for that.”

Frontier models earn their cost in the curve-jumping category, where the problems are strategic and complex.

“We have projects where we spend tens of thousands of dollars on frontier models, but they generated, who knows, a million or two million dollars in value,” he notes.

When the value is that high, the investment makes sense. The discipline is in not using frontier models for work that a much cheaper model can handle just as well. Spencer’s teams apply the same thinking in another way. They use frontier models to build agents that later run on much cheaper models.

“We use the frontier model to create an agent that can then be run on a much lower-cost model,” he explains.

The frontier model writes the Python scripts and prepares the context the agent needs. After that, a lower-cost model handles the repeated execution. The expensive model does the design work once, while the cheaper model runs the workflow repeatedly. For teams looking to bring frontier-level capabilities into production without paying frontier-level costs every time, it’s a practical approach that balances capability with efficiency.

Agentics is where the toil goes to die

The agentics category is where Spencer’s examples become most concrete. More importantly, they show what relieving engineering toil looks like in a production environment rather than a demo. His favorite example isn’t about writing code at all. After a series of software supply chain attacks, SUSE’s security team built an agent that scans for newly reported compromised packages every hour.

“It finds those, and then it scans all of our open source code to see if we’re using it anywhere,” he says. “If we are, it writes a report and notifies us on Slack.”

The benefit is immediate.

“If you see a report of a tool chain attack, our agent was on it before we even knew about it,” Spencer says.

Work that once required engineers to manually search through repositories now happens automatically, often before anyone has even seen the news.

His second example focuses on CVE triage, another task that has become increasingly difficult to manage manually at enterprise scale. CVEs arrive faster than teams can assess them, and many turn out not to be relevant.

“A lot of times the CVE comes in and the package is in the repo, but it is not being exposed in any way that it would matter,” Spencer says.

An agent reviews each CVE for applicability and helps generate the VEX file that documents whether the vulnerability actually affects the product. The result is that engineers spend less time sorting through reports and more time addressing the vulnerabilities that matter.

“We’re focusing our attention not on keeping up with the crush of CVE reports, but on the actual vulnerabilities,” he explains. “Our attention is reserved for actually keeping our customers safe.”

That’s the hallmark of a good agentic use case. It removes repetitive work without taking engineers away from the decisions that require human judgment.

The context that does not survive the session

As engineers move from AI-assisted coding to autonomous agents, Spencer points to a challenge his senior engineers have had to learn to manage. Context that isn’t made explicit often ends up buried in conversation history. When that history disappears, the agent’s behavior changes.

“Those history sessions can embed context which you have not made explicit in your markdown,” he says. “The next time you go and you don’t have all that history, you get different behavior than you were expecting.”

The lesson, Spencer says, is to make recurring context explicit instead of letting it live inside a session that will eventually disappear.

The challenge becomes even more significant once agents begin operating autonomously. Unlike traditional infrastructure tools, agents are not deterministic.

“This is a big change in infrastructure management,” he notes. “If I give this input, I know exactly what’s going to happen.”

With conventional tools, unexpected inputs usually produce predictable failures. Agents behave differently. When they encounter something unexpected, they try to solve the problem, and that often requires additional context, including organizational policies or guidance on how a situation should be handled.

This is where MCP servers become important. They give agents a way to retrieve the right context when they need it. At that point, context management is no longer just about writing better prompts; it becomes part of the infrastructure itself.

Cost control is a design decision, not an afterthought

Running AI at scale makes cost something teams have to design for, not react to later. Spencer sees this as part of the architecture.

At SUSE, part of the answer is structural. Self-hosted AI gives the organization a clearer cost ceiling. The question becomes how well the team can observe and use that fixed capacity, rather than whether a usage-based bill might suddenly run away.

Spencer connects this directly to sovereignty.

“Sometimes they call it cost sovereignty,” he says, “because no one can come back later and say, oh, by the way, we’re changing our model.”

He has seen suppliers move from seat-based pricing to usage-based pricing, leaving engineering teams with a cost model they did not control. Hosting your own AI infrastructure changes that equation. It gives teams more control over where AI runs, how it is governed, and what it can cost.

The governance side of that argument, including how SUSE keeps agents and their costs inside a boundary it can stand behind, is covered in the companion piece, How SUSE Runs AI Without Losing Control.

The more tactical layer is circuit breakers, which cap runaway agent spend in real time.

“We just noticed our Claude usage in the last minute was way too high,” he says, describing the trigger.

Spencer acknowledges the trade-off. Aggressive rate limiting can frustrate engineers who are trying to get work done, but it is a necessary safeguard when autonomous agents can generate costs without a human in the loop.

Like the three-tier framework, the goal is simple: match the cost of the tool to the value of the work and put clear limits around the situations where agents can spend without delivering value.

Ultimately, Spencer’s goal isn’t to slow engineers down. It’s to give them the confidence to use AI effectively without sacrificing governance or cost control.

“The penny drops for them that they’re in a new paradigm,” he says, describing the moment developers realize they can be ten or even a hundred times more productive.

The framework, the tooling, and the guardrails exist to support that shift, not to get in its way.

“You don’t want to stop them from getting that 100X improvement,” he says. “You need to give them the right tools for the job.”

For Spencer, that’s what successful AI adoption looks like: giving engineers the freedom to move faster while making sure the right tool is used for the right work.

The Packt Summer Sale is live through June 30. Get 8,000+ eBooks and videos across AI, programming, data, DevOps, and cloud for $9.99 each, including Clean Architecture with .NET, Python Illustrated, and the C++ STL Cookbook.

→ Browse the sale

In case you missed

","cta":null,"showBylines":true,"showDescription":true,"showImage":true,"size":"md","isEditorNode":true,"title":"Sovereign AI and Agentic Infrastructure with Rick Spencer","publishedBylines":[{"id":427210082,"name":"Saqib Jan","bio":"/localhost","photo_url":"https://substack-post-media.s3.amazonaws.com/public/images/997a788a-cd78-4f84-9b3b-c72ab6dc0153_1008x1008.jpeg","is_guest":false,"bestseller_tier":null}],"post_date":"2026-06-24T18:06:00.000Z","cover_image":"https://substackcdn.com/image/youtube/w_728,c_limit/8PdtwqLL6YI","cover_image_alt":null,"canonical_url":"https://deepengineering.substack.com/p/sovereign-ai-agentic-infrastructure-rick-spencer-suse","section_name":"Interviews","video_upload_id":null,"id":203441001,"type":"newsletter","reaction_count":1,"comment_count":0,"publication_id":1729053,"publication_name":"Packt Deep Engineering","publication_logo_url":"https://substackcdn.com/image/fetch/$s_!H5BJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736bc1ee-d689-497e-83a8-7d9bf9022eb9_600x600.png","belowTheFold":true,"youtube_url":null,"show_links":null,"feed_url":null}">

🛠️ Tool of the Week

ToolHive is an open source platform from Stacklok for running and governing Model Context Protocol (MCP) servers in production.

Highlights

Secure isolation: Runs each MCP server in its own isolated container with minimal permissions.

Enterprise access control: Enforces per-request identity and access policies with OIDC integration.

Self-hosted deployment: Keeps the MCP registry, gateway, and servers on your own infrastructure.

Lower token usage: Uses semantic tool discovery to reduce token usage by up to 85%.

ToolHive

📎 Tech Briefs

Grok Build opens a plugin marketplace - New plugin marketplace featuring tools from MongoDB, Vercel, Sentry, and Cloudflare.
Gartner Forecasts Worldwide AI Spending - Enterprise spending is expected to grow 47% year-over-year to reach $2.59 trillion in 2026.
Anthropic study links AI coding success to domain understanding - Domain expertise proved a stronger predictor of success than coding ability.
Mastra npm supply chain attack disclosed - Over 144 packages were compromised through the easy-day-js typosquat dependency.
OpenAI Codex adds granular internet access controls - Users can now restrict internet access by domain and HTTP method.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Keep building,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company wants to reach senior developers, software engineers, and technical decision-makers, speak to us about partnering with Deep Engineering.

How SUSE Runs AI Without Losing Control

Saqib Jan — Wed, 24 Jun 2026 20:20:29 GMT

Most companies adopting AI are making a quiet trade in exchange for speed. They accept that their data passes through systems they do not control, priced on terms they did not set, governed by rules someone else can change. For a consumer app that trade is often fine. But for an enterprise running infrastructure under compliance regimes, it is not. Rick Spencer, General Manager for Technology and Product at SUSE, has for the last two years been working out how an open source enterprise adopts AI at scale without making that trade, and his approach is a useful template for any organization that takes data sovereignty, auditability, and cost predictability seriously.

SUSE’s position is unusual in a way that sharpens the problem. “All the software that we write is open source,” Spencer explains. “We’re not worried about, oh, we leaked the code, we publish the code.” The concern is the data that belongs to others. When an engineer debugs a customer environment, the logs are not SUSE’s to hand to a third-party model. “They trust us to not do those kinds of things,” he says, and that trust is the thing the entire approach is built to protect.

You can watch our full interview or read the Q&A article here.

Sovereignty means how, not whether

The first instinct when AI collides with compliance is to draw a boundary around where AI is allowed to go. Spencer rejects that framing, saying “I don’t think it’s can or cannot. It’s more so how.” The distinction matters because a can-or-cannot policy ends with engineers either blocked from useful tools or quietly routing around the rules. A question of how keeps the capability available while controlling the conditions under which it runs.

The clearest illustration is SUSE’s build process. A lot of what the company ships is built in an internal instance of Open Build Service, and the defining property of those builds is that they happen offline. “All the builds are offline. They literally are not connected to the internet,” he points out. “This is super important because you need to be able to prove that nothing happened during the build process.” Proving a negative is far easier when there was no connection through which anything could have happened. It means doing things the hard way, making sure every source is present ahead of time because nothing can be pulled live during the build, but the payoff is provable integrity.

Applying AI inside that environment is where the how becomes concrete. The example Spencer gives is backporting a patch to previous stable releases, the kind of repetitive, knowledge-intensive work AI is well suited to. The question is whether it can be done in a sovereign way, and his answer is yes, on conditions. “As long as we are running AI in a way that it’s able to run disconnected from the internet, and we can have complete visibility into everything it’s doing.” SUSE goes further than running existing models in isolation. “In some cases we even train our own models to accomplish these things,” he shares, “and that way we know the model doesn’t have some naughty time bombs built into it.” Because owning the model end to end removes the last category of thing the enterprise would otherwise have to take on trust.

The foundation under all of this is SUSE AI, the company’s own stack for running AI workloads on private infrastructure, which it uses heavily internally and runs Llama on. “It’s all within our private infrastructure, so we make sure there’s no chance that any data can escape,” Spencer says. “We only use models which can be vetted effectively.” The principle is consistent. Keep the data inside the boundary, and only run models you can actually inspect.

MCP as the control layer, not just the connector

Most of the conversation about Model Context Protocol treats it as plumbing that turns a chatbot into an agent by giving it tools to act with. Spencer agrees that is what it does, but his more interesting argument is that MCP is where an enterprise regains the control that autonomous agents would otherwise erode. “MCP servers do another really important thing,” he explains, “which is provide a place where you can, as an enterprise, bring some sanity and control to the usage.”

The mechanism is straightforward once you see the server as infrastructure rather than glue. “If you have MCP servers running, they’re just servers,” Spencer says. “That means you can provide access ACLs to them.” A server can be told that a given user’s agent may use these tools and not those. The usage can be logged. Gateways can sit in front, and the company runs its own alongside a partnership with StackLok. The architectural rule that holds it together is that the language model never touches tools directly. “You don’t give the LLMs access directly to tools, only the MCP servers,” he reasons, “and then you can have that oversight, meet your compliance needs.”

He takes the containment idea down to the operating system. “You can put the MCP server, I call it, in jail,” Spencer says, describing a systemd process scoped to present only the compute resources the server actually needs. The reasoning is a security posture rather than a convenience. “For every MCP server you’re running, there’s an LLM out there that’s trying to use it, and who knows what kind of prompt injections people are running.” The same boundary defends against the model’s own failures, not only malicious input. An agent cannot delete a production server with a tool it was never given. “They guard against things like the AI hallucinating something and deleting your production server, because you simply don’t provide that tool to it,” he warns. Control here is not a policy document. It is the set of tools the agent is and is not handed.

There is a quality dimension to MCP that reinforces the control argument, because a well-built server encodes expert knowledge rather than leaving the model to guess. SUSE ships MCP servers with its products, crafted by the people who know those products best. “It would be like, instead of you sitting down in front of a chatbot saying I need to figure out how to use Rancher, you’re sitting down with the whole Rancher development team telling you how to prompt the chatbot,” Spencer says. An agent working against an expert-built server is not interpreting raw APIs and making guesses a human cannot easily validate. The encoded knowledge makes the agent both more capable and more predictable, which is control of a subtler kind.

Cost sovereignty belongs in the same conversation

The control story is incomplete if it stops at data and governance, because an enterprise that cannot predict its AI spend has lost a different kind of control. Spencer folds cost into the sovereignty argument directly. “Sometimes they call it cost sovereignty,” he says, “because no one can come back later and say, oh, by the way, we’re changing our model.” He has watched a supplier move developers from seat-based to usage-based pricing, a shift his teams did not control and could not prevent. Hosting your own infrastructure changes the nature of the exposure. “There’s a maximum cost there,” he notes of self-hosted AI, and the question shifts from whether you will overrun a variable bill to whether you have the observability to use a fixed capacity fully.

For the cases where variable cost is unavoidable, SUSE uses circuit breakers that cut off runaway spend in real time when usage spikes past a threshold in a given minute. Spencer is honest that this frustrates engineers who get rate limited mid-task, but the alternative is an autonomous agent running up cost with no human in the loop. The same discipline runs through the company’s use of frontier models, reserved for high-value strategic work rather than routine completion, and through the practice of using a frontier model once to build an agent that then runs on a cheaper model or a private one. Each of these is the same instinct expressed at the level of money, keeping the expensive capability available for the work that justifies it and putting hard limits around everything else.

What makes the SUSE approach instructive beyond its own walls is that none of it depends on slowing engineers down. The sovereignty, the MCP governance, and the cost controls exist so that engineers can move fast inside a boundary the enterprise can actually stand behind. “You don’t want to stop them from getting that 100X improvement,” Spencer says. “You need to give them the right tools for the job.” Control, in his telling, is not the opposite of speed. It is the thing that makes speed safe to allow.

Sovereign AI and Agentic Infrastructure with Rick Spencer

Saqib Jan — Wed, 24 Jun 2026 18:06:00 GMT

Most engineering organizations adopting AI do it without compliance regimes scrutinizing every decision. SUSE works under exactly that scrutiny, and the way it solved for AI adoption under strict data sovereignty requirements is instructive for any team that cares about where its data goes and what its AI actually costs.

Rick Spencer is the General Manager for Technology and Product at SUSE, where he leads the engineering teams behind the company’s full product portfolio, from SUSE Linux Enterprise and Multi-Linux Manager to the cloud native stack of Rancher, RKE2, K3S, and SUSE AI.

SUSE has one of the longest and deepest open source infrastructure histories in the industry, and its enterprise customers are operating under strict compliance regimes. Rick joined Deep Engineering Live to talk about how SUSE adopted AI agents without breaking its promises on data sovereignty, the framework his teams use to decide which AI tools fit which work, why he rejects output-based developer metrics, and the role MCP now plays in managing enterprise infrastructure.

Watch the full conversation below or read the full interview.

This session was recorded offline as part of the Deep Engineering Interview Series. The transcript below has been lightly edited for clarity and readability.

Q. Tell us a little about yourself and what you do at SUSE, and the kinds of engineering challenges your teams are working through right now.

Rick Spencer: I’m the General Manager for Technology and Product at SUSE. That means I lead the engineering teams for all the products that we offer to customers. That includes Linux, like SUSE Linux Enterprise, SUSE Linux Enterprise Server for SAP, and Multi-Linux Manager. We also have a suite of cloud native products like RKE2, K3S, and Rancher of course. We have a lot of things built on top of both of those, like the application collection, which are certified Kubernetes applications that you can run. We have other products composed of those building blocks like SUSE AI that you can use to run your own sovereign AI stack. There is SUSE Edge, and SUSE Edge’s cousins like SUSE Telco and SUSE Industrial Solutions.

Our customers tend to be enterprises with pretty serious enterprise requirements. They work under compliance regimes, they typically face a lot of scrutiny, they need important things like L3 support, reliable lifecycle models, a lot of predictability, high quality, and low CVE counts. So we take the open source software in the world and we create packages of it that are usable by enterprise companies.

Q. SUSE has a longer and deeper open source infrastructure history than most companies in the space. When AI agents started becoming the real workflow tool for engineers, how did that land internally, and what did adoption actually look like on the ground?

Rick Spencer: There is a lot to unpack here, so let me try to go at least somewhat systematically. All the software that we write is open source, so we are not worried about leaking the code. We publish the code. That was not the concern. But there are things like, let’s say you are debugging a customer environment. You do not want to let your engineers just take those logs and send them to random AI bots. We promise that we won’t do that. They trust us to not do those kinds of things. So there was a phase that we went through trying to figure out how to use AI in an effective way that maintained our promises to the customers.

Part of that solution was really realizing that engineers are going to use AI to go faster wherever they can. It is not like, oh, don’t use AI. That would just not be workable. So it was really about setting up our engineering management team to coach those engineers effectively. Besides keeping promises of data sovereignty, costs can also really run out of control. We see a lot of people run into that. For us, we never really had the problem of engineers deleting things in production. Our engineers tend to be very cautious. But it is easy to rack up pretty big bills on Anthropic and Copilot and so on.

A big part of the solution was that we have our own sovereign AI. We make this thing called SUSE AI, which is a stack you can use to manage AI workloads on your own infrastructure. We use that pretty heavily internally and we run Llama on it. If you are doing things, and we have a few different places that we do that, it is all within our private infrastructure, so we make sure there is no chance that any data can escape. We only use models which can be vetted effectively.

Then there was more to it on the coaching and oversight side. What we ended up doing is getting pretty precise about how engineers use AI. We broke it down into three categories. The first is using it for your daily work, which is your statement completion and your debugging and that kind of thing. The second is using it for agentics, which is relieving your toil, letting agents take care of some things that used to create a lot of work or interruptions. The last part I call curve jumping. That is when you are going from zero to infinity, doing things with AI that you would not have tried before, like solving really deep problems in one big step.

We created a framework around those three different kinds of uses, and then we help engineering managers help their engineers pattern match. Okay, if I just need statement completion and debugging help, these are the tools I can use for that. If there is a sovereignty aspect to it, these are the tools for that work. These are good tools for this kind of agent. And then these are the frontier models that we provide to you for those curve jumping capabilities. It sounds very organized now, but there was a lot of experimentation and a lot of rapid innovation from the engineers. Some of the early adopter engineers led, and then we went back and tried to create some order out of that so we could spread what we learned around.

Q. Digital sovereignty is central to how SUSE thinks about its stack. How does that principle shape where AI can and cannot go inside your engineering workflows?

Rick Spencer: I don’t think it’s can or cannot. It’s more so how. Let me give you an example. For digital sovereignty, a lot of the things we build, we actually build in something called the internal build service, which is an instance of something called Open Build Service, which is a service we provide to everybody. Kubernetes is built on it. There are thousands and thousands of things built on it. The interesting thing in terms of digital sovereignty is that all the builds are offline. They literally are not connected to the internet. This is super important because you need to be able to prove that nothing happened during the build process. That is a lot easier to do if there was no internet connection.

So we have to go around the hard way. We have to make sure all the sources are there. You cannot pull anything live in the heat of the moment during build time. You cannot run post-install scripts. Now if you want to apply AI in that environment, let’s say you want to backport a patch to previous stable releases, can you do that in a sovereign way? Yes, you can, as long as we are running AI in a way that is able to run disconnected from the internet and we have complete visibility into everything it is doing. These things are not easy, but we have decades of experience with it that we can apply. In some cases we even train our own models to accomplish these things, and that way we know the model does not have some naughty time bombs built into it.

Q. When your engineers started integrating AI agents, where did it deliver productivity gains, and where did it create new problems?

Rick Spencer: Let me give you some examples. My favorite example isn’t really about code. We are an open source company, and we have all of this code within our dominion of control, in GitHub and here and there and everywhere. I don’t know if you remember the Trivy attacks last month, and just a spate of tool chain attacks. We had a response process for that, but it was predicated on those things occurring occasionally, not twice a week. So our security team wrote an agent that scans certain sites every hour. It says, okay, there is a reported compromised package, typically in NPM, sometimes in PyPI, different places. It finds those, and then it scans all of our open source code to see if we are using it anywhere. If we are, it writes a report and notifies us on Slack. So we know right where to pay attention right away.

Fortunately, so far we have not really been impacted because we have really good hygiene around our tool chains. But bad guys are really smart and work really hard, so we still want to stay super vigilant. That has just been such a huge relief, because now if you see a report of a tool chain attack, our agent was on it before we even knew about it. It saves so much toil, because we don’t have to send people out to check if we are using this package or do a search in this area of GitHub.

There are other areas like CVE mitigation. A CVE comes in, and an agent examines it. Is it even applicable? A lot of times the CVE comes in and the package is in the repo, but it is not being exposed in any way that it would matter. There is this thing called VEX, which is basically a file you provide along with the CVE database to explain whether the vulnerability is impacting you or not. That is really hard to do at the scale that CVEs are coming in, but the agents can do that for us pretty easily. That means we are focusing our attention not on keeping up with the crush of CVE reports, but on the actual vulnerabilities. Our attention is reserved to actually keep our customers safe.

Q. How do you think about measuring the impact of AI on your engineering teams?

Rick Spencer: We might have a different view on that than some people. We are really tending away from measurements that measure output and utilization, and we are trying to focus on impact. What that means is we don’t have leaderboards that show every developer and how many lines of code their agents submitted. I consider that garbage vanity metrics. Not helpful.

One of the main things we want to do is measure the impact of our use of AI without it being an extra burden on the development team. A lot of the tools out there assume you are a proprietary software company where everyone is working on a single code base, which is just not how an open source enterprise works. We are working on hundreds, if not thousands, of repositories all the time, and the work to maintain them is very different. So those utilization numbers and those developer-to-developer comparisons just don’t have value. I’d rather have engineers working than reporting.

So we are setting up metrics for different things, like how fast CVEs are being addressed, how fast patches are being backported, how fast our L3 responses are getting closed while maintaining the same NPS score. These are things where we have applied AI to different areas, so let’s focus on the business impact, not on the utilization.

That said, we are working on a set of dashboards right now so an engineering manager can look at the cost and utilization on their team to help with coaching. Let’s say an engineering manager has an eight person team. Hey, I noticed we are burning a lot of tokens. What are we actually doing that is burning that many tokens? I’m not sure we are getting value out of that. Or, hey, we have these seats for this LLM or code assistant we bought, but we are not using them. Are there areas where we could be? So we are definitely measuring the value from a business perspective, but we are really trying to decentralize and allow engineering managers to guide their teams on getting the most value out of AI, without it becoming a leaderboard game where developers feel exposed in some game that is not about providing value to customers.

Q. Whenever we talk about AI agents, we cannot avoid MCP. Why does it matter to so many companies today, and what does it unlock for engineering teams in your kind of environment?

Rick Spencer: MCP is critical, actually. If anyone is listening and does not know what I mean, a Model Context Protocol server is a little bit of code that runs and offers to an LLM, hey, here is some context you can use, and here are some tools, some actual things you can do. That turns a chatbot into an agent, because an agent can actually do things.

MCP does a few things that are really important. The first is just ease of use. A good MCP server provides structure to the LLM so that it is way easier to write a prompt to get the results you want. The LLM sees the MCP server is for this purpose, and these are the kinds of information the humans want out of it. You don’t have to include all that in the prompt. And it is actually really easy to write an MCP server with an LLM. If you have a decent model, it does not even need to be top tier. You say, hey, we have this bit of software we want to control with agents, this is our use case, and it will write an MCP server for you pretty easily. Then you can have a human go in and edit it.

But there is another part to it. We ship MCP servers with all of our products, and we think this is really important. In our view, the world is moving to a new paradigm. Before, as an administrator, you would think about all the applications you use to monitor and control your servers, your Kubernetes clusters, your workloads. Now we are moving into a mode where you don’t think that way. You think about writing agents, or chatting with the infrastructure to get the information you need, and then it is able to take action on your behalf without you having to worry about the specific syntax.

For that to work, those MCP servers have to have really good human knowledge encoded into them. If you think about our MCP servers for SLES, for Rancher, for Multi-Linux Manager, the key is that the experts in using that tool have crafted those MCP servers. It would be like, instead of you sitting down in front of a chatbot saying I need to figure out how to use Rancher, you are sitting down with the whole Rancher development team telling you how to prompt the chatbot. That encoding of knowledge makes the agentics way more powerful, because it is not guessing. Otherwise it has to look at the raw APIs and make a bunch of guesses, and there is no way as a human you will know if that is the right thing to do.

All that said, there is another really important thing MCP servers do, which is provide a place where you can, as an enterprise, bring some sanity and control to the usage. If you have MCP servers running, they are just servers. That means you can provide access ACLs to them. You can say the MCP server for this user is allowed to use these tools and not these tools. You can log the use of the MCP servers. We have our own gateway, but we also partnered with a company called StackLok that we talked a lot about at the last SUSECON. There are different gateways you can put into place as an enterprise to keep the MCP servers under control. You don’t give the LLMs access directly to tools, only the MCP servers, and then you can have that oversight and meet your compliance needs.

Even at a low level, you can put the MCP server, I call it, in jail. You can say, on the server, here is the user for this MCP server, here is a systemd process that only presents the actual compute resources it needs. Because you have to be thinking, for every MCP server you are running, there is an LLM out there trying to use it, and who knows what kind of prompt injections people are running. MCP servers also guard against things like the AI hallucinating something and deleting your production server, because you simply don’t provide that tool to it. This to me is one of the main roles SUSE has to play as part of this disruption, because we are bringing this agentic notion of how to manage all of your infrastructure.

Q. Cost is a real constraint when you are running AI at any scale. What does a practical cost mitigation approach look like for an engineering organization working the way SUSE does?

Rick Spencer: I can speak from our own experience. The fact of the matter is, if you are using a self-hosted AI, sure, you spend a lot on the big iron, and you are probably paying a company like SUSE for support. But nonetheless, there is a maximum cost there. Then the real question is, do you have the observability in place to make sure it is being utilized fully? That is a very different conversation. Are we getting full utilization out of our fixed costs, where we never have to worry about overrunning?

There is digital sovereignty, and sometimes they call this cost sovereignty, because no one can come back later and say, oh, by the way, we are changing our model. We have some suppliers where a lot of our developers were using seat-based pricing, and then over time they let us know, in plenty of time, that they are moving to usage-based pricing. That is a big change. We did not have sovereignty over the way they price it, whereas if you are hosting your own, you have that sovereignty over the pricing. So it is something to think about.

Another thing we use a lot is circuit breakers. Hey, we just noticed our Claude usage in the last minute was way too high, or Gemini, whatever you are using. That keeps runaway agents in check. It can be very frustrating for developers if they are trying to get work done and every single minute they are getting rate limited, but we are talking about cost controls, so you need to do the thing.

The other thing to say is that we are big believers in frontier models. We are not saying don’t use frontier models, but it is important to use them for the right things. You do not need a frontier model to understand your Python module and give you code completion. You just don’t need it for that. The frontier models are really for when you are in that curve jumping, super strategic mode. We have projects where we spend tens of thousands of dollars on frontier models, but they generated, who knows, a million or two million dollars in value, so the cost benefit was definitely there.

One thing we do with frontier models is, let’s say we need an agent for something. We use the frontier model to create an agent that can then be run on a much lower cost model. It will say, sure, I’ll write the Python scripts it needs to use, so it doesn’t have to try to do that inference every time. I’ll write the context file that works for that model. So you can start with the frontier model and then tell it to do things with your less expensive models, or even your own models that are in your own infrastructure.

When you see that needle move, you see people start adopting it, and you’ll see step functions in utilization of tokens. A certain engineer, the penny drops for them that they are in a new paradigm where, as a developer, they suddenly realize they are empowered to be ten times, a hundred times more effective using these tools. You can see day to day these little jumps. Oh, somebody figured it out. Someone figured it out. So then you need to go back, because you don’t want to stop them from getting that 100X improvement. You need to give them the right tools for the job.

Deep Engineering #52: Sam Keen on the Context Tax You Pay in Every Claude Code Session

Saqib Jan — Thu, 18 Jun 2026 16:38:44 GMT

Claude Code for Software Engineering

Join this interactive workshop to learn how to turn Claude Code from a session-by-session assistant into a repeatable engineering system, using structured context, reusable skills, scoped rules, hooks, and guardrails that work across real codebases and team workflows.

🗓️ Friday, June 20 · 10:30 AM EDT onwards

Use code DEEPENG50 for 50% off.

✍️ From the editor’s desk,

Welcome to the 52nd issue of Deep Engineering!

A recent study pulled apart the architecture of Claude Code and found that only 1.6 percent of the codebase is actual AI decision logic, while the remaining 98.4 percent is the deterministic infrastructure that surrounds the model, including the permission gates, the context management, the tool routing, and the recovery logic that keep the whole thing usable. The agent loop at the center of it turns out to be a simple while loop, which means the genuine engineering effort sits in the systems built around the model rather than in the model itself.

This analysis lines up almost exactly with what most engineers working with AI tools are discovering through daily practice, which is that the thing slowing them down is rarely the intelligence of the model and is far more often the fact that none of the context they carefully supply in one session survives into the next. Supplying that missing context has quietly become the engineer’s job, and it gets paid at the start of every single session without anyone ever counting it.

This week Sam Keen, an agentic engineering researcher, and former engineer at AWS and Nike, and the author of Clean Architecture with Python, shares a practical way to stop paying that cost for good. His piece walks through how to convert the context you re-explain on repeat into a system that compounds across sessions, using the mechanisms Claude Code already gives you, and how to recognize the moment that system starts quietly working against you rather than for you.

Let’s get started.

Featured Newsletter: Engineering At Scale

A weekly column that makes databases, system design, and architecture easy to follow, with clear explanations, practical insights, and career advice for engineers building at scale.

→ Subscribe to Engineering At Scale

The Hidden Cost of Starting From Scratch

Submitted by

When you open a fresh Claude Code session, the assistant knows nothing about your project. It does not know where your tests live, it does not know the patterns this codebase uses, and it does not remember the conventions you walked it through the day before. So you explain all of it again, and then you do the same thing again tomorrow.

That re-teaching is a tax, and it is the easiest one to overlook because it never shows up on any gauge, which means you pay it at the start of every session without ever once counting what it costs you.

The bottleneck isn’t the model

The real limiting factor in AI-assisted development right now is not how clever the model is, because the models are already more than capable enough for the work most teams are asking of them. The limiting factor is that none of that intelligence carries from one session into the next, and the job of supplying the missing context every single time has quietly been handed to you without anyone naming it as work.

Picture a senior engineer who forgets everything about your codebase overnight, every single night, and shows up the next morning brilliant and fast and genuinely helpful but starting again from absolute zero. You would not describe that person as a force multiplier, you would describe the arrangement as exhausting, and yet that is the default relationship most people have with their coding assistant. They end up blaming the model for the drag when the actual problem is that nothing they explained yesterday is still present today.

Write the context down once

The fix is to stop starting from scratch, which means writing the recurring context down once in a place the harness reads automatically so that it is present in every session without you having to lift a finger. Claude Code gives you several mechanisms for doing exactly this, and three of them are foundational. The most useful way to think about the three is by the specific kind of cost each one removes from your day.

Agent files, written as CLAUDE.md, are your project’s standing memory, the place where the conventions and the layout and the way things get done here all live. You write them once and they load into every session, so you stop re-explaining the project from the beginning each time. The part most people underuse is that these files load hierarchically, which means a personal file can ride along on every session on your machine, a broader file can cover all of your coding work, and a project-specific file can sit on top of both, each one layering onto the last. The thing to remember is to keep them lean, somewhere around a couple hundred lines each rather than letting them sprawl.

Skills capture the procedures you would otherwise walk through by hand every time, the multi-step moves where you first do one thing and then check another before continuing. A procedure you would normally re-explain becomes a procedure you simply invoke, and because a skill is not limited to instructions alone, it can bundle the scripts the agent runs, which means the repeatable move can carry real executable code rather than prose describing what the code should do.

Hooks handle the corrections you would otherwise find yourself repeating, the lint nit and the formatting rule and the check you keep having to ask for. They fire at fixed points in the harness lifecycle, before or after a tool runs for instance, and because they sit outside the model’s control they run every single time regardless of whether the model would have remembered to do them. You give the note once and then you never have to give it again.

The pattern underneath all three mechanisms is the same one, because each of them converts a recurring cost into a single one-time write. That is what compounding actually means in this context, and it is the direct opposite of starting from scratch, since the investment you make is small and the payback lands in every session that follows it.

Your setup can rot, here is how to catch it

There is an honest catch worth naming here, and it is the place where the /context command earns its keep, because a compounding system can quietly rot over time. You install a skill pack and then forget it is even there. A CLAUDE.md that started out tight slowly fills up with things that mattered once and no longer do. The investment turns into freight without you noticing, and freight is really just the from-scratch tax wearing a slightly nicer costume.

The /context command is how you tell the difference, and it does one genuinely useful thing, which is that it makes the invisible visible. One command gives you a colored grid and a per-category breakdown of exactly what you are carrying before the conversation has even started. You do not need to master it, you only need to glance at it often enough to notice when something is wrong.

The last time I ran it, the single biggest chunk of my standing context was not the project memory I had carefully written, it was 14.4k tokens of skill packs I had installed on a whim and then never used even once. I had assumed my context was quietly working in my favor, and a five-second look told me otherwise. Culling them took about a minute, using /skills to deactivate the ones I never reach for and /plugin to drop a whole pack that had arrived bundled with something else.

My /context readout: skills were the largest slice of standing overhead, bigger than the project memory I’d actually written.

That same readout flagged a second problem with a different fix, because the CLAUDE.md in my working directory had grown to 3,500 tokens without my noticing. I sat down with Claude and compacted it, cutting the irrelevant and the quietly duplicated, and it came back at 1,200 tokens, which is the same project memory carried at roughly a third of the weight.

Clarity helps the agent for exactly the same reason it helps a human reader, because a bloated and half-contradictory CLAUDE.md does not only cost you tokens, it actively muddies the very instructions you are leaning on to get good work out of the model. A lean file is easier for the model to follow in the same way that a tight brief is easier for a colleague to follow.

The specific number you land on does not really matter, but the habit does, so skim /context the way you would skim a credit-card statement, not obsessively but often enough to catch the recurring charge you forgot you ever signed up for.

Do the upfront work once

Starting from scratch feels free because the cost is smeared so thin across every session you will ever run, but it is not actually free, and it is in fact one of the largest and quietest line items in the way you work.

So put in the upfront work of writing a lean memory file, building a few skills you genuinely use, and setting a couple of hooks that hold the line for you. Then keep curating it, because models change and your projects change, and a glance at /context now and then is what tells you which parts of your setup are still earning their place. The goal was never a clever prompt. The goal is a setup that already knows your project before you say a single word to it.

Ad: Join Packt’s live workshop Claude Code Beyond Prompts by Sam Keen on June 20 and learn how to turn CLAUDE.md, skills, and hooks into a compounding coding system.

Use code CLAUDE60 for 60% off. Limited to the first 10 sign-ups.

🛠️ Tool of the Week

Repomix — an open source tool that packs an entire repository into a single, AI-friendly file ready to feed to a coding agent, with token counting built in.

Highlights

Packs a whole repository into a single structured file optimized for LLM consumption, removing the manual copy-paste that eats the start of every session.
Counts tokens per file and for the whole pack, so you see what context costs before you spend it rather than after.
Ships an official skill for Claude Code, Cursor, Codex, and Copilot, letting agents run Repomix directly inside the workflow.

Learn more about Repomix

📎 Tech Briefs

Anthropic suspends Fable 5 and Mythos 5 worldwide - A US export control directive citing national security forced Anthropic to disable both frontier models for all customers, days after launch, with all other models unaffected.
OpenAI Codex 0.140.0 ships - Codex adds Claude Code imports, unified mentions, and encrypted Amazon Bedrock API-key authentication.
GitHub Copilot CLI 1.0.63 released - A new deferTools option reduces MCP context bloat when tool search is enabled.
Xiaomi open-sources MiMo Code - Xiaomi’s terminal coding agent targets long tasks and offers free limited-time MiMo Auto access.
ChatGPT adds memory summary controls - Users can delete memories, turn memory off, and directly correct their memory summary.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Keep building,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company wants to reach senior developers, software engineers, and technical decision-makers, speak to us about partnering with Deep Engineering.

Deep Engineering #51: Francesco Ciulla on Rust, Go, and Service-Level Engineering Decisions

Saqib Jan — Thu, 11 Jun 2026 13:30:37 GMT

Build Production-Ready AI Applications with Rust, Claude and Codex

Learn how to use Claude and Codex as development partners for building reliable Rust applications faster. This hands-on workshop with Francesco Ciulla shows how to scaffold, refactor, debug, test, and productionize AI-assisted Rust code with confidence.

Use code DEEPENG50 for 50% off.

✍️ From the editor’s desk,

Welcome to the 51st issue of Deep Engineering!

The Rust team released version 1.96.0 on May 28, shipping new Copy-compatible range types, stabilized assert macros, and Cargo security fixes. It is the kind of release that tells you something important about where the language is. Now on a steady six-week release cadence, Rust’s toolchain keeps getting more integrated, and each release moves the language further from its reputation as a difficult, specialist tool and closer to something engineering teams can simply rely on.

That maturity is part of what is changing the production calculus for engineering teams this year. Francesco Ciulla, author of The Rust Programming Handbook and head of developer relations at Zerops, has used both Rust and Go in production. His view on the Rust versus Go debate is neither dismissive of Go nor evangelistic about Rust.

Ciulla discussed how Rust and Go solve different backend problems, where Go still wins, where Rust’s flat latency and binary size arguments become genuinely decisive, and why his thinking on committing to Rust for a production backend has changed.

Today’s expert insights are based on the broader conversation about Rust adoption we had with Ciulla. You can read our previous issue or watch the full Q&A here.

Let’s get started.

Featured Newsletter: Java Tips and Tricks

Subscribe to Java Tips and Tricks on Substack for practical Java insights, modern Java features, software design discussions, and updates from the Java ecosystem.

→ Subscribe to Java Tips and Tricks

Expert Insights

Rust and Go Are Better Compared Service by Service

by Saqib Jan with Francesco Ciulla

Most engineering teams debating Rust versus Go for their next backend service are usually trying to answer the question too early, assuming the decision starts with the language rather than the service.

Francesco Ciulla, author of The Rust Programming Handbook and head of developer relations at Zerops, thinks that framing misses the point. The useful question is not whether Rust is better than Go, or whether Go is more practical than Rust. It is whether the specific service being built has constraints that make one language’s trade-offs more valuable than the other’s.

The distinction Ciulla draws is practical. Go still earns its place in cloud infrastructure, CLI tooling, hiring, and teams that need a service working quickly without introducing a new adoption burden. Rust, in his view, becomes much harder to ignore when a service is constrained by memory use, latency predictability, high-concurrency performance, or the need to remove whole classes of runtime failure from the system. That is why the Rust versus Go debate becomes less useful the longer it stays at the language level. The decision only starts to make sense when it moves down to the service level.

Most teams are asking the wrong question

Most engineers who have followed the Rust versus Go conversation online will have encountered it as a tribal argument, with advocates on both sides treating the choice as a matter of identity rather than engineering judgment. Ciulla rejects that framing, and part of what makes his view useful is that he is not arguing from a Rust-only position. He currently works with Go at Zerops, where the company’s CLI is written in Go, and he has run Rust services in production on his own projects.

“I would never say that Go is a bad programming language,” Ciulla says. “You can feel how powerful Rust is because we are talking about completely different scenarios and still Rust has something to say. But that does not make Go bad.”

The more useful framing, he argues, is to stop treating Rust versus Go as a general-purpose language comparison and start treating it as a service-level engineering decision. The question is not whether an organization should adopt Rust instead of Go. The question is whether a specific service in the system has properties where Rust’s trade-offs are worth paying for, or whether Go’s simplicity, ecosystem, and hiring advantages matter more.

That reframe changes the conversation because it moves the decision away from preference and toward evidence. Once the team is looking at the service rather than the language, the relevant questions become much more concrete. Is memory use a real constraint? Is tail latency a business problem? Does the service sit on a critical path? Does the team have anyone who can review Rust code well enough to put it into production safely? Without those questions, the debate quickly becomes ideology dressed up as architecture.

Go wins on hiring, tooling, and the Docker ecosystem

Ciulla is careful not to turn the comparison into an anti-Go argument. Go is a natural fit for CLI tools, cloud infrastructure tooling, and the Docker and Kubernetes ecosystem. Docker is written in Go and Kubernetes is built on Go. For teams building tools that live inside that ecosystem, Go is the default for reasons that have little to do with language preference and everything to do with integration, community, and the availability of patterns the team can learn from.

The hiring argument also runs in Go’s favor, and Ciulla thinks engineering leaders should be honest about it. “In terms of finding Go engineers, probably at the moment it is easier,” he notes, “because there are probably more of them.” For a company that needs to move quickly and cannot afford a long search for specialized Rust talent, that matters. Staffing is not separate from engineering judgment. It is one of the constraints that determines whether a technical choice can survive contact with production.

Go also has lower adoption friction in general-purpose backend contexts where the performance ceiling is not the binding constraint. If a team is building a service that needs to be working by the end of the week, deployed reliably, and understood by the next engineer who touches it, Go’s simplicity and the breadth of its ecosystem are real advantages. Ciulla makes the same point more generally when talking about technology choices under deadline pressure. “When you need something simple, and you’re familiar already with Java or JavaScript, why don’t you use it?” The same principle applies to Go for teams already operating in that ecosystem.

That matters because Rust adoption is not free. It requires a different mental model, a compiler that forces decisions earlier, and at least one person on the team who knows the language well enough to validate what is being shipped. For a service that does not need Rust’s performance profile, those costs may not be worth paying.

Join Francesco Ciulla to learn how to build production-ready AI applications with Rust, Claude and Codex. Register here

Rust wins on performance and it is not a close contest

Where Ciulla becomes more direct is performance. On raw performance, he does not see much ambiguity. “Check the benchmarks yourself and send me the link where Go beats Rust,” he says. “Sometimes they are at the same level. In terms of pure performance, there is no story.”

That bluntness is useful, but the more important argument in his interview is not about benchmark wins. It is about latency predictability. Languages that rely on garbage collection, including Go, Java, and Node.js, can introduce pauses when the collector runs. An HTTP request that arrives during one of those pauses may experience higher latency than one that does not, even though the service logic is identical.

“By not having a garbage collector on the backend side, you basically have flat latency,” Ciulla explains. “You don’t rely on luck, or on the user not being the unlucky one. It’s a problem that is removed.”

For most web applications running at moderate scale, that distinction may not matter enough to justify a language change. Ciulla’s point is not that every API should be rewritten in Rust. His argument is that services with strict latency requirements, high concurrency, or service-level objectives tied to consistent tail latency should be evaluated differently from ordinary backend services. In those cases, the lack of a garbage collector is not an aesthetic language feature. It changes the runtime behavior the team has to reason about.

The resource efficiency argument is equally concrete. Ciulla describes running a Rust web server in production and observing roughly four megabytes of RAM in development and five megabytes in production. On a one-gigabyte droplet, that means many small Rust services can sit idle at once without creating the same memory pressure a heavier runtime might introduce. That is not a benchmark designed to win an online argument. It is an operational fact that changes what an infrastructure team has to provision, monitor, and pay for.

That is where Rust’s case becomes strongest. Not when the team wants a language that is fashionable, and not when someone wants to win the Rust versus Go debate, but when a specific service is expensive, latency-sensitive, memory-constrained, or sitting on a path where runtime predictability matters.

Deployment changes the operational argument

One of the more practical points Ciulla makes is that Rust changes the shape of the deployment artifact. When you run cargo build on a Rust project, you get an architecture-specific executable binary. A Windows machine produces a Windows executable. A Mac produces a Mac executable. A Linux machine produces a Linux executable. You can also cross-compile by changing the target architecture through the compiler, which lets a team produce a Linux binary from another environment when the workflow requires it.

Ciulla’s preferred production workflow is to build the Rust binary directly inside the Docker image build process. “My flow is that I prefer to build the Rust binary directly when I build the Docker image,” he explains, “so I have something which is just deployable everywhere, a Linux executable running in a Docker container.”

The appeal is straightforward. The team gets a lightweight binary compiled for the right architecture, packaged inside a portable container. “The dream for operations is having an executable inside a Docker container,” he says. “You get something which is lightweight and can run everywhere Docker is installed.”

That matters for teams already containerizing services, which is most teams building at meaningful production scale. A Rust binary inside a container keeps the deployment artifact small while preserving the operational consistency teams expect from Docker. The container still matters because real systems rarely deploy one service by hand. They orchestrate many services, restart them, replace them, and scale them across environments. As Ciulla puts it, “We are not in the 90s anymore.”

The argument is not that Rust removes the need for containers. It is that Rust and containers fit together cleanly. Rust gives the team a compact executable artifact. Docker gives the team a repeatable deployment boundary. Together, they reduce some of the runtime and packaging overhead that teams accept as normal in other ecosystems.

Rust is ready for more production backends

The most interesting part of Ciulla’s position is that it has changed. Two years ago, he would not have committed to building a paid SaaS product with a Rust backend. A year ago, he was still hedging. In 2026, he removes the qualification entirely. “If I had a paid product, I would use Rust,” he says. “Let’s remove the probably.”

That shift is not based on general enthusiasm for the language. Ciulla describes himself as skeptical by default, and his advice throughout the interview is to try technologies directly rather than rely on what advocates or critics say online. His confidence comes from the maturity he now sees in the Rust backend ecosystem, especially around Axum, which he says he would now use in production if he were building a SaaS or paid product.

The surrounding toolchain also strengthens the case. Cargo handles dependency management, building, testing, and documentation in a single integrated workflow. There is no equivalent of the npm versus yarn versus pnpm decision that JavaScript teams often navigate before they even get to the application itself. Running tests is cargo test. The integration is not a small ergonomic convenience. For teams that have spent years carrying build-system complexity, dependency churn, and fragmented tooling across projects, a coherent toolchain reduces the amount of process overhead attached to the language.

Ciulla sees that as part of why Rust’s reputation can lag behind its current reality. The language still has a learning curve, especially around ownership, lifetimes, and the borrow checker, but the ecosystem around the language has become more integrated rather than more fragmented. For experienced developers willing to learn Rust on its own terms, that changes the adoption equation.

Start the decision with the service

Ciulla’s adoption advice was consistent across our interview. Do not rewrite everything in Rust. Do not introduce it because the internet says it is the future. Do not make the language the strategy. Start with one service.

The right starting point, in his view, is a critical service with a real performance, latency, or memory problem. It might be a login service. It might be an API under heavy load. It might be a component that slows the rest of the system down. The point is to choose the service where Rust’s advantages are tied to an actual constraint rather than a general preference.

That is also where organizational readiness becomes unavoidable. If a team has no one who understands Rust well enough to review AI-generated code or validate a production deployment, the language can become a liability. “Who decides if the AI-generated Rust service is okay to put in production?” Ciulla asks. “You need the validation of an expert.”

That expert does not make Rust special. It makes Rust normal. Every new technology needs someone inside the organization who can tell the difference between code that compiles and code that should be shipped. Without that person, the team is not adopting a better tool. It is creating a new category of production risk.

For teams that do have the right service and the right internal expertise, Ciulla thinks the timing has changed. Rust is no longer only a systems language that backend teams admire from a distance. It is becoming a practical choice for the services where its properties matter most. The mistake is treating that as a universal recommendation. The opportunity is knowing exactly where it applies.

The Rust versus Go decision, then, is not really a Rust versus Go decision at all. It is a question about constraints. If the service needs simplicity, staffing depth, cloud tooling familiarity, and quick delivery, Go may be the better engineering choice. If the service needs predictable latency, low memory overhead, strong runtime control, and a deployment artifact that maps cleanly to containers, Rust deserves serious consideration. The senior engineering move is not to pick a side. It is to know which job each language is being asked to do.

Go deeper with Francesco Ciulla’s book

The Rust Programming Handbook: An End-to-End Guide to Mastering Rust Fundamentals, a practical guide to Rust’s ownership model, memory safety guarantees, concurrency patterns, trait system, and real-world use in systems and web programming.

Explore the book here

🛠️ Tool of the Week

cargo-nextest — a next-generation test runner for Rust that replaces cargo test with faster, more reliable test execution for production Rust projects.

Highlights

Runs each test in its own process, surfacing hidden state sharing and enabling better parallelism than cargo test’s single-process model.
Retries flaky tests automatically, reducing noise in CI pipelines without manual intervention.
Outputs machine-readable JUnit XML, compatible with most CI systems out of the box.
Version 0.9.137 added a JSON schema for user configuration, standardizing nextest settings across a workspace.

Learn more about cargo-nextest

📎 Tech Briefs

Docker Desktop 4.77.0 released - Marketplace extensions now install by pinned manifest digest, reducing tag-mutation risk after publication.
Rust 1.96.0 released - New Copy-compatible range types, assert_matches! macro stabilization, and two Cargo security fixes for third-party registry users.
Go 1.26.4 released - Security fixes ship for crypto/x509, mime, and net/textproto alongside compiler and runtime bug fixes.
Docker Desktop security update - Two container-to-host code execution CVEs in the Model Runner inference backend patched, with a Linux kernel backport fixing a container privilege escalation.
rust-analyzer update - New diagnostics, predicate evaluation, and completion fixes improve daily Rust IDE feedback loops.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Keep building,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company wants to reach senior developers, software engineers, and technical decision-makers, speak to us about partnering with Deep Engineering.

Try Rust With Your Own Hands and Eyes with Francesco Ciulla

Saqib Jan — Thu, 11 Jun 2026 10:33:12 GMT

Francesco Ciulla has been building with Rust since 2022, working across web development, developer tooling, and content creation for a large technical audience online. He is a Docker Captain since 2021, a former full-stack developer at the European Space Agency on the Copernicus project, and currently head of developer relations at Zerops.

He is the author of The Rust Programming Handbook, published by Packt in December 2025. Francesco joined Deep Engineering Live to talk about Rust adoption strategy, organizational challenges, concurrency, deployment workflows, and where the language is headed in 2026.

You can read or watch the full conversation here:

This session was recorded live as part of the Deep Engineering Live Interview Series. The transcript below has been lightly edited for clarity and readability. Audience members joined the conversation and asked questions directly during the session.

Q. What does Rust adoption actually look like at the organizational level, and what does success look like for engineering teams introducing it?

Francesco Ciulla: Rust has been growing a lot in the past few years and I am glad I started learning it a bit earlier than most people. I started creating content in 2022 and 2023 and then began working on the Rust Programming Handbook around April 2023, which took about two years to publish.

On adoption at scale, there is the famous meme about rewriting everything in Rust, and like every good meme there is a bit of truth in it. But I think the best approach, from a practical perspective, is not to rewrite everything in Rust at the beginning. The best way to introduce Rust in a big project is to find the hard part that is slowing things down, the bottleneck of your services, and try to write one single service in Rust. That is the best way to approach it. And then you will probably see Rust slowly take over more of your codebase, but I mean that in a good sense.

Q. Amazon and other large organizations have noted the high cost and risk of adopting Rust without internal expertise, and the talent pool is also quite thin. What advice do you give engineering leaders planning to introduce Rust and acquire the right people?

Francesco Ciulla: As with every new technology, the problem is not the technology itself. It is how well the technology is understood by the people in the organization. I remember when I was working at the European Space Agency and Docker adoption was slow, not because of anything wrong with Docker, but because a new technology that is not well known internally creates friction. That is the bottleneck.

The best approach is to have a shepherd, someone who can bring real knowledge into the organization. Basically a senior Rust developer who already knows all the flows and who people can refer to when they get stuck. This is especially true in the AI era where everyone is writing code with AI assistance, but you still need validation. Who decides whether the AI-generated Rust service is safe to put in production? You need the validation of an expert. That said, this is not just a Rust rule. It is the golden rule of adopting any new technology.

Q. The Rust learning curve has a reputation for being steep. The borrow checker in particular causes a period of deep soul searching for newcomers. What is the best way for an experienced developer to learn to think in Rust, especially in an AI-accelerated world?

Francesco Ciulla: I actually gave a talk at Rust Nation UK recently with the deliberately provocative title Rust Is Hard to Learn, so feel free to fight me on this. Because I think the idea that Rust has a steep learning curve is more of a myth than a reality, and I believe it can be addressed quite quickly.

The biggest challenge is not the concepts themselves but the mindset. Rust has a unique way of handling memory, and even if you are a senior developer with 20 years of experience, if you try to learn Rust by comparing it directly to other programming languages, you will struggle. The more experience you have, the more you think you already know how things work. Fighting the borrow checker, understanding lifetimes and ownership, can feel overwhelming if you bring that baggage with you. But if you are open-minded and approach it as something genuinely new rather than as a variation on what you already know, you will get the full power of the language and understand why so many people are enthusiastic about it.

Rust has been voted the most loved programming language year after year, and loved means the people who have used it still want to use it. That is a meaningful metric. I would rather trust the judgment of people who have actually worked with Rust than form an opinion based on what someone said on Twitter. When I am an engineer, I prefer to try things with my own hands and my own eyes.

Q. When is Rust not the right choice for a team, despite all its advantages?

Francesco Ciulla: I should probably say never, because I am supposed to be biased for Rust. But I prefer to be honest. There are genuinely cases where Rust is not the ideal tool.

First, when you need something simple and it needs to be done immediately. If you are a junior developer who needs to deliver a working API today and you do not know Rust, this is not the moment to start learning it under deadline pressure. You can always refactor something later. When you need something that just works, go with the technology you already know well.

Second, the ecosystem argument is real. Python has better libraries for data science. JavaScript has a larger package ecosystem for certain kinds of web work. Rust integrates well with other languages, but if you need something that is native to another ecosystem, that is a real constraint rather than just a preference. Good engineers use the right tool for the problem, and the case for Rust is strongest when the problem involves performance, memory efficiency, or concurrency at a level where other languages start showing their limits.

Q. Google’s Android security team reported that memory safety vulnerabilities fell below 20 percent after prioritizing memory-safe languages. Are those kinds of productivity and code quality benefits common in practice when teams use Rust?

Francesco Ciulla: Yes they are, but I think productivity in Rust is not primarily about typing speed. Rust can feel verbose and you might write more slowly in some ways than in other languages. The biggest benefit is the lack of debugging depth. You spend more time thinking carefully upfront, but you spend almost zero time chasing segfaults or memory leaks in production. And we always underestimate that part.

We talk a lot about how efficiently we can write code, but if you need less time to debug your code you are effectively writing more logic per unit of time. That is the part people consistently underestimate. We only tend to count the time it takes to write the thing, not the time it takes to find and fix what breaks later.

I also think Rust is one of the best programming languages for working with AI-generated code specifically because of what it requires from you as the reviewer. After the AI generates code you still need to touch it, understand it, and validate it. Otherwise there is no difference from copying boilerplate off Stack Overflow. You still have to understand what the code does. If you have no control, either you are useless or you cause a problem. In both cases, that is not a good place to be.

Q. What makes Rust’s approach to concurrency different, and how does it actually help teams building multi-threaded systems?

Francesco Ciulla: The first time I learned concurrency was at university in Java, and it was treated as the final, advanced session of the course, something dangerous that required extra care and specialized knowledge. I think many engineers carry that experience as a kind of trauma around concurrency.

When I went to teach concurrency in Rust for a YouTube video, I expected it to be challenging. I started the example and in two minutes I was done. I had the opposite problem. It was too easy.

This is structural rather than accidental. Rust was created when multi-core processors were already standard. Concurrency was not retrofitted onto a model designed for single-threaded execution. It was built in from the start. And the ownership system that prevents data races at compile time is the same ownership system that governs memory safety everywhere else in the language. There is no separate concurrency model to learn. The properties that make Rust memory safe are the same properties that make concurrent code safe.

That said, I personally prefer to add concurrency once an application is already working and you want to use memory more efficiently. If doing that takes a day of extra effort and it reduces your server costs from a hundred dollars a month to twenty, it was worth it. If it takes three weeks of fighting with the runtime, you would probably rather just spend more on the server. We are talking about production-grade applications here, not weekend side projects.

Q. How does Rust’s concurrency model hold up when you are building low-level networking components like proxies, packet processors, or kernel modules, territory that has traditionally belonged to C?

Francesco Ciulla: There are more repositories in C for that kind of work, and that is just a historical fact. But I am already seeing people build protocols and low-level networking components in Rust, and I think with AI assistance this is becoming more practical and more doable than it was even a year ago.

Just the fact that we are seriously considering writing these things in Rust is a win for the language. Nobody ever suggested doing this kind of work in JavaScript, because at that level you need pure efficiency and developer experience is not on the table. Only languages with the right performance profile can even enter this conversation. And Rust’s readability is a genuine advantage in that domain specifically. Low-level code becomes very complex very fast, and the fact that you can read your own code tomorrow is a significant practical benefit when you are dealing with networking protocols. I am not saying Rust will replace C or C++. Languages rarely disappear. But we now have an option, and having that option is already a meaningful shift.

Q. What are the most common pitfalls you see developers run into with Rust?

Francesco Ciulla: I wrote an eighteen-page chapter on pitfalls in the book, so I can probably remember about half of them now. The biggest one is trying to write Rust as though it were another language. If you approach it with the patterns and assumptions you bring from other languages, you will have problems. The second is not trusting the compiler. Especially at the beginning, the instinct is to try to fix things your own way without reading what the compiler is actually telling you. The compiler is giving you very specific information and the errors are basically tutorials. They are helping you write better code. Learning to read them carefully rather than working around them is probably the single most important habit to develop early.

Q. Rust’s trait system and rich type semantics let developers encode invariants directly in the type system, but this power can also lead to very complex code. How should an experienced architect balance using the advanced type system versus keeping things simple?

Francesco Ciulla: If you want to build rockets, you need to use more complex tools. That is just the nature of it. I think the best approach is to build a solid foundation in the basics first. You should not be using traits without understanding what they are. In terms of code organization, Rust is the best language I have worked in for how it structures modules, files, and folders. At some point you will stop writing everything in a single file, and having a clear module structure helps you manage complexity as it grows.

You can also write straightforward Rust code up to a certain point. If you want to unlock the full potential, including unsafe Rust and raw performance, Rust allows you to remove the seat belts. But of course the code becomes more complex at that level. This is not unique to Rust though. The complexity of any project is always exponential. It starts simple, simple, simple, and then suddenly nobody knows what is going on anymore. Having solid fundamentals gives you the tools to manage that curve.

Q. Rust versus Go for microservices is a question many backend teams face right now. Is Rust ready to challenge Go in the web backend space?

Francesco Ciulla: Let me be clear first that Go is not a bad language. Docker is written in Go and I would never say that makes Go bad. I have a Docker bottle on my desk and I have been a Docker Captain since 2021. I also currently work at Zerops where we have a CLI written in Go and I read a lot of Go code myself. So I am not dismissing it.

On pure performance, the comparison between Rust and Go is not really a contest. Check the benchmarks yourself and send me the link where Go beats Rust. Sometimes they are at the same level. In terms of pure performance there is no story.

Where Go genuinely wins is on CLIs, cloud tooling, and developer ecosystem. Docker is written in Go, Kubernetes is built on Go. If you want to be in the DevOps space and write tools in that ecosystem, Go is the natural choice. It also has more engineers available in the job market right now, which matters if you are hiring.

From a developer perspective though, I would argue the opposite is worth considering. If there are fewer Rust engineers than Go engineers, being expert in Rust means less competition. I would rather be expert in a language where there is less competition than be one of millions of Go developers. But I understand that a company hiring today will probably find a good Go developer faster than a good Rust developer. Both arguments are valid depending on which side of that conversation you are sitting on.

Q. How does Rust affect build and deployment workflows in practice?

Francesco Ciulla: This is one of my favorite questions because it combines two of my favorite things. When you build a Rust application, you get architecture-specific executable binaries. If you run cargo build on your machine, you get an exe on Windows, an executable on Mac, and an executable on Linux. You can also cross-compile, changing the target architecture through the compiler, to produce a Linux binary on a Windows machine.

My preferred workflow for production is to build the Rust binary directly when building the Docker image. That way I have a Linux executable compiled inside the container, ready to run everywhere Docker is installed. The dream for operations teams is having a single lightweight binary inside a Docker container. You get the portability and scalability of containers with the minimal footprint of a Rust binary.

Q. Are there real operational benefits to shipping Rust services as smaller container images with lower runtime overhead?

Francesco Ciulla: Absolutely. The binary is the dream for operations teams because it is the most lightweight option. I mentioned earlier that my Rust web server uses four megabytes of RAM in development and five in production. On a one-gigabyte droplet you could theoretically run more than 200 such services in idle. That kind of resource profile changes what is economically viable to deploy.

You could in theory run the Rust executable directly without Docker, and for a single service that works fine. But if you need to orchestrate ten services on the same machine, containers are still the right answer for production-grade applications that need to scale. The slight overhead Docker adds is worth it many times over in terms of scalability, replaceability, and operational consistency. We are not in the 90s anymore.

Q. What use cases in web development do you see Rust excelling at?

Francesco Ciulla: When performance is really important, Rust is where it shines. If performance is not your main concern, you can go with more conventional choices. I am still a fan of Node.js for certain kinds of work.

One of the most underappreciated arguments for Rust in web services is flat latency. Languages with garbage collectors, including Go, Java, and Node.js, introduce periodic pauses when the collector runs. Those pauses can last hundreds of milliseconds. An HTTP request that arrives during a GC cycle gets a worse experience than one that does not. By not having a garbage collector on the backend side, you have flat latency. You do not rely on luck or on the user not being the unlucky one. That problem is simply removed.

The other scenario where Rust makes a clear case is when you have one service in your system that is significantly slower than everything else. There is always one in any sufficiently complex system. Sometimes it is not the service itself but the upstream dependencies it is calling. But when the service itself is the bottleneck, spending time to optimize it in Rust makes sense. Writing the whole application in Rust from scratch is only necessary if you are starting a new project or you genuinely want to have fun with the language. For most teams, the bottleneck service is where to start.

Q. What are the main challenges when integrating Rust into CI/CD pipelines and monitoring?

Francesco Ciulla: The honest answer is that since Rust has been in production for less time than other languages, there are fewer examples in the documentation for some specific integrations. This gap is closing quickly and will probably disappear in a couple of years, but if you are using a specific technology and looking for a Rust integration example, you may occasionally find the documentation lacking compared to what exists for JavaScript or Python.

The Rust toolchain itself is actually one of the language’s strongest points once you get used to it. Running tests is cargo test. It is integrated natively into the language and there is no equivalent of the npm versus yarn versus pnpm decision that JavaScript teams have to navigate before writing a single line of code. The ecosystem and toolchain are famous within the Rust community for being one of the things people love most about working in the language.

Q. The Linux kernel maintainers have declared Rust permanent and are planning components that require it. What does that endorsement signal about where Rust is headed?

Francesco Ciulla: It is great news and very bad news for Rust skeptics. The fact is that Rust is slowly getting adopted at bigger and bigger levels. You can see this on the government side, in military applications, and in security-critical domains where safety requirements are the highest. Just the fact that Rust was considered a viable option for kernel-level work was already a meaningful milestone, even before it succeeded. The language was competing in a domain that had been exclusively C and C++ territory for decades and it earned a permanent place there.

I will be genuinely happy when we stop having the conversation about whether Rust should be used, and we just start using it. Python does not get these conversations. Nobody asks whether you should use Python. I will be happy when Rust reaches that level of acceptance as a normal production choice. We are moving in that direction. Every day I see another positive signal.

Q. Where do you see the next phase of Rust growth happening?

Francesco Ciulla: I think the biggest shift is coming in web development backends, and I know this is an unpopular opinion in the Rust community where the language is traditionally associated with systems programming. But I am seeing companies with hundreds of developers reach out to tell me they are rewriting their backend services in Rust. These are not random side projects. These are companies making deliberate production decisions.

Two years ago I would not have committed to building a paid SaaS product in Rust. In 2024, probably not. In 2025, maybe. In 2026, yes, I would use it. The Axum framework in particular has matured to the point where I am confident recommending it for production. That was not true a year ago.

In the embedded space, Rust is already winning and I have largely stopped advocating for it there because the argument is settled. I am also seeing companies that manufacture embedded devices ship them with Rust as the default, which is a different story from developers experimenting with Rust on embedded hardware. When the producers of these devices choose Rust before selling them, that is a commercial signal.

A member of the audience asked: With the state of the industry right now, is it challenging for a junior developer to start their journey with Rust and find their first job?

Francesco Ciulla: With the state of the industry right now, I think it is challenging for a junior developer to find a job regardless of which language they know. So let us remove Rust from that equation first.

You have two approaches here. One is to go as mainstream as possible, learn the most used framework, and compete for the highest volume of jobs. The problem with that approach is that you are also competing with the largest number of other candidates. I personally prefer the opposite approach. Since there are fewer Rust engineers than engineers in most other languages, being expert in Rust gives you a real differentiator.

If a job description lists JavaScript, React, SQL, Docker, Kubernetes, and then also mentions Rust, and there are two candidates and one of them knows Rust, that extra knowledge might be the thing that gets you the role. That is my honest view. The era of becoming strong in exactly one technology and finding a job with that alone is probably over. We need to be flexible. But dedicating some time to understanding the basics of Rust might make you shine in an interview in a way that knowing only mainstream technologies will not.

Francesco Ciulla is the author of The Rust Programming Handbook, published by Packt, and head of developer relations at Zerops.

Deep Engineering #50: Brian Allbee on Building Better Python Software

Saqib Jan — Thu, 04 Jun 2026 15:11:15 GMT

Claude Code for Software Engineering

🗓️ Friday, June 20 · 10:30 AM EDT onwards

Use code DEEPENG50 for 50% off.

✍️ From the editor’s desk,

Welcome to the 50th issue of Deep Engineering!

Anthropic expanded Project Glasswing on June 2, extending Claude Mythos Preview to approximately 150 new organizations for codebase vulnerability scanning, after initial partners found more than 10,000 high-severity security flaws in production code.

AI can now find vulnerabilities in production systems that were presumably tested, reviewed, and shipped by engineering teams. The gap between code that compiles and code that holds up under real-world pressure is no longer theoretical.

That gap also has a cause. Brian Allbee, Staff Software Engineer at Cleerly and author of Hands-On Software Engineering with Python (Packt), argues that programming focuses on the correctness of the code itself, while software engineering expands that focus to sustainability as change occurs. Too many developers optimize for code that works today, without enough attention to whether that code can be changed, tested, maintained, and handed off tomorrow.

Allbee joined Deep Engineering Live to discuss what closing that gap looks like in practice. Today’s expert insights are based on that conversation, and you can read or watch the full Q&A here.

Let’s get started.

Featured: All the dev content that matters, in one personalized feed

daily.dev is a professional network for developers, built around a personalized feed of the best content from across the dev ecosystem. Millions of developers use it to stay current with their stack, discover new tools and frameworks, and connect with a global community that shares what they’re learning.

Join for free at daily.dev

Expert Insights

Building Better Python Software Is Not About Writing Better Code

by Saqib Jan with Brian Allbee

Most Python developers measure their work by whether the code runs, assuming the job is done once the function returns the right value, the tests pass, and the build is green. Brian Allbee, Staff Software Engineer at Cleerly and author of Hands-On Software Engineering with Python (Packt), thinks that measure is correct but incomplete, noting that the gap between correct code and sustainable software is where most Python developers stop growing without realizing it.

The distinction Allbee draws is precise. Programming, he explained in our live interview, is focused on “the correctness of the code itself,” whereas software engineering “starts expanding out into more of a focus on sustainability as change occurs.” That shift in focus sounds subtle, but it changes almost every decision an engineer makes, from how they structure a module and handle a growing codebase to how they talk about technical debt with the people who control the roadmap.

The discipline that architecture cannot replace

The instinct when a Python codebase starts to grow is to reach for architecture by breaking things into services, introducing abstractions, and redesigning the data model. Allbee’s experience points in a different direction. “I think most of the paths to success in that context, at least the ones that I can think of that I’ve seen, don’t really start with the architecture, but with discipline behind the process,” he shares.

The discipline he describes is specific and unglamorous, emphasizing the need to keep things as simple as possible while wrapping repeated processes into functions or methods, stressing “that teams should agree on documentation standards and stick to them until something unexpected comes up, and that developers must write code with testability in mind from the beginning, even when there is no immediate requirement for tests.” These practices do not require a new framework or a redesign because they rely on consistency, which is often much harder to achieve than architecture.

The reason discipline comes before architecture is that architecture without discipline produces complexity without clarity. Allbee, in our interview, shared a vivid example with the audience from his own experience regarding a system he encountered that had been written in Python by an engineer who came from a C# background, resulting in an architecture where every function and every class had its own isolated module. The functional layers of the system were seven or eight deep depending on the context, creating a project that was “ridiculously huge,” he recalls, and “way more complicated than it needed to be, and it was hard to manage... hard to maintain.”

The problem was not the language or incompetence, but rather a mental model built for a different environment being applied wholesale to Python. Allbee points to a concept from the book Code That Fits In Your Head to explain why this matters, because humans can only keep “five to seven bits of information in the front of their memory at a given point in time.” A system with seven layers of depth saturates that capacity before a developer has even started reasoning about what any individual layer does.

Allbee argues that developers must “keep it simple” and “collapse things down to the point where you don’t have to have 19 different classes and 15 different instances of, you know, all these other classes to deal with something that really should be capable of being managed as a single function.”

Technical debt is a product decision, not a technical one

One of the more practically useful things Allbee explains about managing an evolving Python system is that technical debt is not primarily a technical problem. “Technical debt is one of those product-level priorities,” he reasons, adding that “whoever’s making the prioritization decisions is going to be in control of when those get tackled, if they get tackled.”

That framing shifts where an engineer should focus their energy when technical debt is accumulating, meaning the work is not just to identify the debt but to communicate its consequences clearly enough that the people controlling the roadmap can make an informed decision. “Making sure that you can communicate effectively, here’s what the impact of this technical debt is to your product-level people or whoever’s making those decisions, is gonna be a key thing,” he adds. That requires being able to sit down and say clearly that if the team does not deal with a bug, it is going to lead to cascading issues, and the longer they put it off, the more likely it is to lead to a really significant problem that will take even longer to get past.

The teams that handle technical debt well, in Allbee’s experience, are the ones that treat it as a first-class concern rather than an emergency. The difference between those two approaches is almost entirely about communication, where debt that gets communicated early and framed in terms of product risk gets prioritized, while debt that surfaces as a crisis gets managed badly.

Testing is a design decision

The most common framing of testing in Python projects is that it is something you add to code that already exists, but Allbee’s position is that testability is a property of the code itself, meaning that designing for it from the beginning changes the shape of the code in ways that make it easier to understand, change, and hand off to other engineers.

His testing approach for his own projects is a method that “exercises valid and invalid inputs for all of the parameters of every callable in the project.” He shares, “You combine that with judicious monitoring of missing lines and a code coverage report, that has served really well for me in making sure that the targets of those tests are being both thoroughly and realistically exercised.” The more important principle underneath the practice is that tests are most valuable when they reflect how the system is actually used, not just how the code is structured.

In team contexts, Allbee advocates for explicit agreement about how tests are organized and what tools are involved. “I’ve seen what happens when different engineers who aren’t communicating with each other each go their own way,” he points out, noting that “the tests that result, even if they’re rigorous and well thought out, are oftentimes difficult to follow across different test modules.” The investment in agreement upfront produces a test suite that the whole team can confidently read, maintain, and extend.

On AI-generated code in testing contexts, Allbee recommends defining a test suite that only humans are permitted to modify, making it as rigorous and complete as possible, and then allowing AI to generate implementation code that must pass that suite. He explains the boundary by stating that you can tell the AI to “write all the code you want,” but it “must pass this test suite” and it does “not get to modify that test suite.” That boundary, he reasons, provides about as much coverage as can realistically be achieved when AI is involved in production code.

Bring Claude Code into real engineering workflows, not just isolated coding sessions. Register here.

Concurrency is a design problem first

Python’s performance limitations and its Global Interpreter Lock have been a recurring concern for engineers building high-throughput systems, and CPython’s free-threaded build has stirred interest in what Python might make possible beyond the GIL. Allbee is measured about expectations, highlighting that most Python code is IO-bound rather than CPU-bound, which is where the GIL has its most significant impact, though he is hopeful that the free-threaded model will open doors for more CPU-bound work to be written in Python.

The framing that matters most to many developers is not about the runtime at all. “Concurrency is a design problem before it’s a runtime problem,” he underscores, adding that having better concurrency support in the language really does not eliminate the need to understand how your processes are going to contend against each other, how to deal with data ownership at the scope of the code, or how failures can happen. His practical advice on concurrency reflects this directly by recommending that developers add it sparingly and only when there is an actual benefit that outweighs the overhead of handling errors, data contention, and coordination costs. “Optimize your clarity and correctness first,” he recommends, and “really only reach for concurrency when you understand where the time is actually being spent.”

Cloud readiness is designing for volatility

The question of what makes a Python application cloud-ready is one Allbee addresses in terms of design principles rather than tools or platforms. The containerized application is cloud-ready, he acknowledges, but so are function-as-a-service constructs like AWS Lambda functions, proving that the specific mechanism matters less than the underlying design orientation.

“The key concept that ties almost every cloud-resident system together, containerization, stateless design, any of those, is that they are inherently disposable,” he explains. Because a container can be killed at any time, a Lambda invocation could be terminated before it reaches a successful completion, and Kubernetes pods restarting are probably routine events, designing for that reality means building processes around the expectation that the hardware can disappear at any point in time.

Statelessness in that context is about making failure cheap. There is no state to manage and no need to write code to reacquire that state, meaning a process simply ends and is restarted, making recovery from a failure as simple as starting a new instance. “Statelessness and containerization matter more because they make failure cheap and recovery routine than for any other purpose or reason,” he says, arguing that this principle should sit near the top of the list of factors shaping design decisions for any system built to run in a cloud environment.

What senior engineers actually do

The question of what separates an engineer ready for senior work from one who is not comes back to the same systems-oriented thinking that distinguishes engineering from programming, where the indicator is not technical mastery but curiosity about the system rather than just the isolated function.

“If they started demonstrating that they’re concerned with more than just is the code doing what it’s supposed to do,” he explains, “if there’s a certain amount of curiosity, why are we doing it this way, do they recognize the trade-offs, those are the things that I think start really indicating somebody is actually ready to go beyond just I’ve written this function, and it’s done, and it’s tested, and it works. Done. I’m finished.”

The senior engineers Allbee has tried to emulate and seen do their best work are not defined by the code they write but by the systems they shape and the teams they are enabling. That involves asking questions that guide less senior engineers to ask those same questions on their own, such as why the team is going down a certain road, what the benefits are, and what trade-offs exist. “There are always trade-offs,” he notes, emphasizing, “Always, always, always trade-offs.”

The advice he offers to Python developers trying to grow in an AI-accelerated world collapses to three core principles, which are to “think in systems,” to “design for change,” and to “optimize for your team.” If you come away thinking differently about why you write the code that you are writing and not just how, then that is the shift that matters. Since the language, tools, and expectations placed on Python engineers will inevitably keep growing, the engineers who hold up under those pressures are the ones who stopped measuring their work by whether the code runs and started asking what it will take for the system to survive.

Go deeper with Brian Allbee’s book

Brian Allbee explores these ideas in more depth in Hands-On Software Engineering with Python, a practical guide to building Python systems that are easier to test, maintain, evolve, and hand off.

Explore the book here

🛠️ Tool of the Week

Ruff - fast Python linting and formatting for teams trying to keep quality gates enforceable without slowing every commit.

Highlights

Consolidates linting, import sorting, upgrade checks, and formatting behind one configuration surface.
Runs fast enough for pre-commit and CI workflows, which makes quality checks more likely to stay enabled.
Supports monorepos and hierarchical configuration, helping larger teams avoid one-off project rules.
Already used across major Python projects, making it a practical default rather than a niche experiment.

Learn more about Ruff

📎 Tech Briefs

Copilot SDK is now generally available - GitHub made Copilot SDK stable across six languages, letting teams embed agent workflows into internal tools.
Anthropic expands Project Glasswing - Project Glasswing now extends to 150 organizations, shifting AI vulnerability discovery toward coordinated patching capacity.
Python pip 26.1.2 - Pip 26.1.2 shipped with Trusted Publishing attestations, tightening provenance for standard Python installation workflows across teams.
Using uv in GitLab CI/CD - Astral added GitLab CI guidance for uv images and cache pruning, simplifying reproducible Python pipelines outside GitHub.
Pyright 1.1.410 - Pyright 1.1.410 refreshed the Python wrapper package, keeping CLI and editor type checks aligned automatically.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Keep building,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.

Hands-On Software Engineering with Python with Brian Allbee

Saqib Jan — Wed, 03 Jun 2026 12:30:00 GMT

Brian Allbee has been writing Python almost exclusively since 2012, working across cloud-based application development, machine learning integration at Dice.com, and backend systems in AWS using Step Functions and Python Lambdas.

Allbee, Staff Software Engineer at Cleerly and author of Hands-On Software Engineering with Python, now in its second edition published by Packt, joined Deep Engineering Live to talk about what separates engineering from programming, how to scale and refactor Python systems responsibly, and what it actually takes to grow into senior and staff-level roles.

Watch the full conversation below.

Q. Tell us about your background and the kinds of systems you have worked on.

Brian Allbee: I have been programming almost exclusively in Python since early 2012. Prior to that I worked in C Sharp dot net, Flex markup language, and PHP for application development. I landed on Python at a job I started early in 2012 at an ad agency where they needed somebody to come in and build an internal application that was more performant than their off-the-shelf solution for asset management. I fell in love with the language a little before that position started, but I was very happy that a language audit I did reinforced that Python was still the way to go because it had everything they needed.

Since then I have done client-facing cloud management application work, a handful of customer-facing applications I cannot get into too much detail on because they are still covered under NDAs, and the last six years I spent doing machine learning implementation and integration for Dice.com on the team that eventually became their applied data science and AI team. Currently I am doing backend system development in an AWS cloud context with Step Functions and Python Lambdas to deal with health insurance processing.

Q. What distinguishes a true software engineering approach from just programming, particularly for Python developers working on real-world systems?

Brian Allbee: I think learning to think in terms of systems, not just implementations, is probably the main thing. I feel that holds true whether the backing language is Python or not, and it does not stop with just the systems that an engineer is writing. On the technical side it extends out to the entire toolchain, anything that shapes the code itself or determines how the code is managed or handled. But it also extends to what I would call nontechnical systems in the sense of a set of principles or procedures that define how something is done.

I basically feel that programming is really focused on making sure that the code is correct, the correctness of the code itself. Where software engineering starts expanding out into more of a focus on sustainability as change occurs.

Q. For Python developers aiming to move into senior or staff engineering roles, given how much AI is now part of development workflows, what skills or mindset shifts do they need beyond raw coding proficiency?

Brian Allbee: I think that same systems-oriented thinking is still the big dividing line, and I believe that will hold true even if LLM-based code generation turns out to be the next big thing that all of its proponents argue it will be. Even in those scenarios, manual interaction with code at the level of the syntax of the code itself might dwindle over time, but there are still going to be sensitive domains where some of that remains necessary. More importantly, engineers need to understand how that code fits together even if they did not write it, and why it fits together the way it does.

Hand in hand with that is a broader understanding of the problems being solved. Software engineering, like every other engineering discipline I am aware of, is concerned with solving problems usually while operating within some set of constraints. Software engineering focuses on solving those problems by creating systems, and that goes back to the whole systems-oriented thinking. But solving the problem requires understanding that problem first. Even if code generation becomes largely or completely automated, someone still has to own that system and understand its constraints and its potentials for failure and how it is expected to evolve over time.

Q. What are some of the best practices for updating, refactoring, and scaling an existing Python codebase as it evolves?

Brian Allbee: I think most of the paths to success in that context, at least the ones I can think of that I have seen, do not really start with the architecture but with discipline behind the process. The approaches that have worked for me or that I have seen work well for others include keeping things as simple as possible, wrapping processes that get used over and over again into functions or methods or whatever context works best whenever possible, and not being afraid to use structured data.

If you are working in a team, make sure the team has some agreement about how much in-code documentation and by extension comments are expected and what it should provide. The same kind of team-level agreement about what code standards you want to apply. Stick to those until something comes up that is not covered, and revise them as needed. It is a growing process.

Writing code with an eye towards making it testable is key even if there is not an immediate need for testing. Future you, if you are writing good tests, will come back and thank you if they could.

Q. Can you share an example from your experience of tackling technical debt or redesigning a Python system to improve its maintainability and performance?

Brian Allbee: Honestly, I really cannot. I do not have any dramatic war stories here because I have worked with generally exceptionally healthy teams that treated technical debt as a first-class concern, not as an emergency. Technical debt is one of those product-level priorities. Whoever is making the prioritization decisions is going to be in control of when those get tackled if they get tackled.

If there is significant technical debt, making sure that you can communicate effectively, here is what the impact of this technical debt is to your product-level people or whoever is making those decisions, is going to be a key thing. That means being able to sit down and say, I understand that you do not want us to deal with this bug or whatever it happens to be. If we do not deal with this, it is going to lead to this, then that, and the longer we put that off, the more likely it is to lead to a really significant problem and the longer it is going to take for us to get past that.

Q. Modern teams often grapple with how much upfront system design to do versus driving straight into coding. How do you find the right balance between careful architecture and rapid agile execution?

Brian Allbee: I think understanding the full final scope of a project, even if it is just at a very high level, is critical to one side of that balance. The other side is knowing, again even if just at a high level, what constraints and non-project expectations are in play. You mentioned agile. Even if some form of agile is not part of a team’s day-to-day processes, there are some takeaways from agile that I think can still be beneficial. The entire idea of delivering work and software frequently on some sort of cadence is one of those. Iterating against the smallest deliverable units that can be identified would be another major factor.

I do not think the real risk is too much design or too little. It is designing without understanding the constraints that are involved. Iterating on the smallest meaningful units of work is going to be the most practical way to find that right balance between design and execution.

Q. Python has introduced features like data classes, type hints, and static typing tools in recent versions. How have these modern language features shaped your approach to designing Python software, and how would you recommend engineers fully embrace type hints in large projects?

Brian Allbee: Not as much as it might sound like, actually. Before I turned to Python I was working with C Sharp dot net, which is a statically typed compiled language. I came to really like the idea of static typing from a programming perspective even in dynamically typed languages like Python and JavaScript. If you dig back far enough into some of my very old and now unsupported blogs, you will find I even wrote some blog posts about implementing that sort of a structure in Python back as far as version 2.7.

I definitely recommend using the typing system that is in the language right now. At worst, it is additional documentation that modern IDEs can pick up on to help an engineer working with the code. With the inclusion of just one third-party package, I like TypeGuard myself but there are others, it is possible to achieve runtime static-like type safety and static type-like behavior in Python code. And pre-deployment tools like MyPy are going to pick up on your type hints to give you some extra quality control going into that process. I think about whether you enforce types at runtime or not, the design clarity is worth it.

Q. In building robust Python applications, how do you approach data modeling and validation? When does Pydantic make sense and when are simpler options sufficient?

Brian Allbee: I think it depends on the scope and intentions of the project. Pydantic is great for projects where there are complex requirements that can be derived from something like a JSON or OAS schema. It is also good for projects that are responsible for generating those JSON or OAS schemas. The downside is it is a larger package, coming in at around two and a half megabytes or so for the module itself and its primary dependency. So if package size is a concern, that might not be the best choice.

There are other options. Fast JSON schema combined with regular Python dictionaries and lists is a solid alternative and it is much smaller. If there is no need for any kind of schema documentation but there is still a desire for type checking, type-annotated Python classes will probably get you 80 plus percent of the way there in my experience. If you need mutable data structures and that is all you need, data classes are a good option. If you need an immutable data structure and type checking is not a concern at the level of the code structure, I usually start with something like a named tuple.

I think the right data modeling tool depends less on popularity and more on the system’s constraints and the scale, scope, and longevity of the project.

Q. What are your go-to best practices for testing Python systems at scale? How do you balance unit, integration, and end-to-end tests, and how do you ensure the test suite stays reliable and useful over time?

Brian Allbee: For my own projects, I tend to like a testing approach that exercises valid and invalid inputs for all of the parameters of every callable in the project. You combine that with judicious monitoring of missing lines in a code coverage report, and that has served really well for me in making sure that the targets of those tests are being both thoroughly and realistically exercised.

In a working environment, I like for the team to come to some sort of consensus about how the tests are organized, what tools are involved, and so on. I have seen what happens when different engineers who are not communicating with each other each go their own way. The tests that result, even if they are rigorous and well thought out, are oftentimes difficult to follow across different test modules because different people write tests in different ways.

When integration or end-to-end testing is not feasible, I try to push unit tests closer to behavioral testing even if that increases mocking complexity. Those kinds of scenarios, unit testing can still go a long way. Ultimately, though, tests are most valuable when they reflect how the system is actually used, whether that is managed at a unit testing, integration testing, or system testing level, not just how that testing code or the code itself is structured.

Q. AI is now being used in areas like test generation, debugging, and even test maintenance. How should engineers think about AI in testing without compromising reliability and confidence in their systems?

Brian Allbee: I think the fundamental important thing there is going to be getting some sort of a consensus from anybody who is involved, all of the stakeholders, and anybody who is going to be held accountable for failures in the code, as to what the guardrails need to be. I like AI from the standpoint of code generation for things that are not sensitive. If it affects somebody’s lives or livelihoods, I do not want randomly generated code out there without really good guardrails.

The approach that I have seen and tried myself that has the most promise is to take a test-driven development approach and define a test suite, and only allow humans to modify that test suite. Make sure it is really good, really solid, really rigorous, and covers all of the business needs, everything that you can come up with. And then you can let an AI process go to town on the code as long as there is a clear boundary there. You can tell it, you can write all the code you want, it must pass this test suite, you do not get to modify that test suite. Let it go to town. At that point I think you are probably about as well covered as you can be.

Q. How should Python teams set up CI/CD pipelines to improve code quality and deployment reliability? What best practices help the most and what pitfalls should be avoided?

Brian Allbee: The goals are all the same for CI/CD regardless of the language involved. You have to fetch the code, you have to test it, you have to build it and package it, and you have to deploy it. The basic sequence is common across the board. There may be additional tasks like checking that the deployment process is well-formed, or aspects that are not tied directly to the code itself.

The main value that CI/CD adds is not necessarily the automation. It is the fast, automated, trustworthy feedback that you get from one of those processes. I would say look for places where you can generate that feedback, find the break points that are going to happen, and make sure that anything that fails gets surfaced in a meaningful, timely, and useful fashion.

Q. What makes a Python application cloud-ready, and what are the most important design principles to bear in mind?

Brian Allbee: Cloud-ready can mean different things to different organizations, different teams, or even individual people. A containerized application is cloud-ready provided that it can be deployed appropriately, but so too are function-as-a-service constructs like AWS Lambda functions and their equivalents in other provider spaces. It all depends ultimately on the final deployment expectations.

Some key things to bear in mind include leveraging environment variables to help control behavior in different cloud accounts or environments within those accounts. You will find that they can carry over from local development to deployment processes and build pipelines all the way out to your final deployed product. They can always be replicated and manipulated locally, and that makes things easier and faster to change in a deployed application without having to redeploy an entire stack.

Be aware of and actively seek out systems that cloud providers offer for things that you need to deal with. Secret storage is a great example. Pull a secret in one time when a container initializes or a Lambda starts up, and then do not touch it after that. Know the best practices and constraints for your final deployed code. A great example is AWS Lambda functions. You cannot run a Lambda function for more than fifteen minutes, so once you have a good idea of how long a process can take, set that timeout accordingly and test against it.

I think cloud-ready is less about where code runs and more about designing for volatility and external constraints.

Q. How do statelessness and containerization fit into building scalable cloud systems, and why do they matter?

Brian Allbee: If you think about it at a basic structural level, the key concept that ties almost every cloud resident system together, containerization, stateless design, any of those, is that they are inherently disposable. A container can be killed at any time. A Lambda invocation could be terminated before it reaches a successful completion. Kubernetes pods restarting are probably routine events. Even in a serverless context, a virtual machine can be stopped and restarted without warning. Recognizing that means designing your processes around the expectation that your hardware can disappear at any point in time.

Statelessness in that context is in a very real way about making failure of your hardware cheap. There is no state to manage, there is no need to write code to reacquire that state. A process ends and is restarted. Planning for failures and designing around the idea that recovery from a failure is just starting a new instance is probably near the top of the list of factors shaping design decisions.

In a container-based context, the container is at that point the smallest unit to replace. The key factor to keep in mind there is making sure that the startup behavior is consistent and predictable. Ensure that the environments are repeatable and allow a failed container to be replaced automatically and seamlessly rather than relying on any kind of manual troubleshooting process.

Statelessness and containerization matter more because they make failure cheap and recovery routine than for any other purpose or reason. That is what it comes right down to.

Q. A member of the audience asked: How critical is containerization to scaling systems?

Brian Allbee: Containerization is one of the more popular mechanisms but it is not the only mechanism out there. Most of my experience with containerization has been in a cloud-oriented context, and the alternatives in an AWS context at least include things like Lambda functions, which technically are their own containers but you do not have to worry about containerization as one of the factors in your code that you are concerned with. You are literally just writing code to fit inside the context of that Lambda container and letting it go. It is a good skill to have, most definitely, something that is going to be of use and interest in a lot of jobs these days, but I do not know that it is a critical skill for all cases.

Q. A member of the audience asked: Does Flask scale well?

Brian Allbee: The scaling question is so context-dependent that it is really hard to say definitively. In a containerized structure where your data store is completely separated from your Flask environment and application, and you can spin up and drop new instances of containers, I think it scales as well as anything else out there.

You will probably find that FastAPI is going to be more performant, but there is also a lot more work that has to happen in a FastAPI context. Flask is probably about in the middle. It is a good balance between a lot of stuff already supported versus speed of operation. And then at the other end you have something like Django where it does everything for everyone but it is not going to be as performant at an instance-by-instance level. After that, it is really going to end up depending on how well you can spread that load out through load balancing across containers running your application, regardless of whether it is Flask or FastAPI or Django or something completely homebrewed. That is probably where you are going to see the most scalability capability out of all of those options.

Q. What is one trade-off you see Python developers consistently get wrong when building systems?

Brian Allbee: The one I would say is most consistently seen in my experience is going back to the idea of overengineering. I want to write this as an object-oriented system because object-oriented is the way to go. And the same could be said for functional programming. Understand your problem space and design the solution around that problem space, because that is what you are trying to do, provide a solution for that specific problem space.

The best example in my personal experience was a system that was written in Python by somebody who came from a C Sharp background. The project was ridiculously huge. Every function had its own module. Every class had its own module. You put everything together, the functional layers of the system were seven or eight deep depending on the context.

Way more complicated than it needed to be, and it was hard to manage and hard to maintain. If I could have gone back and talked to, let us call him Steve, I probably would have said, Steve, there is this really good book out there called Code That Fits In Your Head. Read that. It is all about keeping things at a manageable level because psychologically humans can only keep five to seven bits of information in the front of their memory at a given point in time. You are talking about seven layers worth of depth in a project structure. That is already saturating things. Keep it simple. Collapse things down to the point where you do not have to have nineteen different classes and fifteen different instances of all these other classes to deal with something that really should be capable of being managed as a single function.

Q. How can you tell when an engineer is ready for senior-level work?

Brian Allbee: It goes back to what we started with. If they start demonstrating that they are concerned with more than just whether the code is doing what it is supposed to do, whether this function works, if there is a certain amount of curiosity about why are we doing it this way, what is the advantage of taking a functional programming approach versus a procedural versus an object-oriented one, what are the trade-offs, and do they recognise the trade-offs. Those are the things that start really indicating somebody is actually ready to go beyond just, I have written this function, it is done, it is tested, it works, I am finished.

The senior engineers that I have tried to emulate and that I have seen do their best work really are not defined by the code that they write but by the systems that they shape and the teams that they are enabling. That involves some gatekeeping, asking why we are going down this particular design path, or why are we not using this brand new library that has just shown up in the last three months. There is a lot of broader scope in asking those questions of less senior engineers and guiding them to learn how to ask those same questions on their own. Why do we go down this road? What is the benefit? What are the trade-offs? Because there are always trade-offs. Always.

Q. What motivated the second edition of Hands-On Software Engineering with Python, and what changed?

Brian Allbee: A good part of it was just the time lag. It was seven years between the first edition and the second. But Python itself has changed significantly, not so much in the language core but in the maturity of its tooling and the breadth of problems that it is now used to solve. The ecosystem around testing, packaging, automation, and deployment has grown in ways that significantly change how Python is used in real-world systems. Its adoption has also expanded dramatically, particularly in cloud and large-scale environments and also in AI, where it is very much a go-to language right now.

Today, Python engineers are frequently expected to think about architecture, performance, testing, and operational concerns in ways that just were not as common when the first edition was written. All of those growth areas and the dramatically increased surface area of use of the language kind of begged for further discussion.

The first edition tells the story of a fictitious company called Handmade Stuff that is just starting to develop an application structure to deal with what they are trying to accomplish as a company. The second edition takes that story forward and says, okay, we have this application and it is functional but less than optimal, and there is a significant impetus from the organization to move into the cloud. So what would that look like? A lot of the principles are still absolutely the same. You still need that system-level thinking, you still need to understand the problem space, you still have to work out how you are going to deploy this. What it looks like is going to be extremely different.

Q. What is the one piece of advice you would give to Python developers trying to grow into stronger engineers in an AI-accelerated world?

Brian Allbee: If you come away from thinking differently about why you write the code that you are writing, not just how, then you are moving in the right direction. Engineers are on the hook to develop and ship working code. But I will go back to my basic principles more than anything else. Think in systems. Design for change. If you are working in a team, optimise for your team.

Ask how easily the code could fit into a larger system, how it will change over time, and how your design choices will affect the product later for the people who have to work with it after you are done with it. That mindset shift is a key thing in enabling an engineer to grow into more senior roles and to build software that holds up under the real-world pressures that you are going to run into.

Brian Allbee is a Staff Software Engineer at Cleerly and the author of Hands-On Software Engineering with Python, published by Packt.

Deep Engineering Specials: Enterprise AI has an API problem

Saqib Jan — Tue, 02 Jun 2026 16:16:05 GMT

If you have 30,000 APIs, you probably have 300,000 endpoints across your organization. While that sounds like a problem of scale, it is actually one of design.

With agents discovering and calling APIs at runtime rather than developers hardcoding them at build time, that design problem has become one of the most urgent infrastructure questions in enterprise engineering.

This month’s special issue digs into how APIs built for developers need to become discoverable, understandable, governed, and safe runtime capabilities for agents, with commentary from Erik Wilde, Head of Enterprise Strategy at Jentic; Nandita Giri, Senior Software Engineer at Microsoft; Rohan Gupta, Principal Product Manager at Harness; and Mayank Bhola, Co-Founder and Head of Products at TestMu AI.

Let’s get started.

Special issue — June 2026

Your APIs were built for developers, not agents

“You don’t just look for APIs when you’re writing an app. You kind of look for APIs every time you solve a problem.” — Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador

Most large enterprises have no idea how many APIs they have. Ask them and the honest answer is usually somewhere between a guess and a shrug. What they do know is that the number is large, the endpoints are larger, and the documentation is somewhere between incomplete and missing. For years that was manageable because the people consuming those APIs could compensate. They had context, experience, and enough judgment to work around the gaps in a poorly written spec or an ambiguous parameter name.

For the better part of two decades, engineering teams designed APIs for a specific kind of consumer, a developer sitting at a keyboard, reading documentation, and making deliberate decisions about which endpoints to call and in what order. That consumer had context, experience, and the judgment to fill in the gaps that a poorly written spec inevitably left open. The API did not need to be perfect because the developer compensated for its imperfections at design time, before a single line of integration code was written.

That assumption breaks down when the consumer is an agent, says Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador. Agents do not read between the lines, compensate for ambiguous parameter names, or infer the intent behind a generic error response. They act on what the contract says, at runtime, every time they need to solve a problem, and the gap between what most enterprise APIs offer and what agents actually need is where many AI projects begin to fail before they deliver measurable value.

Anthropic’s 2026 State of AI Agents Report found that 46% of engineering teams cite integration with existing systems as their primary challenge when deploying agents, placing it above model capability, prompt quality, and every other factor on the list. The bottleneck is infrastructure, and that infrastructure depends on APIs that were never designed for this kind of consumer.

Masterclass: Building AI-Ready APIs with Agent Skills

Join OpenAPI Ambassadors Erik Wilde and Frank Kilcommins for a hands-on masterclass on building AI-ready APIs with agent skills, covering OpenAPI, Overlay, Arazzo, semantic discovery, deterministic workflows, and governance guardrails for agent-driven integrations.

🗓️ July 1, 2026 · 10:30 AM – 1:30 PM ET · Online

Use code DEEPENG50 for 50% off.

Agents discover APIs at runtime, developers do not

In our live interview, Wilde gave one of the clearest framings for how agent consumption differs from developer consumption. Developers search for APIs when building an application, make a decision, and hardcode the integration so it stays consistent for the lifetime of the application. Agents search for APIs at runtime, every time they encounter a problem they need to solve, against a catalog that may contain hundreds of thousands of options across a large enterprise.

That changes the API problem from documentation quality to runtime selection. The consumer is no longer a skilled person who can fill in the gaps of a poorly written spec. It is a machine that acts on exactly what the contract says, nothing more and nothing less, and it does so without the accumulated context that a developer brings to the integration process.

Wilde illustrated the scale of this problem by sharing a recent experience with a car manufacturer operating roughly 50,000 APIs and 500,000 endpoints across the organization. The point is not that this number is exceptional. For a large enterprise with decades of accumulated systems and services, it is closer to the normal condition than most teams would like to admit. What changes with agents is the cost of that normal condition. When the consumer needs to find the right capability at runtime, the selection problem alone can make the API landscape effectively unusable without a serious restructuring of how capabilities are described, organized, and exposed.

Agents cannot compensate for spec drift

“Think of the API as a contract with a very literal, very curious machine.” — Nandita Giri, Senior Software Engineer at Microsoft

Nandita Giri, Senior Software Engineer at Microsoft with prior engineering experience at Meta and Amazon, works across agentic AI and automation, and the pattern she observes across organizations working to become AI-ready is consistent and predictable. Teams invest in producing a good OpenAPI specification at launch, treat it as a first-class deliverable at the time of release, and then watch the specification and the actual API behavior silently diverge over the following months as the code evolves faster than the documentation does.

For developer-facing APIs, this drift is a manageable nuisance because developers notice the discrepancy, ask questions in Teams, Slack or a GitHub issue, and someone eventually updates the documentation before the next consumer runs into the same problem. But for agent-facing APIs, spec drift is not a nuisance. It is a silent failure mode that is exceptionally difficult to trace because agents have no mechanism for noticing the discrepancy between what the spec says and how the API actually behaves. They act on what the spec says, encounter failures they cannot interpret without the surrounding context that a developer would have, and either produce incorrect results or abandon the task entirely without surfacing a meaningful error to the system that called them.

The only way to stop that drift from compounding, Giri argues, is to treat the specification as a first-class part of the release process on every change, with CI pipelines that validate spec fidelity against actual runtime behavior before deployment proceeds, not as a quarterly audit task but as a gate that blocks release when the spec and the actual behavior have diverged.

Giri is equally specific about what good specifications actually require for agent consumption, and her examples are concrete enough to apply immediately. A field called status that returns values 1, 2, and 3 is useless to an agent unless the spec also documents that 1 means New, 2 means In Progress, and 3 means Completed, because the agent has no way to infer that mapping from the field name or the values themselves. An endpoint that documents only that it returns a 400 error for bad input, without specifying which input combinations trigger that response, leaves an agent unable to prevalidate its requests or recover gracefully when the error occurs. A rate limit that appears only in external documentation and not in the spec itself is invisible to any agent that has not been specifically trained on that external documentation. These are not edge cases that organizations can deprioritize. They are the normal state of most enterprise API specifications, and they are a primary reason why agents fail in ways that produce poor results without surfacing a clear explanation of what went wrong.

The same distinction applies on the API producer side. Standard linting tools check structure, including whether a description field exists, whether it meets a minimum length, and whether required parameters are present. That structural check is genuinely useful as a first line of defense, but it cannot evaluate whether a description is written in a way that helps an agent understand what the operation is actually for.

A field that passes every linting rule can still be useless to an agent if it describes what the endpoint does technically without explaining the intent a consumer would bring to it. Descriptions need to represent intent, including what somebody would use the operation for, what constraints apply, and how the agent should reason about the result. The gap between a description that passes a linting check and a description that an agent can act on reliably is the gap that most teams are not yet closing, and closing it requires evaluation mechanisms that go beyond pattern matching on the specification itself.

Cross-service inconsistency breaks agent workflows

“It’s not just about connecting the dots for AI agents. It’s about making sure they understand what those dots mean.” — Rohan Gupta, Principal Product Manager at Harness

Rohan Gupta, Principal Product Manager at Harness, approaches the same problem from the perspective of an organization managing APIs across many teams and many services. His concern extends beyond the quality of any individual specification to the consistency of API design across the entire landscape. When agents operate in enterprise environments, they rarely interact with a single service in isolation. They move through workflows that cross multiple services, passing data and decisions from one system to another, and every inconsistency between how different teams have designed their APIs adds friction at the exact points where agents need to reason about how to connect things together.

Gupta’s view is that API specifications must be well-annotated and thoroughly documented so that agents can understand and execute the tasks they are given with accuracy and clarity, and that the design sloppiness which developers could historically compensate for becomes a structural blocker when the consumer is a machine reading a schema as its only source of truth. Missing descriptions, vague parameter names, inconsistent error handling patterns, and exposed implementation quirks that make no sense outside the context of the original development team all force agents into guesswork, and agents that guess tend to fail in ways that are difficult to reproduce and harder to debug than the original error would have been.

The governance problem becomes harder at the cross-service level. If one service in an agent’s workflow provides ambiguous or outdated information, the agent can be misled into triggering actions on a completely separate system in ways that no individual team would have anticipated or authorized. Lifecycle management for APIs that agents consume cannot focus only on backward compatibility within a single service anymore. It has to account for the cross-platform consistency and auditability of changes across every service in every workflow that agents are permitted to traverse, which is a meaningful expansion of what API governance has historically required.

APIs that work for developers fail agents in production

“If you can’t explain why your agent made a decision, you’re not ready to go live.” — Mayank Bhola, Co-Founder and Head of Products at TestMu AI

Mayank Bhola, Co-Founder and Head of Products at TestMu AI, has a practitioner’s view of where the failure patterns actually surface when organizations move from building agentic systems in development to running them in production. The pattern he observes is consistent across teams and organizations. APIs that worked reliably for developer consumption fail at meaningful rates when agents start calling them, and the root cause is almost always constraints and rules that were documented in external guides or tribal knowledge rather than encoded explicitly in the specification itself, leaving agents with no mechanism for knowing those rules exist until they violate them and encounter a failure they cannot interpret.

The fix Bhola advocates for is not simply better documentation, because better documentation that lives outside the machine-readable contract is still invisible to agents. It requires rethinking how APIs surface information about their own behavior, making all constraints explicit within the spec itself and building API surfaces that are structured to reduce the cognitive overhead agents face when trying to understand what an endpoint does, when to call it, and what the consequences of calling it incorrectly might be. For organizations with established API landscapes, he recommends maintaining two parallel layers, with a legacy developer API preserving backward compatibility for existing integrations and an AI-optimized layer built on top of it that flattens nested data structures, makes all constraints and relationships explicit, and exposes capabilities at a level of abstraction that agents can act on without needing to combine multiple lower-level calls to accomplish a single business task.

Bhola believes the industry’s biggest blind spot is assuming that successful API consumption automatically leads to reliable agent behavior. In practice, many failures emerge after the API call succeeds. The agent selects the wrong tool, misinterprets context, follows an invalid reasoning path, or takes an action that technically satisfies the request but violates business intent. This is why validation infrastructure must be designed before deployment rather than after incidents occur.

Testing agentic systems requires teams to evaluate decision quality, tool selection accuracy, reasoning traceability, and behavioral consistency under changing conditions. The goal Bhola highlights is not just to verify outputs, but to understand whether the agent arrived at those outputs for the right reasons.

Too many endpoints, not enough intent

The structural problem underneath all of this is that most enterprise APIs are too fine-grained for agents to use reliably, even when every individual specification is perfectly written and maintained. As Wilde frames it, accomplishing anything meaningful often requires combining many different endpoints in a specific order that encodes implicit business logic which is obvious to a developer who understands the domain but entirely opaque to an agent that has only the API contracts to work from.

When doing something meaningful requires chaining thirty endpoints in the right sequence, agents become confused about how to combine them, inventive in ways that produce incorrect results, or they make errors partway through the sequence that cascade into larger failures that are difficult to unwind. Wilde’s position is that AI readiness requires reducing the number of endpoints agents are exposed to and improving the business alignment and intent-based nature of the APIs that remain, so that a workflow that wants to accomplish a task ideally needs only a single tool call rather than having to orchestrate many lower-level calls in the correct order. The solution he and his colleagues at Jentic are working toward is a workflow layer that sits above the existing fine-grained API landscape, exposing business-level capabilities that are designed for runtime discovery and agent consumption rather than for developer integration at build time.

This pattern already shows up in enterprise partner integrations. Organizations with complex APIs that they expose to partners face a specific version of the fine-grained problem, where a partner integrating with a large API surface has to understand the full landscape even when they only need a small part of it, and the engineering effort of that integration is significant enough to slow or block adoption entirely.

The solution Wilde describes is building purpose-built workflows for specific partners, so that a partner only needs to understand the workflows that were designed for their particular use cases rather than navigating the full API surface independently. The underlying APIs do not change. What changes is the layer of business-level capabilities that sits above them, designed for a specific consumer’s needs rather than for maximum flexibility across all possible consumers. The benefit for agents is the same as the benefit for partners, with fewer options to navigate, clearer intent at each step, and a much lower chance of combining things incorrectly.

The insight that makes this approach worth pursuing beyond its value for agents alone is one that Wilde makes explicit. This improvement is not only valuable for agents. Any developer who currently has to call fifteen underlying APIs to accomplish a task that should conceptually be a single operation would also benefit from a better-designed capability API on top of those underlying services. The investment in agent-readiness is an investment in the overall quality and usability of the API landscape, and the returns compound across every consumer of those APIs whether that consumer is a human developer or an autonomous agent running at runtime.

The API layer is where the next two years are decided

Wilde’s view of API lifecycle management is the right closing frame for this issue. Agents do not consume APIs the way developers do. They discover capabilities at runtime, decide whether a tool looks useful in the moment, and need machine-readable signals about what the API does, what constraints apply, what side effects it may trigger, and whether it is safe to keep using.

That changes how organizations need to think about versioning, deprecation, and governance. The old model assumes that a developer reads the documentation, notices a migration notice, and updates an integration on a schedule the team can manage. Agent-facing APIs need more of that information to be visible at runtime. If an API is being deprecated, if a capability is nearing sunset, or if a safer replacement exists, the consuming system needs a way to discover that signal before it makes a decision.

This is where API lifecycle management needs to move, and organizations that invest in the governance structures to support it now will be better positioned than those that wait for the pressure to become unavoidable. The agents are already in production, and the limiting factors are no longer model capability alone but integration, security, and operational scalability, which means the API layer is where the most consequential infrastructure work of the next two years will happen for most engineering organizations.

The same design assumption that broke enterprise APIs, that the consumer has context, judgment, and the ability to fill in gaps, is present in every other infrastructure layer that agents call at runtime. Wilde’s framing brings the issue back to a practical rule. Agents should not be used to compensate for infrastructure that fails to express intent, constraints, lifecycle state, or safe operating boundaries. The teams that build on infrastructure designed to make those signals explicit will ship more reliable agentic systems than those still working around infrastructure that was never designed for this kind of consumer.

Thank you for reading this special issue of Deep Engineering on why the API layer has become the most consequential infrastructure problem in enterprise AI.

We’ll be back on Thursday with more expert-led content, and next month, on the first Tuesday of July, with another special issue.

Keep building,
Saqib Jan
Editor-in-Chief, Deep Engineering

Deep Engineering #49: David Knickerbocker on Open Source Intelligence and Real-World AI Systems

Saqib Jan — Thu, 28 May 2026 17:05:37 GMT

All the dev content that matters, in one personalized feed

Whether you're an early-career engineer levelling up or a senior dev tracking what's next, daily.dev makes sure the signal reaches you - without the noise.

Join for free at daily.dev

✍️ From the editor’s desk,

Welcome to the 49th issue of Deep Engineering!

Earlier this month, CISA and its international cybersecurity partners released Careful Adoption of Agentic AI Services, a guide for organisations adopting AI systems that can plan, use tools, access data, and act across digital environments. That changes the risk model because AI systems operating inside real workflows inherit risk from the surrounding data, permissions, tools, and context.

That risk is becoming easier to understand in practice. On 23 May 2026, Rohan Pandey of DigitalOcean and Archit Bhujang of Arizona State University published Poisoning the Watchtower, which shows how logs, alerts, URLs, payloads, DNS queries, and usernames can carry attacker written instructions into LLM assisted security workflows.

AI systems do not only consume clean prompts from users. They consume context from operational systems, open web sources, documents, logs, tools, and knowledge bases that the model does not control. Once that context includes contradiction, deception, malicious text, or attacker controlled content, relevance alone becomes an unsafe target for retrieval and summarisation.

David Knickerbocker, founder of Verdant Intelligence and author of Network Science with Python (Packt), builds systems for Open Source Intelligence (OSINT) environments where messy and adversarial data is normal. His perspective matters in this issue because the systems he builds separate observing from judging, treat claims as claims rather than facts, and preserve minority signals that simpler retrieval pipelines often discard.

In issue 43, we looked at Knickerbocker’s work on real-time knowledge graphs and AI systems that treat knowledge as a live stream of claims. Today’s issue continues that conversation with what OSINT teaches engineers about messy, adversarial data. You can also watch our interview or read the full Q&A here.

Let’s get started.

Claude Code for Software Engineering

Learn how to structure Claude Code with context, reusable skills, scoped instructions, and guardrails so it works reliably across real codebases and team workflows.

🗓️ Friday, June 20 · 10:30 AM EDT onwards

Use code DEEPENG50 for 50% off.

Expert Insights

Building AI Systems That Handle Contradiction at Scale

by Saqib Jan with David Knickerbocker

Most engineers building AI systems have never had to question whether their data source is working against them. The data comes, is processed, retrieved, and the system responds. The assumption underneath all of that is that the source is cooperative, that it was created to convey information accurately, stored in a format designed for retrieval, and that what comes back when you query it is at least an honest attempt at an answer. The problem is that assumption is so embedded in how most AI systems are designed that it never gets examined.

David Knickerbocker, founder of Verdant Intelligence and author of Network Science with Python (Packt), builds systems for environments where data is not clean, settled, or cooperative. His AI systems ingest from the open web, across sources that contradict each other, where some information may be misleading, incomplete, or adversarial. The engineering challenge is not only making retrieval accurate. It is making the system useful when the real world refuses to behave like a clean dataset.

The assumption that data is helpful

Engineers who have worked primarily with internal databases, structured APIs, or carefully assembled training sets carry a baseline assumption that data is cooperative. It was created to convey information accurately, stored in a format designed for retrieval, and accessed through interfaces that return what was asked for. The job of the retrieval system is to find the right thing efficiently.

Open-source intelligence does not work this way. When ingesting from the open web at scale, some fraction of what arrives is wrong, some is deliberately misleading, and some represents one side of a contested claim. For Knickerbocker, the ingestion layer is not the right place to decide what is true. “You can have two different groups that are in opposition from each other,” he says. “One group will say this is the truth, and another group will say this is the truth, and they will be in direct conflict with each other.” The system’s job, in that moment, is to capture what is being claimed and preserve enough context for judgment to happen later.

“The real world is a messy space. It is not just that websites disagree with each other. Websites also have malware. If you point your servers at websites and you just download everything that is on them, then you need to be prepared for the consequences of downloading malware.”

The practical design response is to treat the system as an observer rather than an adjudicator. Knickerbocker draws that line clearly. “My systems do not care who is right or wrong,” he explains. “They just do not. My systems are observers.” The point is not neutrality as a value statement. It is an architectural boundary. The system captures what is being said, keeps competing claims visible, and avoids collapsing observation into judgment too early.

This distinction matters far beyond open-source intelligence. Any AI system that draws on user-generated content, social media, news, or unstructured enterprise data is working with material that was not created to be machine-readable and was not vetted before ingestion. The assumption that the data is trying to help is not just wrong in those environments. It is a liability.

Bigger clusters are not more important than smaller ones

One of the quieter failures in production NLP systems is the treatment of minority signals as noise. A similarity-based retrieval system returns the most representative results, which in practice means the most common results. A clustering pipeline that surfaces the largest groups first will consistently deprioritize small but significant signals. In a world where the interesting thing is often the outlier, that is a serious problem.

In open-source intelligence specifically, this failure mode has consequences. A small cluster of claims pointing toward something dangerous is not less important because it is small. A single source saying something that contradicts the majority view is not less worth capturing because it is in the minority. “Bigger clusters are not more important than smaller clusters,” Knickerbocker observes. “In open source intelligence, everything matters, top to bottom.”

Drawing from his engineering experience building these systems, Knickerbocker ensures his APIs return full context rather than a ranked shortlist. “If you use a tool to do a search to find out something, you are getting a snapshot of time,” he says. His systems are designed to capture what he calls the heartbeat of the internet. “If I use my API... it is going to come back with 10,000 things. My APIs do not return 10. They return full context.” That creates a harder downstream problem because the question is no longer how to retrieve the best few results. It is how to make a large, shifting body of claims usable without discarding the signals that do not look dominant at first.

The parallel for general AI systems is specific and direct. Any retrieval or summarisation pipeline that privileges majority signal is making a judgment call that the most common view is the most relevant one. That judgment call is often wrong, and it is invisible because the discarded minority signal never surfaces.

The difference between a claim and a fact

Engineers trained on factual datasets tend to build systems that treat retrieved content as facts to be combined and presented. The underlying assumption is that if the source is credible and the retrieval is accurate, what comes back is true. In a contested information environment that assumption collapses immediately, and the design has to change with it.

Knickerbocker’s approach separates the task of capturing claims from the task of evaluating them. What a source says is observable. Whether what it says is correct requires judgment that depends on context, corroboration, and often human expertise that the system does not have. Turning that claim into an evaluated fact requires a different layer of judgment, and Knickerbocker is careful not to build that decision into the first act of ingestion. “I do not make that decision, and I do not allow my AI to make the decision what is true or what is false either,” he says. “I am more interested in what people are claiming is what is going on in the world.”

This design choice has a significant downstream consequence. It means the system can handle contradiction without breaking. Two sources saying opposite things about the same event are not a problem to resolve at the retrieval layer. They are two data points, both of which belong in the response. Knickerbocker simply logs these varied claims as parallel ribbons of information. The human or the downstream system that receives them can then apply judgment about which to act on, in what context, and with what confidence.

The verification boundary

One of the hardest design decisions in any AI system that works with real-world data is where to draw the line between surfacing an insight and making an actionable claim. The two feel similar at the output layer but require very different things from the system that produces them.

In our live interview, Knickerbocker was specific about where that line sits. “Everything that I do is intentional,” he shares. His real-time intelligence layer is built for awareness. It captures what is happening and surfaces it without making the final judgment on what should be done next. If a piece of intelligence looks actionable, the system does not automatically act on it. It surfaces the signal so a human or downstream process can decide whether it matters, who should see it, and what level of confidence is appropriate.

“There are still certain parts that I like being a human being. Some things you just need to be aware of. Like, you do not need to respond to everybody. But it is good to know what is on the radar.”

In practice, this means that even when a piece of intelligence looks clearly actionable, the system does not act on it. It surfaces it. The routing of that intelligence to the right person or the right downstream process is a separate engineering and organisational problem, and conflating it with the retrieval problem produces systems that are either too conservative to be useful or too confident to be trusted.

This is a principle with broad application. AI systems that are asked to be both the observer and the actor tend to perform neither role well. Keeping the observation layer and the action layer separate, with a clear boundary between them, is one of the most reliable ways to build something that stays trustworthy as it scales.

Entity extraction gets easier but never clean

Entity extraction from clean text is comparatively well understood. The models are good, the cleanup is manageable, and the output is reliable enough for most downstream uses. Entity extraction from the open web at scale is a different challenge, not because the models are worse but because the data has properties that laboratory text does not.

Knickerbocker began this work in 2018, starting with part-of-speech tagging before NER models were mature, moving to spaCy as those models improved, and more recently using LLMs for extraction. The trajectory is one of improving reliability rather than changing fundamentals. “Entity extraction has improved a lot since 2015,” he notes. “I mostly have to just throw away less. I have less cleaning to do, and it gets things right a lot easier.”

What has not changed is the messiness. Natural language processing at scale on real-world text always produces noise. The question is how much noise is acceptable for the downstream use case and how to handle the cleaning efficiently. At the scale he describes from previous work, including entity extraction across internet-scale datasets, the cleaning cannot be purely manual. It has to be part of the pipeline rather than an editorial step applied after the fact.

He also flags a risk in the current extraction approach that is worth understanding. Older NLP models produced visible noise that engineers learned to catch and correct. LLM-based extraction produces outputs that look clean even when they are wrong, because the model is good at generating confident-looking text regardless of underlying accuracy.

“LLMs are a little bit dangerous because the messiness goes away. People are a little bit more trusting of LLMs than older NLP. When you are using LLMs, everything just looks perfect. And that is kind of a dangerous downside too.”

The implication for engineers is that moving to LLMs for extraction does not reduce the need for validation. It makes validation harder to remember because the outputs no longer look like they need it.

Building for the world that actually exists

The thread running through Knickerbocker’s work is a commitment to grounding. He builds systems for the world as it is, not a cleaned version of it. That leads to a specific set of design choices: treat data as claims rather than facts, preserve minority signals, separate awareness from judgment, and let the system observe before any person or downstream workflow decides what to do next.

Those principles come from the kinds of environments Knickerbocker has worked in: data operations, cybersecurity, open-source intelligence, and production systems where the cost of getting something wrong is real. “The real world is a messy space,” he says. “Natural language processing is just messy. I have not seen it get really cleaned up yet.”

For engineers who have worked mostly with clean internal systems, that might sound like a warning about a narrow class of hard problems. It is broader than that. Any AI system that deals with content created by people, pulled from the web, generated by users, or routed through operational systems eventually has to confront the same condition. Real-world data is messy by default. The systems that handle it well are intentionally designed for that mess before it becomes a production failure.

🛠️ Tool of the Week

GraphRAG — A graph-based retrieval pipeline for unstructured text

GraphRAG helps teams preserve relationships across messy documents, conflicting claims, and large text collections before asking an LLM to answer.

Highlights:

Builds knowledge graphs from unstructured text instead of relying only on isolated chunks.
Links entities, claims, and topics so retrieval can use structure, not just similarity.
Supports local and global search for both narrow evidence lookup and corpus-level synthesis.
Gives engineers a practical starting point for testing graph-based RAG patterns.

Learn more about GraphRag

📎 Tech Briefs

Claude Compliance API Integrations - Compliance API integrations help IT and security teams govern Claude across connected enterprise workflows.
MCP Events Working Groups - Gateway, transport, registry, and agents groups advanced protocol work around tool-connected AI systems.
RAGFlow v0.25.6 - Browser agents and RAPTOR AHC mode expand RAGFlow from document retrieval into web-aware ingestion workflows.
Qdrant v1.18.1 — Vector dimension validation before WAL writes reduces ingestion failure risk during async upserts.
Weaviate v1.38.0-rc.0 - Nested object filtering and namespace support improve retrieval precision for structured, multi-tenant corpora.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Keep building,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.

Compute Obsession Is Slowing Down AI Systems

Saqib Jan — Tue, 26 May 2026 05:30:00 GMT

Engineers building AI systems today tend to focus on compute first. It is typically about how many GPU cores, how many parameters, how much VRAM, and how to extract more from all of it. While the benchmarks are about throughput and inference speed, the infrastructure conversations are about scaling horizontally across more hardware.

Jim Ledin, a seasoned engineering leader, CEO of Ledin Engineering and author of Modern Computer Architecture and Organization (third edition, Packt), thinks that framing misses the most important constraint in production AI systems. The bottleneck holding back real-world AI performance is not compute but data movement.

“Data movement can often be more expensive than the actual computation steps,” Ledin says. “The latency, especially moving large data structures across different levels of the memory hierarchy, can dominate and leave a lot of your compute bandwidth idle.” This is not a niche embedded systems concern. It is happening in the largest AI deployments in the world, and it is the reason hardware vendors like NVIDIA are designing systems the way they are today.

Continue reading or watch the full conversation with Jim Ledin below.

Memory bandwidth is slowing your AI system more than your GPU is

When a CPU or GPU requests data from memory and that data is not available in cache, the processor waits while the computation units sit idle. In a consumer application, that idle time seems like a minor inconvenience. But in an AI system processing large tensors continuously, it accumulates into a significant fraction of total runtime.

“AI workloads are becoming increasingly memory bandwidth limited,” Ledin shares, pointing to a dynamic that is reshaping how AI hardware gets built. “It is taking more time to bring data into the GPU or TPU memory than it is taking for the computation to take place on the data.” The raw ability to multiply matrices is no longer the binding constraint. But getting the data to the multipliers fast enough is.

This is exactly why high bandwidth memory exists. HBM modules are stacks of RAM chips built into a cube, physically close to the processing units, with far higher data transfer rates than conventional DRAM. “On a TPU card, you typically have several of these HBM modules,” Ledin explains, “and they have a far higher data rate for transferring data in and out of the GPU processing components than on a typical consumer grade GPU.” The engineering bet being made with systems like NVIDIA’s Blackwell architecture is that memory bandwidth is worth more than raw core count, because the cores are already faster than the data can reach them.

But there is a side effect that touches anyone buying consumer hardware. “A lot of the production capacity for memory is going into these high bandwidth memory modules, which cost a lot more for the purchaser and make a lot more money for the vendor,” Ledin observes. That is a direct reason DDR5 has been difficult to find and expensive when available. The memory fabs are prioritizing the more profitable HBM production, and consumer DRAM is downstream of that decision.

The hardware cost your cloud bill is hiding

Most software engineers, especially those working in cloud environments, treat the hardware as someone else’s concern. The abstraction is good enough, the managed services handle the infrastructure, and the code runs somewhere. Ledin’s argument is that this hands-off relationship with hardware has a real cost that shows up in performance and in cloud bills.

“If your code is accessing memory in inefficient patterns, if you are not using the cache memory within the processor in an effective manner, and if you are just moving data around more than is necessary, that can all have significant performance impacts,” he warns. The CPU requests data from memory, and if it is not in cache, it waits. “A lot of the time it is unavoidable, but the amount of latency can be minimized by different ways of optimizing algorithms.”

The mechanics are specific. When a modern CPU reads from DRAM, even a single byte triggers a 64-byte cache line transfer. The processor brings in a block of adjacent memory whether it needs all of it or not. If the algorithm then jumps to a different memory location, causes that block to be evicted from cache, and later needs it again, it has to re-read it from DRAM. That is wasted time. “For best efficiency, you would want your code to be working with data from that block before it moves on to something else,” Ledin explains, “rather than bouncing around to other memory locations.”

In a cloud environment, this inefficiency does not just slow things down. It costs money, and there is no incentive for cloud providers to surface it clearly. “You are paying for the usage of the system whether the CPU is actually crunching instructions or the CPU is idle waiting for a data item to come in from memory,” he points out. The cloud bill does not distinguish between productive cycles and stall cycles. Engineers who understand cache locality can write code that reduces stalls and therefore reduces cost, not just latency. Optimizing for cost comes down to understanding your memory access patterns and engineering around them, not just choosing the right managed tooling stack.

Drawing from his engineering work across embedded and production systems, Ledin shares a useful example. A Linux web server called Tux, which ran in kernel space to avoid user-to-kernel data transfers, developed a performance problem under high load because its per-request state data grew large enough to exceed the CPU’s level two cache. “Performance dropped off sharply,” he recalls. Engineers analyzed the cache behavior, restructured the data layout to keep per-request state smaller, and did the same for instruction caching by batching related processing together. “Fixes that they implemented increased the application performance by about 40%.” No new hardware, no architectural overhaul. Just understanding where the memory ceiling was and designing around it.

GPUs are the right tool, but not always for the reason you think

The assumption that GPUs are the correct architecture for AI workloads is not wrong, but it is incomplete in a way that matters for engineers making infrastructure decisions. Ledin draws a distinction that is often glossed over in the mainstream conversation about AI hardware.

“GPUs are probably the ideal architecture today for people and small companies that want to run language models locally,” he says, drawing from personal experience. He recently ran the Gemma 4 26-billion-parameter model on an NVIDIA RTX 4090, and for that use case the GPU is the right tool. But for larger-scale deployments running the much larger frontier models, the picture is different. “The trend there is for dedicated TPUs,” he notes.

The distinction matters because GPUs carry silicon dedicated to graphics work that has nothing to do with tensor operations. A consumer GPU has hardware for real-time video rendering, gaming pipelines, and display output. A TPU does not. “TPUs do not use up silicon for that purpose and focus everything on the tensor work,” Ledin explains. When you are running thousands of inference requests at scale, that difference in silicon allocation translates directly into efficiency at the workload that actually matters.

There is also the SIMT execution model to understand. Modern NVIDIA GPUs run 32 threads in lockstep, all executing the same instruction on different data streams simultaneously. This is efficient for linear, parallel workloads. When those threads hit a branch, a conditional where some threads take the if path and some take the else path, the hardware executes one side then goes back and executes the other. “You basically have effectively a pipeline stall where it has to go back and execute a different thread in that kind of situation,” Ledin highlights. The flexibility is there, but it comes at a cost. “Avoiding branching if possible can have a significant impact on performance.”

For engineers deciding where to run inference workloads, Ledin offers a practical heuristic. “The GPU only really becomes attractive when you have enough work for it to do that it can be parallelized and enough that it will amortize the costs associated with moving data onto the GPU, launching the kernels, and doing the management work to transfer data to and from the GPU.” If the workload is not large enough to keep the GPU busy, the CPU implementation may be faster because it avoids all that overhead entirely.

Frameworks are hiding costs that engineers need to see

Frameworks and libraries have made it possible to build sophisticated AI systems without ever thinking about what is happening in hardware. That is mostly a good thing. The abstraction accelerates development and reduces mistakes. But there is a point where abstraction stops being a benefit and starts hiding costs that need to be visible.

“Where it becomes dangerous to use too much abstraction is when it obscures what is happening with the data layout in memory and the execution patterns,” Ledin cautions. In performance-critical applications, the framework is making decisions about how data is structured and how the processor interacts with it. If the engineer does not know what those decisions are, they cannot tell when they are working against the hardware.

The practical approach Ledin recommends is a two-layer architecture. “Use the most expressive code at the edges of the system, and in the core, use more performance-aware code.” The boundary between those layers is not always obvious in advance, and finding it usually requires benchmarking rather than reasoning. But the principle is clear: abstractions are appropriate where they preserve meaning across the team, and they become a problem where they hide costs that affect the system’s ability to meet its requirements.

One specific pattern worth knowing is the array of structures versus structure of arrays tradeoff. A common data layout is an array of objects, where each object holds all the fields for one entity. For CPU cache efficiency, it can be significantly better to restructure this as a structure of arrays, where each field is stored as a separate array for all entities. “That might have a big impact on performance,” Ledin notes, because the CPU cache loads contiguous memory, and if the algorithm is operating on one field across many entities, the structure of arrays layout means each cache load is full of useful data rather than fields the algorithm is not touching.

The skills that will matter when the hardware changes again

The specific technologies that matter five years from now are difficult to predict, and Ledin is honest about that. “Four years ago when the previous version of my book came out, it was not at all clear to me, or I think a lot of people, what was going to be happening with AI in the coming years,” he says. Predicting which hardware architectures or AI frameworks will dominate is not the point. Building the mental model to understand them when they appear is.

3rd Edition

The foundational skill is the ability to reason across abstraction layers. “The way to really understand the system requires the ability to reason across all of the abstraction layers from the software framework that you are working on at the top level, all the way down to the hardware that runs the code,” Ledin underscores. That does not mean reading assembly code for every application. It means understanding how pipelines and caches work and orienting code to work within those environments rather than against them.

The other shift is heterogeneous computing. Writing code that runs on a CPU is no longer sufficient context for many engineering problems. “It is also becoming more critical to understand heterogeneous computing environments,” Ledin says. “It is not just writing code that runs on a CPU. You might also have code that interacts with the GPU if you are running a parallelized algorithm on that, whether it is a language model or something else.” Domain-specific accelerators, TPUs, RISC-V implementations, and specialized inference chips are all becoming part of the environments that production engineers have to reason about. The engineers who will be most effective in that landscape are the ones who understand why those architectures make the tradeoffs they do, not just how to call their APIs.

This article is based on Deep Engineering #46. You can read the full issue, including additional insights from Jim Ledin on modern computer architecture and AI infrastructure,

Deep Engineering #48: Erik Wilde on Agent-Ready APIs, Widespread MCP Adoption, and the OpenAPI Standards That Matter

Saqib Jan — Thu, 21 May 2026 17:43:11 GMT

Building Reliable AI Agents with Java and LangChain4J

A hands-on workshop covering how to build production-grade AI agents using Java and LangChain4J.

🗓️ Friday, June 13 · 10:00 AM – 1:30 PM ET · Online

2 for 1 deal is live. Use code DEEPENG50 for 50% off.

✍️ From the editor’s desk,

Welcome to the 48th issue of Deep Engineering!

Google announced Managed Agents in the Gemini API two days ago at Google I/O, making it possible to spin up an agent that can reason, use tools, execute code, and browse the web with a single API call. The infrastructure work that previously required teams to build and manage sandboxes, scaffolding, and execution environments is being abstracted away. The capability is in public preview and Google is clear that outputs should be reviewed before use in sensitive workflows, but the direction is quite clear. Deploying agents is getting significantly easier.

What is not getting easier at the same pace is making the APIs those agents will call worth calling. APIs designed for actual developers, who can tolerate ambiguous descriptions, infer intent from sparse documentation, and navigate hundreds of operations to find the right one, do not work the same way for agents. Agents are less reliable at resolving ambiguous API semantics, choosing among many overlapping operations, and safely composing actions without machine-readable contracts and guardrails.

Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador at the OpenAPI Initiative, has spent considerable time on solving for that gap. We spoke with Wilde about what agent-ready actually means in practice, and he explained from his engineering purview why MCP will not fix a poorly designed API foundation, and what platform engineers should start planning for today.

The expert insights in today's issue are based on our recent live interview with Wilde and you can read or watch the full Q&A here.

Let’s get started.

Featured Newsletter: Machine Learning at Scale.

Are you a SWE looking to upskill into ML systems? Get high quality ML system design content delivered to your inbox. Learn how to design and scale Machine Learning Systems.

Subscribe to Machine Learning at Scale

🧠 Expert Insights

Your APIs Are Not Ready for Agents, and MCP Will Not Fix That

by with

The conversation about AI agents in enterprise software dominates engineering mindshare as to how agents will consume APIs and what it actually takes to make that consumption work reliably. Most organisations have taken the shortcut. They have built an MCP server, pointed it at their existing API landscape, and told themselves the agent problem is solved. Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador at the OpenAPI Initiative, thinks that is the wrong bet, and his reasoning is specific enough to be useful.

“Whatever you invest in better APIs becomes useful for everybody,” Wilde affirms. “If you invest specifically in MCP, that investment is effectively scoped to LLM consumers.” The point is not that MCP is useless. It is that MCP is a delivery mechanism, and delivery mechanisms change. The API foundation underneath it does not change nearly as quickly, and if that foundation is poorly designed for the agents that will eventually consume it, no amount of tooling stacked on top of it will compensate. The organisations that will be in the strongest position in two years are the ones investing in the foundation now, not the ones chasing the current delivery protocol.

The abstraction level problem

The clearest way to understand what makes an API agent-ready is to look at a concrete example, and Wilde in our interview offered one that makes the problem immediately legible. The GitHub REST API currently has around 1,100 operations. That is not unreasonable for a product as complex as GitHub. A developer can navigate 1,100 operations because they bring context, experience, and the ability to read documentation and infer intent. They know roughly what they are looking for and they can work toward it even when the path is not obvious.

An agent does not work that way. “For an agent to work directly with that GitHub API is pretty complex,” Wilde points out, “because a lot of those operations need to be combined in a certain way to result in the workflows that you really want to accomplish on GitHub.” The agent has to figure out not just what each individual operation does but how they compose, in what order, under what conditions, and with what dependencies. With 1,100 operations, the combinatorial space of possible workflows is enormous, and agents navigating it without guidance will produce unreliable results.

Now look at the GitHub MCP server, which has around 70 tools. Each of those tools represents a higher-level workflow, something a developer might actually want to accomplish on GitHub rather than a low-level operation that contributes to that accomplishment. The reduction from 1,100 to 70 is not a loss of capability. It is a gain in usability for the specific class of consumer that is trying to get things done rather than explore a surface. “What I would say,” Wilde argues on this point, “is that if you had a genuinely agent-friendly GitHub API, it might also just have around 70 operations.” The MCP server is not adding something new. It is providing the abstraction level that the underlying API should have provided in the first place.

This is the abstraction level problem, and it is the most important design question for engineering teams building API infrastructure that agents will consume. The APIs that were designed for developer flexibility, with many fine-grained operations that compose in powerful ways, are exactly the wrong shape for agents that need to accomplish specific goals reliably. The discipline of designing for agents is the discipline of asking what a consumer actually wants to accomplish and surfacing that at the API level, rather than exposing every atomic capability and leaving the composition to the consumer.

What agent-ready actually means

The properties that follow from the abstraction level insight are consistent and actionable. An API designed for agent consumption should not be too fine-grained, and its descriptions should be intent-based and written at a level that is meaningful for a language model rather than just technically accurate for a developer who already knows the domain. It should have examples, ideally multiple examples per operation rather than one, because examples are one of the most reliable ways for a model to understand what an operation actually does in practice. Its error messages should be meaningful enough that an agent encountering a failure has enough information to understand what happened and what it might do next.

“If an AI agent looks at a poorly described API and cannot figure out how it works, it will just move on to the next one,” Wilde notes. “It has less context. It has less experience. It does not really know as well as an actual developer what to do.” This is the practical consequence of the abstraction level problem at the description level. A developer reading a sparse API description can fill in the gaps from domain knowledge and engineering experience. An agent cannot do that reliably, and the result is not a helpful error or a clarifying question. It is a silent failure or a wrong action.

Wilde and his team have built a scoring mechanism for API readiness that makes these dimensions concrete. The scoring uses a combination of standard linting, running tools like Spectral and Redocly to check structural conditions, and LLM-based checks that evaluate whether descriptions are written in a way that is genuinely useful for an agent rather than just present. The distinction matters because a description that exists and passes a structural check may still be useless for an agent if it describes what an operation does technically without explaining what a consumer would use it to accomplish. “These descriptions need to represent intent,” Wilde highlights. “What is the intent of somebody who would use this operation?”

Linting though necessary is not sufficient

Linting has become standard practice in well-run API programs, and Wilde endorses it as a first line of defense. The popular tools are capable and in some cases open source, and the practice of defining shared rule sets that teams can discuss, extend, and maintain in version control is genuinely useful. But in our conversation he was clear that linting alone does not get you to agent-ready, and teams that treat it as the complete solution are leaving the most important problems unaddressed.

The structural checks that linting tools perform are exactly that. They can tell you whether a description field exists and whether it meets a minimum length requirement, but they cannot tell you whether the description is written in a way that helps an agent understand what the operation is for. They can flag a missing example but cannot evaluate whether the examples present give a model enough signal to use the operation correctly in a novel context. The gap between what linting checks and what agent readiness requires is the gap between structure and meaning, and closing it requires evaluation mechanisms that go beyond pattern matching on OpenAPI descriptions.

Wilde also makes an important point about rule set governance that is worth taking seriously. “I am not a big fan of just reusing existing rule sets,” he contends. “I would always say start owning this, build up your own in a collaborative fashion.” The Zalando and Adidas rule sets that circulate in the API community are useful references, but they were built for specific contexts and specific quality standards. Adopting them wholesale means inheriting decisions that were made for a different organisation’s constraints. The value of a rule set comes not just from the rules it contains but from the process by which those rules were agreed upon, which is a process that builds shared understanding of what good API design actually means in a particular context.

MCP is a delivery mechanism, not a foundation

MCP has been growing fast. It is now under the Linux Foundation, major model providers support it, and a growing number of enterprise vendors are shipping MCP servers as a standard part of their product offering. For engineers deciding where to invest, it looks like an obvious answer to the question of how to make APIs accessible to agents.

Wilde’s skepticism is not about MCP’s current momentum. It is about what MCP is and what it is not. “MCP is the current delivery mechanism,” he says. “You need a delivery mechanism, but I would not build too many things that are MCP-specific.” At Jentic, the team supports MCP because it is what the market expects right now, but they have deliberately avoided deep investment in MCP-specific infrastructure. If MCP were replaced by something else, the transition would be straightforward because the underlying work, making APIs well-described, well-structured, and semantically rich, would carry over entirely. That work is not MCP-dependent. It is foundational.

The risk for teams that invert this priority is real. Building an MCP server on top of a poorly designed API landscape means the MCP server inherits all of those same problems. Operations that are too fine-grained stay too fine-grained, descriptions that lack intent stay unreadable to a model, and error messages that tell a human nothing tell an agent even less. The wrapper changes the protocol by which those problems reach the agent, not the problems themselves.

Open standards outlast any delivery protocol

One of the clearest threads in Wilde’s thinking is the value of building on open standards rather than specific tools or protocols. This is not an abstract preference for openness. It is a practical argument about optionality. Teams that build their API practices on OpenAPI, Arazzo, and Overlays are building on specifications that are independent of any vendor, any model provider, and any current delivery protocol including MCP. When the next delivery mechanism arrives, or when the current tooling landscape shifts, the foundation remains.

Arazzo is worth understanding in this context. It is a workflow language published by the OpenAPI Initiative that allows you to describe sequences of API interactions in a standardised format. If accomplishing a particular goal requires calling five endpoints in a specific order with specific dependencies, Arazzo is the language for expressing that. For agents, which struggle with exactly this kind of multi-step composition, a well-constructed Arazzo workflow is one of the most useful things an API producer can provide. “Figuring out multi-step workflows is one of the hardest things for agents to do right now,” Wilde says, “and Arazzo is genuinely good at describing those. We just need to make it discoverable.”

Overlays, the third specification from the OpenAPI Initiative, provides a way to express changes to an OpenAPI description in a standardised diff format. “We use overlays,” Wilde shares, “to deliver improvement suggestions alongside API scores. When the scoring mechanism identifies that an API is not well-designed for AI consumption, it also produces an Overlay that shows exactly what would need to change to improve it.” That makes the gap between current state and agent-ready state concrete and actionable rather than a list of abstract recommendations.

The APIs you design today will still be running in two years

The practical implication of everything Wilde argues is a specific recommendation about timing. API landscapes change slowly. Whatever is designed or changed today will likely remain largely unchanged for one to three years. Agents are arriving in enterprise contexts incrementally but consistently. The customer support and HR agents that are already deployed broadly are the early wave, and the business agents with genuine decision-making authority are behind them.

“API landscapes evolve slowly,” Wilde says. “Whatever you design or change today, you will probably have around for a year or two or three before you touch it again.” The teams that start building API readiness for agents now are the teams whose infrastructure will be in the right shape when agents with more capability and more authority arrive. The teams that wait for agents to become mainstream before improving their APIs will find themselves doing expensive remediation work on a landscape that is already in production and already depended upon.

The recommendation is not to stop shipping features or to redesign everything at once. It is to make agent readiness a standard consideration in the decisions that are already being made. When writing a new operation, write the description for an agent as well as for a developer. When adding examples, add enough that a model can generalise. When defining error responses, add enough context that a consumer without domain knowledge can understand what happened. These are not large investments per decision and they compound over time into an API landscape that agents can actually use.

To this end, “All the platform people out there who are building API platforms or doing platform engineering,” Wilde says, “think about how all of this will change if you have more and more agentic actors and consumers in your organisation, and start planning for that today, even if you can say that right now you do not have it this much and it is going to be another year or two. It is going to arrive.”

In case you missed

Here’s the full Q&A with the interview video featuring Erik Wilde.

🛠️ Tool of the Week

Spectral — open-source JSON and YAML linter with built-in support for OpenAPI, Arazzo, and AsyncAPI

Validates OpenAPI v3.1, v3.0, v2.0, Arazzo v1.0, and AsyncAPI v2.x out of the box.
Supports fully custom rule sets, letting teams build and own their own governance standards.
Integrates with VS Code, JetBrains, GitHub Actions, and Azure API Center for shift-left linting.

Learn more about Spectral

📎 Tech Briefs

Google Managed Agents now in Gemini API - A single API call now provisions an ephemeral Linux sandbox with code execution, web browsing, and tool use built in.
Kyverno 1.18 released post-CNCF graduation - First post-graduation release patches two SSRF CVEs and adds cleanup policy support to the Kubernetes policy engine.
OpenAI and Dell bring Codex to on-premises enterprise - The partnership makes the Codex coding agent available in hybrid and air-gapped enterprise environments for the first time.
A2A protocol underpins Google’s full agent stack - Agents built at any abstraction level can be called as sub-agents across the entire Google Cloud agent platform.
43% of AI-generated code fails in production - Survey of 200 SRE leaders finds teams average three production redeploy cycles to verify a single AI-suggested fix.

That’s all for today. Thank you so very much for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Stay awesome,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.

Building Agent-Ready APIs in Production with Erik Wilde

Saqib Jan — Wed, 20 May 2026 20:06:57 GMT

Erik Wilde has spent more than 12 years working on APIs in every form, from communication protocols to enterprise API platforms, governance frameworks, and now the question of what it takes for APIs to actually work for AI agents. He holds degrees in computer science from TU Berlin and a PhD from ETH Zurich, has contributed to multiple open standards, and is an OpenAPI Ambassador at the OpenAPI Initiative. He currently works at Jentic, where he focuses on making API landscapes usable for the next generation of agentic consumers.

Erik joined Deep Engineering Live interview session to talk about OpenAPI 3.2, what agent-ready APIs actually look like, and why he is more skeptical about MCP than most people expect.

Watch the full conversation below.

A note on format: this session was recorded live as part of the Deep Engineering Live Interview Series. The transcript below has been lightly edited for clarity and readability. Audience members joined the conversation and asked questions directly during the session.

Q. Tell us about your background and how you ended up working on APIs.

I have been working on APIs, in some shape or form, all of my life. I started with communication systems and protocols and then moved into the API space proper about 12 years ago. I have mostly worked for companies that sell enterprise software in that space, so typically API gateways and API platforms, the kinds of things where large companies have a lot of digital capabilities and a lot of those have APIs. More and more, companies have realized that the better you maintain, manage, extend, and govern that real estate, the easier it becomes to develop new applications and to realize potential that is within the company but needs a little bit of digging to get to.

Then about a year ago I met the two founders of Jentic, and they described to me what they were building. Very briefly, what they want to do is build a platform where agents can use APIs, because oftentimes the APIs that exist might not be the ideal ones for agents, and you also might want to control those agents a little more because you might not be confident they always do the right thing. We all know that AI has a tendency to sometimes have surprising ideas. I really liked that idea, so I decided to join. I have been at Jentic now for just over half a year and it has been a great experience. I still talk about APIs because in the end, without APIs, there is simply no AI.

Q. OpenAPI 3.2 shipped last September. What changes have the highest operational impact for engineering teams, and which are mostly nice to have?

3.2 is a maintenance release. It is backwards compatible and does not change things dramatically. What it is not, and I want to start there, is AI focused. That is what we are planning for the next version, 3.3, where we really want to think more aggressively about what it would take to make OpenAPI specifically more AI friendly.

That said, even in 3.2, some of the improvements are more meaningful than they might first appear. The tag system has been extended so that tags, which you use to group and annotate operations, are now a hierarchical space rather than a flat one. You can have tags and subtags and so forth. That is something people always wanted to do. The reason it matters for AI is that anything that makes an API description semantically richer, anything that allows descriptions to carry more meaning, is valuable for agents. So thinking about how you describe your APIs not just as technical endpoints but as semantic services, with rich schemas, descriptions at every level, and well-defined error messages, that is where I think the real operational value lies right now.

At Jentic we have released a scoring mechanism for APIs so you can find out whether your API is AI friendly or not. A lot of what that scoring looks at is the kinds of things that have always been good API design practice: put in more descriptions, include examples, make your error messages clear and actionable. The difference now is that where a human developer might look at a poorly described API and figure it out from experience and context, an agent that cannot figure out how an API works will simply move on to the next one. It has less context and less tolerance for ambiguity. So the APIs you design now will probably be around for a couple of years, and starting to think about this new class of consumers is worth doing today.

Q. Streaming is also now explicitly supported in 3.2. When teams document streaming, what details separate readable from implementable and testable?

Streaming always was something people were doing. I think it has just become so much more visible because that is how all the AI APIs work. When you use a chatbot and you watch the response appear word by word, that is streaming in action. And what 3.2 does is give you a slightly more explicit way to document that in OpenAPI. That is actually a very common pattern with OpenAPI improvements over the years. It is not that something entirely new is added. It is more that people can now formally document something they have been doing all along, but that was not well covered by the specification.

WebHooks are another good example of this. WebHooks have been popular for a long time. I was surprised when somebody gave me a statistic saying that around 60 percent of the 100 most popular APIs use WebHooks. That is a remarkably high number, but it makes sense because WebHooks are a convenient pattern. You do something with an API, and at some point the API can call you back and say this process is finished, go and fetch your results. People had been doing that for a long time, but it was never explicitly supported in OpenAPI. And then at some point the specification simply gets extended to cover what practitioners are already doing. That is what makes it more complete over time.

Q. The 3.2 tag structure now supports nesting. How do you use tags as information architecture for large API catalogs, and how do you govern that taxonomy across teams?

That is a good and very demanding question, because it goes well beyond OpenAPI and into whether you have a data dictionary or some general framework for how things get named in your organization. Organizations always have a hard time doing that because it is hard to agree on terms, and it is hard to make sure that everyone understands which terms exist, what they mean, and when to use them. Tags are no different. They give you a way to assign meaning to things in your OpenAPI description, but what that meaning is is entirely up to you.

Until now tags were relatively minor things. The typical pattern was to say here are all the operations about customers, here are all the operations about products, and so on, and documentation tools would then group things by tag. With the hierarchical tag structure in 3.2, you could go much further. You could have a hierarchy of unlimited depth if you want, where each thing in your API is linked to some kind of data dictionary or ontology. I have not seen people doing that yet, but I am pretty sure they will start.

That said, my recommendation would be not to go crazy building a complex standalone tag taxonomy inside OpenAPI. If you start introducing complex terminology with different hierarchies and groupings, you probably also need to align that with every other place in your organization where things get tagged, whether that is databases, document stores, or wherever you manage information. So check what your general information architecture looks like. What dictionaries or terminologies are already established? Then think about how you map those into the OpenAPI tag model rather than inventing a whole new taxonomy that lives only in your API descriptions.

Q. On linting as a quality gate: how do you design a rule set taxonomy that maps cleanly to real ownership, the way platform teams and product teams each have different responsibilities?

What linting is being used for right now is governance and a level of automation. The goal is that when people start designing or changing APIs they get quick feedback on whether they are following guidelines or not. A good number of organizations publish their rule sets openly on GitHub. I have a collection of around 30 or 40 publicly accessible ones. The Zalando ones are popular because they have been around for a while. Adidas has some solid ones. There are also some published by government and e-government initiatives. So there are plenty of references.

Linting is useful but it has real limitations. The popular tools, whether that is Spectral, Vacuum, or Redocly, all work in a similar way. You have rules that apply to certain parts of your OpenAPI description and they check for structural conditions. Something like, this operation must have a description and the description must be at least 20 characters. It is really a structural check. And that is useful. I would absolutely recommend doing it.

What I am not a big fan of is just reusing existing rule sets wholesale. I would always say start owning this, build up your own in a collaborative fashion. Have a GitHub repository somewhere where developers can propose and discuss new rules, argue for whether a guideline is worth following, and then get it merged into your shared rule set once there is enough agreement. You might also have different rules for different stages of the API lifecycle. Some rules are so important that every code check-in has to follow them. Others might only apply to APIs you expose to external partners, where you want higher quality standards. So you end up with rule sets that are tuned by the consumer type or the lifecycle stage, or both.

But as I said, linting has limits. At Jentic we use Spectral and Redocly as part of our API scoring checks, but we also have a good number of LLM-based checks, because if you are scoring APIs for AI readiness, what matters is not just whether a description field exists but whether it is written in a way that is actually useful for an agent. Those are the kinds of checks that typical linting tools cannot do because they operate at the structural level. So linting is a solid and by now fairly standard first line of defense, but also look a little beyond it.

Q. How do you set severity levels like error, warning, and informational, and what is an exception policy that avoids lint fatigue without lowering the floor?

Severity levels really should be what you would expect. If something is non-negotiable and needs to be fixed before anything moves forward, that is an error. There is no discussion. Then you have warnings, where the message is that this is not great but it is acceptable, though you should consider fixing it. It gives the developer a signal without blocking them. And then informational messages, which honestly I am not sure are that interesting for developers to act on directly. What I have seen done a couple of times is that informational-level messages are not really meant for developers to read at all. They are intended for downstream tooling. The linter surfaces an observation that is then picked up by some other tool in the pipeline. So the informational channel becomes a way for the linter to communicate with tooling downstream rather than with the developer.

Q. On large specs with tens of thousands of lines, linting performance and PR feedback loops become real constraints. What repository or spec structuring patterns reduce friction without fragmenting the contract?

What you probably want is to avoid always linting the whole thing. Large specifications are never in one file. They are assembled from a whole bunch of sources, schemas, references, and components from various places. So it makes much more sense to have your checks in place at those individual source locations rather than only at the assembled specification level. Instead of linting the full spec at the end of every pipeline run, start linting when you make changes to the schemas and the smaller pieces that feed into the overall description.

If you do that with a reasonable level of discipline, you avoid the compounding effect where you finally lint the big spec and get hit with hundreds of errors you have been quietly accumulating. Do not treat linting as the last step. Do it as early as possible, as close to where the change is actually happening as you can. That is the pattern that keeps the feedback loops short and the debt manageable.

Q. There is a proposal for OpenAPI 3.3. What are you personally most interested in seeing there?

For me, because of where I work right now, the big issue is how we could improve OpenAPI specifically with a focus on AI. We have not done that so far in any serious way. There are a whole bunch of discussions within the OpenAPI Initiative around how that could be done.

Some of it is about semantics. Some of it is about making clearer when and how long an API is actually going to be around, which is something agents care about in ways that human developers traditionally have not. Agents always use an API at runtime. They discover it, decide it looks like a good API to use, and then need to figure out what it does, what it does not do, what its side effects and constraints are. All of that could be surfaced in a much more accessible way through the API description itself rather than sitting only in human-facing documentation.

One idea I find genuinely interesting is the relationship between OpenAPI and Arazzo. Arazzo is a workflow language, published by the OpenAPI Initiative, that lets you orchestrate sequences of OpenAPI interactions. You can say: to accomplish this goal, call this endpoint, then that one, then that one. It is a simple orchestration language layered on top of OpenAPI. What would be really cool is if an OpenAPI description could link to an Arazzo workflow and say, if you use this operation, it actually makes the most sense as part of this workflow you can find over there. Figuring out multi-step workflows is one of the hardest things for agents to do right now, and Arazzo is genuinely good at describing those. We just need to make it discoverable. So that is one of the directions I would love to see 3.3 move in.

And as a reminder, the OpenAPI Initiative is open source and open to everyone. You do not need to be a member, you do not need to pay anything. The discussions happen primarily on Slack. If you have ideas or questions, just come and join. It is a very active and welcoming community. Check out openapis.org, and note that the S matters.

Q. With MCP consolidating under the Linux Foundation’s AI foundation, what is the minimum governance surface an enterprise needs before agents can use tools broadly?

I am still a little skeptical about MCP, honestly. I may very well be wrong, but what I would really encourage everyone to do is first think about your API estate and really invest in your APIs, rather than obsessing too much over MCP specifically. Whatever you invest in better APIs becomes useful for everyone. Developers can use it, agents can use it, partners can use it. If you invest specifically in MCP, that investment is effectively scoped to LLM consumers. And that may sometimes make sense, but it is important to keep in mind that the API landscape is the foundational layer you will be working with long term, and MCP may or may not stick around.

At Jentic we do support MCP because at this point you have to, but we are not deeply invested in MCP itself. If MCP went away and something else came along, that would not be a significant problem for us. We think of what we do as delivering capabilities to agents, and MCP is the current delivery mechanism. You need a delivery mechanism, but I would not build too many things that are MCP-specific. That would be my personal view.

Q. From an audience member: what makes an API truly agent-ready in production compared to a standard REST API?

One of the things I like to use as an illustration is the GitHub API. The current GitHub API version three has around 1,100 operations. GitHub is a complex product and there is a lot you can do with it, so 1,100 operations is not unreasonable. But for an agent to work directly with that API is quite complex, because a large number of those operations need to be combined in a certain way to produce the workflows that you actually want to accomplish on GitHub.

Now compare that to the GitHub MCP server, which has around 70 tools. Way fewer, and they are much higher level. They represent entire workflows, entire things you might want to do on GitHub, rather than the more atomic operations you find in the native API. What I would argue is that if you had a genuinely agent-friendly GitHub API, it might also just have around 70 operations. Not 1,100. Right now those 70 are available through MCP because that is what GitHub decided to build, and that is fine, but the point is that if you have an agent that wants to get things done, it will be significantly happier with 70 well-described higher-level operations than with 1,100 lower-level ones.

The properties that make an API agent-ready follow from that. It should not be too fine-grained. The descriptions should be written at a level that is meaningful for an LLM, which means intent-based and human-readable, not just technical. It should have examples, and ideally multiple examples rather than just one. Error messages should be meaningful and actionable, giving the agent enough information to understand what happened and what it might do next. And if you make those improvements, you almost certainly also improve the developer experience as a side effect, so it is not a speculative investment.

Q. On API deprecation and sunsetting: how should agents handle the lifecycle signal that an API they depend on is entering a sunset cycle?

Deprecation and sunsetting are genuinely important to me. I have written some small standards for how an API can actually surface that information at runtime. And I think we will see more and more of these runtime mechanisms being built out, because agents consume APIs at runtime by design. They discover an API, start using it, and then ideally they should also be able to discover that the API is only going to be available for another two weeks. At that point, a well-designed agent might alert someone, or start looking for a replacement, or whatever the right behavior is for that situation. What exactly to do about it is a separate design question. But as a consumer of an API, this is information that is relevant, and if we can surface it at runtime, consumers can react at runtime. That feels like an obviously good thing to pursue.

Q. On request and response schema design: how do you design schemas so that an LLM can reliably choose the correct operation, handle partial failures, and avoid duplicating side effects?

Schema design becomes part of the general question of how you design OpenAPI for AI consumption. You want descriptions in your schemas, not just in your operations, so that an LLM can understand what individual fields actually mean rather than just their names and types. Names that carry meaning help too. Parameters named X, Y, and Z are much harder for an agent to reason about than parameters with names that reflect their actual intent.

Beyond that, I think we are going to see interesting evolution in how APIs handle the granularity of what they return. Right now the standard REST model is relatively static: here is a request schema, here is a response schema. But if you are working with agents that are trying to minimise token usage and context pollution, there is a real case for APIs that can return only the fields that were actually asked for. GraphQL has a nice built-in capability for this, which is one of the things that makes it interesting for agentic use cases. REST does not have that natively, but you could layer something on top. We will see how that evolves, but it is one of the more interesting design questions in this space right now.

Q. What workflow patterns show up repeatedly when enterprises actually start working with Jentic, and what makes them stable as APIs underneath them change?

One example we were not expecting, which is always a good sign when you start talking to real enterprises, is the partner integration scenario. If you have a relatively complex API that you expose to partners, that is a large engineering effort for each of those partners. They have to understand the whole API even if they only need a small part of it.

What we now actively pursue, because it keeps proving useful, is creating specific workflows for specific partners. You say, this partner only wants to do these particular things, so they get a set of workflows built on top of the API that match their actual use cases. They do not need to understand the full API surface. They just need to understand the workflows that were created specifically for them.

And the stability point is interesting. As long as you develop your APIs in a backwards compatible way, those workflows remain stable even as the underlying APIs change. As a workflow user you do not even need to know that the APIs underneath now do additional things. You just keep invoking the same workflows and they continue to work. The moment you break a backwards compatible API is the moment you also break the workflows depending on it. So the discipline of backwards compatibility pays off at every layer.

Q. Looking ahead six months, what should a senior engineer or platform engineer watch closely in standards, tooling, or governance for agent-facing APIs?

What I would recommend, starting from tomorrow morning, is to begin thinking about agents in your planning even if you do not have them yet. And I acknowledge that the term agent has become fairly meaningless at this point. Everything seems to be called an agent now. But what I do see when talking with organisations is that certain types of agents are already getting real use, customer support agents and some HR agents being the most common. These are agents that are useful across industries, and you can mostly buy them, hook them up to your documentation, and they work.

What you see much less of right now, despite all the talk, is what I would call real business agents in production, where a piece of software can sense things, take action, and make decisions. Agents that actually have agency. And I believe we will see more and more of these, not necessarily all at once, but incrementally. You trust them with a little more next year, and a little more the year after.

Because of that, I would highly recommend making the AI readiness of your APIs part of your standard practice now. API landscapes evolve slowly. Whatever you design or change today will probably be around for a year or two or three before you touch it again. So ask yourself whether your linting and your design practices are optimising only for developer experience, or whether they are also starting to account for agent experience. The good news is that optimising for agent experience tends to improve developer experience as a side effect. You are not making a speculative bet. You are making something better for everyone while also preparing for what is coming. If you work on API platforms or in platform engineering, start thinking now about how your API landscape will need to evolve as you have more and more agentic consumers. Because it is going to arrive. That is at least my personal view.

Erik Wilde is Head of Enterprise Strategy at Jentic and an OpenAPI Ambassador at the OpenAPI Initiative. He is the creator of the Getting APIs to Work channel on YouTube. This interview was conducted by Saqib Jan, Editor-in-Chief of Deep Engineering.

Why Senior Engineers Fail System Design Interviews

Saqib Jan — Tue, 19 May 2026 20:19:10 GMT

Most engineers presume that because they know their tech stack well enough, the system design interview will be easy. And why should they think any differently. They have shipped distributed systems at scale, debugged race conditions at 3am, and made the architectural calls that kept production stable under pressure. But then they walk into a system design interview with confidence and walk out having failed, often without understanding exactly why.

Archit Agarwal, Principal Member of Technical Staff at Oracle where he builds ultra-low-latency authorization services in Go, has interviewed hundreds of engineers. His observation about why experienced engineers fail is the most direct and honest assessment of this problem: they do not fail because they do not know what Kafka is or how DynamoDB handles consistency. They fail because of how they communicate. That single factor, how clearly and deliberately an engineer narrates their thinking, determines the outcome of most system design interviews more than any technical knowledge does.

They jump to solutions before understanding the problem

Agarwal described a pattern he sees play out repeatedly across interviews at every level of seniority. An interviewer gives a problem and within thirty seconds the candidate is already saying “I’ll use Redis, I’ll use Kafka, let’s go with microservices.” The interviewer has not said anything about scale. The candidate has not asked how many users the system needs to support, whether it is read-heavy or write-heavy, what the latency requirements are, or whether there are compliance constraints based on the geography of operation. They have skipped the part of the conversation that actually determines what should be built.

Those questions are not warm-up questions. They are the questions that drive the architecture. Nonfunctional requirements determine architecture, not the other way around. How many requests per second, what consistency model you need, whether you have a strict latency ceiling, these are the inputs. The architecture is the output. Engineers who skip to the output without gathering the inputs are designing in a vacuum, and the interviewer can see it the moment it happens. Agarwal’s recommendation is to spend the first one to two minutes of any system design interview doing nothing but alignment: gather functional requirements on what is being built and what the user actually needs, then gather nonfunctional requirements on scale, consistency, latency, and compliance. If you ask the right questions in those first two minutes, Agarwal says, you have already impressed the interviewer. They are listening properly now, engaged, and following where you are going rather than waiting for you to stumble.

They design everything at Google scale

Senior engineers have worked on large systems and that experience is genuinely valuable, but it also creates a bias that hurts them in interviews: the instinct to design for the most demanding possible version of any problem, whether the problem actually requires it or not. Agarwal is direct about this. Not every system needs to scale to Google. If you are designing an internal tool that will only ever be used by your company’s engineers, you do not need multi-region deployment, and you do not even need cloud infrastructure. You could run it on a local area network and it would be perfectly adequate for the problem at hand. The engineer who reaches for global infrastructure for a problem that does not need it is demonstrating a failure of judgment, not a depth of knowledge.

Good system design is about matching the architecture to the requirements you gathered in those first two minutes, not about showcasing every pattern you have ever learned across a career. The interviewer is not evaluating whether you know how to design at Google scale. They are evaluating whether you understand when to use which level of complexity and why, and that distinction is entirely invisible if you default to maximum complexity regardless of the constraints in front of you.

They go quiet when they are thinking

Senior engineers are often comfortable sitting with a difficult problem for several minutes before speaking, and in a production context that is a perfectly reasonable way to work through something complex. In a system design interview it reads as disengagement, and the interviewer has no way to tell whether you are making progress or whether you are stuck. Agarwal uses a phrase that reframes what good communication looks like in this context: the interviewer needs to be able to follow your brain’s commit history. Every decision you make, every trade-off you consider and reject, every assumption you surface and then validate or invalidate, should be spoken out loud as you make it, not as a performance or a monologue but as a live narration of your actual reasoning as it happens.

This serves two distinct purposes. It gives the interviewer genuine insight into how you think rather than just what conclusion you eventually reached, which is what they are actually evaluating. And it forces you to be more precise about your own reasoning, because articulating a decision out loud surfaces the assumptions underneath it in a way that thinking silently does not. Agarwal’s observation is that engineers who think out loud often catch their own errors in real time and self-correct naturally, and that self-correction is not a weakness. It is exactly the kind of flexible, honest thinking the interviewer is looking for.

They defend their design when constraints change

Experienced engineers have ownership instincts built over years of shipping and defending decisions in production. When they have built something they defend it, and in most professional contexts that instinct is appropriate. In a system design interview it becomes a liability the moment the interviewer introduces a constraint change mid-session, which Agarwal says he genuinely enjoys doing precisely because it reveals something important about the candidate.

Changing constraints are the normal reality of production engineering. Requirements shift, scale changes, new compliance requirements appear, and the ability to absorb a change, restate it clearly to confirm alignment, identify which parts of the design need updating and which parts remain intact, and then restructure calmly is exactly the capability that distinguishes an engineer who can operate in a real production environment from one who can only design under controlled conditions. The engineers who struggle here are the ones who treat the curveball as an attack on their design and respond by defending the original rather than adapting to the new information. Agarwal’s point is unambiguous: the interviewer is not trying to invalidate your architecture. They are trying to see whether you can hold your design lightly enough to change it when the situation demands it, which is something you will be required to do repeatedly in any engineering role worth having.

They use jargon to sound credible instead of clarity to be understood

Senior engineers have large vocabularies built from years of working across complex systems. Distributed systems, eventual consistency, CQRS, saga pattern, two-phase commit. These are real concepts with real meanings and knowing them is genuinely useful. But using them in rapid succession without grounding them in the specific problem being discussed is a signal that the engineer is performing knowledge rather than applying it, and experienced interviewers recognise the difference immediately.

Agarwal’s standard for communication in a system design interview is demanding but correct: your explanation should be clear enough that even a junior engineer could follow the reasoning without needing to already know the answer. Not dumbed down, and not simplified to the point of inaccuracy, but clear enough that every choice is grounded in the specific requirements of the system being designed rather than in a general desire to demonstrate familiarity with advanced concepts. The engineers who stand out in Agarwal’s interviews are not the ones with the most impressive vocabulary. They are the ones who make him feel like he is sitting with another engineer genuinely working through a problem together, which is exactly what a system design interview is supposed to be.

The full conversation with Archit Agarwal is now live on Deep Engineering.

Rust Is Hard for the Engineers with the Most Experience

Saqib Jan — Mon, 18 May 2026 16:07:41 GMT

Rust, who would have thought, has ranked as the most loved programming language in the Stack Overflow developer survey for nine consecutive years. Honestly, I must admit this is an unusual kind of statistic because it measures not just adoption but retention. The engineers who use Rust want to keep using it, and that pattern has only deepened even as the language moved from systems programming curiosity to production infrastructure at companies including Amazon, Google, Meta, and Microsoft. Interestingly, the Linux kernel now carries Rust code without the experimental label it held for years. Debian’s APT package manager is also introducing hard Rust dependencies this year.

The performance benchmarks and the memory safety arguments have been made, tested in production, and largely validated. But what the benchmarks do not explain is why so many experienced engineers find Rust genuinely difficult to work with, why the teams that adopt it often go through a period where velocity drops before it recovers, and what it actually takes to get good at it rather than just competent. These are the questions that sit behind the adoption numbers and they matter more than the numbers do for any engineer thinking seriously about where Rust fits in their work.

Evan Williams, author of Design Patterns and Best Practices in Rust, has been writing software for more than 40 years and came to Rust while building a hardware system that needed to be rock solid and run without access in a remote location. Francesco Ciulla, author of The Rust Programming Handbook and a Docker Captain who previously worked at the European Space Agency on the Copernicus project, started publishing Rust content in 2022 and 2023, earlier than most, and has since watched the language’s adoption from the inside. Their perspectives on Rust come from different parts of the stack and different kinds of work, but on the questions that actually trip engineers up, they are in close agreement.

We interviewed Evan Williams and Francesco Ciulla separately for Deep Engineering Newsletter issues.

The engineer who struggles most is usually the most experienced one

The reasonable assumption when a team introduces Rust is that the senior engineers will pick it up fastest. They have the most context, the most pattern recognition, and the most experience navigating unfamiliar codebases. In practice, the opposite tends to happen, and both Williams and Ciulla have seen it play out firsthand.

“The more experienced you are, the more years you have doing something in some other language, the more trouble you’re likely to have,” Williams says, “because you have patterns of thought that come from those languages that you don’t even realize are there.” The problem is not that experienced engineers consciously try to apply Java or C++ patterns to Rust. The problem is that those patterns are invisible to them, baked in over years of use until they no longer register as choices at all. The engineer is not making a decision when they reach for inheritance or shared mutable state. They are doing what has always worked, and Rust will not let them.

Ciulla put it more directly. “Even if you are a senior developer, even if you have twenty years of experience, if you want to try to learn Rust comparing it to other programming languages, you will fail, because it’s like learning something which is completely new.” He makes the case that this is not a reason to avoid Rust but a reason to go in with a specific kind of openness, one that experienced engineers often find harder to maintain than junior ones do precisely because they have more to unlearn. A developer learning Rust as their second or third language has no competing mental model to discard. A senior engineer with a decade long experience in Java has to dismantle instincts that have been reliable for years before they can build new ones, and that dismantling is the work that most people underestimate going in.

Trusting the compiler is not a beginner tip

The first thing most engineers do when the borrow checker rejects their code is look for the minimum change that will make it compile. That is the right approach in almost every other language and the wrong one in Rust, and it is where a significant amount of early frustration comes from.

“The golden rule is to trust the compiler, especially at the beginning,” Ciulla says. What he means is not passive acceptance but active reading. The borrow checker is not producing noise. It is producing information about what the program’s structure requires, and engineers who learn to read it that way move through the learning curve faster than engineers who treat every error as an obstacle to clear. The difference is subtle at first and significant over time.

Williams argues that the borrow checker is doing something more useful than preventing bugs. “The borrow checker is your friend because it prevents you from making a messy design. It prevents you from making a broken design. It prevents you from writing whole classes of bugs that you will then spend many hours trying to find,” he explains. “I have found it to be an incredible partner in writing code that allows me to sleep at night.” The reason it works this way is that Rust’s ownership rules enforce a discipline that experienced engineers in other languages apply selectively and inconsistently because those languages do not require it. A value has one owner. References are either shared and immutable or exclusive and mutable, never both. The compiler will not proceed until the code is explicit about who owns what and when.

“The principles that the borrow checker forces you to adhere to in Rust are the exact principles that you should be using in every programming language,” Williams reasons. “But you don’t have to. So it’s very easy to not think about those things.” That observation reframes what the borrow checker is. It is not an imposed restriction. It is a discipline that good engineers apply in other languages by habit and judgment, made non-negotiable and automatic in Rust.

The discipline extends from individual functions to the shape of the whole system. A program that handles ownership correctly at the function level has to handle it correctly across modules, across threads, and across component boundaries, because the same rules apply everywhere. “You need to think about who controls what, how it is controlled, and you need to start from the very beginning thinking about the boundaries of your program and the system architecture, dividing things up into areas of responsibility,” Williams underscores. “Because unlike Python or Java, you can’t have links going all over the place. The borrow checker is never going to accept that.” The result is that well-written Rust systems tend toward a specific architectural shape: data flows in one direction, ownership chains move forward and do not loop back, and the behavior of the system is legible from its structure in a way that systems with shared mutable state often are not.

The most underutilized expression of what this makes possible is the typestate pattern. It uses the type system to encode the state of a value at compile time in a way that makes invalid state transitions not just errors but programs that cannot be compiled at all. Williams reflects on it with visible enthusiasm. “It’s a way of developing state machines and systems that have state that evolves where invalid state transitions aren’t just errors, they’re impossible to write. The compiler won’t compile them,” he says. “It represents a huge advance in the way that such systems are written because now instead of runtime errors, you have a state machine that is guaranteed to work because every transition either is a valid transition or it won’t even compile. That’s an amazing thing.” The pattern was not invented for Rust, but the language’s ownership system and type handling make it practical in a way that other languages do not, and for systems where invalid state transitions are genuinely dangerous rather than merely inconvenient, it is one of the most concrete expressions of what Rust makes possible.

What Rust actually gives you in production

Ciulla’s case for Rust is grounded in things he measured when running a Rust web server on his own machine. It was consuming four megabytes at rest and five in production. “If you have a droplet with one gigabyte of RAM, you can have 200 plus services,” he notes, “of course in idle, but this proves that if you have a service that consumes a lot of RAM, it is worth thinking about.” For teams running infrastructure where memory costs money and density matters, the difference between a Rust service and an equivalent service in a garbage-collected language is not marginal.

The latency story is also specific. “By not having a garbage collector on the back end side, you basically have a flat latency,” Ciulla observes. “If a user makes an HTTP request when the garbage collector starts, it will experience a higher latency. Rust removes that problem entirely.” Go and Node.js both have garbage collectors that pause for collection cycles, and even short pauses measured in hundreds of milliseconds are enough to introduce latency spikes that affect users who are unlucky enough to hit the request at the wrong moment. Rust’s absence of a garbage collector means the latency profile is predictable rather than probabilistic, which matters significantly for services where consistency is as important as average throughput.

The deployment model is simpler than most engineers expect going in. A Rust project built with cargo produces a standalone binary for the target architecture, which packages cleanly into a container image. “If you build the executable when you build the Docker image, you have something which is just deployable everywhere,” Ciulla says. “A Linux executable running in a Docker container. That’s the dream.” The operational benefit is smaller images, faster startup, and a runtime with almost no overhead beyond the binary itself.

Williams approaches the production question from the correctness angle rather than the performance angle. The systems where Rust earns its place most clearly are the ones where failure has a real cost. “Systems that are mission critical in some way or other are really key Rust use cases,” he says. “All of these features combine into a whole that make Rust a really powerful language for doing things that have to work. Things where failure is monetarily or in human cost even a terrible problem.” The memory safety and the ownership model and the compile-time guarantees are not separate features. They are different expressions of the same underlying commitment: the program either demonstrates its correctness to the compiler or it does not compile.

Williams also reflects on something unexpected he discovered while writing the early chapters of his book, the ones covering what not to do in Rust. He went back and deliberately tried to write bad code, the kind of code that would illustrate the mistakes he was cautioning against, and found it harder than he expected. “When I went back and tried to write bad code in Rust, it was much harder than writing the good code,” he recalls. “That’s an interesting perspective that just didn’t even occur to me.” The language’s constraints push code toward a particular shape so consistently that departing from it requires actively working against the grain of the language rather than simply making a poor choice.

“The biggest benefit in Rust is about the lack of the debugging depth. You spend more time thinking up front, but you spend almost zero time chasing segfaults or memory leaks in production,” Ciulla remarks. “And we always underestimate this part. We always talk about the efficiency of the code, but if you need less time to debug your code, you’re basically writing more logic at the end of the day.” The upfront investment in getting the types and the ownership right is real, but the downstream debugging cost it removes is larger and does not diminish as the team becomes more experienced. It is simply gone.

Where to start and where Rust is the wrong tool

On the practical question of how to bring Rust into an existing codebase, both Williams and Ciulla give advice that converges almost exactly despite coming from different engineering contexts. Neither recommends starting with a rewrite.

“The best way to introduce Rust in a big project is to find that hard part that’s the bottleneck and try to write one single service in Rust,” Ciulla says. “And then you will see, probably slowly, Rust might take over your code base, but I mean this in a good sense.” Williams makes the same point with a specific warning about the temptation to go faster. “What you don’t want to do is jump into saying, we’re just going to rewrite our project in Rust now. Pick a small piece, focus on that, gain confidence and mastery of the language, and then use that to build upon it and start bringing in more things,” he says. Starting with a bounded, non-critical component gives the team room to move through the learning curve without the pressure of a production incident concentrating everyone’s attention on the wrong things.

Ciulla adds something worth noting about the AI-assisted workflow that is becoming standard for many engineers. “In this AI era, everyone is rushing stuff with AI, but you still need the validation,” he says. “Okay, AI wrote this Rust service, but now who decides if this is okay to put in production? Of course, you need the validation of an expert.” The Rust compiler catches a large class of errors automatically, but the errors that survive it, logic errors rather than memory errors, still require someone who understands the language well enough to see what the code is actually doing. Having at least one engineer on the team who knows Rust well enough to review AI-generated code is not optional.

Both are also direct about when Rust is not the right choice. Ciulla points to tight deadlines and fast prototyping as the clearest case against it. “If you need fast prototyping, you are familiar already with Java, JavaScript, why don’t you use it?” he says. “When the deadline is so close, probably it’s not the best way to try something new because something would go wrong, especially if you’re not an expert.”

Williams points to user interfaces as an area where the ecosystem is still catching up and the tooling gaps are large enough to make other languages more practical. “Doing a website in Rust is still kind of a feat,” he notes, “and it’s an awful lot easier to use the tools that everybody else is using to accomplish that goal.” The Python data science ecosystem is another area Ciulla names directly: the libraries are simply better established there, and using Rust for data science work means building against a thinner set of available tools than Python provides.

Where Is the Ecosystem Headed

Ciulla expects Rust to grow most significantly in the near term, and his prediction lands in a direction that surprises most of the Rust community. “I think the next big wave might be in web development,” he says, adding that he is aware this is an unpopular position in a community that still thinks of Rust primarily as a systems language. His reasoning is grounded in what he has been seeing directly: companies with hundreds of developers reaching out to tell him they are moving services to Rust for their web backends. “I get this news because I’m well known for talking about Rust and being quite vocal about it,” he observes. “I’m not talking about a person just doing this on a random Saturday night. I’m talking about companies that have hundreds of developers.” He points to Axum as the framework that has matured to the point where he would now use it in a production SaaS product, which he says was not true two years ago. Embedded systems, in his view, have already crossed the threshold where Rust’s place is settled.

Williams takes a longer view on how the ecosystem will evolve. “The ecosystem is going to get richer and people are going to be branching out in the set of use cases, hitting areas that right now Rust has relatively weak support for,” he says. “As larger and larger projects are built, there is going to be more refinement of the language itself, but more importantly, more refinement of the use of the language.” The patterns that make Rust work well at scale are still being discovered and codified. The language is stable, but the understanding of how to use it well is still developing, and that development is happening inside the teams building the largest Rust codebases.

What both conversations point toward is a language whose difficulty and whose value come from the same source. Rust is hard to learn for experienced engineers because it refuses to accommodate the habits that made them experienced. It is valuable in production because that same refusal, enforced by the compiler on every build, produces code whose behavior is predictable, whose data flows are legible, and whose failure modes are constrained to things the language cannot check rather than things the engineer forgot to check. The engineers who get the most out of it are the ones who stop trying to carry their existing instincts across and start letting the compiler teach them what the program actually needs.

In case you missed

Here’s the full interview video featuring Evan Williams.

Deep Engineering #47: Evan Williams on Why Experienced Developers Have the Hardest Time Learning Rust

Saqib Jan — Thu, 14 May 2026 16:42:52 GMT

Eval Driven Development for Engineers

This hands-on workshop teaches you to build reliable, production-ready AI systems using eval-driven development. Taught by Imran Ahmad, Data Scientist and author of 50 Algorithms Every Programmer Should Know.

🗓️ May 30 · 11:00 AM – 3:30 PM ET

Use code EDD50 for 50% off.

✍️ From the editor’s desk,

Welcome to the 47th issue of Deep Engineering!

Debian’s APT package manager is moving toward the Rust threshold its maintainers set more than six months ago. APT maintainer Julian Andres Klode announced on the Debian developer mailing list that hard Rust dependencies and Rust code would be introduced no earlier than May 2026, citing memory safety and stronger unit testing as reasons to move core parsing and signature-verification paths toward Rust and the Sequoia ecosystem. For a tool that underpins Debian, Ubuntu, and their many derivatives, this is not a marginal adoption story. It is Rust moving into infrastructure that enormous numbers of systems rely on every day.

That kind of decision does not get made because Rust is fashionable. It gets made because the language can shift whole classes of errors from production failures into compile-time constraints. That is precisely the argument Evan Williams, senior software engineer and author of Design Patterns and Best Practices in Rust, makes in this week’s issue. We spoke with Williams about what it actually takes to think in Rust, why the borrow checker is a design tool rather than a compiler obstacle, and why he found it harder to write bad Rust than good Rust when working on the book. You can watch our interview or read the full Q&A here.

Let’s get started.

Featured Newsletter: DevOps Bulletin

If you work across DevOps, Cloud Native, AI and security and want a weekly read that surfaces the most relevant open-source tooling, stories, and insights in the space, DevOps Bulletin is worth adding to your reading list.

Subscribe to DevOps Bulletin

Expert Insights

Rust Makes It Harder to Write Bad Code Than Good Code

by Saqib Jan with Evan Williams

Most engineers who struggle with Rust describe the same experience. The compiler rejects code that would compile without complaint in C++ or Java, and the borrow checker surfaces errors that feel arbitrary until, gradually, they start to feel like something else entirely. Evan Williams, author of Design Patterns and Best Practices in Rust (Packt), has a precise name for what they are. They are design feedback, not feedback about syntax or style, but about the structure of the program itself, the shape of data flow, the discipline of ownership, and the decisions about who controls what and for how long. “Rust is your partner in doing that,” Williams says. “You can still write code with bugs in it, but Rust makes it harder to do that and easier to write code that’s going to be solid.”

That framing changes how to think about the borrow checker, and it changes how to think about what Rust actually is. Most languages make it easy to write code that works in isolation. Rust makes it hard to write code that fails in combination, and the difference matters more as systems grow.

The borrow checker is enforcing design, not syntax

Most engineers who pick up Rust treat borrow checker errors as obstacles to route around. The instinct is understandable because in Java or Python, the path from a failing compiler error to working code runs through adjustment: find what the compiler dislikes, change it, move on. Rust works differently, and engineers who apply the same strategy find that routing around the borrow checker is possible in the short term and damaging in the long term.

“The borrow checker is your friend because it prevents you from making a messy design. It prevents you from making a broken design. It prevents you from writing whole classes of bugs that you will then spend many hours trying to find,” Williams says. “I have found it to be an incredible partner in writing code that allows me to sleep at night.”

The reason the borrow checker behaves this way is structural. Java and Python allow data to be accessed from many places at once, which gives engineers flexibility but leaves the responsibility of managing that access entirely with the programmer. Rust removes that flexibility. A value has one owner. References are either shared and immutable or exclusive and mutable, never both at the same time. This constraint forces the programmer to be explicit about who owns what and when, because the compiler will not let the program proceed otherwise. The practical consequence is that programs which compile in Rust tend to have a quality that programs in other languages achieve only through discipline: their data flows are explicit. You can read a Rust program and understand, without running it, who controls which piece of state, when that control transfers, and what happens at the boundary.

The object-oriented trap

The single most common mistake engineers make when getting started with Rust is treating it as an object-oriented language. It resembles one superficially, with structs, methods, and something that looks like encapsulation, but it has no inheritance, no abstract base classes, and no shared mutable state by default. An engineer who brings a Java or C++ mental model will find that the things they reach for instinctively are either unavailable or actively counterproductive.

“If you carry with you an object-oriented language mindset, then you’re going to have nothing but trouble,” Williams says. “The more experienced you are, the more years you have doing something in some other language, the more trouble you’re likely to have, because you have patterns of thought that come from those languages that you don’t even realize are there.”

Rust for C++ Developers

This is a precise observation about how expertise transfers, or fails to. An engineer with ten years in Java has a large inventory of solutions to common problems, and most of those solutions depend on inheritance, shared mutable references, or runtime polymorphism through interfaces. In Rust, none of those approaches work as expected. The patterns are not wrong in their original context. They are wrong for this one, and the difficulty is that the engineer applying them does not recognize the mismatch until the borrow checker makes it unavoidable.

The design patterns that experienced engineers carry into Rust need to be examined before use, not applied by default. Some evolve into new forms because Rust’s enums and advanced generics make several classical patterns either less necessary or unnecessary entirely. Others require fundamental rethinking. The Singleton pattern, useful enough in Java and Python that engineers reach for it without deliberation, tends to become either redundant or actively problematic in Rust. “In Rust, it tends to be either completely unnecessary because other features of the language make it unneeded, or it tends to encourage designs that are really not necessary and where a much better approach could be used,” Williams says.

The replacement for inheritance in most cases is traits, which provide polymorphism without the coupling that comes from sharing a class hierarchy. The discipline required to work with traits well is the same discipline the borrow checker enforces on data: think about the boundaries, be explicit about what crosses them, and design the structure before writing the code.

Ownership as architecture

The ownership model does more than prevent bugs at the function level. It shapes the architecture of the system, because the rules that apply to individual values apply at every level of scale. A program that handles data ownership correctly in a single function has to handle it correctly across modules, across threads, and across the boundaries between components. The borrow checker enforces this at compile time, which means architectural decisions that in other languages can be deferred until the system grows large enough to break start being made from the beginning.

“You need to think about who controls what, how it is controlled, and you need to start from the very beginning thinking about the boundaries of your program and the system architecture, dividing things up into areas of responsibility,” Williams says. “Because unlike Python or Java, you can’t have links going all over the place. The borrow checker is never going to accept that.”

This constraint produces a specific architectural tendency in well-written Rust systems: data flows in one direction. Rather than components that hold references to each other in a web of mutual dependency, Rust systems tend toward chains of ownership that move in one direction and do not loop back. “By saying, I have a chain of ownership that moves down but never moves back up, you are now much more likely to have a system that is going to work,” Williams says. “Data flowing down is something that feels natural and smooth and just works. Data trying to fight the stream back up is going to end up giving you problems because the borrow checker is not going to like you.”

The architectural benefit of this tendency is legibility as much as correctness. A system where data flows in one direction is a system where behavior is predictable from the structure, and debugging does not require reconstructing who might have modified a value and when, because the ownership model makes that history explicit.

Williams illustrates this with an example from his book, a miniature publish-and-subscribe system built to resemble Kafka at a much smaller scale. “Because Rust has move semantics, you know that if something leaves here and goes here, it’s now not here anymore. It’s there. There’s no question about things like having references dangling or anything like that. The clarity of things moving through the system, the clarity of being able to have immutable data in a lot of places and knowing who can and can’t modify any piece of data, it just makes the design of the system so clear and it makes it so much harder to make a system that doesn’t work,” he says.

The typestate pattern

The most underutilized expression of this architectural discipline in Rust is one that Williams returns to with visible enthusiasm. The typestate pattern uses the type system to encode the state of a value at compile time in a way that makes invalid state transitions not just errors but programs that will not compile.

“It’s a way of developing state machines and systems that have state that evolves where invalid state transitions aren’t just errors, they’re impossible to write. The compiler won’t compile them,” Williams says. “It represents a huge advance in the way that such systems are written because now instead of runtime errors, you have a state machine that is guaranteed to work because every transition either is a valid transition or it won’t even compile. That’s an amazing thing.”

The typestate pattern was not invented for Rust, but the language’s ownership system and its handling of types make it practical in a way that other languages do not. The result is that a class of bugs that normally surfaces at runtime, invalid transitions through a state machine, surfaces instead at compile time, before the program runs. For systems where correctness is not optional, this is a material improvement. “Not invented for Rust, but it fits Rust so perfectly, it’s hard to believe it,” Williams says.

What this requires in practice

None of this comes without a cost. The discipline that Rust enforces at compile time is discipline that engineers have to supply at design time, and for teams moving from other languages the transition is genuinely difficult. Williams is specific about where the difficulty concentrates. Velocity drops during the learning period, often enough that teams take it as a signal that the decision was wrong, and it usually is not. “Once the team becomes very well acquainted with Rust, velocity can increase dramatically, but there is a period of time where it seems like things have gotten worse,” he says.

The answer for most teams is to start with a small, non-critical piece of work rather than a rewrite of an existing system, with the goal of building familiarity in a context where the cost of roadblocks is low and then expanding from there. “What you don’t want to do is jump into saying, we’re just going to rewrite our project in Rust now. Pick a small piece, focus on that, gain confidence and mastery of the language, and then use that to build upon it and start bringing in more things,” Williams says.

There are also cases where Rust is the wrong tool. Prototyping benefits from the flexibility that Python provides and that Rust does not. Environments where the tooling is incomplete are not the right place to fight the language and the Rust ecosystem, while growing rapidly, still has gaps where Java or C libraries are well established. User interfaces are the clearest current example. But in systems where failure is expensive, where correctness cannot be approximated, and where the code has to remain understandable as the team around it changes, Rust’s constraints are not a cost. They are the point.

The harder thing to write

The most revealing observation Williams made came not from a question about patterns or architecture but from the experience of writing the early chapters of his book, the ones about what not to do. He went back and tried to write bad Rust deliberately, the kind of code that would illustrate the mistakes he was cautioning against, and it was harder than he expected.

“When I went back and tried to write bad code in Rust, it was much harder than writing the good code,” Williams says. “That’s an interesting perspective that just didn’t even occur to me.”

That observation captures something important about what the language is doing. Rust is not just a language with a strict compiler. Its constraints push code toward a specific shape, one that is explicit about data flow, deliberate about ownership, and structured around clear boundaries of responsibility. The engineers who find Rust most difficult are often the engineers with the most experience, because they have the most deeply held instincts to unlearn. And the engineers who find it most rewarding tend to be the ones who stop treating the borrow checker as an obstacle and start reading it as design feedback. The language is not rejecting their code. It is asking them to think more clearly about what the code is actually doing.

In case you missed

Here’s the full Q&A with the interview video featuring Evan Williams.

🛠️ Tool of the Week

rust-analyzer — Rust language server that provides IDE functionality for writing Rust programs.

Highlights:

Surfaces Rust diagnostics, including ownership and borrow-checker errors from compiler checks, inline during editing.
Supports major LSP-compatible editors such as VS Code, Vim, Emacs, and Zed, with regular stable releases.
Widely used across the Rust ecosystem as the standard Rust IDE backend, including in workflows at organizations that build with Rust.

Learn more about rust-analyzer

📎 Tech Briefs

Dirty Frag vulnerabilities disclosed in the Linux kernel - Two CVEs in Linux ESP/IPsec and RxRPC components allow unprivileged local users to gain root on affected systems.
Linux 7.0.5 stable released with partial Dirty Frag fix - Linux 7.0.5 ships a partial XFRM/ESP patch for Dirty Frag, with a second required fix still in development at release time.
GitHub secret scanning via MCP Server now generally available - Credential scanning is now available in MCP-compatible coding agents before commits or pull requests, requiring GitHub Secret Protection to be enabled on the repository.
Linux 7.1-rc2 published — KVM selftest renaming drove the unusual patch volume in rc2, with functional work covering driver and networking fixes throughout the tree.
MySQL 9.7.0 LTS generally available - New MySQL LTS line ships the Hypergraph Optimizer in Community Edition, with Dynamic Data Masking remaining Enterprise-only in this release.

That’s all for today. Thank you so very much for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Stay awesome,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.

Thanks for reading Packt Deep Engineering! Subscribe for free to receive new posts and help grow our work.

Design Patterns, Ownership Models, and Building Resilient Systems in Rust with Evan Williams

Saqib Jan — Wed, 13 May 2026 18:07:20 GMT

Evan Williams has been writing software for more than 40 years, across every layer of the stack and more programming languages than most engineers will encounter in a career. His book, Design Patterns and Best Practices in Rust, published by Packt, is not a pattern catalogue. It is an argument for a different way of thinking about code entirely, aimed squarely at experienced developers who arrive in Rust carrying instincts that the language will refuse to accommodate.

He recently sat down with Deep Engineering to talk about what that shift requires, which traditional patterns break in Rust, the typestate pattern he finds almost impossible to stop talking about, and why he discovered it is harder to write bad Rust than good Rust.

Watch the full conversation below.

A note on format: the transcript below has been lightly edited for clarity and readability.

Q. You have been in software for a long time. Tell us about your background and how your journey started.

Evan Williams: I have been in the software business for horrifyingly more than 40 years. Surprisingly, given that, my journey started when I was 14 years old, in the 1970s. My father was a very skilled electrical engineer, and we built a computer in the basement together, which had a 6502 processor and 1K of RAM. He hoped that I would become an electrical engineer because of my interest in the electronics. I became a programmer because I thought the programming was the most fun part. I have since then grown up with the industry, not from the very beginning of it, but certainly from fairly early on, and in particular grown up with professional software development from the beginning. I’ve touched every part of the stack, many, many programming languages, many systems, and I am intensely interested by this. If they didn’t pay me to do it, I’d do it anyway.

Q. You have worked with languages like C and Python. What initially drew you toward Rust?

Evan Williams: I’m interested in programming languages in general, and I had heard about Rust almost at the very start. I looked at it and I said, this looks kind of mildly interesting, I’ll just remember that. And then I forgot about it. But years later, Rust had just barely reached 1.0, and I was working on a hardware project that had a few interesting characteristics. It needed to be in a location that we couldn’t access easily. It wasn’t going to be on the internet. It needed to be rock solid because people depended on it. And it was remarkably complicated software. The team talked about it and we made the decision to go with Rust, which none of us knew at the time. We all learned it together. And that was where I caught the Rust bug and have not lost it since.

Q. What motivated you to write a book on design patterns in Rust at this stage of your career?

Evan Williams: There are sort of two questions tied up in there. One is, why would I be writing a book at all at this stage of my career? And the answer to that is, I have been remarkably fortunate to have people help me throughout my career and help me grow. And it makes me incredibly happy to be able to pay that forward and help the people who are learning at this point to grow themselves. This book is a vehicle for me to help people. Why Rust design patterns? Because I think the interesting thing about dealing with design patterns in Rust is very often they’re not what you need, and very often the traditional ones are not exactly what you need, and very often they can cause you problems that you didn’t have to have. So I wanted to save people that frustration and help them move along the path in a way that is a lot less painful, by not making the same mistakes that I made.

Q. Rust has been around for some time but is now seeing increased adoption. Why do you think this is the moment?

Evan Williams: For a long time there were a lot of people like me who were excited about Rust, and there were people who were early adopters who were pushing for it. But right now what we’re seeing is a world where the language has matured, the ecosystem has matured, it’s reached a kind of critical mass. Now when someone is thinking about doing a Rust project, they’re not pioneers who are wandering out into lands unknown. There’s a huge community, a huge set of packages and software, great learning resources, and there are people who have had success that you can see who have done amazing things with Rust. People feel more confident and feel safer venturing into using Rust for things, whereas before it was a little bit more of a risk in their mind.

Q. What gaps does Rust fill compared to more established languages like C, Java, or Python?

Evan Williams: I could just go over sort of the normal thing and say high performance and memory safety, and all those things are true. But Rust is also a powerful modern language that has a good feature set. And I feel like there’s a design gap of sorts. One of the things that’s great about Rust, and I’ll probably keep returning to this topic, is if you need something to be correct, if you need to be able to count on the code that you write, Rust helps you. Rust is your partner in doing that. You can still write code with bugs in it, but Rust makes it harder to do that and easier to write code that’s going to be solid. I think that’s a huge gap that it fills.

Q. What kinds of systems or use cases benefit most from adopting Rust today?

Evan Williams: Systems that are mission critical in some way or other are really key Rust use cases. Yes, Rust has high efficiency and the memory safety is very important. But all of these features combine into a whole that make Rust a really powerful language for doing things that have to work. Things where failure is monetarily or in human cost even a terrible problem. That’s one of the powers of the language and I think it’s great for those situations.

Q. One of the recurring themes in your book is that developers need to think differently in Rust. What does that mindset shift actually involve?

Evan Williams: There are a few things associated with that. One is that Rust is not an object-oriented language. It kind of looks like an object-oriented language in some ways, but it’s not. And if you carry with you an object-oriented language mindset, then you’re going to have nothing but trouble. It’s also a language that requires you to think carefully about the design of your code before you start writing it. It’s very easy to get yourself into trouble in Rust if you don’t plan what you’re doing. You have to start thinking about data and how it’s handled in a different way. You have to think ahead of time about where the data is flowing through your program, what it is, where it is going, how long is it going to live, who is responsible for it. These are things that you don’t really have to think about when you’re writing a Python program or a Java program. Those are good design principles in all of those languages, but Rust requires it of you.

Q. Many developers struggle with the borrow checker early on. How should they reframe it as a design tool rather than a limitation?

Evan Williams: The thing about the borrow checker is it’s there to help you, and it is very easy to get into a mode where you’re fighting with it and you feel like it’s your enemy. But in fact, what it is doing is encouraging you to build your code in a solid manner. It’s encouraging you to think about not just what data you have, but how it’s going to be used. In something like Java or Python, with some amount of plumbing you can get anything from anywhere. You don’t really have to design your program in a highly organized way where you’ve thought about the data flows. But you’re much better off if you do. I think the principles that the borrow checker forces you to adhere to in Rust are the exact principles that you should be using in every programming language. But you don’t have to. So it’s very easy to not think about those things. The borrow checker is your friend because it prevents you from making a messy design. It prevents you from making a broken design. It prevents you from writing whole classes of bugs that you will then spend many hours trying to find. I have found it to be an incredible partner in writing code that allows me to sleep at night.

Q. What are the most common mistakes developers make when they try to apply patterns from other languages directly in Rust?

Evan Williams: The principal thing is, number one, trying to use Rust as if it’s an object-oriented language. It’s not going to work. Viewing the compiler errors as things that you need to figure out how to work around is something that virtually everybody who starts with Rust does. And I can’t emphasize this enough: one of the things that’s sort of interesting about Rust is the more experienced you are, the more years you have doing something in some other language, the more trouble you’re likely to have, because you have patterns of thought that come from those languages that you don’t even realize are there. That’s one of the things that can really get you into trouble. Treating the compiler errors and the problems with the borrow checker as things to work around as opposed to signals that your program needs some redesign, and thinking about things from an object-oriented perspective, those are the main traps.

Q. Traditional design patterns were created with object-oriented languages in mind. How do they evolve when applied to Rust?

Evan Williams: In a number of different ways. First, some of them evolve into entirely new forms or almost out of existence, because being a modern language, Rust has things like enums and all sorts of very advanced use of generics. These are things that make a lot of these design patterns either less necessary or unnecessary. Another way that they evolve is that since there is no inheritance in the language, you have to rethink a lot of the design patterns where, say, you would have an abstract base class. You can’t have that because there’s no such thing. So you would lean into traits. And every single design pattern that you use is affected by the borrow checker and by memory discipline in a way that is not normal in something like Java or Python.

Q. Are there any patterns that become unnecessary or even counterproductive in Rust?

Evan Williams: The one that immediately springs to mind is Singleton, which is so useful that I was using it before it had a name. But it’s useful in Python or Java. In Rust, it tends to be either completely unnecessary because other features of the language make it unneeded, or it tends to encourage designs that are really not necessary and where a much better approach could be used. There are a few occasions where the singleton pattern, as it stands, is actually useful, but more often than not, it’s getting you into trouble.

Q. What Rust-specific patterns do you think are the most powerful and still underutilized?

Evan Williams: The one that I get so excited about that I have to limit myself so that I don’t spend the rest of this conversation talking about it is the typestate pattern. This is something that was not invented for Rust, but you would think that it had been because it works for Rust so perfectly. It’s a way of developing state machines and systems that have state that evolves where invalid state transitions aren’t just errors, they’re impossible to write. The compiler won’t compile them. It represents a huge advance in the way that such systems are written because now instead of runtime errors, you have a state machine that is guaranteed to work because every transition either is a valid transition or it won’t even compile. That’s an amazing thing and I love that feature. Not invented for Rust, but it fits Rust so perfectly, it’s hard to believe it.

Q. Your book emphasizes clear data flow and system architecture. Why is unidirectional data flow so important in Rust systems?

Evan Williams: Thinking about the way data flows through your system in Rust is crucial because Rust makes it much more difficult to thread things back. As a general rule, because you can’t have many different clients holding on to different things in different places, especially not being able to write to different things in different places, having a clear direction of data flow makes it much clearer and much easier to create a consistent system that’s going to compile and work. By saying, I have a chain of ownership that moves down but never moves back up, you are now much more likely to have a system that is going to work in an environment where the number of references that can be held is very limited and you have to be very careful about memory safety. Data flowing down is something that feels natural and smooth and just works. Data trying to fight the stream back up is going to end up giving you problems because the borrow checker is not going to like you.

Q. How does Rust’s ownership model influence architectural decisions at the system level?

Evan Williams: I think one of the crucial things about it is that because you have to be so thoughtful about how your data moves and who owns it, you have to also think about what that means for your system. You need to think about who controls what, how it is controlled, and you need to start from the very beginning thinking about the boundaries of your program and the system architecture, dividing things up into areas of responsibility. Because unlike Python or Java, you can’t have links going all over the place. The borrow checker is never going to accept that. Rust gives you lots of great tools for creating boundaries of abstraction that make it a lot simpler to write code that doesn’t have hidden or difficult-to-find connections between things. You know who owns things, you know who is able to write things. In order to do that, you have to be able to think about things ahead of time and think about what parts of your program own what, and create clear boundaries and areas of responsibility for each piece of the system that you build.

Q. Can you share an example of how Rust leads to better system design compared to other languages?

Evan Williams: One of the examples in the book, one of the projects that we build in the book, is a miniature publish-and-subscribe system similar to Kafka, but very, very much smaller. It is amazing how easy it becomes to make something that is solid and clean in that circumstance. Because Rust has move semantics, you know that if something leaves here and goes here, it’s now not here anymore. It’s there. There’s no question about things like having references dangling or anything like that. The clarity of things moving through the system, the clarity of being able to have immutable data in a lot of places and knowing who can and can’t modify any piece of data, it just makes the design of the system so clear and it makes it so much harder to make a system that doesn’t work. By doing things that way, you have something where you can have a potentially very complicated system and yet have complete confidence that every piece of it individually is going to work right and that they’re going to work together as a system.

Q. What are the real-world challenges that engineering teams face when adopting Rust in production?

Evan Williams: One thing that often happens is that when a team adopts Rust, because of the challenges of learning to work in the language, velocity can go down at first. The team can find itself actually moving slower. Once the team becomes very well acquainted with Rust, velocity can increase dramatically, but there is a period of time where it seems like things have gotten worse. Another problem that often happens is that, as I said earlier, more experienced developers often have a harder time adapting to it. And the last thing I’ll mention, although it’s a lot better than it used to be, is Rust is still not 100% in terms of the kind of rich libraries that you’d find in, say, Java or C, which are just older languages. There’s more out there that supports those languages, although Rust is certainly catching up and it’s remarkable how far it’s come.

Q. When would you advise against using Rust for a project?

Evan Williams: There are a few things. If you’re doing some kind of prototyping, Rust is harder to prototype in and you really want something more like Python. If you’re working in certain niche environments where the tooling is not there, that’s a place where you don’t want to try to fight with the tooling to try to get Rust to work. It’s good to just work with the native things that are there. There are places where a dynamic language like Python is just a much easier thing to use and will more perfectly fit what you’re trying to do. And I also think there are a few areas where Rust has the potential to do a lot but is still catching up. User interfaces, for example. There are certainly user interface frameworks and libraries, but doing a website in Rust is still kind of a feat. And it’s an awful lot easier to use the tools that everybody else is using to accomplish that goal.

Q. What is the best way to introduce Rust to a team without overwhelming your developers?

Evan Williams: The thing that you need to do to start is find a piece of work that you can work on that has a limited scope and which is not in the critical path, because there are going to be roadblocks and bumps as people learn. What you don’t want to do is jump into saying, we’re just going to rewrite our project in Rust now. Pick a small piece, focus on that, gain confidence and mastery of the language, and then use that to build upon it and start bringing in more things.

Q. How should developers balance performance, safety, and complexity when designing systems in Rust?

Evan Williams: One of the nice things about it is that all of those things sort of come with the language. The thing that developers need to do is focus on letting the language help you. Rust will give you all of those things if you focus on thinking in the language and using patterns and features that are natural to the language, as opposed to trying to retool something from another language that doesn’t really fit. There are so many things that people try to do to get around problems that they have where if they just use the features of the language as they exist and did things in a way that’s more natural to the language, all of those things just fall out.

Q. How do you see Rust evolving over the next couple of years in terms of adoption and ecosystem?

Evan Williams: There are a couple of different ways you can answer that. The ecosystem is going to get richer and people are going to be branching out in the set of use cases, hitting areas that right now Rust has relatively weak support for, but where I think people are building all the time. As larger and larger projects are built, there is going to be more refinement of the language itself, but more importantly, more refinement of the use of the language. And I think this is important. This book that I wrote is not just about the language itself, but the way to use the language. And I think the Rust community and the people who are building in Rust are going to be defining and creating new ways to use it that are unique and leverage its power to do things that right now we haven’t even thought of.

Q. Did any of your own assumptions about Rust change while you were writing the book?

Evan Williams: It’s interesting because one of the things that changed is I came to recognize, through working on the early chapters about what not to do, that in Rust it’s actually more work to do things wrong. And I think that’s one of the things that surprised me. I knew that you had to think in a new way and do things differently, but when I went back and tried to write bad code in Rust, it was much harder than writing the good code. That’s an interesting perspective that just didn’t even occur to me.

Q. For developers picking up your book, what key takeaway do you hope they walk away with?

Evan Williams: The key takeaway is that the thing you want is not to learn a particular set of patterns. My book is full of what the title says, how to deal with design patterns in Rust. But the much more important thing is changing your mindset. If I can help people to recognize that there is a new mindset that they need, that’s the key thing. And I see so many people who become frustrated with Rust because Rust has such an unusual learning curve. In most languages, it’s sort of a steady progress. Maybe you plateau a little bit, but you’re always going up. With Rust, very often it seems like you’re learning and learning and learning and getting better and better. And when you reach a certain level of complexity in your programs, it feels like things are getting worse. And that’s because of the mindset. Helping people understand that will save them so much pain. That’s what I want people to take away from the book.

Q. Is there something most developers underestimate about Rust?

Evan Williams: I think the thing that people perhaps underestimate about Rust is it’s not just about memory safety and all these other things. It’s a really powerful modern programming language. It brings so much to the table that has nothing to do with memory safety or thread safety or any of those other things or high performance. It’s just a very clean, beautiful language to write in because it brings so many modern innovations that other languages are sort of stuck having to drag along historic pieces of syntax alongside. Rust is a pleasure to write in. It’s just sometimes the borrow checker can be a little annoying.

Evan Williams is the author of Design Patterns and Best Practices in Rust, published by Packt. This interview was conducted by Saqib Jan, Editor-in-Chief of Deep Engineering.

Deep Engineering #46: Jim Ledin on Modern Computer Architecture and the AI Infrastructure Layer

Saqib Jan — Thu, 07 May 2026 15:03:11 GMT

View the latest HubSpot Developer Platform updates in Spring Spotlight

See what’s new for the HubSpot Developer Platform!

Ship faster with AI coding tools like Cursor, Claude Code, and Codex. Build MCP-powered AI connectors, run serverless functions with support for UI extensions, and use date-based versioning to streamline roadmap planning.

Explore Updates

✍️ From the editor’s desk,

Welcome to the 46th issue of Deep Engineering!

This week, InfoQ analyzed what it actually took for Cloudflare to run large language models efficiently on their global network. The team built a custom inference engine called Infire from scratch in Rust, split model processing into two separate hardware stages because a single machine could not handle both efficiently, and compressed model weights by 15 to 22 percent to reduce what GPUs need to load and move during inference. The reason they had to do all of this is the same one that matters to every engineering team building AI systems: the hardware layer is not an abstraction you can ignore. It is the constraint that every other architectural decision is made around.

This pattern, where standard approaches to running AI workloads break down under real production constraints and the fix requires going back to the hardware layer, is one most engineering teams will eventually encounter. The engineers who avoid it are the ones who understood the hardware constraints before they started building, not after they hit them. Cloudflare’s engineering blog post goes into the technical detail for teams who want to dig further.

This week we are featuring Jim Ledin, CEO of Ledin Engineering and author of Modern Computer Architecture and Organization, now in its third edition published by Packt. Ledin has over thirty years of experience working on embedded systems, safety-critical hardware, and cybersecurity. In this issue he breaks down what engineers building AI systems get wrong about the hardware layer and why it costs them.

Let’s get started.

Architecting Production-Ready APIs for Agents

Most API ecosystems were not built for autonomous agent usage. This hands-on masterclass covers governed API design, OpenAPI specifications, and multi-API workflow modelling with Arazzo so your platform stays predictable and safe under automated usage.

2 FOR 1 deal is also live. Bring a colleague free and learn how to design AI-ready, governed APIs

Use code DEEPENG50 for 50% off.

Expert Insights

Hardware Is Not Someone Else’s Problem

by with

For most software engineers working in the cloud, hardware is an abstraction managed by someone else. You provision compute, write code, deploy, and pay the bill. What happens between the instruction and the silicon is not your concern. That assumption has always had a cost. In an AI-accelerated world, that cost is becoming visible in ways that are harder to ignore.

Jim Ledin, CEO of Ledin Engineering, has been working at the boundary where software meets silicon for over thirty years. His entry point into computer architecture was not a formal computer science curriculum. It was a Commodore 64, a joystick, and a drawing program so slow you could watch it move one pixel at a time.

“That episode really cemented for me how important it is to understand what is going on in the hardware of a system, and not just write what you want to do in your favourite language,” Ledin reflects.

He rewrote the inner loops of that drawing program in 6502 assembly, poking opcodes directly into memory, and the line shot across the screen faster than he could see. The lesson stayed with him across thirty years of embedded systems work, electric vehicle software, and cybersecurity testing on safety-critical hardware. Understanding what the hardware is actually doing is not an optimization exercise. It is the difference between software that works and software that works reliably under real constraints.

That distinction matters more now than it ever has, because the hardware layer is where most AI system performance problems actually live, and most of the engineers building those systems have never had to care about it before.

Where the GPU consensus breaks down

The idea that GPUs are the right architecture for AI workloads has become so widely accepted that most teams treat it as settled. Ledin’s view is more specific, and the specificity matters. For local and personal use, running models on a consumer GPU like an Nvidia RTX 4090, GPUs are the right choice. For large-scale deployments running the largest models, the picture is different.

The distinction comes down to what GPUs were actually designed to do. The “G” in GPU stands for graphics, and consumer GPUs still carry silicon dedicated to real-time video generation and gaming workloads. TPUs, by contrast, are built entirely around the tensor operations that dominate AI model processing. At least 80% of the execution time in a transformer-based model is matrix multiplications, and TPUs concentrate every transistor on exactly that work.

The more pressing constraint, though, is memory bandwidth. “AI workloads are becoming increasingly memory bandwidth limited. That means it is taking more time to bring data into the GPU or TPU memory than it is taking for the computation itself to complete,” Ledin explains.

This is the reason high-end AI systems use high bandwidth memory, or HBM, stacked RAM modules with far higher data rates than anything available on a consumer GPU. “It is also,” Ledin notes, “part of why DDR5 is becoming harder to find. Production capacity for memory is increasingly going into HBM modules for AI infrastructure rather than into consumer components.”

And so, for engineering teams choosing hardware for AI deployments, the implication is concrete: the GPU consensus is correct for a specific part of the problem space, and incomplete for the rest of it.

Data movement is the real cost

The performance conversation in AI engineering tends to focus on compute: cores, clock speed, parallelization. Ledin redirects it toward something that gets less attention and causes more problems.

“Data movement can often be more expensive than the actual computation steps. The latency of moving large data structures across different levels of the memory hierarchy can dominate and leave a lot of compute bandwidth idle,” he emphasizes.

This is not a new insight in systems engineering, but it is one that most application developers have never had to internalize because the abstractions they work with hide it. In a modern PC, reading a single byte from DRAM causes 64 bytes to be transferred into the CPU cache. If the code then bounces to other memory locations, causes those to be loaded into cache, and pushes that first block out, the next access to that original data requires fetching it again from DRAM. The latency compounds across every cache miss, and in AI workloads operating on large data structures, those misses accumulate fast.

The practical recommendation follows directly. Iterating across large data structures multiple times in an algorithm should be avoided wherever possible. Working through memory linearly, in a way that keeps recently accessed data in cache rather than evicting it, is the single most impactful optimization available to most AI system code. It does not require a new framework or a different hardware platform. It requires understanding what the hardware is doing with the data you give it.

In cloud environments, this understanding has a direct financial translation. “You are paying for the usage of the system whether the CPU is actually crunching instructions or sitting idle waiting for a data item to come in from memory,” Ledin warns. This is because inefficient memory access patterns do not just slow down a system. They inflate the bill for it.

When abstraction becomes the problem

Abstractions are one of the most effective tools available to software teams. They accelerate development, limit mistakes, and allow large teams to work on complex systems without every engineer needing to understand every layer. Ledin does not dispute any of this. His concern is more specific: abstractions that obscure hardware costs, in performance-critical applications, are not just unhelpful. They actively create risk.

“Where it becomes dangerous is when abstraction obscures what is happening with the data layout in memory and the execution patterns, basically how the processor is interacting with data as the algorithm proceeds,” he cautions.

The failure mode is not that abstractions break. It is that they make costs invisible until those costs produce an incident. An engineer works within an abstraction layer, the code looks correct at that level, and the performance problem lives underneath it in a layer the abstraction was designed to hide. By the time the problem surfaces in production, the context needed to diagnose it is buried.

Ledin’s recommendation is a two-layer design. Use the most expressive code at the edges of the system, where the abstractions are doing the most valuable work. Use performance-aware code in the core, where the hardware interaction is most consequential. The boundary between those layers is not fixed, and finding it requires benchmarking rather than intuition. But knowing the boundary needs to exist is the starting point. Teams that treat the expressive outer layer as the whole system tend to discover, under load, that the core was never designed for the hardware it runs on.

The CPU versus GPU distinction, for engineers who have never had to care

Most senior software engineers working today have built careers without ever needing to think about the difference between a CPU and a GPU. That is changing, and Ledin’s framing of the distinction is the most useful one available for engineers coming to it for the first time.

A CPU is optimized for low-latency execution of complex branching code. It is built to handle conditional logic, to predict branches and recover when predictions are wrong, and to minimize the latency cost of that work. A GPU is optimized for high-throughput execution of linear code across massively parallel workloads, and it works best when it is running the same instruction across thousands of data streams simultaneously with as little branching as possible.

The implication, therefore, for algorithm design is practical. “The GPU only really becomes attractive when you have enough work for it to do that it can be parallelised, and enough that it will amortise the costs associated with moving data onto the GPU, launching the kernels to execute the code, and doing the management work to transfer data to and from the GPU,” he points out.

That last point is the one most teams miss. A GPU is not a general purpose computer. It cannot run a program on its own. It needs to be started and managed from a CPU, and the overhead of moving data onto the GPU, scheduling the kernels, and moving results back is real. If the workload is not large enough and parallel enough to amortize that overhead, the CPU implementation wins, not because GPUs are slow, but because the cost of using them correctly exceeds the benefit for that specific workload.

Knowing where that line sits, for a specific algorithm running on specific hardware, is the kind of judgment that requires understanding what the hardware is actually doing. It cannot be read off from a benchmark or inferred from a framework’s documentation. It comes from the same place Ledin’s understanding came from: going one level deeper than the abstraction, and learning what happens when the instruction meets the silicon.

In case you missed

Here’s the full Q&A with the interview video featuring Jim Ledin.

If the hardware layer argument resonates, the article below by Lee Peterson, VP of Secure WAN Product Management at Cisco, covers the same constraint from the networking and distributed compute angle.

🛠️ Tool of the Week

vLLM - high-throughput, memory-efficient inference and serving engine for large language models

Cloudflare referenced it as the baseline they benchmarked their custom Infire engine against when building hardware-optimized inference at scale.

Highlights:

PagedAttention eliminates the memory waste that causes most GPU out-of-memory failures in production inference.
Continuous batching processes requests in a dynamic stream rather than static batches, keeping GPUs saturated under real load.
Disaggregated prefill/decode runs compute-bound and memory-bound stages on separate hardware for better throughput.
Supports tensor parallelism, FP8 and NVFP4 quantization across multi-GPU deployments.

Learn more about vLLM

📎 Tech Briefs

Inference gives AI chip startups a second chance - Disaggregated inference, splitting prefill and decode across purpose-built silicon, is making GPU-only inference architectures look like the wrong default for large-scale production deployments.
OpenAI releases MRC for AI training networks - OpenAI’s MRC shows frontier training now depends on failure-tolerant network design, making the interconnect layer a first-class engineering constraint rather than an infrastructure afterthought.
Anthropic opens Claude Security public beta - Claude Security moves vulnerability scanning closer to code review, triage, and patch creation, shifting security work earlier into the engineering workflow rather than treating it as a downstream audit step.
Google opens Workspace MCP server preview - Google is turning enterprise agents into a governed API and access-control problem, with MCP making the boundary between agent capability and enterprise data policy the next infrastructure challenge for platform teams.
vLLM v0.20.1 ships with DeepSeek V4 stabilization and FP4 improvements - The patch release stabilizes DeepSeek V4 serving and improves FP32-to-FP4 conversion speed.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Stay awesome,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.

Thanks for reading Packt Deep Engineering! Subscribe for free to receive new posts and help grow our work.

Computer Architecture in an AI-accelerated World with Jim Ledin

Saqib Jan — Wed, 06 May 2026 18:15:00 GMT

Jim Ledin has been thinking about what happens between the instruction and the silicon for over thirty years. He is the CEO of Ledin Engineering, an expert in embedded software and hardware design, and the author of Modern Computer Architecture and Organization, now in its third edition, published by Packt. His career spans embedded systems development, battery management software for electric vehicles, and cybersecurity assessment and penetration testing for safety-critical systems including self-driving vehicles.

The third edition comes out at a moment when the architecture conversation in software engineering has narrowed almost entirely to one question: what hardware should run AI workloads. Ledin’s answer is more nuanced than the GPU consensus suggests, and it is grounded in the kind of bottom-up reasoning that most application developers have never had to apply.

And this conversation covers where that consensus is incomplete, what engineers building AI systems are getting wrong about memory and parallelism, why abstraction layers become dangerous when they hide hardware costs, and what the architecture of a self-driving vehicle teaches you that distributed backend experience does not.

You can watch the full conversation below or read on for the complete Q&A.

Q. You have been working with embedded systems and hardware design for over thirty years. What first pulled you toward understanding what was happening at the hardware level rather than just writing code?

Jim Ledin: My first real exposure to computer architecture was in the 1980s when I had a Commodore 64 with its 6502 CPU. I wrote a simple basic program to do some screen drawing, basically moving a dot around the screen with the joystick and pushing the button to draw the lines. And it was slow. It was so slow you could watch it moving one pixel at a time. That was painful to try to do anything with.

As time went on I learned a little bit about 6502 assembly language. I found out there were ways you could implement that through the basic interpreter. What you had to do was write out your assembly code by hand, convert it to the opcodes and data bytes, and then poke those bytes into memory. Poke is the basic command. Then you could transfer control and execute them. After I took the inner loops of the drawing program and implemented them in that way, the speedup was amazing. It shot the line across the screen faster than you could see. That episode really cemented for me how important it is to understand what is going on in the hardware of a system, and not just write what you want to do in your favourite language.

My current work is focused on embedded systems development and testing, as well as implementing cybersecurity for those systems and doing cyber testing on them. I have done quite a bit of work with electric vehicles, battery-powered systems, the battery management software, as well as the powertrain control systems. Also implementing cyber testing to evaluate what kind of vulnerabilities may be present in systems and trying to exploit those to demonstrate whether or not they actually exist. I have been doing that for over thirty years, embedded initially and then later adding in the cybersecurity aspect.

The architecture of computer systems is at the boundary where performance, system security, and behaviour in real-world situations all meet. You really need to understand, across all those domains, that everything works as expected and intended. You need an understanding top to bottom, not just what your high-level software does, but also at the hardware level. Not necessarily saying you need to understand what your compiler is doing, but know how the hardware operates, what kinds of things cause you to run into limits, and what you can do differently to improve performance, reliability, and security.

Q. Your book Modern Computer Architecture and Organization is now in its third edition. What had changed enough in the field to make a new edition necessary?

Jim Ledin: The book is intended to start at the beginning. I do not assume that readers have any background or experience with computer architecture, assembly language instructions, memory cache, pipelines, or anything like that. We start with history, where did the first computing devices come from, how were they developed. It even starts back in the 1800s when Charles Babbage designed a mechanical computer intended to be a general purpose digital computing system. It never actually got built, but many of the principles he developed, including pipelining and distributed processing, were implemented in that design. I thought it was remarkable that those concepts were being worked out that far back in history.

Then the book goes through the vacuum tube era in the 1930s and 1940s, the Intel 4004, which was really the first microprocessor, and then on to the 8086, 8088, the PC, the 386, which is basically the same base architecture that modern Intel and AMD processors in your PC and server-based systems use today. The code running on these modern systems is highly compatible with those systems from decades ago. It has gone from 16 bits to 32 bits to 64 bits, adding capabilities without removing previous ones.

The book walks through that history and then goes into detail on how processors work, starting with the 6502. That processor is simple enough that you can understand what is going on with its registers. It only has three. Nothing about it is overwhelming. Once you understand it, you can build upon it to get to the modern processors, which are far more complex.

What changed substantially since the last version of the book was the rise of AI workloads, particularly the shift from the fastest CPU available to very highly parallelised systems optimised to perform matrix computations. The new version, which came out in March, has a chapter that goes into detail on how GPUs operate, from the top-level modular structure down to the granular details of processor cores. There is another new chapter on transformer-based models, looking at them not as someone who designs them but more like a mechanic who wants to take them apart. We work through what calculations actually occur in GPT-2, which was one of the earliest large language models to break through as something genuinely new and important. The current frontier models have obviously evolved quite a bit since then, but they share many of the same fundamental characteristics. If you can go through GPT-2 and understand how it works, you are a very long way toward understanding the latest models.

We are also seeing real diversification of architecture. There were many years where computers for most applications were based on the same Intel-type architecture, but now across different application areas you are seeing GPUs, TPUs, domain-specific accelerators for things like Bitcoin mining, local AI in cell phones and cars, and the open source RISC-V processor which is available to everybody. You can design your own chip based on it, implement it in an FPGA, do whatever you want. It is a rapidly growing line of processor development and the book covers all of it.

Q. The argument that GPUs are the right architecture for AI and LLM workloads is often treated as settled. Where is that consensus incomplete?

Jim Ledin: GPUs are probably the ideal architecture for people and small companies that want to run language models locally. I have recently gotten the Gemma 27 billion parameter model running on an Nvidia RTX 4090, which is about the top end of consumer GPUs available today. For local and personal use, GPUs are the way to go.

But for larger scale deployments running much larger models, the trend is toward dedicated TPUs. A tensor is basically a multi-dimensional array. A matrix is a two-dimensional array, and tensors have more dimensions. Tensors are used widely across AI models, and the work going on inside the processing of those models is largely matrix multiplications operating on broken-down portions of higher-dimensional tensors. A TPU is a processor similar in concept to a GPU, but very specifically focused on the work of large language model tensor processing. GPUs, as the first letter implies, also have silicon dedicated for generating real-time video and handling things like gaming and video creation. TPUs do not use silicon for that purpose. They focus everything on the tensor work.

That is why systems like the Nvidia Blackwell architecture, designed for large-scale data centre applications, are built to have many components interconnected with extremely high-speed data links, working together as a supercomputer. For larger models, consumer GPUs are not really used. It is more the dedicated hardware that focuses on that work.

Another factor is that AI workloads are becoming increasingly memory bandwidth limited. That means it is taking more time to bring data into the GPU or TPU memory than it is taking for the computation itself to complete. These very high-end systems are implemented using what is called high bandwidth memory, or HBM. An HBM module is basically a cube made of a stack of RAM chips, so they hold a lot of memory and have very high bandwidth. On a TPU card you typically have several of these HBM modules, and they have a far higher data rate for transferring data in and out of the processing components than on a typical consumer GPU. This is also part of why it is becoming hard to find DDR5 RAM chips. A lot of production capacity for memory is going into high bandwidth memory modules, which cost more for the purchaser and make more money for the vendor.

Q. Software engineers working in the cloud often treat hardware as someone else’s problem. What does your book argue they are getting wrong, and what does that cost them?

Jim Ledin: If you write software and just ignore the hardware limits, that can lead to a lot of hidden costs. If your code is accessing memory in inefficient patterns, not using the cache memory within the processor effectively, and moving data around more than necessary, that can have significant performance impacts.

If developers understand how the memory access and caching processes work at the hardware level, they can often tailor code to work more effectively within those constraints and minimise latency. When the CPU requests data from memory and it is not available in its cache, it has to wait. You are giving the processor downtime when you want it to be processing data. A lot of that is unavoidable, but the amount of latency can be minimised by different approaches to optimising algorithms.

As an example, in a modern PC, each time you read something from DRAM, even if it is just a single byte, 64 bytes are transferred into the CPU cache. That is what is available at that point for the processor to work with. For best efficiency, assuming you have options, you would want your code to be working with data from that block before it moves on to something else, rather than bouncing around to other memory locations. If you access several other locations that cause them to be loaded into cache, and then that first block gets evicted, and then you go back and read it again, now you have to reread it. That is inefficient. When possible, you want to work through memory in a linear way.

And if you are working in a cloud environment, this not only has those performance issues but also results in higher costs, because you are paying for the usage of the system whether the CPU is actually crunching instructions or sitting idle waiting for a data item to come in from memory.

Q. If you are building AI systems today, what are the hardware concepts that would most change how you designed them, and what do most engineers not understand well enough?

Jim Ledin: Data movement can often be more expensive than the actual computation steps. The latency of moving large data structures across different levels of the memory hierarchy can dominate and leave a lot of compute bandwidth idle. This is a concern even with the very highest performance AI-focused systems. Getting the memory access right relative to the processing is a genuine challenge. You definitely do not want to be iterating across large data structures multiple times in an algorithm if there is a way to avoid it. Going through data linearly is probably going to give the best performance.

As you increase parallelisation of algorithms across cores and processors and across GPUs and other devices, other constraints appear. Synchronisation, where tasks on different processors need to sync up, is a real constraint. The communication bandwidth between processors, whether they are inside the same device or communicating board to board or rack to rack, all of these affect the efficiency and speed of processing, not just the number of cores you can throw at a parallel algorithm. It is important to understand the cost associated with all of these interactions among parallel activities and optimise around them to get the best overall performance.

And then optimising compilers do a great job of scheduling instruction execution and keeping pipelines full, but there are things you can do in code that make it harder for them to do that, and things you can do that make it easier. In performance-critical inner loops, minimising branching can help avoid pipeline stalls. Part of what goes on in a modern processor is trying to predict what will happen at a branch in your code, an if-else type block. The processor may guess right, which means it is very efficient, or it may guess wrong and have to back up and start down the other path. If you can minimise or eliminate branching within the most performance-critical loops, that makes it easier for the optimiser and the rest of the system to run as efficiently as possible.

Q. What is actually happening under the hood in a GPU that makes it effective for AI workloads, at a level that goes beyond the standard explanation about parallelism?

Jim Ledin: Most of the processing in a transformer-based AI model, at least 80% of the execution time, is these tensor operations, which are implemented in hardware as matrix multiplications. GPUs and TPUs have very specialised multiply-and-accumulate hardware specifically designed to perform these operations.

The current generation of Nvidia GPUs implements what is called single instruction multiple thread, or SIMT, execution. A group of 32 threads runs in lockstep, meaning they are all executing the same instruction but on different data streams. SIMT also supports branching, so you can have if-else logic in the code. But this has a performance cost. If you are executing through a stream of data on SIMT code and you come to a conditional instruction where some threads take the if part and some threads take the else part, the hardware executes one side, the if part, only on the threads where that condition applies, then goes back and executes the else part for the other threads. At the end of the block, they sync up and resume in lockstep. Your code can have conditional logic in these lowest-level operational sequences, but there is the drawback that you effectively have a pipeline stall where it has to go back and execute a different thread. You have the flexibility, but there is a cost.

GPU and TPU performance comes as much from high memory bandwidth, getting data in and out as fast as possible, and latency minimisation, as it does from effective thread scheduling across the many thousands of cores within a GPU. All of these things, memory bandwidth, minimising latency, thread scheduling, and using SIMT effectively, all affect GPU performance in addition to the raw ability to parallelise across cores. You really need to manage all of these aspects to get the best performance, not just maximise core count.

Q. The memory hierarchy from cache to RAM to storage is often discussed in theory but rarely in practice. Can you give a concrete example of where a misunderstanding of memory hierarchy caused a real performance problem, and what the fix actually looked like?

Jim Ledin: There was a web server in some Linux distributions in the early 2000s called Tux, which ran in kernel space. It avoided a lot of the transfers from user space to kernel space that a web server normally has to perform. It only served static pages, because since it was running in kernel they did not give the pages dynamic generation capability.

One issue with this server was poor cache locality. The amount of data it kept active on each request seemed to be excessive. Under high load, with lots of users hitting it at once, the state information grew to exceed the size of the level 2 cache in the CPU. Performance dropped off sharply.

Some engineers examined that and determined that by evaluating the cache limitation against how the code was structured, they could reorganise it so the amount of data per request would be smaller and therefore remain within the cache limit up to a much larger level of usage. Similarly, instructions also have a cache in the CPU, and by reorganising the processing and batching some things, they were able to increase the degree to which instructions would remain in the instruction cache during web server processing. The fixes they implemented increased application performance by about 40%.

This was basically examining the behaviour of the application in the context of the limitations of the processor hardware and coming up with solutions that respected those limits. For other applications, similar fixes might involve restructuring data. A large array of structures might be more efficiently processed as a structure of arrays in a way that better aligns with cache limitations. But in these cases, while the design approach is to look at the limits of the system and try to work within them, to really understand what is going to have a big impact you need to implement it and benchmark it in a realistic environment.

Q. There is a growing tension between the abstraction layers that frameworks provide and the hardware cost those abstractions hide. At what point does that become a serious engineering problem?

Jim Ledin: Early on in the development cycle, abstractions are great. They can greatly accelerate development and limit mistakes. Where it becomes dangerous is when abstraction obscures what is happening with the data layout in memory and the execution patterns, basically how the processor is interacting with data as the algorithm proceeds. This is especially critical in large-scale real-time systems with demanding performance requirements.

In addition to using abstraction where it makes sense, engineers need to understand what is happening underneath the abstraction in performance-critical applications. I am not suggesting abandoning abstractions. They are entirely appropriate at the level where they preserve meaning and understanding across a team. But they begin to create a problem where they obscure the costs.

The most effective approach is a two-layer design. Use the most expressive code at the edges of the system, and in the core, use more performance-aware code. It is not always obvious where to place the boundary between performance-aware code and more expressive code. It may take some benchmarking, trials, and iterations to identify the best location for that boundary. But knowing you need to draw it is the starting point.

Q. You work on architectures for systems like self-driving vehicles. What makes those architectures fundamentally different from a standard distributed backend system, and what should engineers working in conventional contexts take from that?

Jim Ledin: A self-driving vehicle is both real-time and safety-critical. The software must meet all of its deadlines, its time limits for producing a response, or that is not just a glitch to blow past, that is a system failure, and that cannot be tolerated. There must be fail-safe responses when unexpected situations occur. Only in the most extreme circumstances, like an unrecoverable hardware failure, would the system be able to stop processing.

A self-driving vehicle tightly couples sensing, computing, and actuating, seeing what is around it, deciding what to do, and steering and controlling vehicle speed. That is pretty different from loosely coupled distributed systems. A distributed system might typically implement retry mechanisms if something fails, and if a system goes down there are online and offline redundant capabilities that can be brought up, basically switching to a backup. Rather than using that approach, safety-critical vehicles provide a level of redundancy where dual processors operate in lockstep. If one experiences a failure, the system continues on the one good one until a repair can be made.

This can be extended further. The American space shuttles had three computers operating in parallel. One advantage of three over two is that if you have two computers and they give different answers, you have to decide which one is good and which is bad. If you have three and two give one answer and one gives another, you probably know the third is bad.

The way engineers working with conventional distributed systems can apply these principles is in situations where the design needs to be fault tolerant while minimising or eliminating processing interruptions. Rather than waiting for a failure to occur, you have enough running capability in operation simultaneously that you can detect a failure and keep things going the whole time while bringing up redundant capability. A lot of large systems already operate this way, but systems that do not could potentially deliver a higher availability level using these techniques.

Q. For engineers who want to build real working knowledge of systems and hardware, what is the most direct path in?

Jim Ledin: Start by understanding how processors operate at the simplest level of a single instruction. There are four steps in an instruction: fetch, decode, execute, and write back. Fetch is when the processor retrieves the opcode bytes from memory. Decode is when the processor assigns work to units within it, like an ALU for an addition instruction. Execute is when it actually does the computation. And write back is where it stores the results in registers, in memory, and in status bits within the processor. Essentially all processors operate at that very low level.

That mental model then scales upward to more complex processors and their capabilities. That is the reason the book starts with the 6502 processor. It is pretty simple, only three registers, 8-bit, nothing about it is overwhelming. But once you understand it, you can build upon that knowledge to get to the modern processors, which have hundreds, if not more, instructions available and many divergent capabilities. It all builds upon those very simple foundations.

Q. Looking ahead five years, what skills will matter most for engineers working at the intersection of software, hardware, and AI?

Jim Ledin: The most important thing is to stay up to date and remain aware of changes as technology advances. Four years ago when the previous version of the book came out, it was not at all clear to me, or I think a lot of people, what was going to happen with AI in the coming years. Pay attention to what is going on around you. Pay less attention to announcements driven by financial considerations or hype from companies focused on their performance in the stock market. Pay more attention to what is actually having an impact in the real world, and learn more about those things.

The sources matter. There are trustworthy websites with genuinely good information about current ongoing activities in CPU development and other computer-related areas, as well as more in-depth sources like scientific papers if you are willing to dig in at that level. Even pursuing formal education, which does not necessarily mean going back to college, could mean taking online courses to develop depth in areas where you might be behind. Certificate programs can be a real path to updating your skills.

Today, the thing is AI. Developers do not just learn programming languages anymore. You need to be learning how to interact with AI and use it effectively to develop better software. The way to really understand these systems requires the ability to reason across all of the abstraction layers, from the software framework at the top level all the way down to the hardware that runs the code. You do not need to break out the assembly code generated by your build tools, though that is sometimes valuable and can be very helpful, either for learning purposes or if you are really in a hot inner loop that needs maximum optimisation. More often it is about understanding the constraints, how the processor works best with pipelines and caches, and orienting your code to work within those environments.

It is also becoming increasingly critical to understand heterogeneous computing environments. It is not just writing code that runs on a CPU. You might have code that interacts with a GPU for a parallelised algorithm, whether it is a language model or something else. And there are specialised accelerators that may be implemented within large-scale systems that speed up specific parts of the operation. There is a lot to learn, and it takes curiosity and sustained attention to stay current.

Q. How would you explain the CPU versus GPU distinction to a senior software engineer who has never had to care about it before?

Jim Ledin: A CPU is optimised for low-latency execution of complex branching code. Branches do have an impact on performance, but CPUs are designed to handle that and minimise it. GPUs work best with highly parallelised, high-throughput execution of linear code, operating on massively parallel workloads. GPU cores work best when they are going through parallel streams with minimal branching.

If you are developing an algorithm and you are not sure whether it should run on a CPU or be split between a CPU and a GPU, the GPU only really becomes attractive when you have enough work for it to do that it can be parallelised, and enough that it will amortise the costs associated with moving data onto the GPU, launching the kernels to execute the code, and doing the management work to transfer data to and from the GPU.

The GPU is really not a general purpose computer. It is more of a specialised device that needs to be managed by something else. You cannot write a program that just runs on a GPU. It needs to be started and managed from a CPU, and you need to get enough benefit from the work you are doing to make all of that worthwhile. If you cannot keep the GPU busy with this kind of work, the CPU implementation may actually win, because it avoids the data transfer and scheduling overhead entirely.

Deep Engineering #45: Francesco Ciulla on Building Production Systems in Rust Without the Expensive Rewrite

Saqib Jan — Thu, 30 Apr 2026 16:32:00 GMT

View the latest HubSpot Developer Platform updates in Spring Spotlight

See what’s new for the HubSpot Developer Platform!

Explore Updates

✍️ From the editor’s desk,

Welcome to the 45th issue of Deep Engineering!

The TIOBE Index for April 2026 puts Rust at number 16, up from 18 last year. While the community widely expected it to break into the top 10, that momentum has slowed. TIOBE attributes this to adoption friction, noting that broader mainstream uptake has proven harder to achieve than the language’s early trajectory suggested.

The institutional picture, however, tells a different story. The NSA and CISA issued joint guidance in June 2025 urging organisations to adopt memory-safe languages for national security systems and critical infrastructure. Google’s Android security team reported that memory safety vulnerabilities, which accounted for 76% of Android bugs in 2019, fell below 20% for the first time in 2025 after the team prioritised writing new code in memory-safe languages.

The Linux kernel maintainers also made Rust permanent, making it a core part of the kernel. The argument about whether memory-safe systems programming languages belong in production is settled at the institutional level. The harder argument, therefore, is how engineering teams adopt it without the expensive, disruptive projects that give the language an undeserved reputation for being hard to introduce.

Francesco Ciulla, author of The Rust Programming Handbook and head of developer relations at Zerops, directly challenges that perception. His framework starts not with the language’s strengths but with the one service in your system where Rust would make an undeniable, measurable difference. That is what this issue covers.

Let’s get started.

Architecting Production-Ready APIs for Agents

2 FOR 1 deal is also live. Bring a colleague free and learn how to design AI-ready, governed APIs

Use code DEEPENG50 for 50% off.

Expert Insights

Rust Does Not Need to Replace Your Stack to Make It Better

by with

Every engineering team that considers Rust tends to circle the same concerns. The language is difficult to learn, expensive to adopt, and more practically useful to systems programmers than to the teams building and running services at scale.

Francesco Ciulla, author of The Rust Programming Handbook, Docker Captain and head of developer relations at Zerops, says, "I've heard that conversation many times, and my response has always been that the framing is wrong from the start."

Ciulla has been building with Rust since 2022, has spoken about it internationally at conferences, and his perspective on Rust adoption is shaped less by enthusiasm for the language and more by a practitioner’s view of where it actually earns its place in a production system. That starting point matters, he says, because the teams that struggle with Rust adoption tend to start from the wrong question.

A joke that contains a kernel of truth

People in the Rust community have long joked about rewriting everything in Rust, and the memes around it have become something of a cultural shorthand for over-enthusiastic adoption. Like most good jokes, Ciulla acknowledges, it contains a kernel of truth. But the practical lesson is the opposite of what it implies. “The best way to introduce Rust in a big project is to find that hard part that is slowing things down, the bottleneck of all your services, and try to write one single service in Rust.” The rewrite everything instinct is how adoption projects become expensive and difficult to justify. The bottleneck-first instinct is how teams get a proof of concept that demonstrates real value before committing to anything broader.

The practical implication is that the adoption decision is not a language decision at the organizational level. It is an engineering decision at the service level. The question is not whether the organization should adopt Rust. The question is whether there is one service in the system that is slow, resource-intensive, or difficult to keep stable, where the properties Rust offers would make a measurable difference. If that service exists, it is the right place to start. If it does not, the case for introducing Rust at all is weaker than it might appear.

Ciulla’s production experience makes this concrete. Running a Rust web server on his own machine, he shared during our conversation that it used four megabytes of RAM in development and five in production. On a one-gigabyte droplet, that means more than 200 services running simultaneously in idle. That number is the kind of resource profile that changes what is economically viable to deploy, and it is the kind of argument that lands differently with an ops team than a language comparison ever could.

Flat latency is a real engineering argument

One of the most underappreciated technical arguments for Rust in production systems is not about speed in the raw throughput sense. It is about predictability. Languages that rely on garbage collection, including Go, Java, and Node.js, introduce periodic pauses when the collector runs. Those pauses can last hundreds of milliseconds. An HTTP request that arrives during a GC cycle experiences higher latency than one that does not. The user on the receiving end did not do anything differently. They were just unlucky.

Ciulla is candid about what this means in practice. “By not having a garbage collector on the back end side, you basically have flat latency. You don’t rely on luck, or on the user not being the unlucky one. It’s a problem that is removed.” For most web applications running at moderate scale, this distinction is invisible. For services with strict latency requirements, high concurrency, or SLAs that depend on consistent tail latency rather than average response time, it is one of the more significant architectural arguments available.

This connects to a broader point about where Rust earns its place and where it does not. The resource efficiency and latency predictability are not arguments for using Rust everywhere. They are arguments for using Rust in the specific services where those properties matter. A service that scrapes a website once a month does not need flat latency. A service handling a million concurrent users does. Knowing the difference is what separates a good adoption strategy from an expensive experiment.

Rust, when it is the wrong choice

Ciulla is honest about the cases where Rust is not the right tool, which is part of what makes his advocacy for it credible. If a team needs something simple and the deadline is tomorrow, Rust is probably the wrong choice. If a developer needs a working API by the end of the day and has no Rust experience, this is not the moment to start learning the language under delivery pressure. “When you need something simple, and you’re familiar already with Java or JavaScript, why don’t you use it?” The question is not rhetorical. It is the right question to ask before any technology adoption decision.

The ecosystem argument is also honest. Python has better libraries for data science. JavaScript has a larger package ecosystem for certain kinds of web work. Rust integrates well with other languages, but if what a team needs is native to another ecosystem, that is a real constraint rather than a preference. Good engineers use the right tool for the problem. The case for Rust is strongest when the problem involves performance, memory efficiency, or concurrency at a level where other languages start showing their limits.

The shepherd principle

One of the more practical observations Ciulla makes about organizational adoption is about knowledge rather than tooling. The bottleneck to Rust adoption at scale is rarely the language itself. It is whether the organization has someone who knows it deeply enough to validate the work being done in it. He draws the parallel to Docker adoption at the European Space Agency, where he worked and observed the tool move slowly not because of anything wrong with Docker, but because it was not well understood internally. The technology is never the problem, he points out. The knowledge is.

“You need the validation of an expert,” Ciulla says. In the era of AI-accelerated development, this point is sharper than it has ever been. Teams can now generate Rust code with AI assistance far faster than their ability to validate it has grown. That gap between generation speed and validation depth is where production incidents come from. Having at least one engineer on the team who understands the ownership model, the borrow checker, and the concurrency primitives well enough to review what the AI produces is not a nice-to-have. It is the thing that determines whether the Rust service is a genuine improvement or a liability waiting to surface.

Concurrency without the trauma

Most engineers who have worked with concurrency in Java or C++ carry a specific kind of wariness about it. The mental model for concurrency in older languages is that it is an advanced topic requiring extra care, specialized libraries, and a heightened awareness of race conditions and deadlocks. Ciulla describes learning concurrency in Java at university as the final, difficult session of the course, something treated as inherently dangerous and saved for the end of the degree.

His first attempt at a concurrency example in Rust produced the opposite experience. “When I had to teach concurrency in Rust in a YouTube video, I made an example in three minutes. I was done. I say okay, the basic example really lasted like two or three minutes because you just declare a couple of threads and literally done.” That experience reflects something structural about how Rust was designed. The language was created after multi-core processors were already standard. Concurrency was not retrofitted onto a model designed for single-threaded execution. It was built in from the start, and the ownership system that prevents data races at compile time is the same ownership system that governs memory safety everywhere else in the language. There is no separate concurrency model to learn. The properties that make Rust memory safe are the same properties that make concurrent code safe.

For teams building services that need to use available CPU resources efficiently, this is not a minor ergonomic improvement. It means that the gap between writing concurrent code and writing correct concurrent code is substantially smaller in Rust than in the languages most engineers have used before. The cost of concurrency, measured in debugging time and production incidents rather than lines of code, is genuinely lower.

The compiler is the most patient teacher on your team

The reputation Rust has for being difficult to learn is real, and Ciulla does not dismiss it. But his explanation for where the difficulty actually comes from is different from the common framing. The problem is not that the concepts are inherently harder than those in other languages. It is that Rust forces you to unlearn patterns that other languages allowed. Engineers who have spent years in C++ or JavaScript carry assumptions about how memory works, how mutability is managed, and what the runtime will silently fix for them. Rust does not fix those things silently. It surfaces them at compile time and requires you to address them explicitly before the code runs.

That shift in where the pain lands is the key insight. “Rust is not hard to learn,” Ciulla says. “It’s different. And this is how we should advocate for it.” The difficulty is front-loaded by design, because the language makes a deliberate trade of more friction during development in exchange for fewer failures in production. Teams that have spent significant time debugging null pointer exceptions, race conditions, or memory leaks in production understand this trade intuitively. The hours lost to a null pointer exception in production dwarf the hours spent fighting the borrow checker upfront.

The Rust compiler, which is the primary source of that upfront friction, is also the primary teaching tool the language provides. Error messages in Rust are unusually detailed and specific. They do not just tell you that something is wrong. They explain what rule was violated, show the relevant code, and often suggest a fix. Ciulla describes the compiler as a teacher rather than a gatekeeper, one that overcommunicates in the same way a good mentor does. “The errors in Rust are basically tutorials. They are helping you to write better code.” For teams introducing Rust to engineers who have not used it before, this property matters practically. The compiler is doing a significant part of the knowledge transfer work that would otherwise fall on the senior Rust engineer on the team.

Rust in the Linux kernel is paying off

The decision by the Linux kernel maintainers to not only allow Rust in the kernel but to plan for components that require it going forward is the kind of institutional endorsement the language community has been waiting for. Ciulla frames it clearly. Even if the experiment had failed, just the fact that Rust was considered a viable option for kernel-level work would have been a meaningful milestone. The language was competing in a domain that had been exclusively C and C++ territory for decades, and it earned a permanent place there.

For engineering leaders tracking where the industry is moving, this matters beyond the kernel itself. Government systems, military applications, and other security-critical domains are beginning to treat Rust as a default rather than an experiment. “Rust is slowly getting adopted at bigger and bigger levels,” Ciulla says. The adoption curve is not linear, but the direction is consistent. Teams that build internal expertise now are not chasing a trend. They are positioning ahead of a transition that is already underway in the most demanding environments in the industry.

The ecosystem argument is changing

One of the historically valid objections to Rust for web development was tooling maturity. Two years ago, Ciulla would not have committed to shipping a production SaaS product in Rust. Today he would, and the reason is specific rather than general. The Axum framework has matured to the point where it is a production-grade choice for web APIs, and the broader ecosystem around async Rust has improved substantially. “In 2026, I will use it,” he says of building a paid product with Rust as the backend, dropping the qualification he would have applied even a year earlier.

The toolchain story is also one of Rust’s genuine advantages for teams evaluating the full cost of adoption. Cargo handles dependency management, building, testing, and documentation in a single integrated tool. There is no equivalent of the npm versus yarn versus pnpm decision that teams arriving to JavaScript have to navigate before writing a single line of code. Running tests is cargo test. The integration is native to the language, not an ecosystem of competing choices layered on top of it. For teams that have spent time debugging JavaScript build configurations, this is not a small thing.

Rust in the AI accelerated development era

Ciulla makes an argument about Rust and AI that is worth sitting with. The claim is not that Rust is better for writing AI applications, though he has views on that too. The claim is that Rust may be one of the best languages to work in during the current period of AI-assisted development, specifically because of what it requires from the engineer reviewing AI-generated code.

When AI writes Rust code, the engineer validating it still has to understand ownership, borrowing, and the type system well enough to know whether the generated code is correct. The compiler will catch a large category of errors, but the human reviewer still needs to understand why the compiler is happy with a piece of code before shipping it. “If you have no control, either you are useless or you cause a problem. So in both cases, it’s not a good time for you.” That discipline, the requirement to understand what the code actually does rather than just accepting output that compiles, is not a burden unique to Rust. But it is more explicitly enforced by the language than in most alternatives, and that enforcement is valuable at a moment when the volume of AI-generated code is increasing faster than the average team’s ability to review it carefully.

🛠️ Tool of the Week

wrkflw — open-source tool for validating and running GitHub Actions locally

wrkflw lets you validate and execute GitHub Actions workflows on your local machine before pushing, catching configuration errors and pipeline failures before they reach CI. Version 0.8.0 shipped this week. Built in Rust.

Validates GitHub Actions workflow syntax locally before pushing to CI
Runs multi-step jobs and matrix builds without cloud dependency
Fast startup and low resource overhead from Rust’s binary compilation model
Catches pipeline failures early, reducing the feedback loop between code and CI

Learn more about wrkflw

📎 Tech Briefs

Linux 7.0 ships with Rust as an official core kernel language - Rust loses its experimental tag in Linux 7.0, reaching full parity with C for kernel development after the Linux Kernel Maintainers Summit decision in December 2025.
Rust 1.95.0 released - The latest stable release introduces cfg_select!, a compile-time macro for conditional configuration, and removes unstable support for custom target specifications on stable toolchains.
crates.io opens new Svelte-based frontend for public testing - The Rust team invites the community to test the rebuilt crates.io frontend ahead of the planned production migration.
Cargo stabilises build.warnings configuration - The build.warnings field is now stable, giving teams a standardised way to configure compiler warning behaviour across workspace builds.
Zed 1.0 ships - The Rust-built code editor reaches stable release with GPU-accelerated rendering, real-time collaborative editing, Git integration, and native AI assistant support across macOS, Windows, and Linux.

That’s all for today. Thank you for reading this issue of Deep Engineering.

We’ll be back next week with more expert-led content.

Stay awesome,

Saqib Jan

Editor-in-Chief, Deep Engineering

If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.

Thanks for reading Packt Deep Engineering! Subscribe for free to receive new posts and help grow our work.