Packt Deep Engineering: Engineering Leadership

How SUSE Runs AI Without Losing Control

Saqib Jan — Wed, 24 Jun 2026 20:20:29 GMT

Most companies adopting AI are making a quiet trade in exchange for speed. They accept that their data passes through systems they do not control, priced on terms they did not set, governed by rules someone else can change. For a consumer app that trade is often fine. But for an enterprise running infrastructure under compliance regimes, it is not. Rick Spencer, General Manager for Technology and Product at SUSE, has for the last two years been working out how an open source enterprise adopts AI at scale without making that trade, and his approach is a useful template for any organization that takes data sovereignty, auditability, and cost predictability seriously.

SUSE’s position is unusual in a way that sharpens the problem. “All the software that we write is open source,” Spencer explains. “We’re not worried about, oh, we leaked the code, we publish the code.” The concern is the data that belongs to others. When an engineer debugs a customer environment, the logs are not SUSE’s to hand to a third-party model. “They trust us to not do those kinds of things,” he says, and that trust is the thing the entire approach is built to protect.

You can watch our full interview or read the Q&A article here.

Sovereignty means how, not whether

The first instinct when AI collides with compliance is to draw a boundary around where AI is allowed to go. Spencer rejects that framing, saying “I don’t think it’s can or cannot. It’s more so how.” The distinction matters because a can-or-cannot policy ends with engineers either blocked from useful tools or quietly routing around the rules. A question of how keeps the capability available while controlling the conditions under which it runs.

The clearest illustration is SUSE’s build process. A lot of what the company ships is built in an internal instance of Open Build Service, and the defining property of those builds is that they happen offline. “All the builds are offline. They literally are not connected to the internet,” he points out. “This is super important because you need to be able to prove that nothing happened during the build process.” Proving a negative is far easier when there was no connection through which anything could have happened. It means doing things the hard way, making sure every source is present ahead of time because nothing can be pulled live during the build, but the payoff is provable integrity.

Applying AI inside that environment is where the how becomes concrete. The example Spencer gives is backporting a patch to previous stable releases, the kind of repetitive, knowledge-intensive work AI is well suited to. The question is whether it can be done in a sovereign way, and his answer is yes, on conditions. “As long as we are running AI in a way that it’s able to run disconnected from the internet, and we can have complete visibility into everything it’s doing.” SUSE goes further than running existing models in isolation. “In some cases we even train our own models to accomplish these things,” he shares, “and that way we know the model doesn’t have some naughty time bombs built into it.” Because owning the model end to end removes the last category of thing the enterprise would otherwise have to take on trust.

The foundation under all of this is SUSE AI, the company’s own stack for running AI workloads on private infrastructure, which it uses heavily internally and runs Llama on. “It’s all within our private infrastructure, so we make sure there’s no chance that any data can escape,” Spencer says. “We only use models which can be vetted effectively.” The principle is consistent. Keep the data inside the boundary, and only run models you can actually inspect.

MCP as the control layer, not just the connector

Most of the conversation about Model Context Protocol treats it as plumbing that turns a chatbot into an agent by giving it tools to act with. Spencer agrees that is what it does, but his more interesting argument is that MCP is where an enterprise regains the control that autonomous agents would otherwise erode. “MCP servers do another really important thing,” he explains, “which is provide a place where you can, as an enterprise, bring some sanity and control to the usage.”

The mechanism is straightforward once you see the server as infrastructure rather than glue. “If you have MCP servers running, they’re just servers,” Spencer says. “That means you can provide access ACLs to them.” A server can be told that a given user’s agent may use these tools and not those. The usage can be logged. Gateways can sit in front, and the company runs its own alongside a partnership with StackLok. The architectural rule that holds it together is that the language model never touches tools directly. “You don’t give the LLMs access directly to tools, only the MCP servers,” he reasons, “and then you can have that oversight, meet your compliance needs.”

He takes the containment idea down to the operating system. “You can put the MCP server, I call it, in jail,” Spencer says, describing a systemd process scoped to present only the compute resources the server actually needs. The reasoning is a security posture rather than a convenience. “For every MCP server you’re running, there’s an LLM out there that’s trying to use it, and who knows what kind of prompt injections people are running.” The same boundary defends against the model’s own failures, not only malicious input. An agent cannot delete a production server with a tool it was never given. “They guard against things like the AI hallucinating something and deleting your production server, because you simply don’t provide that tool to it,” he warns. Control here is not a policy document. It is the set of tools the agent is and is not handed.

There is a quality dimension to MCP that reinforces the control argument, because a well-built server encodes expert knowledge rather than leaving the model to guess. SUSE ships MCP servers with its products, crafted by the people who know those products best. “It would be like, instead of you sitting down in front of a chatbot saying I need to figure out how to use Rancher, you’re sitting down with the whole Rancher development team telling you how to prompt the chatbot,” Spencer says. An agent working against an expert-built server is not interpreting raw APIs and making guesses a human cannot easily validate. The encoded knowledge makes the agent both more capable and more predictable, which is control of a subtler kind.

Cost sovereignty belongs in the same conversation

The control story is incomplete if it stops at data and governance, because an enterprise that cannot predict its AI spend has lost a different kind of control. Spencer folds cost into the sovereignty argument directly. “Sometimes they call it cost sovereignty,” he says, “because no one can come back later and say, oh, by the way, we’re changing our model.” He has watched a supplier move developers from seat-based to usage-based pricing, a shift his teams did not control and could not prevent. Hosting your own infrastructure changes the nature of the exposure. “There’s a maximum cost there,” he notes of self-hosted AI, and the question shifts from whether you will overrun a variable bill to whether you have the observability to use a fixed capacity fully.

For the cases where variable cost is unavoidable, SUSE uses circuit breakers that cut off runaway spend in real time when usage spikes past a threshold in a given minute. Spencer is honest that this frustrates engineers who get rate limited mid-task, but the alternative is an autonomous agent running up cost with no human in the loop. The same discipline runs through the company’s use of frontier models, reserved for high-value strategic work rather than routine completion, and through the practice of using a frontier model once to build an agent that then runs on a cheaper model or a private one. Each of these is the same instinct expressed at the level of money, keeping the expensive capability available for the work that justifies it and putting hard limits around everything else.

What makes the SUSE approach instructive beyond its own walls is that none of it depends on slowing engineers down. The sovereignty, the MCP governance, and the cost controls exist so that engineers can move fast inside a boundary the enterprise can actually stand behind. “You don’t want to stop them from getting that 100X improvement,” Spencer says. “You need to give them the right tools for the job.” Control, in his telling, is not the opposite of speed. It is the thing that makes speed safe to allow.

Compute Obsession Is Slowing Down AI Systems

Saqib Jan — Tue, 26 May 2026 05:30:00 GMT

Engineers building AI systems today tend to focus on compute first. It is typically about how many GPU cores, how many parameters, how much VRAM, and how to extract more from all of it. While the benchmarks are about throughput and inference speed, the infrastructure conversations are about scaling horizontally across more hardware.

Jim Ledin, a seasoned engineering leader, CEO of Ledin Engineering and author of Modern Computer Architecture and Organization (third edition, Packt), thinks that framing misses the most important constraint in production AI systems. The bottleneck holding back real-world AI performance is not compute but data movement.

“Data movement can often be more expensive than the actual computation steps,” Ledin says. “The latency, especially moving large data structures across different levels of the memory hierarchy, can dominate and leave a lot of your compute bandwidth idle.” This is not a niche embedded systems concern. It is happening in the largest AI deployments in the world, and it is the reason hardware vendors like NVIDIA are designing systems the way they are today.

Continue reading or watch the full conversation with Jim Ledin below.

Memory bandwidth is slowing your AI system more than your GPU is

When a CPU or GPU requests data from memory and that data is not available in cache, the processor waits while the computation units sit idle. In a consumer application, that idle time seems like a minor inconvenience. But in an AI system processing large tensors continuously, it accumulates into a significant fraction of total runtime.

“AI workloads are becoming increasingly memory bandwidth limited,” Ledin shares, pointing to a dynamic that is reshaping how AI hardware gets built. “It is taking more time to bring data into the GPU or TPU memory than it is taking for the computation to take place on the data.” The raw ability to multiply matrices is no longer the binding constraint. But getting the data to the multipliers fast enough is.

This is exactly why high bandwidth memory exists. HBM modules are stacks of RAM chips built into a cube, physically close to the processing units, with far higher data transfer rates than conventional DRAM. “On a TPU card, you typically have several of these HBM modules,” Ledin explains, “and they have a far higher data rate for transferring data in and out of the GPU processing components than on a typical consumer grade GPU.” The engineering bet being made with systems like NVIDIA’s Blackwell architecture is that memory bandwidth is worth more than raw core count, because the cores are already faster than the data can reach them.

But there is a side effect that touches anyone buying consumer hardware. “A lot of the production capacity for memory is going into these high bandwidth memory modules, which cost a lot more for the purchaser and make a lot more money for the vendor,” Ledin observes. That is a direct reason DDR5 has been difficult to find and expensive when available. The memory fabs are prioritizing the more profitable HBM production, and consumer DRAM is downstream of that decision.

The hardware cost your cloud bill is hiding

Most software engineers, especially those working in cloud environments, treat the hardware as someone else’s concern. The abstraction is good enough, the managed services handle the infrastructure, and the code runs somewhere. Ledin’s argument is that this hands-off relationship with hardware has a real cost that shows up in performance and in cloud bills.

“If your code is accessing memory in inefficient patterns, if you are not using the cache memory within the processor in an effective manner, and if you are just moving data around more than is necessary, that can all have significant performance impacts,” he warns. The CPU requests data from memory, and if it is not in cache, it waits. “A lot of the time it is unavoidable, but the amount of latency can be minimized by different ways of optimizing algorithms.”

The mechanics are specific. When a modern CPU reads from DRAM, even a single byte triggers a 64-byte cache line transfer. The processor brings in a block of adjacent memory whether it needs all of it or not. If the algorithm then jumps to a different memory location, causes that block to be evicted from cache, and later needs it again, it has to re-read it from DRAM. That is wasted time. “For best efficiency, you would want your code to be working with data from that block before it moves on to something else,” Ledin explains, “rather than bouncing around to other memory locations.”

In a cloud environment, this inefficiency does not just slow things down. It costs money, and there is no incentive for cloud providers to surface it clearly. “You are paying for the usage of the system whether the CPU is actually crunching instructions or the CPU is idle waiting for a data item to come in from memory,” he points out. The cloud bill does not distinguish between productive cycles and stall cycles. Engineers who understand cache locality can write code that reduces stalls and therefore reduces cost, not just latency. Optimizing for cost comes down to understanding your memory access patterns and engineering around them, not just choosing the right managed tooling stack.

Drawing from his engineering work across embedded and production systems, Ledin shares a useful example. A Linux web server called Tux, which ran in kernel space to avoid user-to-kernel data transfers, developed a performance problem under high load because its per-request state data grew large enough to exceed the CPU’s level two cache. “Performance dropped off sharply,” he recalls. Engineers analyzed the cache behavior, restructured the data layout to keep per-request state smaller, and did the same for instruction caching by batching related processing together. “Fixes that they implemented increased the application performance by about 40%.” No new hardware, no architectural overhaul. Just understanding where the memory ceiling was and designing around it.

GPUs are the right tool, but not always for the reason you think

The assumption that GPUs are the correct architecture for AI workloads is not wrong, but it is incomplete in a way that matters for engineers making infrastructure decisions. Ledin draws a distinction that is often glossed over in the mainstream conversation about AI hardware.

“GPUs are probably the ideal architecture today for people and small companies that want to run language models locally,” he says, drawing from personal experience. He recently ran the Gemma 4 26-billion-parameter model on an NVIDIA RTX 4090, and for that use case the GPU is the right tool. But for larger-scale deployments running the much larger frontier models, the picture is different. “The trend there is for dedicated TPUs,” he notes.

The distinction matters because GPUs carry silicon dedicated to graphics work that has nothing to do with tensor operations. A consumer GPU has hardware for real-time video rendering, gaming pipelines, and display output. A TPU does not. “TPUs do not use up silicon for that purpose and focus everything on the tensor work,” Ledin explains. When you are running thousands of inference requests at scale, that difference in silicon allocation translates directly into efficiency at the workload that actually matters.

There is also the SIMT execution model to understand. Modern NVIDIA GPUs run 32 threads in lockstep, all executing the same instruction on different data streams simultaneously. This is efficient for linear, parallel workloads. When those threads hit a branch, a conditional where some threads take the if path and some take the else path, the hardware executes one side then goes back and executes the other. “You basically have effectively a pipeline stall where it has to go back and execute a different thread in that kind of situation,” Ledin highlights. The flexibility is there, but it comes at a cost. “Avoiding branching if possible can have a significant impact on performance.”

For engineers deciding where to run inference workloads, Ledin offers a practical heuristic. “The GPU only really becomes attractive when you have enough work for it to do that it can be parallelized and enough that it will amortize the costs associated with moving data onto the GPU, launching the kernels, and doing the management work to transfer data to and from the GPU.” If the workload is not large enough to keep the GPU busy, the CPU implementation may be faster because it avoids all that overhead entirely.

Frameworks are hiding costs that engineers need to see

Frameworks and libraries have made it possible to build sophisticated AI systems without ever thinking about what is happening in hardware. That is mostly a good thing. The abstraction accelerates development and reduces mistakes. But there is a point where abstraction stops being a benefit and starts hiding costs that need to be visible.

“Where it becomes dangerous to use too much abstraction is when it obscures what is happening with the data layout in memory and the execution patterns,” Ledin cautions. In performance-critical applications, the framework is making decisions about how data is structured and how the processor interacts with it. If the engineer does not know what those decisions are, they cannot tell when they are working against the hardware.

The practical approach Ledin recommends is a two-layer architecture. “Use the most expressive code at the edges of the system, and in the core, use more performance-aware code.” The boundary between those layers is not always obvious in advance, and finding it usually requires benchmarking rather than reasoning. But the principle is clear: abstractions are appropriate where they preserve meaning across the team, and they become a problem where they hide costs that affect the system’s ability to meet its requirements.

One specific pattern worth knowing is the array of structures versus structure of arrays tradeoff. A common data layout is an array of objects, where each object holds all the fields for one entity. For CPU cache efficiency, it can be significantly better to restructure this as a structure of arrays, where each field is stored as a separate array for all entities. “That might have a big impact on performance,” Ledin notes, because the CPU cache loads contiguous memory, and if the algorithm is operating on one field across many entities, the structure of arrays layout means each cache load is full of useful data rather than fields the algorithm is not touching.

The skills that will matter when the hardware changes again

The specific technologies that matter five years from now are difficult to predict, and Ledin is honest about that. “Four years ago when the previous version of my book came out, it was not at all clear to me, or I think a lot of people, what was going to be happening with AI in the coming years,” he says. Predicting which hardware architectures or AI frameworks will dominate is not the point. Building the mental model to understand them when they appear is.

3rd Edition

The foundational skill is the ability to reason across abstraction layers. “The way to really understand the system requires the ability to reason across all of the abstraction layers from the software framework that you are working on at the top level, all the way down to the hardware that runs the code,” Ledin underscores. That does not mean reading assembly code for every application. It means understanding how pipelines and caches work and orienting code to work within those environments rather than against them.

The other shift is heterogeneous computing. Writing code that runs on a CPU is no longer sufficient context for many engineering problems. “It is also becoming more critical to understand heterogeneous computing environments,” Ledin says. “It is not just writing code that runs on a CPU. You might also have code that interacts with the GPU if you are running a parallelized algorithm on that, whether it is a language model or something else.” Domain-specific accelerators, TPUs, RISC-V implementations, and specialized inference chips are all becoming part of the environments that production engineers have to reason about. The engineers who will be most effective in that landscape are the ones who understand why those architectures make the tradeoffs they do, not just how to call their APIs.

This article is based on Deep Engineering #46. You can read the full issue, including additional insights from Jim Ledin on modern computer architecture and AI infrastructure,

Why Senior Engineers Fail System Design Interviews

Saqib Jan — Tue, 19 May 2026 20:19:10 GMT

Most engineers presume that because they know their tech stack well enough, the system design interview will be easy. And why should they think any differently. They have shipped distributed systems at scale, debugged race conditions at 3am, and made the architectural calls that kept production stable under pressure. But then they walk into a system design interview with confidence and walk out having failed, often without understanding exactly why.

Archit Agarwal, Principal Member of Technical Staff at Oracle where he builds ultra-low-latency authorization services in Go, has interviewed hundreds of engineers. His observation about why experienced engineers fail is the most direct and honest assessment of this problem: they do not fail because they do not know what Kafka is or how DynamoDB handles consistency. They fail because of how they communicate. That single factor, how clearly and deliberately an engineer narrates their thinking, determines the outcome of most system design interviews more than any technical knowledge does.

They jump to solutions before understanding the problem

Agarwal described a pattern he sees play out repeatedly across interviews at every level of seniority. An interviewer gives a problem and within thirty seconds the candidate is already saying “I’ll use Redis, I’ll use Kafka, let’s go with microservices.” The interviewer has not said anything about scale. The candidate has not asked how many users the system needs to support, whether it is read-heavy or write-heavy, what the latency requirements are, or whether there are compliance constraints based on the geography of operation. They have skipped the part of the conversation that actually determines what should be built.

Those questions are not warm-up questions. They are the questions that drive the architecture. Nonfunctional requirements determine architecture, not the other way around. How many requests per second, what consistency model you need, whether you have a strict latency ceiling, these are the inputs. The architecture is the output. Engineers who skip to the output without gathering the inputs are designing in a vacuum, and the interviewer can see it the moment it happens. Agarwal’s recommendation is to spend the first one to two minutes of any system design interview doing nothing but alignment: gather functional requirements on what is being built and what the user actually needs, then gather nonfunctional requirements on scale, consistency, latency, and compliance. If you ask the right questions in those first two minutes, Agarwal says, you have already impressed the interviewer. They are listening properly now, engaged, and following where you are going rather than waiting for you to stumble.

They design everything at Google scale

Senior engineers have worked on large systems and that experience is genuinely valuable, but it also creates a bias that hurts them in interviews: the instinct to design for the most demanding possible version of any problem, whether the problem actually requires it or not. Agarwal is direct about this. Not every system needs to scale to Google. If you are designing an internal tool that will only ever be used by your company’s engineers, you do not need multi-region deployment, and you do not even need cloud infrastructure. You could run it on a local area network and it would be perfectly adequate for the problem at hand. The engineer who reaches for global infrastructure for a problem that does not need it is demonstrating a failure of judgment, not a depth of knowledge.

Good system design is about matching the architecture to the requirements you gathered in those first two minutes, not about showcasing every pattern you have ever learned across a career. The interviewer is not evaluating whether you know how to design at Google scale. They are evaluating whether you understand when to use which level of complexity and why, and that distinction is entirely invisible if you default to maximum complexity regardless of the constraints in front of you.

They go quiet when they are thinking

Senior engineers are often comfortable sitting with a difficult problem for several minutes before speaking, and in a production context that is a perfectly reasonable way to work through something complex. In a system design interview it reads as disengagement, and the interviewer has no way to tell whether you are making progress or whether you are stuck. Agarwal uses a phrase that reframes what good communication looks like in this context: the interviewer needs to be able to follow your brain’s commit history. Every decision you make, every trade-off you consider and reject, every assumption you surface and then validate or invalidate, should be spoken out loud as you make it, not as a performance or a monologue but as a live narration of your actual reasoning as it happens.

This serves two distinct purposes. It gives the interviewer genuine insight into how you think rather than just what conclusion you eventually reached, which is what they are actually evaluating. And it forces you to be more precise about your own reasoning, because articulating a decision out loud surfaces the assumptions underneath it in a way that thinking silently does not. Agarwal’s observation is that engineers who think out loud often catch their own errors in real time and self-correct naturally, and that self-correction is not a weakness. It is exactly the kind of flexible, honest thinking the interviewer is looking for.

They defend their design when constraints change

Experienced engineers have ownership instincts built over years of shipping and defending decisions in production. When they have built something they defend it, and in most professional contexts that instinct is appropriate. In a system design interview it becomes a liability the moment the interviewer introduces a constraint change mid-session, which Agarwal says he genuinely enjoys doing precisely because it reveals something important about the candidate.

Changing constraints are the normal reality of production engineering. Requirements shift, scale changes, new compliance requirements appear, and the ability to absorb a change, restate it clearly to confirm alignment, identify which parts of the design need updating and which parts remain intact, and then restructure calmly is exactly the capability that distinguishes an engineer who can operate in a real production environment from one who can only design under controlled conditions. The engineers who struggle here are the ones who treat the curveball as an attack on their design and respond by defending the original rather than adapting to the new information. Agarwal’s point is unambiguous: the interviewer is not trying to invalidate your architecture. They are trying to see whether you can hold your design lightly enough to change it when the situation demands it, which is something you will be required to do repeatedly in any engineering role worth having.

They use jargon to sound credible instead of clarity to be understood

Senior engineers have large vocabularies built from years of working across complex systems. Distributed systems, eventual consistency, CQRS, saga pattern, two-phase commit. These are real concepts with real meanings and knowing them is genuinely useful. But using them in rapid succession without grounding them in the specific problem being discussed is a signal that the engineer is performing knowledge rather than applying it, and experienced interviewers recognise the difference immediately.

Agarwal’s standard for communication in a system design interview is demanding but correct: your explanation should be clear enough that even a junior engineer could follow the reasoning without needing to already know the answer. Not dumbed down, and not simplified to the point of inaccuracy, but clear enough that every choice is grounded in the specific requirements of the system being designed rather than in a general desire to demonstrate familiarity with advanced concepts. The engineers who stand out in Agarwal’s interviews are not the ones with the most impressive vocabulary. They are the ones who make him feel like he is sitting with another engineer genuinely working through a problem together, which is exactly what a system design interview is supposed to be.

The full conversation with Archit Agarwal is now live on Deep Engineering.

Rust Is Hard for the Engineers with the Most Experience

Saqib Jan — Mon, 18 May 2026 16:07:41 GMT

Rust, who would have thought, has ranked as the most loved programming language in the Stack Overflow developer survey for nine consecutive years. Honestly, I must admit this is an unusual kind of statistic because it measures not just adoption but retention. The engineers who use Rust want to keep using it, and that pattern has only deepened even as the language moved from systems programming curiosity to production infrastructure at companies including Amazon, Google, Meta, and Microsoft. Interestingly, the Linux kernel now carries Rust code without the experimental label it held for years. Debian’s APT package manager is also introducing hard Rust dependencies this year.

The performance benchmarks and the memory safety arguments have been made, tested in production, and largely validated. But what the benchmarks do not explain is why so many experienced engineers find Rust genuinely difficult to work with, why the teams that adopt it often go through a period where velocity drops before it recovers, and what it actually takes to get good at it rather than just competent. These are the questions that sit behind the adoption numbers and they matter more than the numbers do for any engineer thinking seriously about where Rust fits in their work.

Evan Williams, author of Design Patterns and Best Practices in Rust, has been writing software for more than 40 years and came to Rust while building a hardware system that needed to be rock solid and run without access in a remote location. Francesco Ciulla, author of The Rust Programming Handbook and a Docker Captain who previously worked at the European Space Agency on the Copernicus project, started publishing Rust content in 2022 and 2023, earlier than most, and has since watched the language’s adoption from the inside. Their perspectives on Rust come from different parts of the stack and different kinds of work, but on the questions that actually trip engineers up, they are in close agreement.

We interviewed Evan Williams and Francesco Ciulla separately for Deep Engineering Newsletter issues.

The engineer who struggles most is usually the most experienced one

The reasonable assumption when a team introduces Rust is that the senior engineers will pick it up fastest. They have the most context, the most pattern recognition, and the most experience navigating unfamiliar codebases. In practice, the opposite tends to happen, and both Williams and Ciulla have seen it play out firsthand.

“The more experienced you are, the more years you have doing something in some other language, the more trouble you’re likely to have,” Williams says, “because you have patterns of thought that come from those languages that you don’t even realize are there.” The problem is not that experienced engineers consciously try to apply Java or C++ patterns to Rust. The problem is that those patterns are invisible to them, baked in over years of use until they no longer register as choices at all. The engineer is not making a decision when they reach for inheritance or shared mutable state. They are doing what has always worked, and Rust will not let them.

Ciulla put it more directly. “Even if you are a senior developer, even if you have twenty years of experience, if you want to try to learn Rust comparing it to other programming languages, you will fail, because it’s like learning something which is completely new.” He makes the case that this is not a reason to avoid Rust but a reason to go in with a specific kind of openness, one that experienced engineers often find harder to maintain than junior ones do precisely because they have more to unlearn. A developer learning Rust as their second or third language has no competing mental model to discard. A senior engineer with a decade long experience in Java has to dismantle instincts that have been reliable for years before they can build new ones, and that dismantling is the work that most people underestimate going in.

Trusting the compiler is not a beginner tip

The first thing most engineers do when the borrow checker rejects their code is look for the minimum change that will make it compile. That is the right approach in almost every other language and the wrong one in Rust, and it is where a significant amount of early frustration comes from.

“The golden rule is to trust the compiler, especially at the beginning,” Ciulla says. What he means is not passive acceptance but active reading. The borrow checker is not producing noise. It is producing information about what the program’s structure requires, and engineers who learn to read it that way move through the learning curve faster than engineers who treat every error as an obstacle to clear. The difference is subtle at first and significant over time.

Williams argues that the borrow checker is doing something more useful than preventing bugs. “The borrow checker is your friend because it prevents you from making a messy design. It prevents you from making a broken design. It prevents you from writing whole classes of bugs that you will then spend many hours trying to find,” he explains. “I have found it to be an incredible partner in writing code that allows me to sleep at night.” The reason it works this way is that Rust’s ownership rules enforce a discipline that experienced engineers in other languages apply selectively and inconsistently because those languages do not require it. A value has one owner. References are either shared and immutable or exclusive and mutable, never both. The compiler will not proceed until the code is explicit about who owns what and when.

“The principles that the borrow checker forces you to adhere to in Rust are the exact principles that you should be using in every programming language,” Williams reasons. “But you don’t have to. So it’s very easy to not think about those things.” That observation reframes what the borrow checker is. It is not an imposed restriction. It is a discipline that good engineers apply in other languages by habit and judgment, made non-negotiable and automatic in Rust.

The discipline extends from individual functions to the shape of the whole system. A program that handles ownership correctly at the function level has to handle it correctly across modules, across threads, and across component boundaries, because the same rules apply everywhere. “You need to think about who controls what, how it is controlled, and you need to start from the very beginning thinking about the boundaries of your program and the system architecture, dividing things up into areas of responsibility,” Williams underscores. “Because unlike Python or Java, you can’t have links going all over the place. The borrow checker is never going to accept that.” The result is that well-written Rust systems tend toward a specific architectural shape: data flows in one direction, ownership chains move forward and do not loop back, and the behavior of the system is legible from its structure in a way that systems with shared mutable state often are not.

The most underutilized expression of what this makes possible is the typestate pattern. It uses the type system to encode the state of a value at compile time in a way that makes invalid state transitions not just errors but programs that cannot be compiled at all. Williams reflects on it with visible enthusiasm. “It’s a way of developing state machines and systems that have state that evolves where invalid state transitions aren’t just errors, they’re impossible to write. The compiler won’t compile them,” he says. “It represents a huge advance in the way that such systems are written because now instead of runtime errors, you have a state machine that is guaranteed to work because every transition either is a valid transition or it won’t even compile. That’s an amazing thing.” The pattern was not invented for Rust, but the language’s ownership system and type handling make it practical in a way that other languages do not, and for systems where invalid state transitions are genuinely dangerous rather than merely inconvenient, it is one of the most concrete expressions of what Rust makes possible.

What Rust actually gives you in production

Ciulla’s case for Rust is grounded in things he measured when running a Rust web server on his own machine. It was consuming four megabytes at rest and five in production. “If you have a droplet with one gigabyte of RAM, you can have 200 plus services,” he notes, “of course in idle, but this proves that if you have a service that consumes a lot of RAM, it is worth thinking about.” For teams running infrastructure where memory costs money and density matters, the difference between a Rust service and an equivalent service in a garbage-collected language is not marginal.

The latency story is also specific. “By not having a garbage collector on the back end side, you basically have a flat latency,” Ciulla observes. “If a user makes an HTTP request when the garbage collector starts, it will experience a higher latency. Rust removes that problem entirely.” Go and Node.js both have garbage collectors that pause for collection cycles, and even short pauses measured in hundreds of milliseconds are enough to introduce latency spikes that affect users who are unlucky enough to hit the request at the wrong moment. Rust’s absence of a garbage collector means the latency profile is predictable rather than probabilistic, which matters significantly for services where consistency is as important as average throughput.

The deployment model is simpler than most engineers expect going in. A Rust project built with cargo produces a standalone binary for the target architecture, which packages cleanly into a container image. “If you build the executable when you build the Docker image, you have something which is just deployable everywhere,” Ciulla says. “A Linux executable running in a Docker container. That’s the dream.” The operational benefit is smaller images, faster startup, and a runtime with almost no overhead beyond the binary itself.

Williams approaches the production question from the correctness angle rather than the performance angle. The systems where Rust earns its place most clearly are the ones where failure has a real cost. “Systems that are mission critical in some way or other are really key Rust use cases,” he says. “All of these features combine into a whole that make Rust a really powerful language for doing things that have to work. Things where failure is monetarily or in human cost even a terrible problem.” The memory safety and the ownership model and the compile-time guarantees are not separate features. They are different expressions of the same underlying commitment: the program either demonstrates its correctness to the compiler or it does not compile.

Williams also reflects on something unexpected he discovered while writing the early chapters of his book, the ones covering what not to do in Rust. He went back and deliberately tried to write bad code, the kind of code that would illustrate the mistakes he was cautioning against, and found it harder than he expected. “When I went back and tried to write bad code in Rust, it was much harder than writing the good code,” he recalls. “That’s an interesting perspective that just didn’t even occur to me.” The language’s constraints push code toward a particular shape so consistently that departing from it requires actively working against the grain of the language rather than simply making a poor choice.

“The biggest benefit in Rust is about the lack of the debugging depth. You spend more time thinking up front, but you spend almost zero time chasing segfaults or memory leaks in production,” Ciulla remarks. “And we always underestimate this part. We always talk about the efficiency of the code, but if you need less time to debug your code, you’re basically writing more logic at the end of the day.” The upfront investment in getting the types and the ownership right is real, but the downstream debugging cost it removes is larger and does not diminish as the team becomes more experienced. It is simply gone.

Where to start and where Rust is the wrong tool

On the practical question of how to bring Rust into an existing codebase, both Williams and Ciulla give advice that converges almost exactly despite coming from different engineering contexts. Neither recommends starting with a rewrite.

“The best way to introduce Rust in a big project is to find that hard part that’s the bottleneck and try to write one single service in Rust,” Ciulla says. “And then you will see, probably slowly, Rust might take over your code base, but I mean this in a good sense.” Williams makes the same point with a specific warning about the temptation to go faster. “What you don’t want to do is jump into saying, we’re just going to rewrite our project in Rust now. Pick a small piece, focus on that, gain confidence and mastery of the language, and then use that to build upon it and start bringing in more things,” he says. Starting with a bounded, non-critical component gives the team room to move through the learning curve without the pressure of a production incident concentrating everyone’s attention on the wrong things.

Ciulla adds something worth noting about the AI-assisted workflow that is becoming standard for many engineers. “In this AI era, everyone is rushing stuff with AI, but you still need the validation,” he says. “Okay, AI wrote this Rust service, but now who decides if this is okay to put in production? Of course, you need the validation of an expert.” The Rust compiler catches a large class of errors automatically, but the errors that survive it, logic errors rather than memory errors, still require someone who understands the language well enough to see what the code is actually doing. Having at least one engineer on the team who knows Rust well enough to review AI-generated code is not optional.

Both are also direct about when Rust is not the right choice. Ciulla points to tight deadlines and fast prototyping as the clearest case against it. “If you need fast prototyping, you are familiar already with Java, JavaScript, why don’t you use it?” he says. “When the deadline is so close, probably it’s not the best way to try something new because something would go wrong, especially if you’re not an expert.”

Williams points to user interfaces as an area where the ecosystem is still catching up and the tooling gaps are large enough to make other languages more practical. “Doing a website in Rust is still kind of a feat,” he notes, “and it’s an awful lot easier to use the tools that everybody else is using to accomplish that goal.” The Python data science ecosystem is another area Ciulla names directly: the libraries are simply better established there, and using Rust for data science work means building against a thinner set of available tools than Python provides.

Where Is the Ecosystem Headed

Ciulla expects Rust to grow most significantly in the near term, and his prediction lands in a direction that surprises most of the Rust community. “I think the next big wave might be in web development,” he says, adding that he is aware this is an unpopular position in a community that still thinks of Rust primarily as a systems language. His reasoning is grounded in what he has been seeing directly: companies with hundreds of developers reaching out to tell him they are moving services to Rust for their web backends. “I get this news because I’m well known for talking about Rust and being quite vocal about it,” he observes. “I’m not talking about a person just doing this on a random Saturday night. I’m talking about companies that have hundreds of developers.” He points to Axum as the framework that has matured to the point where he would now use it in a production SaaS product, which he says was not true two years ago. Embedded systems, in his view, have already crossed the threshold where Rust’s place is settled.

Williams takes a longer view on how the ecosystem will evolve. “The ecosystem is going to get richer and people are going to be branching out in the set of use cases, hitting areas that right now Rust has relatively weak support for,” he says. “As larger and larger projects are built, there is going to be more refinement of the language itself, but more importantly, more refinement of the use of the language.” The patterns that make Rust work well at scale are still being discovered and codified. The language is stable, but the understanding of how to use it well is still developing, and that development is happening inside the teams building the largest Rust codebases.

What both conversations point toward is a language whose difficulty and whose value come from the same source. Rust is hard to learn for experienced engineers because it refuses to accommodate the habits that made them experienced. It is valuable in production because that same refusal, enforced by the compiler on every build, produces code whose behavior is predictable, whose data flows are legible, and whose failure modes are constrained to things the language cannot check rather than things the engineer forgot to check. The engineers who get the most out of it are the ones who stop trying to carry their existing instincts across and start letting the compiler teach them what the program actually needs.

In case you missed

Here’s the full interview video featuring Evan Williams.

Clean Code Is a Trap, Decompose Instead for Physics and Performance

Saqib Jan — Thu, 23 Apr 2026 15:15:59 GMT

Engineering teams obsess over clean code because they want software to look organized and logical in the text editor. Principles like SOLID get followed strictly, and hours get spent debating folder structures, because it feels like the disciplined way to build software. But this desire for logical cleanliness often leads into a trap where teams build systems that are beautiful to read but terrible to run.

The most maintainable codebases are not the ones that adhere to a style guide. They are the ones that respect the physical and cognitive reality of the environment they live in.

We have spoken (interviewed separately) to two notable engineers who think about this from very different directions. Sam Morley, a mathematician and C++ researcher at the University of Oxford, approaches software from the ground up, where the cost of every abstraction shows up immediately in performance metrics. Sándor Dargó, a senior software engineer at Spotify who works on large-scale C++ systems, approaches it from the maintainability side, where the cost of every abstraction shows up in the engineers who have to live with the code months or years later.

Both conversations happened at different points, on different topics, but arrived at the same conclusion that logical cleanliness is not the goal. But understanding what the machine and the team actually need is.

Your CPU does not care how tidy your objects look

Sam Morley’s starting point is hardware, and his argument is that the way most engineers are taught to structure code works directly against the way processors are designed to access memory.

The instinct is to group data into objects because it models the real world. A Player class holds position, health, velocity, and inventory in one contiguous block, because those things belong together conceptually. But the CPU fetches data in contiguous blocks called cache lines, and if the object structure fills that cache line with data the processor does not need for the current operation, the application pays for it in cycles. The cost is invisible in code review but shows up immediately in a profiler under load.

Morley points to the Structure of Arrays pattern, common in game development, as the counterintuitive solution. Instead of an array of Player objects, you create separate arrays for positions, health values, and velocities. This looks messy to a developer trained in object-oriented design. It violates the instinct to keep related data together, and it produces code that does not map neatly onto the real-world entities it represents. But it allows the CPU to process data significantly faster because every byte in a fetched cache line is a byte the processor actually needs. Cache locality, not conceptual tidiness, determines throughput under real conditions.

Morley’s recommendation is direct: be willing to break clean object models when the hardware requires it. The machine is not going to adapt to the abstraction. The abstraction has to adapt to the machine. And this is not a concern limited to embedded engineers or game studios. It is a reality for any C++ system under sustained load, and the gap between what looks clean and what runs efficiently widens as the scale increases. Teams that do not understand this distinction tend to optimize the wrong things when performance problems eventually surface.

Clever code is a debt that Future You will have to repay

Morley’s second argument shifts from CPU cost to cognitive cost, and it is the more insidious of the two because it compounds slowly and invisibly until a maintenance crisis makes it visible all at once.

His framing here is precise. Future You is a completely different person who has lost all the context that made the current design feel obvious at the time it was written. The engineer writing the code holds the whole system in their head. The engineer returning to it six months later does not. And the engineer reading it for the first time never did. Every clever abstraction that felt natural in the moment of writing becomes a reconstruction problem for every reader who comes after.

Template-heavy code and metaprogramming are the most common form of what Morley calls Wizardry. The name is apt because Wizardry works by concealment. The complexity does not disappear when abstracted away. It becomes invisible until someone needs to debug or extend the system, at which point the engineer is starting from a significant disadvantage with no clear view of how data actually moves through the code. What Morley advocates instead is Process Awareness: code that exposes the data flow clearly rather than hiding it behind layers of indirection. Not short code or smart code. Code whose execution model is obvious to the next engineer who reads it, regardless of whether that engineer was involved in writing it.

The practical implication is to treat Future You as a first-class stakeholder in every design decision. And so, the documentation that explains what the code does is far less valuable than documentation that explains why it is structured the way it is, because the what is usually legible from the code itself. The why rarely is.

When cognitive load becomes your biggest bug

Sándor Dargó approaches the same problem from a different direction but arrives at the same place. His work at Spotify on large-scale C++ systems has given him a practitioner’s view of what happens to codebases over time when cognitive cost is not treated as a first-class engineering concern from the start.

For Dargó, the thread connecting clean code, binary size, undefined behavior, and C++ language evolution is a single idea: reducing complexity in real-world systems. Not as an aesthetic preference, but as a measurable engineering outcome with consequences for how fast teams can move, how safely they can refactor, and how much institutional knowledge survives when people leave. “If you think about clean code, it clearly reduces the cognitive load,” Dargó said during a recent Deep Engineering interview. “If you think about binary size, it might reduce operational cost. New standards like C++23 and C++26 reduce boilerplate and enable safer, more readable abstractions. All of these topics make large C++ systems more maintainable and more evolvable.”

The connection between these concerns is not accidental. Binary size reduction often leads teams toward simpler code as a side effect, because the practices that reduce binary size, avoiding unnecessary template instantiation, being deliberate about what gets inlined, minimizing heavy type erasure, also tend to reduce the number of moving parts an engineer has to hold in mind. The discipline required to keep a binary small and the discipline required to keep a codebase readable are more closely related than most teams realize until they have worked on both problems at the same time.

Dargó’s warning is about the human cost of poor abstraction choices, and in his experience, teams routinely optimize the wrong things because they measure the wrong variables. The heap allocation is visible. The cost of a network request made inside a loop is harder to see until a profiler makes it undeniable. Dargó during our interview cited Amdahl’s Law to make the point concrete: the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that part is actually used. The engineers spending time on heap allocations while making network requests in a loop are not being careless. They are solving the problem they can see. The discipline is in learning to find the problem that actually matters, which requires measurement rather than intuition. “If your code takes a long time to execute due to network latency, then relatively speaking, the heap allocation is not so slow anymore,” Dargó said. “Don’t worry about things that don’t really matter in a given environment.”

Write it readable first, then measure, then and only then optimize

Dargó’s practical framework for navigating these trade-offs is structured around a clear hierarchy of defaults, and the first default is unambiguous: readable code comes first.

His reasoning is grounded in a simple observation that engineering culture tends to underweight. Engineers read code far more often than they write it. Every decision that makes code harder to read imposes a recurring cost on every future reader, and that cost accumulates over the lifetime of the codebase. Defaulting to readability is not a concession to comfort. It is an engineering position with compounding returns, because code that is easy to read is code that is easy to reason about, and code that is easy to reason about is code that is safer to change.

The second principle follows directly: if optimization is necessary, measure before touching anything. The trap is optimizing before a measurement has confirmed that the thing being optimized is the actual problem. This wastes time, introduces unnecessary complexity, and often leaves the real bottleneck untouched. Measure first, identify the hot path, and only then begin the optimization work. Once the hot path is identified, keep it isolated and document the reasoning behind every trade-off made there. Not documentation that explains what the code does, but documentation that explains why it is structured the way it is, so the next engineer understands what they would be giving up if they cleaned it up.

Dargó has been on the receiving end of the alternative. He came into a codebase, saw code that looked wrong, began cleaning it up, and realized too late that the seemingly redundant choice was affecting binary size in a way that mattered for the system. Pull requests had already merged before the context became clear. “Make trade-offs conscious,” Dargó said. “Make them explicit in code reviews, but also in the code itself. If you sacrifice the clarity you aim for, document why. Because otherwise someone later will come in and make it cleaner, unaware of why certain choices were made.”

And this principle has become more critical in the age of agent-assisted development. If engineers can miss the intent behind an undocumented trade-off, then AI agent working on the same codebase will miss it with far greater confidence. Agents read what is in the code. They do not have access to the Slack conversation where the binary size constraint was first discussed, or the code review thread that resolved and got deleted. The context has to be in the code, because that is the only place every future reader, human or agent, will reliably look.

The invisible tax that is getting harder to ignore

Morley and Dargó are describing the same underlying problem from different directions. Every time an engineer has to reconstruct context that was lost, the system has failed them. Morley calls it the Future You constraint. Dargó calls it cognitive load. The mechanism is identical in both cases, and the cost is real even when it does not appear in any metric the team currently tracks.

This cost has become harder to ignore in the last year or two, and not only because systems have grown more complex. Dargó observed during the same session that the shift to AI-assisted development has made context switching materially worse for most engineers, and the profession has not yet fully reckoned with what that means for how software gets built. Engineers are managing multiple agent sessions simultaneously, jumping between prompts and code reviews, moving from one incomplete task to another before any of them reach resolution. The flow state that reliable engineering has always depended on, the gradual accumulation of a mental model, the ability to hold a system’s behavior in mind long enough to reason about it clearly, gets interrupted more frequently and at shorter intervals than at any point in most engineers’ careers.

“We became, often, just prompters,” Dargó said. “Many of us complained even before that we are living in a world of constant context switching. But it just became even worse. You keep jumping from one window to another, from one meeting to another, because others are also moving faster. At least they think they move faster.”

The irony embedded in that observation is significant. The tools promising to accelerate delivery are simultaneously increasing the interruption rate that undermines the deep work required to produce reliable software. Speed and depth are being traded against each other, and the trade is often invisible until the consequences show up in the codebase months later.

Dargó in our live interview also referenced a research finding that makes the dynamic concrete. Engineers who adopt AI-assisted workflows tend to ship more code early on, because the friction of writing has dropped. But code quality drops alongside it, and the initial speed advantage disappears within a few months as technical debt accumulates faster than it can be serviced. “In the beginning you ship more code, because it became so much easier. But you don’t just ship more code. You ship worse code. And that gain in speed is vanishing after a few months because you start accumulating technical debt at the same time. What first seemed faster becomes not faster, but the debt stays,” Dargó said.

The answer is not to reject the tools or return to slower workflows. It is to be deliberate about what the tools are being used for and what gets left behind when they are used. Code that was generated quickly but carries no trace of why it is structured the way it is will cost someone considerably when the context is gone. The practices Morley and Dargó both advocate, keeping the hot path isolated, documenting the reasoning behind trade-offs, defaulting to the readable option unless a measurement says otherwise, are not conservative instincts. They are the engineering habits that make fast development sustainable over time rather than just in the short sprint.

And so, what this actually adds up to

Morley and Dargó are pointing toward the same conclusion from different vantage points: engineering quality cannot be measured by how organized the code looks in the editor.

Morley’s measure is hardware efficiency. Does the code respect the physical reality of how the processor accesses memory, and does it make the execution model visible to the next reader, or does it hide it behind abstractions that feel clever now but become maintenance burdens later? Dargó’s measure is team sustainability. Does the code reduce the cognitive load of the people who maintain it over time, and does it make trade-offs explicit so future engineers and future agents can understand what they would be changing if they touched it?

Clean code is not a trap because readability is wrong. It is a trap because readability without an understanding of what matters in the specific environment produces systems optimized for the wrong audience. The abstractions that feel clean in the editor are often the ones costing the most in production. And the ones that look strange in a code review are often the ones that matter most to the system’s actual behavior.

Not whether it looks clean. But whether it helps the machine run correctly, and whether it helps the next engineer understand why it runs that way. Those two questions do not always have the same answer, but they are always worth asking together, and always worth asking before the code is written rather than after the pull request is merged.

Agentic AI Is Redefining Edge Infrastructure

Saqib Jan — Wed, 25 Mar 2026 18:13:57 GMT

Artificial intelligence is entering a new phase with agentic AI, where autonomous systems perceive, decide, act, and learn without constant human oversight, operating independently across distributed environments while collaborating with other agents in real time.

This shift from centralized AI models to distributed, autonomous agents requires a fundamental rethinking of WAN infrastructure architecture. Previous AI patterns such as centralized training clusters, cloud-based inference, and hub-and-spoke data flows are inadequate for agentic systems that must operate at the edge with speed, autonomy, and resilience.

And in these environments, the WAN is no longer just a means of connecting branch sites to core data centers. It becomes the essential fabric enabling edge agents to synchronize data, share insights, and coordinate actions, making WAN performance, availability, and adaptability critical to agentic AI effectiveness.

Distributed intelligence is edge-centric

Lee Peterson, VP of Secure WAN Product Management at Cisco, explains where the pressure lands first. Edge environments routinely face unpredictable connectivity, and agents operating in those conditions cannot wait for centralized systems to respond.

Peterson points to concrete scenarios where this plays out, from autonomous vehicle navigation systems to intelligent manufacturing floors to retail environments where AI agents manage inventory, pricing, and customer experience simultaneously. In each of these cases, he reasons, the decisions that matter most are the ones that have to be made in milliseconds, based on local conditions, often where connectivity to centralized systems is intermittent or constrained.

But the connectivity assumption is where many organizations get it wrong. Peterson recommends designing for intermittent or constrained WAN conditions rather than treating reliable connectivity as a given, and ensuring real-time path selection for critical systems such as point-of-sale, inventory sync, and IoT devices so that agents can perform automatic remediation during WAN degradation without waiting on human intervention.

Unlike traditional AI models operating on data in controlled environments, he notes, agentic systems exist in the physical world where latency is measured in milliseconds and decisions have immediate consequences. Sending data hundreds of miles to a cloud data center for processing, Peterson argues, is structurally incompatible with the real-time autonomy these systems require, because the agent must process information, evaluate options, and act locally, right where the action is happening.

And the scale of coordination compounds this further. A smart city deployment might involve thousands of agents managing traffic flow, energy distribution, and public safety simultaneously, and Peterson underscores that these agents need to share insights and coordinate actions even when network connectivity degrades.

Organizations that continue to architect around centralized control will find their agentic deployments constrained at precisely the moments that matter most, because this distributed intelligence model is inherently edge-centric and the infrastructure needs to reflect that from the start.

Compute at the edge: the foundation of agent autonomy

Agentic AI requires compute resources co-located with data sources and decision points, which means deploying high-performance processing across thousands of distributed locations including retail, manufacturing, healthcare, and transportation.

The workload requirements are diverse and demanding, covering agents performing rapid inference on streaming data, conducting local model fine-tuning based on environmental feedback, and coordinating with peer agents across locations in real time. In retail, Peterson notes, this might translate to supporting smart shelves, computer-vision inventory systems, digital signage, loss-prevention analytics, and customer-flow optimization directly at each store location, which is a significant compute footprint by any measure.

But powerful edge compute alone cannot deliver the full potential of agentic AI, and Peterson is direct about why. Without equally sophisticated networking, autonomous agents remain isolated, unable to coordinate with peers, synchronize insights, or maintain collective intelligence across distributed environments. The two investments have to be planned together, not sequenced, because the value of edge compute depends almost entirely on the quality of the network that connects it.

Networking at the edge: the nervous system of distributed intelligence

Just as compute provides the processing foundation for autonomous decisions, networking forms the connective tissue enabling multi-agent coordination. Peterson is specific about what agentic AI requires from it. Low-latency communication between distributed agents, efficient data synchronization, security across untrusted environments, and effective network partitioning are not aspirational requirements but operational ones, and the gap between meeting them and not meeting them is the gap between a functioning agentic system and an isolated one.

Consider a manufacturing environment where dozens of AI agents coordinate production, where vision systems inspect components, robots adjust operations in real time, and predictive maintenance agents analyze telemetry from across the floor. Peterson uses this kind of environment to ground the networking argument, because these agents must communicate with millisecond latency and maintain coordinated operation even if connectivity to central systems is temporarily lost. His architectural recommendation is specific in that high-performance networking should be integrated directly into edge compute infrastructure to enable agent-to-agent communication with low latency and high bandwidth, rather than routing every interaction through distant aggregation points, because that approach where networking and compute are designed together is what makes real-time coordination possible.

On security, Peterson is equally precise and equally unambiguous. These systems require cryptographic identity for every agent, encrypted communication, hardware-based roots of trust, and zero-trust architectures designed into both layers from the ground up, ensuring the integrity of autonomous decisions affecting physical systems and human safety in critical infrastructures such as healthcare and transportation. Not as hardening added after deployment, but as a design constraint from day one.

The convergence of compute and networking at the edge

Peterson frames this moment as an inflection point for enterprise infrastructure strategy, and the practical implication is straightforward even if the work is not. Organizations cannot simply extend cloud architectures to edge locations and expect agentic systems to thrive, because the autonomous, distributed, real-time nature of these systems demands infrastructure where compute and networking are designed together to support local intelligence, agent coordination, and secure operation across thousands of diverse locations.

And there is a visibility dimension that Peterson adds, one that often gets missed in these conversations. As organizations deploy distributed AI agents across vast, heterogeneous environments, continuous visibility into WAN performance, network health, and application performance at each edge location becomes indispensable, because without it, blind spots undermine the autonomy and resilience that agentic AI requires and teams lose the ability to detect issues proactively, optimize operations, and assure reliable service delivery before degradation affects outcomes.

Of the choices organizations face right now, Peterson is clear about which ones carry the most weight. Infrastructure decisions made today will determine whether organizations lead this transformation or spend years retrofitting, and the convergence of compute and networking at the edge, he concludes, is the essential foundation upon which the next generation of autonomous, intelligent systems will be built.

Benchmarks Are Making AI Coding Look Safer Than It Is

Saqib Jan — Wed, 04 Feb 2026 18:02:22 GMT

Most technical leaders are optimizing for speed. AI agents now generate code fast enough to reshape how teams ship software. So teams contending with shorter deadlines and shrinking budgets are integrating them into delivery pipelines to increase velocity.

If you are an engineering leader, you have likely seen the SWE-bench leaderboard. It is the current industry standard for ranking AI coding agents. It scores agents based on whether they can produce a patch that passes a test suite. If it does, the agent gets a gold star.

But there is a deeper and often overlooked problem that creates a blind spot for enterprise teams.

Most teams treat these scores like a proxy for real engineering readiness. Speed is not the same as quality, but true velocity is speed plus quality. And passing tests is not the same as writing safe, maintainable code. This then shows up later as security debt, brittle systems, and review fatigue.

The Pass/Fail Trap

Benchmarks like SWE-bench are designed to test code generation rather than code quality. They ask if the agent can generate a solution that satisfies the immediate requirement.

They do not ask if the code is maintainable or if it introduces a hidden security vulnerability. They also ignore whether the new code breaks the architectural pattern of the rest of the application.

Itamar Friedman is the CEO and co-founder of Qodo, the AI Code review platform, says this creates a false sense of security for technical leaders.

“SWE-bench is a benchmark that is meant mostly to check code generation capabilities. You can get a really good grade with quite shitty code. It will pass because it implements the requirements and passes the test. But maybe the code is not maintainable. Maybe it includes a security issue.”

The Illusion of Speed

In the past, humans wrote code slowly and other humans reviewed it just as slowly. Now that AI agents are writing code at lightning speed, developers are opening two to five times more Pull Requests than they did a year ago.

This creates a phenomenon called quality rot. Even if AI generates code that is as good as a human’s, generating ten times more of it means you also generate ten times more bugs.

Friedman argues that relying on a “generation benchmark” to solve this is dangerous. He compares software development to accounting to show why the roles must be separate.

“You have bookkeeping and you have auditing. Ideally, you have two different people that are experts. One is doing the bookkeeping and the other is doing the auditing to verify the quality. Using the same agent to do both tasks is counterproductive.”

The Hidden Risk of Review Fatigue

When AI agents generate thousands of lines of code in minutes, human reviewers naturally get overwhelmed. They start skimming the code and often trust the AI simply because the test suite passed.

This is exactly where bugs slip in. A generalist model like GPT-5 might fix a logic bug but accidentally hardcode a credential or use a deprecated library.

If you rely on the same model to review the code it just wrote, you are essentially asking the fox to guard the hen house. A generalist model might be creative enough to solve the problem, but it lacks the rigid structure needed to audit safety.

What You Should Do

You need to stop obsessing over which model has the highest SWE-bench score and instead build a system of checks for your AI.

First, do not trust the generalist model to police itself. You should use specialized agents where one agent writes the code and a completely different agent reviews it against a strict policy.

Second, you should measure the number of valid bugs your AI catches in PRs rather than just how many PRs it opens.

Finally, you need to treat your AI pipeline like a government rather than a single employee. Friedman emphasizes that a single agent is never enough to ensure enterprise trust.

“You need a system. A system like a country. There are policies, rules, and a police.”

The future is not about faster coding but about smarter reviewing.