Sovereign AI and Agentic Infrastructure with Rick Spencer
On running AI disconnected from the internet, the three-tier framework SUSE uses to match tools to work, why output metrics are vanity metrics, and MCP as a control layer for enterprise infrastructure
Most engineering organizations adopting AI do it without compliance regimes scrutinizing every decision. SUSE works under exactly that scrutiny, and the way it solved for AI adoption under strict data sovereignty requirements is instructive for any team that cares about where its data goes and what its AI actually costs.
Rick Spencer is the General Manager for Technology and Product at SUSE, where he leads the engineering teams behind the company’s full product portfolio, from SUSE Linux Enterprise and Multi-Linux Manager to the cloud native stack of Rancher, RKE2, K3S, and SUSE AI.
SUSE has one of the longest and deepest open source infrastructure histories in the industry, and its enterprise customers are operating under strict compliance regimes. Rick joined Deep Engineering Live to talk about how SUSE adopted AI agents without breaking its promises on data sovereignty, the framework his teams use to decide which AI tools fit which work, why he rejects output-based developer metrics, and the role MCP now plays in managing enterprise infrastructure.
Watch the full conversation below or read the full interview.
This session was recorded offline as part of the Deep Engineering Interview Series. The transcript below has been lightly edited for clarity and readability.
Q. Tell us a little about yourself and what you do at SUSE, and the kinds of engineering challenges your teams are working through right now.
Rick Spencer: I’m the General Manager for Technology and Product at SUSE. That means I lead the engineering teams for all the products that we offer to customers. That includes Linux, like SUSE Linux Enterprise, SUSE Linux Enterprise Server for SAP, and Multi-Linux Manager. We also have a suite of cloud native products like RKE2, K3S, and Rancher of course. We have a lot of things built on top of both of those, like the application collection, which are certified Kubernetes applications that you can run. We have other products composed of those building blocks like SUSE AI that you can use to run your own sovereign AI stack. There is SUSE Edge, and SUSE Edge’s cousins like SUSE Telco and SUSE Industrial Solutions.
Our customers tend to be enterprises with pretty serious enterprise requirements. They work under compliance regimes, they typically face a lot of scrutiny, they need important things like L3 support, reliable lifecycle models, a lot of predictability, high quality, and low CVE counts. So we take the open source software in the world and we create packages of it that are usable by enterprise companies.
Q. SUSE has a longer and deeper open source infrastructure history than most companies in the space. When AI agents started becoming the real workflow tool for engineers, how did that land internally, and what did adoption actually look like on the ground?
Rick Spencer: There is a lot to unpack here, so let me try to go at least somewhat systematically. All the software that we write is open source, so we are not worried about leaking the code. We publish the code. That was not the concern. But there are things like, let’s say you are debugging a customer environment. You do not want to let your engineers just take those logs and send them to random AI bots. We promise that we won’t do that. They trust us to not do those kinds of things. So there was a phase that we went through trying to figure out how to use AI in an effective way that maintained our promises to the customers.
Part of that solution was really realizing that engineers are going to use AI to go faster wherever they can. It is not like, oh, don’t use AI. That would just not be workable. So it was really about setting up our engineering management team to coach those engineers effectively. Besides keeping promises of data sovereignty, costs can also really run out of control. We see a lot of people run into that. For us, we never really had the problem of engineers deleting things in production. Our engineers tend to be very cautious. But it is easy to rack up pretty big bills on Anthropic and Copilot and so on.
A big part of the solution was that we have our own sovereign AI. We make this thing called SUSE AI, which is a stack you can use to manage AI workloads on your own infrastructure. We use that pretty heavily internally and we run Llama on it. If you are doing things, and we have a few different places that we do that, it is all within our private infrastructure, so we make sure there is no chance that any data can escape. We only use models which can be vetted effectively.
Then there was more to it on the coaching and oversight side. What we ended up doing is getting pretty precise about how engineers use AI. We broke it down into three categories. The first is using it for your daily work, which is your statement completion and your debugging and that kind of thing. The second is using it for agentics, which is relieving your toil, letting agents take care of some things that used to create a lot of work or interruptions. The last part I call curve jumping. That is when you are going from zero to infinity, doing things with AI that you would not have tried before, like solving really deep problems in one big step.
We created a framework around those three different kinds of uses, and then we help engineering managers help their engineers pattern match. Okay, if I just need statement completion and debugging help, these are the tools I can use for that. If there is a sovereignty aspect to it, these are the tools for that work. These are good tools for this kind of agent. And then these are the frontier models that we provide to you for those curve jumping capabilities. It sounds very organized now, but there was a lot of experimentation and a lot of rapid innovation from the engineers. Some of the early adopter engineers led, and then we went back and tried to create some order out of that so we could spread what we learned around.
Q. Digital sovereignty is central to how SUSE thinks about its stack. How does that principle shape where AI can and cannot go inside your engineering workflows?
Rick Spencer: I don’t think it’s can or cannot. It’s more so how. Let me give you an example. For digital sovereignty, a lot of the things we build, we actually build in something called the internal build service, which is an instance of something called Open Build Service, which is a service we provide to everybody. Kubernetes is built on it. There are thousands and thousands of things built on it. The interesting thing in terms of digital sovereignty is that all the builds are offline. They literally are not connected to the internet. This is super important because you need to be able to prove that nothing happened during the build process. That is a lot easier to do if there was no internet connection.
So we have to go around the hard way. We have to make sure all the sources are there. You cannot pull anything live in the heat of the moment during build time. You cannot run post-install scripts. Now if you want to apply AI in that environment, let’s say you want to backport a patch to previous stable releases, can you do that in a sovereign way? Yes, you can, as long as we are running AI in a way that is able to run disconnected from the internet and we have complete visibility into everything it is doing. These things are not easy, but we have decades of experience with it that we can apply. In some cases we even train our own models to accomplish these things, and that way we know the model does not have some naughty time bombs built into it.
Q. When your engineers started integrating AI agents, where did it deliver productivity gains, and where did it create new problems?
Rick Spencer: Let me give you some examples. My favorite example isn’t really about code. We are an open source company, and we have all of this code within our dominion of control, in GitHub and here and there and everywhere. I don’t know if you remember the Trivy attacks last month, and just a spate of tool chain attacks. We had a response process for that, but it was predicated on those things occurring occasionally, not twice a week. So our security team wrote an agent that scans certain sites every hour. It says, okay, there is a reported compromised package, typically in NPM, sometimes in PyPI, different places. It finds those, and then it scans all of our open source code to see if we are using it anywhere. If we are, it writes a report and notifies us on Slack. So we know right where to pay attention right away.
Fortunately, so far we have not really been impacted because we have really good hygiene around our tool chains. But bad guys are really smart and work really hard, so we still want to stay super vigilant. That has just been such a huge relief, because now if you see a report of a tool chain attack, our agent was on it before we even knew about it. It saves so much toil, because we don’t have to send people out to check if we are using this package or do a search in this area of GitHub.
There are other areas like CVE mitigation. A CVE comes in, and an agent examines it. Is it even applicable? A lot of times the CVE comes in and the package is in the repo, but it is not being exposed in any way that it would matter. There is this thing called VEX, which is basically a file you provide along with the CVE database to explain whether the vulnerability is impacting you or not. That is really hard to do at the scale that CVEs are coming in, but the agents can do that for us pretty easily. That means we are focusing our attention not on keeping up with the crush of CVE reports, but on the actual vulnerabilities. Our attention is reserved to actually keep our customers safe.
Q. How do you think about measuring the impact of AI on your engineering teams?
Rick Spencer: We might have a different view on that than some people. We are really tending away from measurements that measure output and utilization, and we are trying to focus on impact. What that means is we don’t have leaderboards that show every developer and how many lines of code their agents submitted. I consider that garbage vanity metrics. Not helpful.
One of the main things we want to do is measure the impact of our use of AI without it being an extra burden on the development team. A lot of the tools out there assume you are a proprietary software company where everyone is working on a single code base, which is just not how an open source enterprise works. We are working on hundreds, if not thousands, of repositories all the time, and the work to maintain them is very different. So those utilization numbers and those developer-to-developer comparisons just don’t have value. I’d rather have engineers working than reporting.
So we are setting up metrics for different things, like how fast CVEs are being addressed, how fast patches are being backported, how fast our L3 responses are getting closed while maintaining the same NPS score. These are things where we have applied AI to different areas, so let’s focus on the business impact, not on the utilization.
That said, we are working on a set of dashboards right now so an engineering manager can look at the cost and utilization on their team to help with coaching. Let’s say an engineering manager has an eight person team. Hey, I noticed we are burning a lot of tokens. What are we actually doing that is burning that many tokens? I’m not sure we are getting value out of that. Or, hey, we have these seats for this LLM or code assistant we bought, but we are not using them. Are there areas where we could be? So we are definitely measuring the value from a business perspective, but we are really trying to decentralize and allow engineering managers to guide their teams on getting the most value out of AI, without it becoming a leaderboard game where developers feel exposed in some game that is not about providing value to customers.
Q. Whenever we talk about AI agents, we cannot avoid MCP. Why does it matter to so many companies today, and what does it unlock for engineering teams in your kind of environment?
Rick Spencer: MCP is critical, actually. If anyone is listening and does not know what I mean, a Model Context Protocol server is a little bit of code that runs and offers to an LLM, hey, here is some context you can use, and here are some tools, some actual things you can do. That turns a chatbot into an agent, because an agent can actually do things.
MCP does a few things that are really important. The first is just ease of use. A good MCP server provides structure to the LLM so that it is way easier to write a prompt to get the results you want. The LLM sees the MCP server is for this purpose, and these are the kinds of information the humans want out of it. You don’t have to include all that in the prompt. And it is actually really easy to write an MCP server with an LLM. If you have a decent model, it does not even need to be top tier. You say, hey, we have this bit of software we want to control with agents, this is our use case, and it will write an MCP server for you pretty easily. Then you can have a human go in and edit it.
But there is another part to it. We ship MCP servers with all of our products, and we think this is really important. In our view, the world is moving to a new paradigm. Before, as an administrator, you would think about all the applications you use to monitor and control your servers, your Kubernetes clusters, your workloads. Now we are moving into a mode where you don’t think that way. You think about writing agents, or chatting with the infrastructure to get the information you need, and then it is able to take action on your behalf without you having to worry about the specific syntax.
For that to work, those MCP servers have to have really good human knowledge encoded into them. If you think about our MCP servers for SLES, for Rancher, for Multi-Linux Manager, the key is that the experts in using that tool have crafted those MCP servers. It would be like, instead of you sitting down in front of a chatbot saying I need to figure out how to use Rancher, you are sitting down with the whole Rancher development team telling you how to prompt the chatbot. That encoding of knowledge makes the agentics way more powerful, because it is not guessing. Otherwise it has to look at the raw APIs and make a bunch of guesses, and there is no way as a human you will know if that is the right thing to do.
All that said, there is another really important thing MCP servers do, which is provide a place where you can, as an enterprise, bring some sanity and control to the usage. If you have MCP servers running, they are just servers. That means you can provide access ACLs to them. You can say the MCP server for this user is allowed to use these tools and not these tools. You can log the use of the MCP servers. We have our own gateway, but we also partnered with a company called StackLok that we talked a lot about at the last SUSECON. There are different gateways you can put into place as an enterprise to keep the MCP servers under control. You don’t give the LLMs access directly to tools, only the MCP servers, and then you can have that oversight and meet your compliance needs.
Even at a low level, you can put the MCP server, I call it, in jail. You can say, on the server, here is the user for this MCP server, here is a systemd process that only presents the actual compute resources it needs. Because you have to be thinking, for every MCP server you are running, there is an LLM out there trying to use it, and who knows what kind of prompt injections people are running. MCP servers also guard against things like the AI hallucinating something and deleting your production server, because you simply don’t provide that tool to it. This to me is one of the main roles SUSE has to play as part of this disruption, because we are bringing this agentic notion of how to manage all of your infrastructure.
Q. Cost is a real constraint when you are running AI at any scale. What does a practical cost mitigation approach look like for an engineering organization working the way SUSE does?
Rick Spencer: I can speak from our own experience. The fact of the matter is, if you are using a self-hosted AI, sure, you spend a lot on the big iron, and you are probably paying a company like SUSE for support. But nonetheless, there is a maximum cost there. Then the real question is, do you have the observability in place to make sure it is being utilized fully? That is a very different conversation. Are we getting full utilization out of our fixed costs, where we never have to worry about overrunning?
There is digital sovereignty, and sometimes they call this cost sovereignty, because no one can come back later and say, oh, by the way, we are changing our model. We have some suppliers where a lot of our developers were using seat-based pricing, and then over time they let us know, in plenty of time, that they are moving to usage-based pricing. That is a big change. We did not have sovereignty over the way they price it, whereas if you are hosting your own, you have that sovereignty over the pricing. So it is something to think about.
Another thing we use a lot is circuit breakers. Hey, we just noticed our Claude usage in the last minute was way too high, or Gemini, whatever you are using. That keeps runaway agents in check. It can be very frustrating for developers if they are trying to get work done and every single minute they are getting rate limited, but we are talking about cost controls, so you need to do the thing.
The other thing to say is that we are big believers in frontier models. We are not saying don’t use frontier models, but it is important to use them for the right things. You do not need a frontier model to understand your Python module and give you code completion. You just don’t need it for that. The frontier models are really for when you are in that curve jumping, super strategic mode. We have projects where we spend tens of thousands of dollars on frontier models, but they generated, who knows, a million or two million dollars in value, so the cost benefit was definitely there.
One thing we do with frontier models is, let’s say we need an agent for something. We use the frontier model to create an agent that can then be run on a much lower cost model. It will say, sure, I’ll write the Python scripts it needs to use, so it doesn’t have to try to do that inference every time. I’ll write the context file that works for that model. So you can start with the frontier model and then tell it to do things with your less expensive models, or even your own models that are in your own infrastructure.
When you see that needle move, you see people start adopting it, and you’ll see step functions in utilization of tokens. A certain engineer, the penny drops for them that they are in a new paradigm where, as a developer, they suddenly realize they are empowered to be ten times, a hundred times more effective using these tools. You can see day to day these little jumps. Oh, somebody figured it out. Someone figured it out. So then you need to go back, because you don’t want to stop them from getting that 100X improvement. You need to give them the right tools for the job.


