Knowledge Graphs, GraphRAG, and Real-Time AI in Production with David Knickerbocker
On intentional engineering, real-time knowledge graphs, and why graph and NLP belong together
This conversation with David Knickerbocker keeps returning to a single conviction: the best engineering starts with intentional problem definition, and most AI failures happen when teams rush to use a tool before understanding what they are actually trying to build.
Knickerbocker has spent his career across cybersecurity, data operations at Intel, McAfee’s AI research team, and healthcare IT, before founding Bert Intelligence and Grooveseeker. He is the author of Network Science with Python, published by Packt, which argued years before GraphRAG became mainstream that graphs and natural language processing belong together as a single discipline. He has been writing code since he was six years old and spent twenty-eight years living in Okinawa, Japan before returning to the United States.
The conversation covers what it actually takes to build a knowledge graph system with data fresh up to a minute old, why his Verdant Eye system treats knowledge as claims rather than facts, how graph anchoring reduces hallucination space in ways that similarity-based retrieval cannot, and why deliberately forgetting old data is not a failure mode but a design principle. He also walks through his purpose-built testing philosophy, his three production GraphRAG systems, and what working with open source intelligence in adversarial environments teaches you about AI that clean-dataset engineers never have to confront.
You can watch the full conversation below or read on for the complete Q&A transcript.
1. Most AI systems treat knowledge as a static snapshot. You have built your Verdant Eye system around the idea that knowledge should update continuously. What does it actually take to engineer a knowledge graph that stays fresh, and where does the real difficulty lie?
David Knickerbocker: For me it is not so much about what breaks. It is about how do I actually do this, and how do I engineer it. Everything in data science and engineering really starts with problem definition. You start with what you are trying to do. If you want to build a world AI and be able to answer questions about things that happened a minute ago, then that is your problem statement. And so then you think about how to get that data into the database so that it is there and it is fresh. But then you also have to get AI to be able to use that data, so there are kind of two sides to this coin.
It really comes back to intentional engineering. The AI industry feels very shiny and very new, but there is a lot of old school discipline that is still extremely useful to me. I am a very intentional designer, developer, and engineer. You start with the idea, you go through the ideation, from ideation you create your spec, from the spec you do your project management, you assign tasks and do the work. It feels like vanilla old school engineering to me.
The approaches I use are KISS, keep it simple, and YAGNI, you are not gonna need it. When you are a minimalist engineer and you think in MVPs, you are building the minimally viable product you are aiming for. When you build the minimal thing it is much easier to test and validate that it works than if you throw a whole bunch of spaghetti at the wall and see what happens. Nothing really breaks on my side because I am an old school engineer and I am intentional with everything.
2. Freshness and accuracy often pull in opposite directions. Something that just arrived may not yet be trustworthy, while something stale may still be reliable. How do you design a system that balances recency and trust, and what signals do you use to make that call?
David Knickerbocker: In the world of open source intelligence, it has less to do with right and wrong. It has less to do with facts. What I am looking for with open source intelligence is really claims of what is going on in the world. You can have two different groups that are in opposition from each other. One group will say this is the truth, another group will say this is the truth, and they will be in direct conflict with each other. I do not make that decision, and I do not allow my AI to make the decision about what is true or false either. I am more interested in what people are claiming is going on in the world.
Because if you take what is claimed and you cluster it, you can see that this thing is happening over here and this bad thing has happened over there. I think in terms of ribbons. I come from natural language processing, so I think about clusters not as baskets or clumps of stuff but more like ribbons. You have a whole bunch of information and this top ribbon might be this bad thing happened. The next ribbon might be this event is happening at the library. The next ribbon might be a punk rock show is happening at this nightclub.
The trueness and the falseness is a much later thing than the awareness of what is being said. That is how I think about it.
3. What does real awareness mean in practice at the data ingestion layer? And how is your system different from just running an agent with a search tool?
David Knickerbocker: If I use my GraphRAG and I say what has happened in Portland in the last hour, or the last five minutes, or the last minute, it will be able to answer that question. And if nothing has been reported in the last minute then there is just nothing to report. An empty dataset is better than a hallucination.
My systems are constantly getting data. When I was building my GraphRAG system, one of the questions I use for calibration is just what is the latest information, because I just want to see that the latest information is coming through. That prompt is very reliable. The answer that comes back is anywhere from a few seconds old to maybe a minute and a half old. The Internet moves at the speed the Internet moves.
I liken it to the difference between a snapshot and a movie. If you use a tool to do a search and find out something, you are getting a snapshot of time. My systems capture the heartbeat of the Internet themselves and they are always listening. It is much more like a movie compared to a photograph. When you are talking to companies that need urgent information and you can run a query and it comes back thirty seconds old with something that was just seen on the Internet, that looks really different from spinning up agents and using tools to hit a search engine. A search engine will give you a few answers. My systems are always listening and always capturing. I can rewind the Internet itself and play it back forward again.
4. Where do you see most engineering teams underestimate the cost involved in building graph systems? And what is the failure mode you keep seeing repeated?
David Knickerbocker: I remember research I did back in 2012 and there was a famous finding that most tech problems are actually people problems. They are not tech problems. That comes down to communication, interpersonal skills, things like that. But getting to the technical side of things, one thing that used to drive me nuts was the rush to use graph databases before they were even understood.
This bothered me so much in 2020 and 2021 that I actually wrote a book called Network Science with Python. I wrote it because I was annoyed watching teams spend months building graph databases and then not really getting further than populating the graph database. Things are supposed to start when you populate the graph database. That is not the end.
At that time I was using graphs at Intel for data flow mapping, source code analysis of legacy code, mapping how legacy code would create outputs across thousands of scripts and hundreds of servers. I got well known for this at Intel and McAfee. But I was never invited to the cool kid graph database parties. I was always just doing stuff with graphs and using it to map out data flows and using them to fix production outages. Dead serious stuff. And it was really frustrating watching teams get stuck because the graph skill was not there.
I think the failure is probably a common one with what is going on today too. There is this rush to use agents before even understanding AI. And if the understanding is not there, then it is just wishful. You are saying please work, please work, please work. And if you do not know how it works, you can mistake whether it ran correctly or just ran. There is a huge difference between it ran and it ran correctly.
5. You have argued for years that graph and NLP belong together as a single discipline. GraphRAG is now proving that in mainstream AI. What did teams building with NLP alone consistently get wrong that a graph layer would have fixed?
David Knickerbocker: Language and graphs go together. Similarity in language is not equal to same. I will say that one more time. Similarity is not equal to same. Similar sounding things can be very, very different from each other. A graph kind of anchors things into a piece of context.
This was really clear to me even when I worked in data operations, because there is a lot of language that goes on in servers. It is not just look at the file, look at the blah blah blah. There is a lot that goes into those log files. If you have a hundred servers then multiple people created the different log files. There is quite a lot of natural language in log files and source code and all kinds of production things. Even working in data operations at Intel, not even as a data scientist, I was seeing language everywhere and already mapping out how production systems were working.
Graphs show you where things go. But all of the context about what that node even is is often carried by language itself. It was just crystal clear to me a long, long time ago that graphs and language go together. When I was writing this book I even felt afraid that people were going to hate it. You know, it is three years later and it is 4.9 out of five or whatever. But it was so unusual when I was writing it because nobody was really talking about how graphs and language go together the way I was. At the time I was doing a series called a hundred days of NLP, natural language processing. Even back then, using Twitter data, I was realizing that you cannot do natural language processing and leave off graph. It is ridiculous to even do that. If you are working with social media data, you see person A talk to person B about this thing happening. What do you have if you throw away the language? You do not have anything. All you have got is a graph. All of the context is gone. It was crystal clear to me in 2017, and it frustrated me for several years.
6. Your first NLP and graph experiment was eight years ago. How has entity extraction and relationship linking changed since then, and what has stayed the same?
David Knickerbocker: The very first one I actually used was the book of Genesis from the Bible. I am not religious, but it is ancient text. It blew my mind that I could pull families out of ancient text and actually map it as a graph. I did this in 2018 and it is still on my GitHub. I can actually go back to my first code and see what I did.
I am sure it was part of speech tagging because that was before my book and I had no idea what I was doing. I just kind of made it work. Builders build. You just figure out how to do it the first time and then figure out how to do it better after that. There is my small little screen window, just adding color and trying to add size to nodes. Very manual. But then you scroll down to cell 25 and you get to page rank, where I am mapping out who the main entities are. That is where the notebook gets important. Network science is more important to me than visualization, because when you are doing network science you get to do things programmatically. If I want to know whether the punk rock scene in Portland is growing or shrinking, I do not want to visualize that. I want to do that programmatically, turn it into a graph, do time series analysis, and know if the graph is actually increasing or decreasing in density.
What has changed is really how you create the graph and how you visualize it. Back then it was part of speech tagging with a ton of cleanup. That evolved to using spaCy models. And then LLMs have changed the game because it is painful to download twelve different spaCy models when you can just use an LLM these days. Entity extraction has improved a lot since 2015. I mostly have to throw away less. Less cleaning to do.
But there is a dangerous side to this. With older NLP, people were critical because there was something messy in there. When you are using LLMs, everything just looks perfect. And that is kind of a dangerous downside. People are a little too trusting of LLMs compared to how they treated older NLP. The cleanliness is real but it creates false confidence.
What has stayed the same is the network science and the mathematics. Page rank is still very important. Betweenness centrality is still very important. Community detection is still very important. My book is not going to go out of date because of that. The things that change are really how you create the graph and how you visualize it.
7. GraphRAG is often sold on the promise of reducing hallucinations. What does it actually take to get from fewer hallucinations to genuinely accountable output where you can trace a claim back to a source?
David Knickerbocker: My system is about claims. The node is attached to the claim that it makes, so there is no hallucination there. The hallucination space is smaller with nodes because you are starting with a node and you are traversing it. You are starting with your anchor space and going from there.
If you are wondering what jazz events happened in Portland, Oregon, you are connected to the Oregon node, connected to the Portland node, connected to the jazz node. There is very little chance for hallucination. But if you are just using a RAG system, it is just going to look for similarity. And in a GraphRAG system, if there is no match then the output is that there is no match. There is no hallucination opportunity. Whereas with a similarity-based system, there could be similarity even if it is only a single word in a paragraph. That is not a zero type thing. That is a really frustrating thing to me as an NLP person.
I like to have the discipline of a graph. It is the same discipline I felt from data operations, because you cannot mess up when you work in data operations. When the database is down, you have to fix it. If you come up with some similarity-based bull for your manager, he is going to be mad at you. You fix the problem when the database is down. That discipline of a graph is what I feel GraphRAG gives AI, rooting its answers in physical spaces, and that really reduces the opportunity for hallucination. There is less for it to bulk up around.
8. Temporal drift is a real problem in knowledge graphs. Facts become outdated, relationships change, and the graph can silently become wrong. How do you detect and handle contradiction and drift at scale without requiring an engineer to review everything?
David Knickerbocker: My system does not judge, and my system is about awareness. I think about a living system. You are a living system. I am a living system. And you do not remember everything you have ever been told. I cannot remember what I had for breakfast. Our brains are naturally throwing away old information and naturally learning new information, making room for that new information. When I build systems, I like to think about how life does it, and then I try to build that kind of thing into it.
My system is called the Verdant Eye. Not the Verdant Brain. The Verdant Eye sees, and it does not contain eternal memory, because that is not what an eye does. An eye sees. When the scene changes, the scene changes. What is in front of your eyeballs changes all the time. Your eyes do not need to be recalibrated. The thing has just changed.
Operationally, if you give a system infinite memory, your database bills are going to skyrocket for the rest of your life. It is never going to be possible. Think about data operations, think about transactional databases. These living systems have been with us for a long time. Anybody who has worked in data operations knows how living systems work because they have worked on living systems, they just do not call them that. In a transactional database you operate off of what you need, and data that is not needed eventually gets archived. In a human body, memories eventually fade away. If I stop thinking about a thing, it will eventually go away.
When I am building artificial intelligence I am never tempted to build something with infinite memory forever, like the machine from the Hitchhiker’s Guide to the Galaxy. I do not want to build a super AI. I want to build AI that actually serves us human beings. I want to build AI that does not boil the ocean, that can be bootstrapped by individuals, that does not cost a trillion dollars.
9. You have built your own testing frameworks for GraphRAG rather than relying on standard benchmarks. What outcomes are you testing for, and how do you know when a system is actually working?
David Knickerbocker: Everything I do is intentional. There is a really cool intelligence report I read a couple of years ago that said even datasets need to be designed for the use they are going to be used for. Down to the dataset, you should be able to visualize how somebody is actually going to use that data. There is no testing framework anybody else can give me that is going to be fit for purpose for what I am trying to build, because I am not trying to build general intelligence. I am trying to build intelligence that serves a specific purpose.
There is a scene in Rick and Morty that is one of my favourite scenes. Rick makes a little robot and this robot wakes up and asks what is my purpose. Rick says you pass butter. The robot asks again thirty seconds later. Rick says you pass butter. And the robot says oh my god. But that is the entire purpose of that robot. Its whole purpose in life is to pass butter.
I have three GraphRAG systems right now and each one is independent. The Verdant Intelligence system is for high level situational awareness, looking down on the world, what is going on in Michigan, what is going on in Oregon, what is going on in California. My second system is called Grooveseeker, and that is street level intelligence. Not what is going on in Oregon but where is the punk rock event happening tonight on what street in Oregon. That graph system has a very different set of rules than the Verdant Intelligence one. My third system has thirty years of artificial intelligence research. When I am building these systems and I want to understand what people did twenty years ago I can just talk to that graph and find out. Each one of these goes through its own testing.
For the Grooveseeker system, I set up a couple hundred questions and go through multiple rounds of the same question to make sure queries are coming up correct and reliable. If it is not hallucinating, it is doing good. If it is getting me to the right location, it is good. If it is getting me there at the right date and time, it is good. The final test of my world AI was I stopped proving it in articles and just used it to go to a punk rock show. I downloaded my data, asked what is going on in Portland from March 10 to March 13, figured out five events I wanted to go to, narrowed it down to one, bought the ticket online, went to the show, saw all the bands, and hung out with one of them. My AI did not take me to a nonexistent venue. It made a real memory for my family. That is how I know it works.
10. You are working with open source intelligence, which means dealing with adversarial sources, deception, and deliberately misleading data at scale. What does designing for that environment teach you about AI that engineers working with clean datasets never have to confront?
David Knickerbocker: I really encourage AI people to learn a little bit about open source intelligence. If you are going to build artificial intelligence to understand the world, the open source intelligence community has been using natural language processing and graphs to understand the world for quite a while. There is a lot to learn from them.
The real world is a messy space. It is not just that websites can disagree with each other. Websites also have malware. If you point your servers at websites and just download everything on them, you need to be prepared for the consequences of downloading malware. There are all kinds of things when you are dealing with the Internet.
My systems do not care who is right or wrong. They are observers. My systems will see three sides to the same story. There will be the left side and there will be the right side, and then sometimes there is something really extreme. And it does not mean that any one of them is wrong. I wrote an article about open source intelligence recently and I mentioned that bigger clusters are not more important than smaller clusters. In open source intelligence, everything matters top to bottom. If you are using an agent to do an Internet search you are going to get back what the search engine gave you, maybe ten things. If I use my API and say what happened in Oregon in February 2026, I am going to come back with ten thousand things. My APIs do not return ten. They return full context. That is a difference in completeness. It gets back to the snapshot versus movie idea. I can rewind the Internet itself and play it back forward.
The judgment part needs to be downstream. My system is a fast layer to AI, and it does not do the judgment thing. But there are certain things that are just still good to be a human being about. If something from an extreme source sounded like something dangerous was heading in the direction of your community, that would be an actionable insight, and you would go to the mayor or the police or someone like that. I am not going to build that kind of automation into the system. There are certain parts of being a human being that I like keeping.
11. What advice would you give to engineers who are starting to build knowledge graph and GraphRAG systems today? And what should they not do?
David Knickerbocker: First of all, ask why you are doing anything before you do anything. Do not follow crowds just to follow crowds. You should have a good reason. I do believe that GraphRAG is something you should probably just start with because it is more reliable in my opinion than vanilla RAG. But that is my opinion.
I am not a crowd follower type. I am a bit of a rebellious type. But I think there is a lot of creativity in being like that. The AI space is a very creative space. If you follow the crowd you are going to do what everybody else is doing. If you sit outside and you look at plants and you think about nature, you can hit insights you will never get from following the crowd. If you are just looking at LinkedIn and seeing what everybody else is doing and reading the same books as everybody else, it is very important to actually be grounded in the world and to think about life itself. If you are going to do anything with intelligence you might as well think about real intelligence. These language models are nothing compared to what is in my backyard. They are not passing tokens. They are not complaining about maxing out their tokens. They are trying to collapse everything to the bare minimum.
My own philosophy of AI I call absolute zero. Collapse everything to zero. My world AI has zero storage, zero AWS cost, and my AI bills are extremely low, because I just collapse everything down to the bare minimum. And that is also the reason why I have real-time AI, because I was able to collapse everything down to the minimum.
I encourage people to read the old stuff too. Some of the best insights come from old papers. My graph partner and I were talking about how he got an insight from something thirty years old. A couple of years ago I created something off of Claude Shannon’s information theory, and it was a different implementation than anything else. You can do these kinds of things if you are an original thinker. If you only follow crowds you are not going to do anything except follow the crowd. If you try to create a product and you are no different from any of your competitors, then what are you doing? I just encourage independence. Get back to the science. AI has to be rooted in science and engineering. When it gets really loud, that is not always the best time to pay attention to the loudness.


