Deep Engineering #28: Sam Keen on Making AI…

Nov 27, 2025

How AI agents remember, forget, and relearn: from enterprise memory trilemmas to hybrid architectures and real-world tools.

Read →

2 Comments

Rainbow Roxy

Dec 1

This article comes at the perfect time, as I've been deaply thinking about the practical challenges of agentic AI deployments beyond the hype. Your hypothesis about control and memory being the real bottleneck, rather than raw model capability, is truly insightful; I wonder what specific architectural aproaches or memory mechanisms are proving most effective in bridging that significant gap between benchmark scores and real-world enterprise task completion.

Reply (1)

Share

Sam Keen

Dec 1Edited

Great question, we do see a lot of bench-maxing and also, if you look at many benchmarks, 3-5 models my be within 2-10% of each other, so away from the benchmark, into real world (highly variable), workflows, who knows which of those models will perform better. I think it really comes down to really good evals, something I need to dig into more.

Reply

Share