The Needle, the Haystack, and the Cognitive Tax of AI

I saw a demo a few weeks ago where a self-service AI-powered helpdesk chatbot was given a laptop connectivity issue to diagnose. The vendor had put it in verbose mode to show off all the work the agent was doing in the back-end. It pulled in six months of related tickets from the ITSM system and referenced four internal KB articles on wireless troubleshooting, as well as the full chat transcript from the user's last three interactions, and event logs going back 90 days. It was thorough. It was impressive. And the answer it produced was confidently, almost elegantly, wrong.

The AI didn't fail because it was stupid. It failed because it was overwhelmed. There's a term gaining traction in the AI research community for this failure mode: context rot. And if you're building or buying AI-powered ITSM, you need to understand it because it's about to become the most common reason your AI investments underperform.

The Garage You Can't Work In

Context rot is what happens when you give a capable reasoning system more input than it can productively use. The intuition most people have is that more data should produce better answers. That intuition is wrong.

A useful analogy to me is organizing your garage. You buy bins, label everything, mount pegboard, arrange tools by category and frequency of use. It looks incredible. But six months later you realize you spend more time maintaining the organization system than actually working on projects. The workbench disappears under "helpful" structure. You've optimized for completeness at the expense of function.

LLMs hit the same wall. Research from Chroma, published in collaboration with evaluations across 18 leading models, found that model performance doesn't just plateau as input length grows — it actively degrades. The degradation isn't a retrieval problem, either. In one experiment, researchers replaced all non-essential tokens with blank spaces, making the relevant information trivially easy to find. Performance still dropped. The issue isn't that the model can't locate the needle. The issue is that the haystack itself imposes a cognitive tax.

For IT service management, this has enormous implications.

How Context Rot Shows Up in the Help Desk

The typical architecture for an AI-assisted service desk works like this: a user reports an issue (or an autonomic system detects one), and the AI is given a bundle of contextually relevant information to help it diagnose and resolve the problem. That bundle usually includes some combination of ticket history, knowledge base articles, device configuration data, chat transcripts, runbook excerpts, and event logs.

The instinct is sound. You want the AI to have everything it might need. But "everything it might need" quickly becomes "everything remotely related," and that's where things go sideways. I've seen this play out in three predictable ways. First, the AI latches onto the most recent information as if recency equals causality. A user had a VPN issue two weeks ago, and now they have a printing problem, but the AI treats the VPN ticket as context for the printer diagnosis because it's the freshest signal. Second, the AI produces answers that sound authoritative but are oddly generic — the kind of response that could apply to almost any device in almost any environment. That's a hallmark of a model that's averaging across too many inputs instead of reasoning about the specific situation. Third, the AI parrots a KB article nearly verbatim, treating the documented resolution as gospel even when the environment has changed since the article was written.

That third failure mode is worth dwelling on, because it points to the second (and arguably deeper) problem with how some enterprises and vendors are deploying AI for ITSM.

Knowledge Bases Aren’t Wisdom Bases

Knowledge bases have always had a credibility problem in IT. Everyone knows they go stale. The rate of change in an enterprise environment outpaces the rate at which documentation gets updated, and so the KB becomes a repository of increasingly approximate guidance. That's the traditional definition of "rot”: staleness.

But there's a more fundamental issue with feeding KB content to LLMs, and it has nothing to do with whether the articles are up to date. The issue is that knowledge bases encode what happened before and what we did about it then. They're historical artifacts. And in end-user computing, history is a remarkably poor predictor of the present.

Think about what changes on an enterprise endpoint in any given week without IT explicitly deploying anything. Browser extensions update silently. SaaS applications push backend changes that alter client-side behavior. Plug-ins auto-update. OS micro-patches apply in the background. Security agents refresh their rule sets. Drivers get swapped during routine maintenance windows. The user installs a new collaboration tool their team decided to try.

None of these changes are captured in a knowledge base. None of them generate tickets — until something breaks. And when something does break, the diagnostic picture is completely different from the last time a "similar" issue was reported, because the underlying environment has shifted in ways that the historical record doesn't reflect.

This is the core limitation of deterministic, history-based approaches to IT support. Runbooks assume the environment is static between documented states. KB articles assume the diagnostic path that worked last quarter still applies. Automated scripts assume the preconditions they were written for still hold. In a world where the enterprise desktop is a living, constantly mutating system, those assumptions are increasingly unsafe.

The problem gets worse when you feed this historical content to an LLM. You're combining a reasoning engine that degrades with input volume (context rot) with input that is structurally misaligned with the actual state of the endpoint (the KB fallacy). The AI confidently applies yesterday's playbook to today's environment, and the result feels like intelligence but functions like guesswork.

What Site Reliability Engineering Already Knows

The interesting thing is that another discipline figured this out years ago. Site reliability engineering was born from a simple observation: you can't keep complex systems reliable by writing better runbooks. The rate of change is too high, the failure modes are too novel, and the interactions between components are too dynamic for any static documentation to stay useful for long.

SRE's answer was to shift from prescriptive remediation — "when X happens, do Y" — to observability-driven response. You instrument the system, define service level objectives that describe what "healthy" looks like, monitor against those SLOs in real time, and when something degrades, you diagnose from current observed state — not from a playbook that describes what the system looked like six months ago.

Our customers tell us the enterprise endpoint is overdue for the same shift. The modern managed desktop is every bit as complex and dynamic as a cloud-native microservice — arguably more so, because it sits at the intersection of corporate IT, user behavior, and a constant stream of third-party changes that no one is coordinating. Applying SRE principles to endpoint management means accepting that the environment changes faster than documentation can track, that "similar" incidents often have completely different root causes, and that the only reliable foundation for diagnosis is what the system is actually doing right now.

This is where edge-first digital employee experience telemetry changes the equation. Continuous, real-time data collected from the endpoint itself gives you something no knowledge base or ticket archive can: a current, factual picture of the device state at the moment the issue occurs. What processes are running, what changed recently, what resources are constrained, what's different about this machine compared to a healthy peer, and how many systems in my digital estate are similarly affected?

That's the SLO equivalent for the endpoint — a live, quantified definition of health that you can reason against in real time, rather than a static description of what "normal" used to look like.

Where This Leaves Us

We’re at an inflection point. The first generation of AI for ITSM implementations followed a predictable pattern: take a powerful LLM, give it access to everything, and expect it to figure things out. That approach delivered impressive demos and inconsistent production results.

The next generation needs to be built on two principles that the research now clearly supports. First, context is a finite resource that must be curated, not maximized. Giving an AI more information doesn't make it smarter and — past a threshold — it can make it worse. The systems that perform best will be the ones designed around context discipline, not context abundance.

Second, the inputs that matter most for endpoint diagnosis are the ones that describe the present, not the past. Real-time telemetry from the edge — capturing what the device is actually doing, what changed, and what's different — is structurally better input for an LLM than any knowledge base, no matter how well-maintained. The rate of non-IT-deployed change on the modern enterprise desktop is simply too fast for history-based deterministic approaches to keep up.

SRE learned this lesson with servers and microservices a decade ago. The endpoint is the last major surface in enterprise IT still being managed like it's the turn of the millennium — with static documentation, prescriptive scripts, and the assumption that what worked last time will work again. Context rot is real. It's already degrading the performance of AI systems in production. But it's a solvable problem — by building systems that observe what's actually happening and reason from there.

Source: Lakeside Software

Return to Home