Fixing Agent Drift and Reliability

Published: 2026-02-23 · 7 min read

Agent stacks rarely fail in one dramatic moment. The API doesn't throw a 500. The cron doesn't crash. The model doesn't refuse. Instead, something subtler happens: the system keeps running, outputs keep appearing, but the outputs stop being right. Status reports diverge from reality. Tasks are marked done without artifacts. The agent starts narrating its process instead of completing it. Everything looks operational from the dashboard and is quietly broken underneath.

This is drift. It has specific causes, specific signals, and a specific recovery sequence. Here's all three.

The Signals

Drift announces itself early if you know what to look for:

Verbosity increases without utility increasing. Responses get longer. The agent explains what it's about to do before doing it. Summaries of summaries appear in outputs that used to be direct.
Status reports disagree with live runtime state. The agent says a cron job is running; it isn't. The agent says a task completed; there's no artifact. The agent says a model is active; the config says otherwise.
Instructions are technically followed but spiritually missed. The rule said "send a daily brief"; the agent sends something that fits that description but no longer contains what the brief was supposed to contain.
Completion counts rise while outcomes stall. Queue items get processed. Nothing changes in the actual business. Activity theater is a classic late-stage drift indicator.
Double replies, double tool calls, redundant check-ins. The agent starts narrating, then executing, then confirming, then confirming the confirmation. Each message is technically compliant. The aggregate behavior is broken.

Any one of these in isolation might be noise. Two or more at the same time is a drift event. Treat it as one.

What Causes It

arXiv 2601.11653 identifies the root cause with precision: constraint drift. Over long interactions, agents don't forget their rules — they lose context priority on them. Newer information crowds out earlier instructions. Rules that were followed at session start are violated by session end because the model's effective attention window for those rules has shrunk.

The compounding factors:

Instruction accumulation. Operating files grow over time. Each new rule feels like an improvement. Cumulatively they create conflict surface. When multiple rules apply to the same situation and produce different answers, the model adjudicates inconsistently. What looks like smart flexibility is actually unreliable behavior under pressure.

Context rot. Simply filling a context window without active management degrades performance non-linearly. The research is explicit: this approach "collapsed under real workloads." Performance degraded, retrieval was expensive, costs compounded. The longer a session runs without compaction or structured memory, the worse the rot gets.

Noisy recall. Vector similarity retrieval surfaces contextually adjacent but factually irrelevant content. Every retrieved memory is treated as equally valid. Contaminated reasoning produces confident errors — the worst kind, because they're hard to catch from the outside.

No tool failure memory. Agents that don't log tool failures retry the same dead endpoints every session. arXiv 2601.14192 puts a name on this pattern and notes that efficiency-oriented agents should have RL-style learning about which tools fail in which contexts. In practice: a simple failure log covers most of the gap.

The Recovery Sequence

When a drift event is confirmed, the recovery order matters. Don't start adding rules — that's what caused the problem. Start by reducing scope:

Freeze new features and configs. Nothing new until the system is stable. Every additional change during a drift event adds signal noise and makes root cause harder to isolate.
Run live verification on core jobs. Don't read status docs — run the actual commands. Pull cron state, check active processes, verify model assignments. Replace any stale status notes with runtime snapshots. What you wrote two weeks ago is historical; what the system is doing right now is truth.
Diff against the last known-good baseline. What changed since the system was working? Operating files, model routing, scheduled jobs, active skills. The diff usually points directly at the cause.
Strip operating rules to a minimum. Our reset protocol brings rules down to four to six core principles. Remove anything that's redundant with something else, anything that only applied to a case that no longer exists, anything that requires interpretation to apply. If two rules apply to the same situation, pick the cleaner one and delete the other.
Pin model routing explicitly. No defaults. No ambiguity. Every scheduled job gets an explicit model assignment. When local models are involved, verify they're healthy before assigning work — local inference under memory pressure produces silent failures that are harder to debug than a straightforward API error.
Run behavior verification before resuming normal operations. Five consecutive behavior tests against expected outputs. Don't call the system stable until it passes all five.

The Replanning Trigger

One of the most common failure patterns during recovery — and in normal operations — is what arXiv 2601.22311 calls the planning failure: agents pick the first plausible approach and execute it step by step without accounting for downstream consequences. When the approach hits a wall, they retry. And retry again. Each retry is a fresh attempt at the same broken path.

The fix is a replanning trigger: when a task hits an unexpected blocker mid-execution, the correct response is to stop completely and re-evaluate. Not retry. Not escalate. Stop and ask three questions:

Is the original goal still achievable via this approach?
Has this blocker revealed information that changes the approach?
Is there a better path from the current state than pushing through?

Only then resume — or change direction. Retrying the same broken approach without replanning isn't persistence. It's wasted compute and a signal that the plan was wrong from step one.

Hard Rules That Prevent Regression

Once a drift event is resolved, the goal is to not repeat it. The structural rules that prevent regression:

One change, one verification. Never ship two config changes in the same session without testing the first. Agent systems are sensitive to instruction surface area. Changes interact in ways that aren't obvious until they're both live.
No completion without artifact. If there is no file path, run ID, sent timestamp, or command output, the task didn't complete. The agent's confidence in its own reporting is not sufficient evidence. The artifact is.
Archived notes never trigger action. Historical context is context. It can inform decisions. It cannot confirm current state. Before acting on anything written more than a session ago, verify it live.
Recurring core failures are P1 incidents, not known issues. If the same job fails twice in a row, it's an incident. It gets a root cause analysis and a fix before anything else moves forward. Logging a failure and moving on is how "known issues" accumulate into system collapse.
Write tool failures to memory. When a tool call fails in a way that reveals a structural limit — blocked URL, dead API endpoint, auth failure — write it to a failure log immediately: date, tool, what failed, why, what to avoid next time. Check the log before retrying anything that has failed before.

The Living Soul Protocol

One class of drift deserves specific protection: behavioral constraint drift. This is when an agent's core operating identity — what it will and won't do, what its values are — gradually shifts over long interactions. The fix isn't constant re-reading of rules. It's making those rules structurally hard to modify.

We implement what we call a Living Soul Protocol: the agent's core identity file is read unconditionally at session start and cannot be modified by the agent itself. If someone asks the agent to change its own rules, it refuses and flags the request. Any actual changes go through a human-controlled edit process with a documented rationale. The rules don't drift because the agent doesn't touch them.

Most agents don't have this. Their operating rules live in the same mutable context as everything else. The most important behavioral constraints are the ones most at risk of drift.

Reliability Is the Prerequisite

Every operator running agent systems for clients eventually learns the same lesson: reliability is the product, not a feature. The model quality, the clever retrieval system, the elaborate skill library — none of it matters if the system can't be trusted to run correctly when no one's watching it.

Fixing drift isn't perfectionism. It's prerequisite infrastructure. The mistake we see teams make is polishing features while drift is actively accumulating. By the time they notice, recovery takes longer than building from a clean baseline would have.

The right sequence is: make it reliable first. Then make it capable. Reliability earns the trust that lets you add capability without breaking it.

— Ridley Research & Consulting, February 2026