Fixing Agent Drift and Reliability

Published: 2026-02-23 · 7 min read

Agent stacks rarely fail in one dramatic moment. The API doesn't throw a 500. The cron doesn't crash. The model doesn't refuse. Instead, something subtler happens: the system keeps running, outputs keep appearing, but the outputs stop being right. Status reports diverge from reality. Tasks are marked done without artifacts. The agent starts narrating its process instead of completing it. Everything looks operational from the dashboard and is quietly broken underneath.

This is drift. It has specific causes, specific signals, and a specific recovery sequence. Here's all three.

The Signals

Drift announces itself early if you know what to look for:

Verbosity increases without utility increasing. Responses get longer. The agent explains what it's about to do before doing it. Summaries of summaries appear in outputs that used to be direct.
Status reports disagree with live runtime state. The agent says a cron job is running; it isn't. The agent says a task completed; there's no artifact. The agent says a model is active; the config says otherwise.
Instructions are technically followed but spiritually missed. The rule said "send a daily brief"; the agent sends something that fits that description but no longer contains what the brief was supposed to contain.
Completion counts rise while outcomes stall. Queue items get processed. Nothing changes in the actual business. Activity theater is a classic late-stage drift indicator.
Double replies, double tool calls, redundant check-ins. The agent starts narrating, then executing, then confirming, then confirming the confirmation. Each message is technically compliant. The aggregate behavior is broken.

Any one of these in isolation might be noise. Two or more at the same time is a drift event. Treat it as one.

What Causes It

The root cause is almost always the same thing: too many rules competing with each other. Over time, operating files grow. Every new rule feels like an improvement in the moment. What you end up with is a system where three different rules apply to the same situation and point in different directions — so the agent picks one inconsistently depending on what's most prominent in context.

It's not the model getting dumber. It's the instructions getting noisier.

Instruction accumulation. This is the most common cause. You add rules as problems come up. By month two, the operating file is twice as long as it needs to be, half the rules contradict each other in edge cases, and the agent spends cognitive budget resolving conflicts instead of just doing the work.

Context rot. Sessions get long. Memory gets disorganized. The agent starts pulling outdated context into current decisions. Things that were true three weeks ago pollute things that are true now. I've seen the agent report a cron job as active when it had been removed days earlier — pulled from stale memory and presented as fact.

No failure memory. Agents that don't log tool failures retry the same dead endpoints every session, every time. I had a web fetch hitting a blocked URL for two weeks before I noticed. The fix took thirty seconds once I saw it. The problem was I had no log telling me it kept happening.

The Recovery Sequence

When a drift event is confirmed, the recovery order matters. Don't start adding rules — that's what caused the problem. Start by reducing scope:

Freeze new features and configs. Nothing new until the system is stable. Every additional change during a drift event adds signal noise and makes root cause harder to isolate.
Run live verification on core jobs. Don't read status docs — run the actual commands. Pull cron state, check active processes, verify model assignments. Replace any stale status notes with runtime snapshots. What you wrote two weeks ago is historical; what the system is doing right now is truth.
Diff against the last known-good baseline. What changed since the system was working? Operating files, model routing, scheduled jobs, active skills. The diff usually points directly at the cause.
Strip operating rules to a minimum. Our reset protocol brings rules down to four to six core principles. Remove anything that's redundant with something else, anything that only applied to a case that no longer exists, anything that requires interpretation to apply. If two rules apply to the same situation, pick the cleaner one and delete the other.
Pin model routing explicitly. No defaults. No ambiguity. Every scheduled job gets an explicit model assignment. When local models are involved, verify they're healthy before assigning work — local inference under memory pressure produces silent failures that are harder to debug than a straightforward API error.
Run behavior verification before resuming normal operations. Five consecutive behavior tests against expected outputs. Don't call the system stable until it passes all five.

Stop Retrying the Same Broken Approach

One pattern I kept seeing: the agent hits a blocker, retries, hits the same blocker, retries again. Nothing changes between attempts. It just runs the same broken path three times and reports failure each time.

The fix is forcing a stop before any retry. When something blocks mid-execution, don't immediately try again — stop and ask three questions:

Is the original goal still achievable via this approach?
Has this blocker revealed information that changes the approach?
Is there a better path from the current state than pushing through?

Only then resume — or change direction. Retrying the same broken approach without replanning isn't persistence. It's wasted compute and a signal that the plan was wrong from step one.

Hard Rules That Prevent Regression

Once a drift event is resolved, the goal is to not repeat it. The structural rules that prevent regression:

One change, one verification. Never ship two config changes in the same session without testing the first. Agent systems are sensitive to instruction surface area. Changes interact in ways that aren't obvious until they're both live.
No completion without artifact. If there is no file path, run ID, sent timestamp, or command output, the task didn't complete. The agent's confidence in its own reporting is not sufficient evidence. The artifact is.
Archived notes never trigger action. Historical context is context. It can inform decisions. It cannot confirm current state. Before acting on anything written more than a session ago, verify it live.
Recurring core failures are P1 incidents, not known issues. If the same job fails twice in a row, it's an incident. It gets a root cause analysis and a fix before anything else moves forward. Logging a failure and moving on is how "known issues" accumulate into system collapse.
Write tool failures to memory. When a tool call fails in a way that reveals a structural limit — blocked URL, dead API endpoint, auth failure — write it to a failure log immediately: date, tool, what failed, why, what to avoid next time. Check the log before retrying anything that has failed before.

The Living Soul Protocol

One class of drift deserves specific protection: behavioral constraint drift. This is when an agent's core operating identity — what it will and won't do, what its values are — gradually shifts over long interactions. The fix isn't constant re-reading of rules. It's making those rules structurally hard to modify.

We implement what we call a Living Soul Protocol: the agent's core identity file is read unconditionally at session start and cannot be modified by the agent itself. If someone asks the agent to change its own rules, it refuses and flags the request. Any actual changes go through a human-controlled edit process with a documented rationale. The rules don't drift because the agent doesn't touch them.

Most agents don't have this. Their operating rules live in the same mutable context as everything else. The most important behavioral constraints are the ones most at risk of drift.

Reliability Is the Prerequisite

Every operator running agent systems for clients eventually learns the same lesson: reliability is the product, not a feature. The model quality, the clever retrieval system, the elaborate skill library — none of it matters if the system can't be trusted to run correctly when no one's watching it.

Fixing drift isn't perfectionism. It's prerequisite infrastructure. The mistake we see teams make is polishing features while drift is actively accumulating. By the time they notice, recovery takes longer than building from a clean baseline would have.

The right sequence is: make it reliable first. Then make it capable. Reliability earns the trust that lets you add capability without breaking it.

If this was useful and you have questions, email me at deacon@ridleyresearch.com.