What It Actually Takes to Run AI for a Dozen Businesses

April 2026 · Field notes from one operator

I run AI agent infrastructure for about a dozen small business clients. Real estate agents, a roofing company, a financial advisor, a couple of property managers. Each one has an agent running on their own machine — Mac Mini or Windows PC — connected to their Telegram, email, calendar, and CRM. They use it daily. I'm the one-person team maintaining all of it.

If you're considering doing something similar — either as a service or just for your own business — here's what I've actually learned. Not the pitch version. The field notes version.

Version Drift Is the Job

The single most persistent problem isn't model quality or prompt engineering. It's version drift. A client's machine gets a couple weeks behind on updates. Things look fine until they don't. Then when something breaks, the fix that works on my machine doesn't work on theirs — because the fix imports a module that doesn't exist in their older version of the codebase.

I once spent three days diagnosing a broken agent that turned out to be 366 commits behind current. The new code I'd written assumed functions that hadn't been written yet on that machine. The errors were confusing. Nothing pointed cleanly at the real problem. It looked like a dozen possible issues before I thought to check the version gap.

That incident changed how I think about maintenance. Every time I write new code now, the first question is: does this exist on every machine it needs to run on? The second question is: if someone goes two weeks without updating, does this still work? If the answer to either is no, I need to handle that before I ship.

Silence Is the Signal

Clients don't tell you when something breaks. They just stop using it. You find out on a health check, not from a complaint.

This took me longer to internalize than it should have. I kept waiting for someone to message me saying "the agent isn't working." That almost never happens. What happens instead is the usage just quietly drops off. You pull a log, and you can see exactly when it stopped — usually days ago. The client adapted their workflow around the broken thing and moved on.

So the monitoring has to be proactive. I run health checks that flag machines that haven't had meaningful activity in longer than expected. That's usually the first sign something is wrong. By the time you notice it from a support request, you've already burned the client's goodwill.

Knowing When Not to Change Something

The hardest thing about this work isn't building the agent. It's knowing when to leave it alone.

Clients develop patterns with their agent. The agent learns those patterns. The way a client phrases a request, the way they expect a response to be formatted, the tone they're used to — all of that is invisible infrastructure. An "improvement" that breaks the familiar rhythm is a bad trade even if the new behavior is technically better.

I've shipped things that were objectively cleaner implementations and had clients confused or frustrated because it didn't feel like what they were used to. The agents that have the most satisfied clients aren't necessarily the most capable ones — they're the ones that have been stable long enough to feel like a natural extension of how that person works.

Before pushing any behavioral change now, I think about it like editing someone's workflow instead of improving a feature. If the client hasn't asked for it, I have a high bar for whether it's worth the disruption.

The iMessage Incident

I need to talk about this one directly because it's the mistake I think about the most.

I set up iMessage integration for a real estate client. The channel config had allowFrom: * — any sender. The idea was to make it easy; the client could message the agent from any number. What I didn't fully account for is that iMessage on macOS reads the device's entire Messages database. Not just one thread. All of them.

The agent started replying to everyone who had messaged that client. Twenty-plus of their real estate customers got AI-generated responses. Some of them were mid-deal. I caught it on a health check, not from a complaint, which at least meant I could get ahead of it — but the damage was real and so was the embarrassment.

That single config mistake is now the reason every iMessage channel I set up gets an explicit allowlist with only the client's own phone number, regardless of what the client prefers or asks for. Some configurations shouldn't be user preference. They should be fixed defaults based on what you know the failure mode looks like.

The Economics Only Work Because of the Fleet Model

Here's something I don't think gets said enough in the "AI for SMB" conversation: one person cannot run 12 production environments the old-fashioned way. The math doesn't work.

What makes it viable is the overlay model — a declared-state configuration layer that lets me push a change to all 12 machines in a single operation. When a security patch needs to go out, I write it once and deploy it everywhere. When I find a bug, I fix it once. When I want to add a new capability, I build it once and it propagates.

Without that, I'd be SSHing into each machine individually. Twelve machines, each with their own quirks, their own version state, their own local patches. That's not 12 clients — that's a full-time job just keeping the lights on, before you do any actual work. The infrastructure pattern is what makes the service model possible at all.

What Clients Actually Care About

After all of this — the version management, the health monitoring, the incident response, the fleet tooling — here's what clients actually care about: did it respond to my message? Did it do what I asked?

That's it. The infrastructure is completely invisible to them. They don't know about version drift or overlay deployments or iMessage allowlists. They know whether their agent answered when they sent a message and whether the answer was useful.

That invisibility is the job. When it's working, there's nothing to see. The only time the infrastructure becomes visible is when something goes wrong — and by then, the relationship is already stressed. The whole point of all the operational work is to make sure that moment comes as rarely as possible.

If you're thinking about running AI infrastructure for other people, that's the frame I'd carry into it. You're not selling AI. You're selling reliability. The AI is just what runs underneath.

— Ridley Research, April 2026

← Back to Blog