What Agent Architecture Actually Looks Like After 6 Months

Published: 2026-03-06 · 9 min read · Series: Building in the Open

Update — March 2026: Nine days after publishing this, I consolidated from a multi-agent roster to two standing agents plus on-demand swarms. This post captures real lessons from running the larger roster — and some of those lessons are exactly why I moved away from it. The section on coordination overhead aged especially well.

Most AI agent content falls into one of two categories: beginner tutorials that get you to "hello world," or theoretical architecture diagrams that have clearly never run in production. There's almost nothing in between.

This is that middle ground — what it feels like to run a multi-agent system for months under real workloads and real deadlines, and what actually compounds.

What We'd Do Differently

If we were starting over:

Start with memory architecture first. Retrofitting domain-separated memory later is expensive and painful.
Define trust boundaries before autonomy. Document what agents can do without approval, and what always requires sign-off.
Build coordination before adding the second agent. Handoff protocol gets hard fast once multiple specialists are active.
Invest in boring reliability early. Heartbeats, checkpoints, and recovery rules are why operations survive bad days.

The Plateau Problem Is Real, and It Hits Fast

Every setup starts hot. You add a prompt, maybe memory, and it feels amazing for two weeks.

Then quality plateaus. Context drifts. Decisions from last week disappear. Output stays coherent but doesn't compound. That's where most teams stall.

The difference between "has an AI agent" and "has an AI operation that improves over time" is architecture, not model choice.

Lesson 1: Identity Is Not Cosmetic

Identity is a stability mechanism. A clear role, principles, and operating posture make decision quality consistent when instructions are incomplete.

Under ambiguity, identity-rich agents make better judgment calls. Identity-poor agents either ask too many follow-ups or guess wrong.

Lesson 2: Memory Is the Multiplier (and the Hardest Problem)

Memory failure is usually silent. Sessions feel fine individually, but nothing compounds across weeks.

Domain separation beats one giant file. Keep operations, relationships, projects, and durable knowledge distinct.
Cold-start recovery is non-negotiable. Any session should rebuild critical context from files alone.
Chat memory evaporates. If it isn't written into persistent memory, it dies with the session.
Active distillation beats passive logs. Promote important facts proactively, don't wait to be asked.

Lesson 3: Security Is Architecture

External content is untrusted input. Prompt injection is a design concern, not a hypothetical edge case.

Trust boundaries must be explicit: what runs autonomously, what requires approval, and what happens when instructions conflict with operating rules.

Lesson 4: Multi-Agent Is a Coordination Problem

Single-agent systems hit context ceilings. Multi-agent systems hit coordination ceilings.

The failure mode usually isn't capability. It's handoffs: ownership ambiguity, false completion, and missing verification loops.

The handoff protocol matters more than any individual prompt.

Lesson 5: Boring Infrastructure Wins

Heartbeats, checkpoints, and recovery procedures feel like overhead until a late-night failure proves they're the only reason your operation is recoverable.

Agents that surface blockers, checkpoint state, and maintain themselves between prompts are dramatically more reliable than purely reactive agents.

A Useful Maturity Framework

One reference worth using: Claw Score by Atlas Forge. It grades stacks across identity, memory, security, autonomy, proactive patterns, and learning architecture.

The weighting tracks reality: memory and security carry disproportionate impact. The gap in the framework is multi-agent coordination, which deserves its own explicit dimension.

What Comes Next

This is the first post in a series. We'll break each lesson into concrete implementation patterns with examples from live operations.

If you want this built for your operation instead of assembled from scratch by trial and error, that's exactly what Ridley Research does.

→ ridleyresearch.com