Your harness, your memory
Agent harnesses have become the foundational infrastructure layer for production LLM systems, but most teams don't realize they're making a critical architectural decision when they choose one. The harness you pick determines not just how your agent executes tasks today, but who owns the memory and state that makes your agent valuable over time.
The harness landscape has evolved rapidly. Early 2023 was dominated by simple RAG chains. By mid-2023, we had stateful orchestration frameworks. Now we're seeing full agent harnesses like Claude Code, which leaked at 512k lines of code. That's not scaffolding that will get absorbed into models—it's the orchestration layer that manages tool calling, context windows, compaction strategies, and state persistence. Even when OpenAI and Anthropic add web search to their APIs, they're not baking it into the model. They're running lightweight harnesses behind those endpoints that orchestrate tool calls.
The critical insight is that memory isn't a pluggable component you swap in. It's fundamentally tied to harness architecture. Short-term memory like conversation history and tool results lives in the harness's context management. Long-term memory like user preferences and cross-session state must be read, updated, and compacted by the harness. The harness decides what survives compaction, how skill metadata gets exposed, whether agents can modify their own instructions, and how filesystem state is represented. There are no standard abstractions here yet because memory for production agents is still nascent. Most teams ship an MVP agent first, then add personalization later.
This creates a lock-in problem that's more severe than model lock-in. Switching between OpenAI and Anthropic APIs is straightforward—you adjust prompts and maybe tweak temperature settings. But switching harnesses means losing access to accumulated memory. If you're using OpenAI's Responses API or Anthropic's server-side compaction, your conversation state lives on their servers. You can't resume those threads with a different model provider. If you're using a closed harness like Claude Agent SDK, the memory format and interaction patterns are opaque. You don't know how artifacts are structured or how the harness uses them, making migration impossible.
The worst case is fully managed agent platforms like Anthropic's Claude Managed Agents, where the entire harness and memory system sits behind an API. You have zero visibility into how memory is stored, no control over what's exposed, and no portability. Even ostensibly open harnesses like Codex generate encrypted compaction summaries that only work within the OpenAI ecosystem. Model providers are incentivized to move more functionality behind proprietary APIs precisely because memory creates stickiness that raw model access doesn't provide.
This matters because memory is what transforms a generic agent into a differentiated product. Without memory, your agent is replicable by anyone with the same tools and prompts. With memory, you're building a proprietary dataset of user interactions and preferences that compounds over time. That dataset is your moat. Losing it means starting from scratch on personalization, tone, and learned behaviors.
The practical recommendation for teams building production agents is straightforward: prioritize open harnesses where you control the memory layer. Use stateless APIs or self-hosted solutions where you own the persistence layer. Treat memory architecture as a first-class concern in your design, not something you'll figure out later. The switching costs only increase as your users interact more with your system.