The Agentic Memory Problem

AI tools have evolved dramatically over the last few years:

2022: LLMs as tools for generation; simple prompt in, input out interactions.
2023: Chatbots become mainstream. Tool use arrives.
2024: Agentic workflows are now a thing. Tool use is common, longer contexts, and multi-step execution of tasks.
2025: Longer time horizon agents become common. OpenClaw launches.
2026: Memory becomes a necessity for agent orchestration

The capabilities of agents are far beyond what they were capable of a scant few years ago. But their users still have to deal with seemingly basic issues:

The agent forgets something you explained to it yesterday
It doesn't learn from things your team-mate has explained to it
The agent gets stuck down a rabbit hole; it gets itself out again, but the next day it takes the same wrong path, burning credits along the way.
Constantly needing to write more into a central ever-growing AGENTS.md file which consumes tokens on every request; that's not ideal for more esoteric pieces of information.

This is because agents are without a memory system that serves their needs (or often without one at all). Systems designed for explicit user / bot interaction several years ago don't work for long-running agents, especially ones that may not even have direct user interaction.

Better prompts aren't enough

Prompt engineering remains an effective way of directing the agent on its task. But this isn't the same as learned knowledge and experience. A single static prompt can't describe everything they need to know about every interaction that might come up, in the same way that an instruction to an engineer about a problem to solve can't contain their career's worth of learning, facts and proficiency.

To solve this, agents need access to memory. Just as with humans working on a task, they need to be able to remember as they go - not just initially as they begin work on a problem, but as the problem space is explored and more information is learned, they may then be able to make a connection that they couldn't initially. It's a similar phenomenon to when we have "shower thoughts"; often the connection can't be made until we have sufficient context available¹.

What should this memory layer look like?

In order to solve these problems, agents need durable long-term memory. That memory needs to be shareable between different instances - local files on disk may help your coding agent but they won't help your team-mate's, and many deployed agents aren't in a position to be writing files to disk at all. It should also be able to span different backends; changing the model your agent uses shouldn't also lobotomise its memory.

Current thinking on the matter says there are many different kinds of memories, which are useful in different ways. We'll discuss this more in a future article, but for now we'll say that simply remembering and recalling a bunch of RAG-style facts isn't enough to truly guide an agent.

As well as being an open area of research, this is also a challenging systems engineering problem. The memory system needs to be able to monitor agent transcripts, process them asynchronously (accounting for parallel activity from the same agent) and surface concise, useful learnings for future runs.

At Volary, we've built this memory layer. We're exploring the future of what this should look like for agents, and how that will work for teams. Join the discussion in our Slack community.

This is the first of a series of articles exploring Volary and the future of AI memory. The second article takes a deeper dive into different kinds of memory and how we treat them.

With humans, organising memories takes time, and much of it is done as we sleep, which is one reason ideas might strike us in the shower. Agents don't sleep or shower, but there is value in an asynchronous process to organise memories - we'll come back to this in a later article. ↩

Better prompts aren't enough

What should this memory layer look like?

Footnotes