Guide April 18, 2026 · 18 mins · The D23 Team

Agent Memory Patterns: Persistent Context Across Long-Running Workflows

Learn agent memory patterns for persistent context in long-running workflows. Explore storage, retrieval, and architectural best practices for production AI systems.

Understanding Agent Memory: Why Context Persistence Matters

When you deploy an AI agent to solve complex, multi-step problems—whether it’s querying your data warehouse, generating dashboards, or orchestrating analytics workflows—the agent needs to remember what it has already learned, what decisions it has made, and what context it discovered along the way. Without persistent memory, each step starts from scratch. The agent forgets the user’s intent, loses track of intermediate results, and cannot build on previous discoveries.

This is where agent memory patterns become critical. Unlike traditional stateless APIs that process a single request and discard context, long-running agents operate across multiple interactions, sometimes spanning hours or days. They need to maintain a coherent understanding of the problem domain, user preferences, data schemas, and the results of prior computations.

Think of it like a data analyst working on a complex project. On day one, they explore the schema, run test queries, and document findings. On day two, they build on that knowledge to create dashboards. Without notes, they’d have to re-explore everything. Agent memory is that notebook—a persistent store that lets the agent pick up where it left off, avoid redundant work, and make better decisions because it understands the full context.

Research from the AI agent community, including work on AI Agents Need Memory Control Over More Context, demonstrates that bounded, controllable memory systems are essential for agents operating beyond the limits of a single context window. Modern approaches like OpenSearch as an Agentic Memory Solution show how to architect memory systems that scale with agent complexity and interaction volume.

The Core Challenge: Context Windows and Long-Lived Workflows

Large language models (LLMs) that power modern AI agents operate within fixed context windows—typically 4K, 8K, 32K, 100K, or 200K tokens depending on the model. A token is roughly a word or fraction thereof. Even with the largest models, a long-running workflow can accumulate far more context than fits in memory at once.

Consider a practical scenario: An AI agent embedded in your analytics platform needs to:

Understand your entire data schema (thousands of columns across dozens of tables)
Remember previous queries the user has run
Track which tables are slow, which are unreliable, and which need special handling
Maintain the user’s preferences (metric definitions, business logic, forbidden queries)
Keep notes on intermediate results and derived insights
Learn from past mistakes (bad joins, incorrect aggregations)

If you try to stuff all of this into the context window alongside the current user request, you’ll quickly hit the limit. The agent will either lose critical information, respond slowly due to token overhead, or fail entirely.

Persistent memory solves this by externalizing context. Instead of keeping everything in the LLM’s context, you store structured information in a database or vector store, and the agent retrieves only what it needs for the current task. This pattern is discussed extensively in Building Effective AI Agents: Architecture Patterns and Implementation Frameworks, which outlines how to manage state beyond context windows in dynamic workflows.

Types of Agent Memory: Categorizing Storage Patterns

Agent memory is not monolithic. Different types of information require different storage and retrieval strategies. Understanding these categories helps you design systems that are both performant and maintainable.

Working Memory: The Agent’s Scratch Pad

Working memory is ephemeral context used within a single conversation or task. It includes:

The current user request and its variations
Intermediate results from tool calls
Reasoning steps and decision trees
Temporary variables and state
The conversation history for the current session

Working memory typically lives in the LLM’s context window or in a short-lived session store (Redis, in-memory cache, or a database with a TTL). It’s fast to access, small enough to fit in context, and discarded when the task completes or the session ends.

Example: An agent is generating a dashboard. The user asks, “Show me revenue by region for Q4.” The agent retrieves the revenue table schema (stored elsewhere), notes that it needs to filter by quarter, and keeps the current request in working memory while it constructs the SQL. Once the dashboard is generated, that working memory is no longer needed.

Episodic Memory: Session and Conversation History

Episodic memory captures the sequence of events within a session—what the user asked, what the agent did, what the results were. This is essential for:

Answering follow-up questions (“Can you also show me last quarter?”)
Debugging agent behavior (“Why did the agent choose that table?”)
Learning from the session to improve future interactions
Compliance and audit trails

Episodic memory is typically stored in a database with a session or conversation ID. It persists longer than working memory (days or weeks) and is searchable. Many teams use PostgreSQL, MongoDB, or specialized conversation stores for this.

Example: A user asks the agent to create a KPI dashboard. The agent runs three queries, encounters an error on the second, corrects it, and succeeds. All of this—the queries, the error, the correction—is logged in episodic memory. If the user returns a week later and asks about that dashboard, the agent can retrieve the history and understand what was built and why.

Semantic Memory: Knowledge About the Domain

Semantic memory is stable, domain-specific knowledge that transcends individual sessions. It includes:

Data schema definitions and relationships
Business logic and metric definitions
Data quality notes and warnings
Historical insights and patterns
User roles and permissions
Standard query templates and best practices

Semantic memory is long-lived, shared across users, and rarely changes. It’s often stored in a vector database (for semantic search) or a relational database with indexing for fast retrieval. This is the kind of knowledge that makes an agent smarter over time.

Example: Your agent learns that the customers table has a signup_date field that is sometimes NULL for legacy accounts, and that queries should filter these out. This knowledge is stored in semantic memory. Every future agent instance benefits from this learning without having to rediscover it.

Procedural Memory: How to Do Things

Procedural memory captures patterns, strategies, and methods that work. It includes:

Successful query patterns for common requests
Tool-use sequences that work well
Error recovery strategies
Optimization techniques

Procedural memory is often implicit in the agent’s behavior, but making it explicit—storing it as templates, examples, or decision rules—improves consistency and performance. Building Agentic AI Workflows with MCP, AgentCore, and Bedrock discusses how to structure tool definitions and memory systems to enable effective procedural learning.

Storage Backends: Choosing the Right Infrastructure

Once you’ve categorized your memory, you need to store it somewhere. Different backends have different tradeoffs.

Relational Databases (PostgreSQL, MySQL)

Relational databases are reliable, queryable, and familiar. They’re excellent for:

Episodic memory (conversation logs, session history)
Structured semantic memory (schema definitions, business rules)
Audit trails and compliance

Tradeoffs: Relational queries are precise but can be slow for unstructured text search. You’ll need to add full-text search indexes or use an external search engine for semantic queries.

Vector Databases (Pinecone, Weaviate, Milvus)

Vector databases excel at semantic search—finding similar concepts even if the exact words don’t match. They’re ideal for:

Semantic memory (finding relevant past queries, insights, or patterns)
Similarity-based retrieval (“Find insights similar to this user’s question”)
Deduplication and clustering of knowledge

Tradeoffs: Vector databases require embedding your text (converting it to numerical vectors), which adds latency and cost. They’re less precise for structured queries and harder to update without re-embedding.

Graph Databases (Neo4j)

Graph databases are powerful for representing relationships—between tables, metrics, users, and concepts. They’re useful for:

Data lineage and dependency tracking
Understanding how metrics relate to underlying data
Permission and role hierarchies
Recommendation systems (“Users who asked this also asked that”)

Tradeoffs: Graph databases require careful schema design and can be overkill for simple use cases. Query performance depends heavily on graph traversal patterns.

Search Engines (OpenSearch, Elasticsearch)

Search engines combine the best of relational and vector approaches. As noted in OpenSearch as an Agentic Memory Solution, modern search platforms like OpenSearch 3.3 provide native support for agentic memory systems, enabling both full-text and semantic search alongside structured metadata.

Tradeoffs: Search engines require operational overhead (indexing, cluster management) but provide excellent flexibility for hybrid queries.

In-Memory Caches (Redis, Memcached)

For working memory and frequently accessed data, in-memory caches are unbeatable. They’re fast, simple, and ideal for:

Session state and working memory
Frequently accessed semantic memory (hot data)
Rate limiting and quotas

Tradeoffs: In-memory caches are volatile (data is lost on restart) and limited by RAM. They require careful eviction policies to avoid memory bloat.

Architectural Patterns: How to Structure Agent Memory Systems

With storage backends in mind, let’s look at common architectural patterns for agent memory. These patterns determine how your agent stores, retrieves, and updates memory.

The Retrieval-Augmented Generation (RAG) Pattern

RAG is the most common pattern for adding persistent memory to agents. The flow is:

User makes a request
Agent retrieves relevant context from memory (semantic search, keyword lookup, etc.)
Agent augments the user request with retrieved context
Agent runs the LLM with the augmented prompt
LLM generates a response informed by persistent memory
Agent stores new information back to memory

This pattern is straightforward and works well for read-heavy workloads (analytics queries, dashboard generation). The challenge is deciding what to retrieve and ensuring you retrieve enough context without overwhelming the LLM.

Example in D23: When a user asks your embedded analytics to “Show me revenue by region,” the agent retrieves the relevant schema definitions, past queries about revenue, and business logic around revenue calculations. It augments the request with this context, generates a better SQL query, and stores the new query in episodic memory for future reference.

The Memory Update Pattern

RAG assumes memory is read-only within a task. But agents often need to update memory as they learn. The memory update pattern adds explicit write operations:

Agent performs an action (runs a query, encounters an error, discovers a pattern)
Agent decides whether to update memory (explicit decision logic)
Agent writes to memory (structured update, not just appending)
Future agent instances benefit from the update

This requires careful design to avoid memory corruption and inconsistency. You need:

Versioning: Track which version of semantic memory is current
Conflict resolution: Handle cases where two agents update the same memory simultaneously
Validation: Ensure updates are accurate before storing
Rollback: Ability to revert bad updates

The Hierarchical Memory Pattern

Not all memory is equally important. Hierarchical memory organizes information by relevance and specificity:

Level 1 (Hot): Most relevant, frequently accessed memory in fast storage (cache)
Level 2 (Warm): Moderately relevant, in primary storage (database or vector store)
Level 3 (Cold): Rarely accessed, in archival storage (data warehouse, S3)

The agent retrieves from Level 1 first, falls back to Level 2 if needed, and rarely accesses Level 3. This balances performance with cost.

Example: Current session history is in cache (hot). Schema definitions and recent insights are in a vector database (warm). Historical conversation logs from six months ago are in a data warehouse (cold).

The Transcript Replay Pattern

Instead of storing only the final state, transcript replay stores the full sequence of events and reconstructs state on demand. This is inspired by event sourcing and is discussed in AI Agents Need Memory Control Over More Context.

Agent stores every action and result (transcript)
When needed, replay the transcript to reconstruct state
Summarize or compress the transcript to avoid infinite growth

This pattern is robust (you never lose information) but can be slow (replaying a long transcript takes time). It’s often combined with periodic snapshots: store the full transcript, but also store compressed summaries at intervals.

The Bounded Context Pattern

Long-running workflows can accumulate unbounded memory. The bounded context pattern explicitly limits memory size:

Define a maximum memory size (tokens, bytes, or entries)
When memory exceeds the limit, apply a compression or eviction policy
Common policies: oldest-first (FIFO), least-recently-used (LRU), or importance-weighted

This prevents memory bloat but requires careful policy design to avoid losing critical information. Research on AI Agents Need Memory Control Over More Context explores bounded memory systems for exactly this reason.

Implementation Considerations: Making Memory Systems Production-Ready

Moving from theory to practice requires attention to reliability, security, and performance.

Consistency and Coherence

When multiple agent instances or users interact with the same memory, consistency becomes critical. You need to decide:

Strong consistency: All agents see the same memory state (slower, more reliable)
Eventual consistency: Agents may temporarily see stale memory (faster, risk of divergence)
Read-your-writes: Agents see their own updates immediately (middle ground)

For analytics workflows, eventual consistency is often acceptable (a dashboard built from slightly stale schema knowledge is still useful). But for critical business logic, strong consistency may be necessary.

Security and Access Control

Memory can contain sensitive information: query results, user preferences, data schema details. You need:

Encryption at rest: Memory stored on disk should be encrypted
Encryption in transit: Memory retrieved over the network should be encrypted
Access control: Only authorized agents and users can read/write specific memory
Audit logging: Track who accessed what memory and when

The OWASP Top 10 for Agentic Applications 2026 highlights memory and context poisoning as a critical risk in agentic systems. An attacker who can corrupt memory can cause agents to make bad decisions. Implement validation, versioning, and audit trails to mitigate this.

Latency and Performance

Memory retrieval adds latency to every agent action. Strategies to minimize impact:

Caching: Cache frequently accessed memory in fast storage
Batch retrieval: Fetch multiple memory items in one query
Async retrieval: Fetch memory in parallel with other operations
Compression: Store compressed memory to reduce transfer time
Indexing: Build indexes on memory for fast lookup

For embedded analytics in D23, latency matters. Users expect dashboards to load quickly. Memory retrieval should be sub-second, which means careful indexing and caching.

Cost Management

Persistent memory systems have costs: storage, retrieval queries, vector embeddings, and infrastructure. To manage costs:

Tiering: Store hot data in fast (expensive) systems, cold data in slow (cheap) systems
Compression: Summarize old memory to reduce storage
Selective storage: Not everything needs to be stored (working memory, for instance, can be discarded)
Deduplication: Avoid storing the same information multiple times

Vector embeddings are particularly expensive (they require API calls or local compute). Consider whether every piece of memory needs to be embedded, or if keyword search is sufficient for some data.

Advanced Patterns: Multi-Agent and Distributed Memory

As your system scales, you’ll encounter challenges with multiple agents, multiple users, and distributed systems.

Shared Memory for Multi-Agent Coordination

When multiple agents work on the same problem, they need to coordinate. Shared memory enables this:

Agent A explores the schema and stores findings in shared memory
Agent B reads Agent A’s findings and continues from there
Agent C validates the work and updates shared memory with quality notes

This requires strong consistency (all agents must see the same view) and careful conflict resolution. Tools like Memory in LangGraph provide patterns for multi-actor agent workflows with persistent state.

Hierarchical and Federated Memory

In large organizations, you might have:

Local memory: Each team/agent maintains its own memory
Shared memory: Teams share common knowledge (schema, metrics, best practices)
Global memory: Organization-wide insights and standards

This requires federation—a way to query and update memory across boundaries while maintaining security and consistency. It’s complex but necessary for scaling agent systems across large organizations.

Memory Compression and Summarization

Long-running workflows generate enormous amounts of memory. Compression keeps it manageable:

Summarization: Compress episodic memory by summarizing conversations
Clustering: Group similar insights and store only the representative ones
Archival: Move old memory to cold storage
Forgetting: Explicitly delete memory that’s no longer relevant

The challenge is doing this without losing critical information. LLMs are good at summarization, but they can miss nuances. Hybrid approaches (LLM summarization + human review for critical memory) are common in production systems.

Real-World Example: Agent Memory in Analytics Workflows

Let’s walk through a concrete example of agent memory in an analytics context, relevant to how D23 and similar platforms operate.

Scenario: A startup uses an embedded analytics agent to help non-technical users explore their product data. The agent needs to:

Understand the data schema (100+ tables, 1000+ columns)
Remember user preferences (which metrics matter, which tables are slow)
Track previous queries and insights
Learn from mistakes (bad joins, incorrect aggregations)
Maintain conversation context across sessions

Memory Architecture:

Working Memory (Redis): Current session state, active conversation, intermediate query results. TTL: 1 hour.
Episodic Memory (PostgreSQL): Conversation logs, query history, results. Retention: 1 year. Indexed by user, session, and date.
Semantic Memory (Vector Database): Schema definitions, metric definitions, business logic, past insights. Updated when users correct the agent or when admins add new knowledge.
Procedural Memory (LLM Context): Embedded in the agent’s system prompt and fine-tuning—patterns for common queries, error recovery strategies.

Workflow:

User asks: “Show me revenue by customer segment for this month.”
Agent retrieves working memory (current session context)
Agent searches semantic memory for:
- Revenue table definition and relationships
- Customer segment metric definition
- Any warnings about this table (e.g., “includes test customers, filter them out”)
Agent searches episodic memory for similar past queries
Agent generates SQL, augmented with retrieved context
Agent runs the query and gets results
Agent stores the new query in episodic memory and notes any new insights in semantic memory
Agent returns results to user

If the user returns a week later:

User asks: “Can you show me the same thing for last month?”
Agent retrieves episodic memory and finds the previous query
Agent modifies the query (change date filter) rather than regenerating from scratch
Agent is faster and more consistent because it has memory

This example shows how agent memory reduces latency, improves consistency, and enables learning. For platforms like D23 that embed analytics into products, this matters enormously—every second of latency affects user experience.

Best Practices and Lessons Learned

Based on research and real-world deployment, here are key best practices:

Start Simple, Evolve Gradually

Don’t build a complex multi-tier memory system from day one. Start with episodic memory (conversation logs) and working memory (session state). Add semantic memory and vector search once you understand your use case. Add hierarchical tiering and compression once you hit performance or cost problems.

Make Memory Explicit and Queryable

Memory should be inspectable. Users and developers should be able to ask:

“What does the agent know about this table?”
“Why did the agent make that decision?”
“What has the agent learned from past queries?”

This requires storing memory in queryable formats (not just embeddings) and providing tools to inspect it.

Version and Validate Memory Updates

When agents update memory, validate the update and track versions. This enables rollback if something goes wrong and helps debug agent behavior.

Monitor Memory Quality

Memory can degrade over time. Implement monitoring to catch:

Stale or outdated information
Contradictions (conflicting information in memory)
Poisoning (corrupted or malicious updates)

Use Bounded Memory with Explicit Policies

Set maximum memory sizes and define eviction policies. Make these policies explicit and tunable, not hidden in code.

Combine Multiple Memory Types

Rare is the system that uses only one memory type. Most production systems combine episodic (conversation logs), semantic (knowledge base), and working memory (session state) in a carefully orchestrated way.

Integration with Modern AI Frameworks

If you’re building agents, you likely use frameworks like LangChain, LlamaIndex, or Anthropic’s APIs. These frameworks increasingly support memory patterns:

Agent Capabilities from Anthropic covers building agents with memory systems for maintaining context and state across extended interactions. Similarly, Memory in LangGraph provides specific patterns for persistent state in multi-actor workflows.

When evaluating frameworks, check:

Does it support custom memory backends?
Can you retrieve memory from external systems?
Does it handle memory versioning and conflict resolution?
Is memory queryable and inspectable?

Frameworks that treat memory as a first-class citizen (not an afterthought) are easier to scale and debug.

Conclusion: Memory as Infrastructure

Agent memory is not optional—it’s foundational infrastructure for long-running, intelligent systems. Whether you’re building autonomous analytics agents, embedding BI into your product, or orchestrating complex data workflows, you need to think carefully about how your agents store, retrieve, and update context.

The patterns discussed here—RAG, memory updates, hierarchical storage, bounded contexts, and multi-agent coordination—are proven approaches used in production systems. Start with the simplest pattern that solves your problem, measure performance and costs, and evolve as you learn.

For organizations using platforms like D23, which manages Apache Superset with AI and API-first design, understanding these patterns helps you architect better analytics workflows. When you embed self-serve BI or AI-powered analytics into your product, persistent agent memory becomes the difference between a system that learns and improves and one that starts from scratch every time.

The research community continues to advance memory systems for agents—Agent-Memory-Paper-List on GitHub provides a curated list of papers on this topic if you want to dive deeper. As these systems mature, they’ll enable agents to solve increasingly complex problems, maintain richer context, and operate more reliably in production environments.

The key insight: in a world of bounded context windows and long-running workflows, memory is not a luxury—it’s the engine that makes intelligent agents practical and reliable.