When you're building pipelines
Entity resolution by inference. Corrections that don't stick. Memory regressions you absorb because the architecture won't.
One of three operational modes of the same person: someone building an operating system for their own AI agents. Operating mode, Infrastructure debugging mode.
You enter this mode when you're wiring together multi-step agent pipelines. You ship agents that work in demos. In production, entity resolution drifts, memory regresses, and when something goes wrong you can't trace why. Half your engineering effort goes toward compensating for a memory layer that doesn't hold its shape. You're an inference babysitter. Neotoma puts you on solid ground.
Escaping
Inference babysitter — absorbing variance the architecture doesn't handle
Into
Builder who ships on solid ground
Tax you pay
Prompt engineering workarounds, dedup hacks, memory regression fixes
What you get back
Product velocity, shipping confidence, roadmap ambition
Same question, different outcome
Without a state layer, agents return stale or wrong data. With Neotoma, every response reads from versioned, schema-bound state.
Agent starts from zero every session
The agent accumulated context across a multi-turn workflow, then the session ended. Next session, everything was gone: no accumulated facts, no entity history, no continuity.
Two agents, conflicting state
A research agent and a writing agent both updated the same entity. Without versioned writes, one silently overwrote the other, and the final output mixed stale and current data.
Can't trace output back to source
An orchestration pipeline produced an incorrect recommendation. Without provenance links, the team couldn't determine which step introduced the error or what data it was based on.
Duplicate entities, divergent state
Multiple agent sessions created separate records for the same real-world entity. Without canonical resolution, downstream agents reasoned over duplicates with conflicting attributes.
Why this happens
Failure modes without a memory guarantee
Agents forget between sessions
Token-based and conversation-only memory resets every session. Agents can't accumulate facts, track entity evolution, or reference decisions from prior runs. Each session starts from scratch or depends on brittle prompt injection.
No provenance means no trust
When an agent produces an output, there's no way to trace it back to the source data that informed it. Debugging, evaluation, and compliance all require knowing why the agent said what it said, and most memory systems can't answer that.
State mutates silently across pipeline steps
In multi-step orchestration, one agent's write can silently overwrite another's. Without versioned, hash-based state evolution, pipelines produce non-reproducible results that can't be replayed or audited.
Client data in memory services you don't control
Agent workflows route customer data through external memory APIs with no visibility into storage location, access controls, or retention policies. When a client asks "where is my data stored?", most agent architectures can't answer.
Memory locked to one framework
LangChain memory doesn't port to CrewAI. CrewAI state doesn't port to custom orchestration. Every framework ships its own memory layer with its own API, its own storage format, and no interoperability. Switching frameworks means starting state management over.
AI needs
What you need from your AI tools, and what current tools don't provide.
- Cross-session, cross-agent state that persists beyond token windows
- Deterministic memory: same input, same output; no silent mutation
- Full provenance linking every agent output to its source facts
- Structured entity resolution so agents reason over canonical data, not duplicates
- Audit trail for eval, debugging, and compliance across pipeline steps
How Neotoma solves this
Neotoma removes the tax your team pays compensating for unreliable memory. Entities resolve once and persist. State evolves through versioned, auditable transitions. Every fact traces to provenance. Cross-session state with full audit trail.
Deterministic, versioned memory substrate
Every state transition is content-addressed and versioned. Same input always produces the same output. No silent mutation; agents and pipelines can be replayed and audited.
Full provenance and audit trail
Every entity, relationship, and fact links back to the observation that created it. Query "where did this come from?" and get a traceable chain from output to source.
MCP-native, MIT-licensed, no lock-in
MIT-licensed. No token, no vendor lock-in, no proprietary memory format. Your state is yours: stored locally, accessible via MCP from any compatible tool, portable across frameworks by design.
Cross-session entity resolution
Agents accumulate facts across sessions without duplication. Entity resolution ensures canonical IDs, typed relationships, and timelines that survive agent restarts, handoffs, and pipeline re-runs.
What actually changes
You stop compensating for memory and start building on top of it. New features compound instead of regressing. You add a capability and it works across sessions because the state it depends on persists.
You ship to more users and the entity graph gets richer, not messier, because schema constraints and merge rules handle what used to be manual cleanup. Your roadmap shifts from memory regression fixes to new capabilities.
A customer reports an issue and you trace it to a specific observation in thirty seconds. You start trusting your own system enough to build ambitiously on it.
How is this different from what you're already using?
Framework-native memory
LangChain, CrewAI, and custom frameworks each ship their own memory abstraction. None are portable. None version state. None provide provenance. Switching frameworks means starting state management over.
RAG / vector retrieval
Retrieval finds things at query time by similarity. It doesn't persist canonical entities, maintain provenance, or guarantee the same result twice. Retrieval and a truth layer solve different problems and will coexist.
Provider-hosted memory
ChatGPT memory, Claude memory: conversation-scoped, provider-bound, non-deterministic. No cross-platform access, no correction mechanism, no audit trail.
Retrieval and state are different paradigms, not a feature gap. Embedding-based search and agentic search both optimize for flexible, on-demand access. A truth layer optimizes for consistency and verifiability. If your agents need to reason over canonical entities across sessions, not just find relevant context within one, you need a state layer underneath the retrieval.
Data types for better remembrance
The entity types you'll store most often.
agent_session
Session state, context windows, and accumulated facts across agent runs
action
Agent actions with inputs, outputs, and provenance links
pipeline
Multi-step orchestration workflows with step-level audit
evaluation
Eval results, benchmarks, and regression tracking
audit_event
Immutable log of state transitions and entity mutations
tool_config
Agent tool configurations, MCP server bindings, and runtime parameters
entity_graph
Resolved entities with typed relationships and temporal evolution
runbook
Operational procedures and agent behavioral rules
When you don't need this
For single-session, stateless agent tasks (one-shot summarization, code generation, document Q&A), retrieval is sufficient and simpler. Neotoma is for agents that accumulate facts across sessions, resolve entities, track commitments, and need to explain their reasoning after the fact.
Other modes
The same person operates in multiple modes. The tax differs; the architecture that removes it is the same.
In building mode, the tax is prompt engineering workarounds, dedup hacks, and memory regression fixes. Neotoma removes that tax and gives you a deterministic, provenance-backed substrate to build on. Same architecture removes the tax in every mode.
Built because retrieval kept re-inferring entities that should have been resolved once and persisted.
Deep dive: Why agent memory needs more than RAG