Inference variance

Pipeline state mode

Your agent guesses entities every run. Corrections don’t persist. Regressions ship because the architecture can’t prevent them.

You wire together multi-step agent pipelines. They work in demos. In production, entities drift, memory regresses, and when something goes wrong you can't trace why. Half your effort goes toward compensating for a memory layer that doesn't hold its shape. Neotoma puts you on solid ground.

Escaping

Babysitting inference — absorbing variance the architecture doesn't handle

Into

Builder who ships on solid ground

Compensating for memory → building on top of it

Tax you pay

Prompt workarounds, dedup hacks, memory regression fixes

What you get back

Product velocity, shipping confidence, roadmap ambition

Same question, different outcome

Without a state layer, agents return stale or wrong data. With Neotoma, every response reads from versioned, schema-bound state.

Cross-session memory

without state layer

Continue the onboarding workflow for Acme Corp.

No onboarding workflow found. Starting fresh.

with state layer

Continue the onboarding workflow for Acme Corp.

Resuming step 4 of 7. Last update: 2 hours ago.

Agent starts from zero every session

The agent built up context over a multi-turn workflow, then the session ended. Next session, everything was gone - no accumulated facts, no continuity.

Multi-agent coordination

without state layer

What's the latest company summary for Apex?

Apex: 12 employees, Series A. (stale from research agent)

with state layer

What's the latest company summary for Apex?

Apex: 45 employees, Series B. Merged from 2 agent sources.

Two agents, conflicting state

A research agent and a writing agent both updated the same record. One silently overwrote the other, and the final output mixed stale and current data.

Pipeline debugging

without state layer

Why did the pipeline recommend vendor B?

Recommendation based on available data.

with state layer

Why did the pipeline recommend vendor B?

Based on observation #3021 (cost matrix v3) at step 2 of 5.

Can't trace output back to source

A pipeline produced an incorrect recommendation. Without provenance, there was no way to determine which step introduced the error or what data it was based on.

Entity resolution

without state layer

Get all open tasks for client 'J. Martinez'.

Found 2 tasks for 'Jose Martinez', 1 for 'J. Martinez'.

with state layer

Get all open tasks for client 'J. Martinez'.

3 open tasks for Jose Martinez (canonical ID: ent_8f2a).

Duplicate records, divergent state

Multiple sessions created separate records for the same person. Downstream agents reasoned over duplicates with conflicting details.

Why this happens

No shared memory across sessions or agents; everything resets

Can't trace agent decisions back to the facts they were based on

Same pipeline, different results — no reproducible state to compare

Context fragments across orchestration steps and agent handoffs

Client data flows through memory services you can't audit

Failure modes without a memory guarantee

Memory silently mutates between sessions

Can't replay a pipeline to understand what went wrong

Context lost across orchestration steps

No trail linking agent output to source facts

Resolved decisions retrieved as open questions

Client data in third-party memory with no access audit

Memory locked to one framework

Agents forget between sessions

Conversation-scoped memory resets every session. Agents can't accumulate facts, track how records evolve, or reference decisions from prior runs. Each session starts from scratch.

No way to trace why the agent said what it said

When an agent produces an output, there's no link back to the data that informed it. Debugging, evaluation, and compliance all require knowing why - and most memory systems can't answer that.

State changes silently across pipeline steps

In multi-step pipelines, one agent's write can silently overwrite another's. Without versioned state, pipelines produce results that can't be replayed or compared.

Data in memory services you can't audit

Agent workflows route data through external memory APIs with no visibility into storage, access controls, or retention. When someone asks "where is my data?", most agent setups can't answer.

Memory locked to one framework

LangChain memory doesn't port to CrewAI. CrewAI state doesn't port to custom orchestration. Every framework ships its own memory with its own API and no interoperability.

AI needs

What you need from your AI tools, and what current tools don't provide.

Memory that persists across sessions and agents
Same input, same output - no silent changes between runs
Every agent output traces back to the facts it was based on
Entity resolution so agents reason over canonical records, not duplicates
Audit trail for debugging and compliance across pipeline steps

How Neotoma solves this

Neotoma removes the tax you pay compensating for unreliable memory. Entities resolve once and persist. Every fact traces back to its source. New features compound instead of regressing.

Persistent, versioned memory

Every state change is versioned. Same inputs always produce the same output. No silent overwrites - pipelines can be replayed and compared.

Full provenance and audit trail

Every record and relationship links back to where it came from. Ask "where did this come from?" and get a traceable chain from output to source.

Open source, no lock-in

MIT-licensed. No vendor lock-in, no proprietary memory format. Your data is yours - stored locally, accessible from any MCP-compatible tool, portable across frameworks.

Entity resolution across sessions

Agents accumulate facts across sessions without creating duplicates. Canonical IDs, typed relationships, and timelines survive restarts, handoffs, and pipeline re-runs.

What actually changes

You stop compensating for memory and start building on top of it. New features compound instead of regressing. You add a capability and it works across sessions because the state it depends on persists.

The records graph gets richer as usage grows, not messier - constraints and resolution rules handle what used to be manual cleanup. Your roadmap shifts from memory regression fixes to new capabilities.

When something goes wrong, you trace it to a specific record in thirty seconds. You start trusting your own system enough to build ambitiously on it.

How is this different from what you're already using?

Framework-native memory

LangChain, CrewAI, and custom frameworks each ship their own memory. None are portable across tools. None version state. None provide provenance. Switching frameworks means starting over.

Retrieval / vector search

Retrieval finds relevant context at query time by similarity. It doesn't persist canonical records, track provenance, or guarantee the same result twice. Retrieval and a persistent memory layer solve different problems.

Provider-hosted memory

ChatGPT memory, Claude memory: conversation-scoped, provider-bound, non-auditable. No cross-tool access, no correction mechanism, no trail.

Retrieval and persistent memory are different paradigms. Retrieval optimizes for flexible, on-demand access. Persistent memory optimizes for consistency and verifiability. If your agents need to reason over canonical records across sessions - not just find relevant context within one - you need a persistent layer underneath retrieval.

Data types for better remembrance

The entity types you'll store most often.

agent_session

Session state and accumulated facts across agent runs

action

Agent actions with inputs, outputs, and provenance links

pipeline

Multi-step workflows with step-level audit

evaluation

Eval results, benchmarks, and regression tracking

audit_event

Immutable log of state transitions and corrections

tool_config

Agent tool configurations and runtime parameters

entity_graph

Resolved records with typed relationships and temporal evolution

runbook

Operational procedures and agent behavioral rules

When you don't need this

For single-session, stateless tasks (one-shot summarization, code generation, document Q&A), retrieval is sufficient. Neotoma is for agents that accumulate facts across sessions, resolve entities, track commitments, and need to explain their reasoning after the fact.

Other modes

The same person operates in multiple modes. The tax differs; the architecture that removes it is the same.

Cross-tool sync mode

Context janitor

Every session starts from zero. You re-explain context, re-prompt corrections, re-establish what the agent already knew.

Replay & debug mode

Log archaeology

Two runs. Same inputs. Different state. No replay, no diff, no explanation.

The tax is prompt workarounds, dedup hacks, and memory regressions. Neotoma removes that tax and gives you a persistent, provenance-backed substrate to build on.

Built because retrieval kept re-guessing entities that should have been resolved once and persisted.

Deep dive: Why agent memory needs more than RAG

Install in 5 minutes View architecture →