When you're debugging infrastructure
Two runs. Same inputs. Different state. No replay, no diff, no explanation.
You're in this mode when something breaks and you need to understand why. Your observability stack watches everything except the thing that matters: what the agent believed and why. Debugging means reconstructing truth from scattered logs and hoping to reproduce the issue. Neotoma gives you replayable state so debugging takes minutes, not days.
Escaping
Log archaeologist — reverse-engineering truth from logs
Into
Platform engineer with replayable state
Tax you pay
Writing glue (checkpoint logic, custom diffing, state serialization)
What you get back
Debugging speed, platform design time, sleep
Same question, different outcome
Without a state layer, agents return stale or wrong data. With Neotoma, every response reads from versioned, schema-bound state.
Same pipeline, different results
Two runs of the same pipeline with identical inputs returned different state. Without versioned state, there was no way to detect or explain the drift.
Invisible overwrite, broken downstream
An upstream agent silently overwrote a shared record. Downstream consumers read stale data and produced incorrect output. The change was invisible.
Missing provenance, failed audit
An evaluation needed to trace an agent's output to its source data. Without an immutable log, the trail had to be reconstructed manually from scattered logs.
Can't reconstruct state after failure
A production agent crashed mid-run. In-memory state was lost. Without an append-only log, there was no way to reconstruct what the agent knew at the time of failure.
Why this happens
Failure modes without a memory guarantee
Agent runs aren't reproducible
Two runs with identical inputs produce different results. State changes between sessions are invisible - no versioned history to compare, no log to replay. Debugging means reading logs and guessing.
State changes are invisible
Values overwrite in place. When a record changes, the previous value is gone. No diff, no provenance, no way to know which agent or pipeline step introduced the change.
No audit trail for compliance or evaluation
Evaluation needs to compare outputs against known-good state. Compliance needs to trace decisions to source data. Without an immutable log, neither is possible without manual reconstruction.
Memory locked to one vendor's runtime
Each agent runtime ships its own memory: none portable, none interoperable. Switching frameworks means rebuilding state management from scratch.
No data residency guarantees
Agent state flows through third-party APIs with no contractual guarantee about where it's stored or who can access it. For teams with compliance obligations, opaque provider memory is a gap that manual audits cannot close.
If you can't replay an agent run, you can't debug it. If you can't debug it, you can't iterate. Neotoma makes agent state inspectable, diffable, and replayable - so your debugging cycle is minutes, not days.
AI needs
What you need from your AI tools, and what current tools don't provide.
- Same inputs always produce the same state - no silent drift
- Full provenance chain from agent outputs back to source data
- Replayable timelines for debugging production failures
- Validation at write time so bad data is rejected before it spreads
- Append-only log for complete state reconstruction after failure
How Neotoma solves this
Neotoma replaces the glue you write by hand — checkpoint logic, state serialization, custom diffing — with primitives that just work. Replay any timeline, diff any state, trace any output to its source.
Deterministic state
Every state change is versioned. Same inputs always produce the same state - no ordering sensitivity, no silent drift. Agent runs become reproducible by construction.
Append-only log
Records are immutable. Corrections add new data; they never overwrite. The full state can be reconstructed from the log at any point in time.
Full provenance and replayable timeline
Every record links back to where it came from. Replay the timeline to any historical state. Diff versions to see what changed and when.
Validation at write time
Records enforce validation rules at write time. Bad data is rejected before it enters the memory graph - preventing garbage-in-garbage-out across runtimes.
What actually changes
You stop writing glue. Checkpoint logic, state serialization, custom diffing, retry handlers - the guarantees you've been hand-rolling become primitives you build on.
When something fails, you query the timeline instead of reconstructing it from logs. Post-mortems take thirty minutes because provenance answers "what changed and when" directly.
The job shifts from reactive firefighting to proactive platform design.
Key differences
How your needs differ from Building pipelines:
- Focus: infrastructure and platform layer - below application agents, above compute
- Adoption motion: evaluate guarantees first, then standardize across teams
- Success metric: reproducible runs and auditability, not just better agent outputs
Data types for better remembrance
The entity types you'll store most often.
agent_session
Session state with versioned context and accumulated facts
action
Agent actions with inputs, outputs, timestamps, and provenance
pipeline
Multi-step workflows with step-level state tracking
evaluation
Eval results, benchmarks, and regression tracking
audit_event
Immutable log of state transitions and corrections
tool_config
Agent tool configurations and runtime parameters
entity_graph
Resolved records with typed relationships and temporal evolution
runbook
Operational procedures and agent behavioral rules with version history
When you don't need this
If your agents are stateless request-response (no accumulated context, no record tracking), standard logging and tracing are sufficient. Neotoma is for when agents accumulate state across sessions and pipeline steps, and you need that state to be reproducible, traceable, and auditable.
Other modes
The same person operates in multiple modes. The tax differs; the architecture that removes it is the same.
Operating mode
Operating across tools
Every session starts from zero. You re-explain context, re-prompt corrections, re-establish what your agent already knew.
Building mode
Building pipelines
Your agent guesses entities every session. Corrections don't persist. Memory regressions ship because the architecture can't prevent them.
The tax is writing glue: checkpoint logic, custom diffing, state serialization. Neotoma removes that tax and gives you primitives to build on instead.
Built because I hit every failure mode on this page while running a twelve-server agent stack against a production monorepo.
Deep dive: Building structural barriers that incumbents can't copy