Neotoma

Entity snapshots

An entity snapshot is the deterministic reducer output for one entity, the system's current best answer to 'given every observation we have, what is the truth right now?' Snapshots are derived, cached, and recomputed; observations are the durable ground truth. Every snapshot field carries provenance back to the observation that produced it, and snapshots optionally carry an embedding for semantic search.

Source → Interpretation → Observation → Snapshot. Entity snapshots are the rightmost layer of the truth model, the merged view derived from observations, with provenance and embeddings attached.

Schema#

entity_snapshots table (Postgres / hosted)

SQL / TS
Schema or pattern reference for this primitive.
CREATE TABLE entity_snapshots ( entity_id TEXT PRIMARY KEY, entity_type TEXT NOT NULL, schema_version TEXT NOT NULL, snapshot JSONB NOT NULL, computed_at TIMESTAMPTZ NOT NULL, observation_count INTEGER NOT NULL, last_observation_at TIMESTAMPTZ NOT NULL, provenance JSONB NOT NULL, user_id UUID NOT NULL, embedding vector(1536) );
FieldTypePurpose
entity_idTEXTForeign key to entities.id and PK, at most one snapshot per entity
entity_typeTEXTMirrors entities.entity_type so reads avoid the join
schema_versionTEXTSchema version this snapshot was computed against
snapshotJSONBThe merged, current truth for the entity, computed by the reducer
provenanceJSONBMap field → observation_id; one entry per snapshot field, drives 'where did this come from?' views
observation_countINTEGERNumber of observations the snapshot was computed from
last_observation_atTIMESTAMPTZTimestamp of the newest observation included in the snapshot
computed_atTIMESTAMPTZWall-clock time of the most recent reducer run
embeddingvector(1536)Optional embedding of the snapshot for semantic similarity search; partial ivfflat index covers non-null rows
user_idUUIDOwner; mirrors entities.user_id for RLS

Deterministic by construction#

Same observations + same schema + same reducer config ⇒ same snapshot, byte-for-byte (modulo computed_at). Re-running the reducer never randomly changes a field. This is what lets Neotoma replay historical state, audit truth, and detect non-determinism in custom reducers.

Provenance map#

provenance is a JSONB object whose keys are snapshot field names and whose values are the observation_id that produced each value. From there the chain is fully resolvable: observation → source (raw bytes) and observation → interpretation (model, prompt, schema version). Every snapshot field has exactly one provenance entry, no field is unsourced.

When the reducer runs#

The reducer recomputes a snapshot when its observation set changes: a new observation arrives, a reinterpretation completes, an entity merge rewrites observations from a loser entity to a winner, or a schema upgrade requires recomputation against a new schema_version. Reads never trigger recomputation, snapshots are cached state.

Embeddings and vector parity#

Snapshots optionally carry a 1536-dimensional embedding for semantic similarity search. The cosine ivfflat index is partial (lists=100) and only covers rows where embedding is not null. In local SQLite mode the column is mirrored into a sqlite-vec virtual table (entity_embeddings_vec) plus a join table (entity_embedding_rows) so KNN queries get the same shape as hosted.

Merge deletes the loser, recomputes the winner#

When two entities are merged, the loser's snapshot is deleted (the loser entity has no more observations pointing at it), and the winner's snapshot is recomputed deterministically from the union of observations. Reads filter merged entities by default, so the loser disappears from default views without any retroactive rewriting of history.

Invariants#

Every entity snapshot satisfies the following constraints:

MUST

  • Have exactly one row per entity (PK on entity_id)
  • Be byte-for-byte reproducible from observations + schema + reducer config (modulo computed_at)
  • Carry a provenance entry for every field in snapshot
  • Stamp schema_version, observation_count, last_observation_at, and computed_at on every recomputation
  • Be filtered by user ownership on every read path

MUST NOT

  • Be edited directly by clients or agents, every change is the reducer reacting to new observations
  • Survive an entity merge on the merged-from side, the loser's snapshot is deleted, the winner's is recomputed
  • Be treated as durable ground truth, observations are durable, snapshots are derived
  • Carry a value in snapshot without a corresponding provenance entry

Where to go next#

  • All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
  • Architecture , how the primitives compose into Neotoma's deterministic state
  • Terminology , canonical glossary of terms used across Neotoma docs