Entities

An entity is the canonical, durable row that every observation, relationship, and timeline event ultimately points at. The entities table itself is small and stable, aliases, identity decisions, and merge history live here. The rich, current view of an entity lives in entity snapshots, recomputed deterministically from observations.

Sits next to the truth pipeline. Observations describe entities; the reducer composes those observations into entity snapshots. Without a stable entities row, observations would have no durable target to attach to.

Schema#

entities table (Postgres / hosted)

SQL / TS
Schema or pattern reference for this primitive.
CREATE TABLE entities ( id TEXT PRIMARY KEY, -- Deterministic hash-based ID entity_type TEXT NOT NULL, -- 'person', 'company', 'location', 'invoice', … canonical_name TEXT NOT NULL, -- Normalized name aliases JSONB DEFAULT '[]', -- Array of alternate names metadata JSONB DEFAULT '{}', first_seen_at TIMESTAMP WITH TIME ZONE, last_seen_at TIMESTAMP WITH TIME ZONE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), user_id UUID NOT NULL, merged_to_entity_id TEXT REFERENCES entities(id), merged_at TIMESTAMPTZ );
FieldTypePurpose
idTEXTDeterministic hash-based ID derived from entity_type + canonical_name + user_id; same identity collapses to the same row
entity_typeTEXTCanonical type label (person, company, location, invoice, …)
canonical_nameTEXTNormalized name used as part of the identity hash
aliasesJSONBAlternate spellings, legal names, handles, additive, never destructive
metadataJSONBIdentity-level metadata (e.g. external IDs); not the rich snapshot view
first_seen_atTIMESTAMPTZEarliest observed_at across observations for this entity
last_seen_atTIMESTAMPTZMost recent observed_at across observations
user_idUUIDOwner; combined with the identity hash this enforces per-user identity isolation and RLS
merged_to_entity_idTEXTSet on merge, points at the surviving entity; the merged-from row stays so historical observations resolve
merged_atTIMESTAMPTZWhen the merge happened; reads filter merged entities from default queries

Deterministic, hash-based identity#

Entity ids are derived from (entity_type, canonical_name, user_id), not generated at random. Re-resolving the same identity converges on the same row, which is what makes ingestion idempotent and lets out-of-order writes attach to the right entity without coordination.

Why the row stays small#

Rich, multi-field truth lives in entity_snapshots, which is recomputed by the reducer. The entities row carries only what has to be durable: identity, aliases, merge state, and ownership. If snapshots are lost they can be rebuilt; the entities row cannot.

Merge: a repair mechanism, not write-time resolution#

Neotoma does not attempt perfect entity resolution at write time. Duplicates are repaired with merge_entities(from_id, to_id): observations pointing at the loser are rewritten to the winner, the loser's snapshot is deleted, the winner's is recomputed, and the loser row is marked merged with merged_to_entity_id and merged_at set. The loser row stays so historical observations and relationships still resolve.

User isolation#

RLS on entities filters by user_id. Identity is per-user, two users can independently have an entity for the same canonical name without colliding. All reads from MCP, HTTP, and CLI go through this filter, and merged entities are excluded from default queries.

Invariants#

Every entity satisfies the following constraints:

MUST

  • Carry a non-null entity_type, canonical_name, and user_id
  • Have a deterministic, hash-derived id so re-resolving the same identity returns the same row
  • Be the foreign-key target for every observation, relationship, and timeline event for that identity
  • Be repaired via merge (never destructive deletion), merged rows stay so history resolves
  • Pass attribution policy enforcement before write

MUST NOT

  • Be edited destructively after creation, aliases and metadata are additive
  • Carry the rich, merged truth view (that lives in entity_snapshots)
  • Be deduplicated across user boundaries, identity is per-user
  • Be hard-deleted as a routine merge step, the merged-from row stays for provenance

Where to go next#

  • All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
  • Architecture , how the primitives compose into Neotoma's deterministic state
  • Terminology , canonical glossary of terms used across Neotoma docs