Entities

An entity is the canonical, durable row that every observation, relationship, and timeline event ultimately points at. The entities table itself is small and stable, aliases, identity decisions, and merge history live here. The rich, current view of an entity lives in entity snapshots, recomputed deterministically from observations.

Sits next to the truth pipeline. Observations describe entities; the reducer composes those observations into entity snapshots. Without a stable entities row, observations would have no durable target to attach to.

Schema#

entities table (Postgres / hosted)

SQL / TS

Schema or pattern reference for this primitive.

CREATE TABLE entities (
  id TEXT PRIMARY KEY,                             -- Deterministic hash-based ID
  entity_type TEXT NOT NULL,                       -- 'person', 'company', 'location', 'invoice', …
  canonical_name TEXT NOT NULL,                    -- Normalized name
  aliases JSONB DEFAULT '[]',                      -- Array of alternate names
  metadata JSONB DEFAULT '{}',
  first_seen_at TIMESTAMP WITH TIME ZONE,
  last_seen_at TIMESTAMP WITH TIME ZONE,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  user_id UUID NOT NULL,
  merged_to_entity_id TEXT REFERENCES entities(id),
  merged_at TIMESTAMPTZ
);

Field	Type	Purpose
`id`	`TEXT`	Deterministic hash-based ID derived from entity_type + canonical_name + user_id; same identity collapses to the same row
`entity_type`	`TEXT`	Canonical type label (person, company, location, invoice, …)
`canonical_name`	`TEXT`	Normalized name used as part of the identity hash
`aliases`	`JSONB`	Alternate spellings, legal names, handles, additive, never destructive
`metadata`	`JSONB`	Identity-level metadata (e.g. external IDs); not the rich snapshot view
`first_seen_at`	`TIMESTAMPTZ`	Earliest observed_at across observations for this entity
`last_seen_at`	`TIMESTAMPTZ`	Most recent observed_at across observations
`user_id`	`UUID`	Owner; combined with the identity hash this enforces per-user identity isolation and RLS
`merged_to_entity_id`	`TEXT`	Set on merge, points at the surviving entity; the merged-from row stays so historical observations resolve
`merged_at`	`TIMESTAMPTZ`	When the merge happened; reads filter merged entities from default queries

Deterministic, hash-based identity#

Entity ids are derived from (entity_type, canonical_name, user_id), not generated at random. Re-resolving the same identity converges on the same row, which is what makes ingestion idempotent and lets out-of-order writes attach to the right entity without coordination.

Why the row stays small#

Rich, multi-field truth lives in entity_snapshots, which is recomputed by the reducer. The entities row carries only what has to be durable: identity, aliases, merge state, and ownership. If snapshots are lost they can be rebuilt; the entities row cannot.

Merge: a repair mechanism, not write-time resolution#

Neotoma does not attempt perfect entity resolution at write time. Duplicates are repaired with merge_entities(from_id, to_id): observations pointing at the loser are rewritten to the winner, the loser's snapshot is deleted, the winner's is recomputed, and the loser row is marked merged with merged_to_entity_id and merged_at set. The loser row stays so historical observations and relationships still resolve.

User isolation#

RLS on entities filters by user_id. Identity is per-user, two users can independently have an entity for the same canonical name without colliding. All reads from MCP, HTTP, and CLI go through this filter, and merged entities are excluded from default queries.

Invariants#

Every entity satisfies the following constraints:

MUST

Carry a non-null entity_type, canonical_name, and user_id
Have a deterministic, hash-derived id so re-resolving the same identity returns the same row
Be the foreign-key target for every observation, relationship, and timeline event for that identity
Be repaired via merge (never destructive deletion), merged rows stay so history resolves
Pass attribution policy enforcement before write

MUST NOT

Be edited destructively after creation, aliases and metadata are additive
Carry the rich, merged truth view (that lives in entity_snapshots)
Be deduplicated across user boundaries, identity is per-user
Be hard-deleted as a routine merge step, the merged-from row stays for provenance

Entities subsystem doc , Identity, aliases, merge tracking, RLS
Entity snapshots , Reducer output that composes observations into the entity's current truth
Observations , Granular facts that describe entities
Relationships , Typed graph edges between entities
Entity merge , Detailed merge mechanics and the entity_merges audit table
Schema , Authoritative DDL for entities

Where to go next#

All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
Architecture , how the primitives compose into Neotoma's deterministic state
Terminology , canonical glossary of terms used across Neotoma docs