Observations
An observation is a granular, source-specific statement about an entity at a point in time. Observations are the only thing the reducer reads to compute an entity snapshot, and they are the only place ground truth lives. Snapshots are derived; observations are durable.
Source → Interpretation → Observation → Snapshot. Observations are the third layer, the immutable atoms of truth. Reducers merge them deterministically into snapshots.
Schema#
Observation shape (conceptual)
| Field | Type | Purpose |
|---|---|---|
entity_id | string | The entity this observation is about (resolved during ingestion, user-scoped) |
schema_version | string | Active entity schema version at extraction time |
source_id | string | What raw content produced this observation |
interpretation_id | string | null | Which extraction run produced this; NULL for structured store_structured writes |
observed_at | Date | When this observation was made (or extracted from the source) |
specificity_score | number | Reducer tie-break: how specific this observation is for its fields |
source_priority | number | 0 (AI) / 100 (structured agent) / 1000 (user correction). Corrections always win. |
fields | object | The actual granular facts, whatever the schema admits at this version |
The three-layer truth model#
Source carries raw bytes. Interpretation carries the audit trail of how those bytes were read. Observations carry granular facts that link back to both. Snapshots are deterministically composed from all observations for an entity by the reducer. This separation is what lets Neotoma reinterpret without rewriting history and replay snapshots at any point in time.
Immutability and provenance#
Observations are never mutated. Every observation links to its source_id and interpretation_id, so any field on any snapshot can be traced back to the exact bytes and extractor that produced it. Reinterpretation, correction, and re-ingest always produce new observations, the audit trail grows monotonically.
Source priority and merging#
When two observations disagree about the same field, the reducer picks the winner using (source_priority, specificity_score, observed_at). User corrections write at priority 1000 and always win. Structured agent writes are at 100. AI interpretations are at 0. This priority is deterministic and surfaceable: the Inspector shows which observation produced each snapshot field.
Where observations come from#
Three writers create observations: structured store_structured calls (source_priority 100, interpretation_id NULL), AI interpretation pipelines on completion (source_priority 0, interpretation_id set), and explicit user corrections via correct() (source_priority 1000). All writes flow through service_role on the MCP server.
Invariants#
Every observation satisfies the following constraints:
MUST
- Link to a non-null source_id and a resolved entity_id
- Be immutable, corrections and reinterpretations create new observations, not edits
- Carry a source_priority that determines reducer tie-breaks deterministically
- Pass attribution policy enforcement before write
MUST NOT
- Be mutated after creation
- Be created without a corresponding sources row (structured writes still own a synthetic source)
- Use confidence directly in reducer merge logic, that is reserved for source_priority and specificity_score
- Be returned across user boundaries; every read filters through source ownership
Related#
- Observation architecture , Three-layer model, lifecycle, snapshot computation, provenance
- Sources , Raw artifact each observation links back to
- Interpretations , Extraction run each AI-produced observation belongs to
- Relationships , Edges follow the same observation-snapshot pattern
- Versioned history , Why immutable observations matter for agent memory
Where to go next#
- All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
- Architecture , how the primitives compose into Neotoma's deterministic state
- Terminology , canonical glossary of terms used across Neotoma docs