Interpretations

An interpretation is a versioned attempt to extract structured information from a single source. It exists as a first-class record so the system can audit how data was extracted, reinterpret without rewriting history, and track extraction quality over time. Structured agent writes (already-structured payloads) skip interpretations entirely.

Source → Interpretation → Observation → Snapshot. Interpretations are the second layer, they record which model, prompt, and schema version produced which observations.

Schema#

interpretations table (Postgres / hosted)

SQL / TS

Schema or pattern reference for this primitive.

CREATE TABLE interpretations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  source_id UUID NOT NULL REFERENCES sources(id),
  interpretation_config JSONB NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending',
  error_message TEXT,
  extracted_entities JSONB DEFAULT '[]',
  confidence NUMERIC(3,2),
  unknown_field_count INTEGER NOT NULL DEFAULT 0,
  extraction_completeness TEXT DEFAULT 'unknown',
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  archived_at TIMESTAMPTZ,
  user_id UUID NOT NULL
);

Field	Type	Purpose
`id`	`UUID`	Referenced by observations.interpretation_id for full provenance
`source_id`	`UUID`	The source that was interpreted
`interpretation_config`	`JSONB`	Audit log of model, model_version, extractor_type, prompt_version, temperature, schema_version
`status`	`TEXT`	State machine: pending → running → completed \| failed
`confidence`	`NUMERIC(3,2)`	Aggregate model self-reported confidence in [0.00, 1.00], advisory, not authoritative
`unknown_field_count`	`INTEGER`	Count of extracted fields that did not match the active schema and were routed to raw_fragments
`extraction_completeness`	`TEXT`	complete / partial / unknown, coverage signal for the source
`archived_at`	`TIMESTAMPTZ`	Set when a newer interpretation supersedes this one; the row stays queryable

interpretation_config is an audit log, not a replay contract#

interpretation_config captures the model, prompt, extractor type, and schema version active at run start. Re-running with the same config can produce different outputs because LLM weights drift, network conditions affect tokenisation, and tools the extractor calls may themselves be non-deterministic. What Neotoma guarantees is that whichever output happened is permanently linked to the config that produced it.

Status state machine#

pending (created, not started) → running (started_at set) → completed | failed (terminal). Terminal states never transition back; reruns create new rows. confidence, unknown_field_count, extraction_completeness, and completed_at are written on the terminal transition.

Reinterpretation creates new rows, never mutates#

Reinterpretation always creates a new interpretation and new observations. The prior interpretation gets archived_at marked but its observations remain queryable in observation history. The reducer chooses between competing observations using source_priority, specificity_score, and observed_at; corrections (priority 1000) always win.

Quality signals#

unknown_field_count flags schema drift, sustained spikes mean the schema is missing real-world fields and should be evolved via update_schema_incremental. extraction_completeness (complete/partial/unknown) is set by the extractor at run end. confidence is advisory only, the reducer MUST NOT use it for merge decisions.

Invariants#

Every interpretation satisfies the following constraints:

MUST

Carry a non-null source_id, interpretation_config, and user_id
Capture model / extractor / prompt / schema version in interpretation_config at run start
Be immutable in identifying fields after write, only status, timing, quality, and archived_at change
Pass attribution policy enforcement before write

MUST NOT

Be mutated in a way that retroactively changes which observations a row produced
Be hard-deleted (use archived_at) outside of explicit user-initiated source deletion
Be created without a corresponding sources row
Be assumed deterministic for replay, only audit-log linkage to config is guaranteed

Interpretations subsystem doc , Full schema, status lifecycle, quality signals
Sources , Raw artifact every interpretation points back to
Observations , Granular facts produced by completed interpretations
MCP spec, reinterpret , reinterpret(source_id, interpretation_config?) tool
Implementation , src/services/interpretation.ts, create / status transitions

Where to go next#

All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
Architecture , how the primitives compose into Neotoma's deterministic state
Terminology , canonical glossary of terms used across Neotoma docs