Interpretations

An interpretation is a versioned attempt to extract structured information from a single source. It exists as a first-class record so the system can audit how data was extracted, reinterpret without rewriting history, and track extraction quality over time. Structured agent writes (already-structured payloads) skip interpretations entirely.

Source → Interpretation → Observation → Snapshot. Interpretations are the second layer, they record which model, prompt, and schema version produced which observations.

Schema#

interpretations table (Postgres / hosted)

SQL / TS
Schema or pattern reference for this primitive.
CREATE TABLE interpretations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), source_id UUID NOT NULL REFERENCES sources(id), interpretation_config JSONB NOT NULL, status TEXT NOT NULL DEFAULT 'pending', error_message TEXT, extracted_entities JSONB DEFAULT '[]', confidence NUMERIC(3,2), unknown_field_count INTEGER NOT NULL DEFAULT 0, extraction_completeness TEXT DEFAULT 'unknown', started_at TIMESTAMPTZ, completed_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT NOW(), archived_at TIMESTAMPTZ, user_id UUID NOT NULL );
FieldTypePurpose
idUUIDReferenced by observations.interpretation_id for full provenance
source_idUUIDThe source that was interpreted
interpretation_configJSONBAudit log of model, model_version, extractor_type, prompt_version, temperature, schema_version
statusTEXTState machine: pending → running → completed | failed
confidenceNUMERIC(3,2)Aggregate model self-reported confidence in [0.00, 1.00], advisory, not authoritative
unknown_field_countINTEGERCount of extracted fields that did not match the active schema and were routed to raw_fragments
extraction_completenessTEXTcomplete / partial / unknown, coverage signal for the source
archived_atTIMESTAMPTZSet when a newer interpretation supersedes this one; the row stays queryable

interpretation_config is an audit log, not a replay contract#

interpretation_config captures the model, prompt, extractor type, and schema version active at run start. Re-running with the same config can produce different outputs because LLM weights drift, network conditions affect tokenisation, and tools the extractor calls may themselves be non-deterministic. What Neotoma guarantees is that whichever output happened is permanently linked to the config that produced it.

Status state machine#

pending (created, not started) → running (started_at set) → completed | failed (terminal). Terminal states never transition back; reruns create new rows. confidence, unknown_field_count, extraction_completeness, and completed_at are written on the terminal transition.

Reinterpretation creates new rows, never mutates#

Reinterpretation always creates a new interpretation and new observations. The prior interpretation gets archived_at marked but its observations remain queryable in observation history. The reducer chooses between competing observations using source_priority, specificity_score, and observed_at; corrections (priority 1000) always win.

Quality signals#

unknown_field_count flags schema drift, sustained spikes mean the schema is missing real-world fields and should be evolved via update_schema_incremental. extraction_completeness (complete/partial/unknown) is set by the extractor at run end. confidence is advisory only, the reducer MUST NOT use it for merge decisions.

Invariants#

Every interpretation satisfies the following constraints:

MUST

  • Carry a non-null source_id, interpretation_config, and user_id
  • Capture model / extractor / prompt / schema version in interpretation_config at run start
  • Be immutable in identifying fields after write, only status, timing, quality, and archived_at change
  • Pass attribution policy enforcement before write

MUST NOT

  • Be mutated in a way that retroactively changes which observations a row produced
  • Be hard-deleted (use archived_at) outside of explicit user-initiated source deletion
  • Be created without a corresponding sources row
  • Be assumed deterministic for replay, only audit-log linkage to config is guaranteed

Where to go next#

  • All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
  • Architecture , how the primitives compose into Neotoma's deterministic state
  • Terminology , canonical glossary of terms used across Neotoma docs