Interpretations
An interpretation is a versioned attempt to extract structured information from a single source. It exists as a first-class record so the system can audit how data was extracted, reinterpret without rewriting history, and track extraction quality over time. Structured agent writes (already-structured payloads) skip interpretations entirely.
Source → Interpretation → Observation → Snapshot. Interpretations are the second layer, they record which model, prompt, and schema version produced which observations.
Schema#
interpretations table (Postgres / hosted)
| Field | Type | Purpose |
|---|---|---|
id | UUID | Referenced by observations.interpretation_id for full provenance |
source_id | UUID | The source that was interpreted |
interpretation_config | JSONB | Audit log of model, model_version, extractor_type, prompt_version, temperature, schema_version |
status | TEXT | State machine: pending → running → completed | failed |
confidence | NUMERIC(3,2) | Aggregate model self-reported confidence in [0.00, 1.00], advisory, not authoritative |
unknown_field_count | INTEGER | Count of extracted fields that did not match the active schema and were routed to raw_fragments |
extraction_completeness | TEXT | complete / partial / unknown, coverage signal for the source |
archived_at | TIMESTAMPTZ | Set when a newer interpretation supersedes this one; the row stays queryable |
interpretation_config is an audit log, not a replay contract#
interpretation_config captures the model, prompt, extractor type, and schema version active at run start. Re-running with the same config can produce different outputs because LLM weights drift, network conditions affect tokenisation, and tools the extractor calls may themselves be non-deterministic. What Neotoma guarantees is that whichever output happened is permanently linked to the config that produced it.
Status state machine#
pending (created, not started) → running (started_at set) → completed | failed (terminal). Terminal states never transition back; reruns create new rows. confidence, unknown_field_count, extraction_completeness, and completed_at are written on the terminal transition.
Reinterpretation creates new rows, never mutates#
Reinterpretation always creates a new interpretation and new observations. The prior interpretation gets archived_at marked but its observations remain queryable in observation history. The reducer chooses between competing observations using source_priority, specificity_score, and observed_at; corrections (priority 1000) always win.
Quality signals#
unknown_field_count flags schema drift, sustained spikes mean the schema is missing real-world fields and should be evolved via update_schema_incremental. extraction_completeness (complete/partial/unknown) is set by the extractor at run end. confidence is advisory only, the reducer MUST NOT use it for merge decisions.
Invariants#
Every interpretation satisfies the following constraints:
MUST
- Carry a non-null source_id, interpretation_config, and user_id
- Capture model / extractor / prompt / schema version in interpretation_config at run start
- Be immutable in identifying fields after write, only status, timing, quality, and archived_at change
- Pass attribution policy enforcement before write
MUST NOT
- Be mutated in a way that retroactively changes which observations a row produced
- Be hard-deleted (use archived_at) outside of explicit user-initiated source deletion
- Be created without a corresponding sources row
- Be assumed deterministic for replay, only audit-log linkage to config is guaranteed
Related#
- Interpretations subsystem doc , Full schema, status lifecycle, quality signals
- Sources , Raw artifact every interpretation points back to
- Observations , Granular facts produced by completed interpretations
- MCP spec, reinterpret , reinterpret(source_id, interpretation_config?) tool
- Implementation , src/services/interpretation.ts, create / status transitions
Where to go next#
- All primitive record types , index of sources, interpretations, observations, relationships, and timeline events
- Architecture , how the primitives compose into Neotoma's deterministic state
- Terminology , canonical glossary of terms used across Neotoma docs