<!--
  Full-page Markdown export (rendered HTML → GFM).
  Source: https://neotoma.io/primitives/sources
  Generated: 2026-04-27T12:50:25.427Z
-->
# Sources

A source is the raw, content-addressed artifact that every other primitive ultimately traces back to: a file you uploaded, a webhook payload, a structured agent write. Sources are deduplicated per user by SHA-256 content hash so the same bytes are never stored twice.

Source → Interpretation → Observation → Snapshot. Sources are the leftmost, immutable foundation of the three-layer truth model.

## Schema[#](#schema)

sources table (Postgres / hosted)

SQL / TS

Schema or pattern reference for this primitive.

CREATE TABLE sources ( id UUID PRIMARY KEY DEFAULT gen\_random\_uuid(), content\_hash TEXT NOT NULL, storage\_url TEXT NOT NULL, storage\_status TEXT NOT NULL DEFAULT 'uploaded', mime\_type TEXT NOT NULL, file\_name TEXT, byte\_size INTEGER NOT NULL, source\_type TEXT NOT NULL, source\_agent\_id TEXT, source\_metadata JSONB DEFAULT '{}', created\_at TIMESTAMPTZ DEFAULT NOW(), user\_id UUID NOT NULL, CONSTRAINT unique\_content\_per\_user UNIQUE(content\_hash, user\_id) );

| Field | Type | Purpose |
| --- | --- | --- |
| id | UUID | Stable source identifier referenced by every observation, interpretation, and timeline event derived from it |
| content\_hash | TEXT | SHA-256 of the raw bytes; combined with user\_id it is the deduplication key |
| storage\_url | TEXT | Where the bytes actually live (object storage, local disk, …) |
| storage\_status | TEXT | uploaded / pending / failed; ingestion uses this to gate downstream interpretation |
| mime\_type | TEXT | Used to choose the right interpreter and to render the source back to humans |
| byte\_size | INTEGER | Quota accounting, integrity sanity-check |
| source\_type | TEXT | Classifier (file, http, structured, …) used by the read path and Inspector |
| source\_agent\_id | TEXT | Optional attribution of the writing agent (AAuth tier, clientInfo) |
| source\_metadata | JSONB | Free-form provenance (URL, headers, capture tool, etc.) |
| user\_id | UUID | Owner; combined with content\_hash this enforces per-user dedupe and RLS |

## Per-user content addressing[#](#deduplication)

Two writes of identical bytes by the same user collapse to a single sources row via the unique (content\_hash, user\_id) constraint. Two different users uploading the same bytes get two distinct sources rows: deduplication is intentionally not cross-user so privacy boundaries remain intact and per-user storage accounting stays accurate.

◆

## Lifecycle[#](#lifecycle)

Sources are created by the ingest path, consumed by zero or more interpretations, and (if the user explicitly deletes) cascade-removed along with their interpretations, observations, and timeline events. Reinterpretation never touches the sources row, it creates a new interpretations row pointing at the same source.

◆

## Row-level security[#](#rls)

All downstream reads filter by source\_id ∈ caller's owned sources. Even where user\_id is denormalised onto downstream rows, the source-scoped filter is the security boundary. Only the MCP server writes sources via service\_role; clients never insert directly.

## Invariants[#](#invariants)

Every source satisfies the following constraints:

MUST

-   Carry a non-null content\_hash, byte\_size, mime\_type, source\_type, and user\_id
-   Be deduplicated per user, repeat ingest of identical bytes returns the existing row
-   Be referenced by every interpretation, observation, and timeline event derived from them (FK enforced)
-   Be deletable only via explicit user action, which cascades to all derived primitives

MUST NOT

-   Be mutated after upload, bytes and metadata are append-only
-   Be deduped across user boundaries, content addressing is per-user
-   Carry interpreted/extracted data, extraction lives on observations and interpretations
-   Be exposed via APIs that bypass the source-ownership filter

## Related[#](#related)

-   [Sources subsystem doc](https://github.com/markmhendrickson/neotoma/blob/main/docs/subsystems/sources.md) , Full source-and-interpretation lifecycle, MCP tools, quota model
-   [Interpretations](/primitives/interpretations) , Versioned extraction attempts that consume a source
-   [Observations](/primitives/observations) , Granular facts produced from a source via an interpretation
-   [Timeline events](/primitives/timeline-events) , Source-anchored temporal records derived from extracted dates
-   [Determinism doctrine](https://github.com/markmhendrickson/neotoma/blob/main/docs/architecture/determinism.md) , Where sources sit on the deterministic-vs-non-deterministic boundary

## Where to go next[#](#more)

-   [All primitive record types](/primitives) , index of sources, interpretations, observations, relationships, and timeline events
-   [Architecture](/architecture) , how the primitives compose into Neotoma's deterministic state
-   [Terminology](/terminology) , canonical glossary of terms used across Neotoma docs