Storage Layers | Schemas (Markdown)

<!--
  Full-page Markdown export (rendered HTML → GFM).
  Source: https://neotoma.io/ru/schemas/storage-layers
  Generated: 2026-04-27T12:50:41.786Z
-->
# Storage layers

Neotoma uses a three-layer storage model so users can upload anything without losing data while still keeping the queryable state layer schema-compliant and deterministic. Every extraction touches all three layers: the original bytes go to raw\_text, schema-defined fields go to observation properties, and everything else goes to raw\_fragments, never silently dropped.

Spans the boundary between sources and observations. Ingestion partitions every extraction into these three layers before writing.

## Schema[#](#schema)

Three-layer extraction shape (TypeScript)

SQL / TS

Schema or pattern reference for this concept.

// Returned by extractAndValidate() { // Layer 1: raw\_text, immutable original bytes, lives on the source // (already stored on the sources row by the time extraction runs) // Layer 2: properties, schema-compliant only, deterministic, queryable properties: { schema\_version: "1.0", invoice\_number: "INV-001", amount: 1500.0, currency: "USD", date\_issued: "2024-01-15T00:00:00Z", vendor\_name: "Acme Corp" }, // Layer 3: extraction\_metadata, preservation layer extraction\_metadata: { unknown\_fields: { purchase\_order: "PO-789", internal\_cost\_center: "CC-456" }, warnings: \[ { type: "unknown\_field", field: "purchase\_order", message: "Field not defined for type 'invoice', preserved in extraction\_metadata" } \], extraction\_quality: { fields\_extracted\_count: 7, fields\_filtered\_count: 2, matched\_patterns: \["invoice\_number\_pattern", "amount\_due\_pattern"\] } } }

## Layer 1, raw\_text on the source[#](#layer-1)

The source's raw bytes are immutable and content-addressed (SHA-256 + user\_id). They never change after upload, never carry interpreted data, and are the artifact every reinterpretation reads from. Schema evolution does not require re-uploading, the same source can be reinterpreted under a newer schema version at any time.

◆

## Layer 2, observation.properties (schema-compliant)[#](#layer-2)

Only fields defined in the active schema\_definition land in properties. Each properties payload includes schema\_version. This is the layer queries hit (JSONB indexed), the layer entity extraction reads from, and the layer the reducer composes into snapshots. By construction it is deterministic: same input bytes + same schema\_version + same converters ⇒ same properties.

◆

## Layer 3, raw\_fragments (preservation)[#](#layer-3)

Anything extracted that doesn't match the active schema goes to raw\_fragments, unknown fields, original values that were converted, validation warnings, and extraction quality metrics. raw\_fragments is the substrate auto-enhancement and the schema expansion architecture analyse to suggest schema upgrades. Because nothing is dropped, schema evolution is non-destructive: re-adding a field surfaces its historical values via schema-projection filtering.

◆

## Partition rules[#](#partition-rules)

Fields named in the active schema → properties. Fields not named in the schema → raw\_fragments as unknown fields. Missing required fields produce warnings on the observation; observations are never rejected for missing optional fields, and never rejected for unknown fields. Required-field failure produces a warning, not a write rejection, the system always preserves what was extracted.

◆

## What entity / snapshot composition reads[#](#entity-extraction)

Entity extraction and snapshot computation read from properties only. raw\_fragments is explicitly excluded from snapshot composition, it is a holding area, not a query target. This is what keeps snapshots deterministic and schema-aligned even when extraction surfaces extra data.

## Invariants[#](#invariants)

MUST

-   Preserve all extracted data, unknown fields go to raw\_fragments, never discarded
-   Always create an observation, even on missing-required-field warnings
-   Stamp schema\_version on every properties payload
-   Pull entity extraction and snapshot composition fields from properties only
-   Mirror converter inputs into raw\_fragments with reason converted\_value\_original

MUST NOT

-   Reject observations for unknown fields
-   Reject observations for missing optional fields
-   Store unknown or non-schema fields in observation.properties
-   Use raw\_fragments for entity extraction or snapshot composition
-   Modify or guess field values during partitioning

## Related[#](#related)

-   [Schema handling architecture](https://github.com/markmhendrickson/neotoma/blob/main/docs/architecture/schema_handling.md) , Three-layer model, partition logic, validation rules
-   [Schema registry](/schemas/registry) , Where the active schema\_definition lives
-   [Observations](/primitives/observations) , How properties on observations feed the reducer
-   [Sources](/primitives/sources) , Layer 1, content-addressed raw bytes
-   [Schema expansion](https://github.com/markmhendrickson/neotoma/blob/main/docs/architecture/schema_expansion.md) , How raw\_fragments seed automatic schema growth

## Where to go next[#](#more)

-   [All schema concepts](/schemas) , registry, merge policies, storage layers, versioning
-   [Primitive record types](/primitives) , sources, observations, snapshots, and the rest of Neotoma's atoms
-   [Schema management workflows](/schema-management) , CLI commands for listing, validating, and evolving schemas