<!--
Full-page Markdown export (rendered HTML → GFM).
Source: https://neotoma.io/fr/schemas/storage-layers
Generated: 2026-04-27T12:50:41.563Z
-->
# Storage layers
Neotoma uses a three-layer storage model so users can upload anything without losing data while still keeping the queryable state layer schema-compliant and deterministic. Every extraction touches all three layers: the original bytes go to raw\_text, schema-defined fields go to observation properties, and everything else goes to raw\_fragments, never silently dropped.
Spans the boundary between sources and observations. Ingestion partitions every extraction into these three layers before writing.
## Schema[#](#schema)
Three-layer extraction shape (TypeScript)
SQL / TS
Schema or pattern reference for this concept.
// Returned by extractAndValidate() { // Layer 1: raw\_text, immutable original bytes, lives on the source // (already stored on the sources row by the time extraction runs) // Layer 2: properties, schema-compliant only, deterministic, queryable properties: { schema\_version: "1.0", invoice\_number: "INV-001", amount: 1500.0, currency: "USD", date\_issued: "2024-01-15T00:00:00Z", vendor\_name: "Acme Corp" }, // Layer 3: extraction\_metadata, preservation layer extraction\_metadata: { unknown\_fields: { purchase\_order: "PO-789", internal\_cost\_center: "CC-456" }, warnings: \[ { type: "unknown\_field", field: "purchase\_order", message: "Field not defined for type 'invoice', preserved in extraction\_metadata" } \], extraction\_quality: { fields\_extracted\_count: 7, fields\_filtered\_count: 2, matched\_patterns: \["invoice\_number\_pattern", "amount\_due\_pattern"\] } } }
## Layer 1, raw\_text on the source[#](#layer-1)
The source's raw bytes are immutable and content-addressed (SHA-256 + user\_id). They never change after upload, never carry interpreted data, and are the artifact every reinterpretation reads from. Schema evolution does not require re-uploading, the same source can be reinterpreted under a newer schema version at any time.
◆
## Layer 2, observation.properties (schema-compliant)[#](#layer-2)
Only fields defined in the active schema\_definition land in properties. Each properties payload includes schema\_version. This is the layer queries hit (JSONB indexed), the layer entity extraction reads from, and the layer the reducer composes into snapshots. By construction it is deterministic: same input bytes + same schema\_version + same converters ⇒ same properties.
◆
## Layer 3, raw\_fragments (preservation)[#](#layer-3)
Anything extracted that doesn't match the active schema goes to raw\_fragments, unknown fields, original values that were converted, validation warnings, and extraction quality metrics. raw\_fragments is the substrate auto-enhancement and the schema expansion architecture analyse to suggest schema upgrades. Because nothing is dropped, schema evolution is non-destructive: re-adding a field surfaces its historical values via schema-projection filtering.
◆
## Partition rules[#](#partition-rules)
Fields named in the active schema → properties. Fields not named in the schema → raw\_fragments as unknown fields. Missing required fields produce warnings on the observation; observations are never rejected for missing optional fields, and never rejected for unknown fields. Required-field failure produces a warning, not a write rejection, the system always preserves what was extracted.
◆
## What entity / snapshot composition reads[#](#entity-extraction)
Entity extraction and snapshot computation read from properties only. raw\_fragments is explicitly excluded from snapshot composition, it is a holding area, not a query target. This is what keeps snapshots deterministic and schema-aligned even when extraction surfaces extra data.
## Invariants[#](#invariants)
MUST
- Preserve all extracted data, unknown fields go to raw\_fragments, never discarded
- Always create an observation, even on missing-required-field warnings
- Stamp schema\_version on every properties payload
- Pull entity extraction and snapshot composition fields from properties only
- Mirror converter inputs into raw\_fragments with reason converted\_value\_original
MUST NOT
- Reject observations for unknown fields
- Reject observations for missing optional fields
- Store unknown or non-schema fields in observation.properties
- Use raw\_fragments for entity extraction or snapshot composition
- Modify or guess field values during partitioning
## Related[#](#related)
- [Schema handling architecture](https://github.com/markmhendrickson/neotoma/blob/main/docs/architecture/schema_handling.md) , Three-layer model, partition logic, validation rules
- [Schema registry](/schemas/registry) , Where the active schema\_definition lives
- [Observations](/primitives/observations) , How properties on observations feed the reducer
- [Sources](/primitives/sources) , Layer 1, content-addressed raw bytes
- [Schema expansion](https://github.com/markmhendrickson/neotoma/blob/main/docs/architecture/schema_expansion.md) , How raw\_fragments seed automatic schema growth
## Where to go next[#](#more)
- [All schema concepts](/schemas) , registry, merge policies, storage layers, versioning
- [Primitive record types](/primitives) , sources, observations, snapshots, and the rest of Neotoma's atoms
- [Schema management workflows](/schema-management) , CLI commands for listing, validating, and evolving schemas