Schema registry
The schema registry is the table that holds every versioned entity schema in Neotoma. It is config-driven by design: domain-specific schemas (contact, invoice, task, …) live as data, not code, so schemas can evolve at runtime without redeploys. Every schema row pairs a field-by-field schema_definition with a reducer_config that controls how observations merge into the entity snapshot.
Read on every observation write, every snapshot recomputation, and every schema-projection filter. Sits between the storage layer (sources/observations) and the deterministic reducer.
Schema#
schema_registry table (Postgres / hosted)
| Field | Type | Purpose |
|---|---|---|
entity_type | TEXT | Domain type label (contact, invoice, task, conversation_message, …) |
schema_version | TEXT | Semantic version (1.0.0, 1.1.0, 2.0.0); unique per entity_type |
schema_definition | JSONB | Field map: name → { type, required?, validator?, converters?, description? } |
reducer_config | JSONB | Per-field merge_policies the reducer uses to compose observations into snapshots |
active | BOOLEAN | Exactly one active row per entity_type (per scope) at a time; new writes pick this up immediately |
scope | TEXT | global (shared) or user (per-user override that wins when caller's user_id matches) |
user_id | UUID | Set when scope = 'user'; lets one tenant evolve their schema without affecting others |
Schema definition format#
schema_definition is a JSONB object with a single fields key. Each field carries a type (string | number | date | boolean | array | object), an optional required flag, an optional validator function name, an optional preserveCase flag for canonicalization, an optional description, and an optional converters list for deterministic type coercion (e.g. nanosecond timestamp → ISO 8601 date). The shape is intentionally narrow, schemas describe data, they do not run code.
Field type converters#
Converters reconcile real-world data (numeric timestamps, stringified booleans, nested arrays) with the declared field type without losing the original value. A converter is one of a small registry of named, deterministic functions (timestamp_nanos_to_iso, string_to_number, …). Successful conversions land in observations under the schema-typed field; the original value is mirrored into raw_fragments with reason converted_value_original so reprocessing remains lossless.
Global vs user-specific schemas#
Schemas resolve user-specific first, global second. A user-specific schema row (scope = 'user', user_id = caller) lets a tenant pilot new fields or stricter validators without affecting other users. When a user-specific pattern proves out across many users with consistent types, it can be promoted to a global schema via reconciliation.
Auto-enhancement from raw_fragments#
Unknown fields encountered at extraction time go to raw_fragments. With auto-enhancement enabled, the system analyses fragment frequency, type consistency, and source diversity, then promotes high-confidence fields (≥95% type consistency, ≥2 sources, ≥3 occurrences by default) into the active schema as a minor version bump. Field blacklists, name validators, and idempotency guards keep noise out.
Service interface#
register() inserts a new (entity_type, schema_version) row. activate() flips active = true on one version and false on the others within the same scope. updateSchemaIncremental() is the safe upgrade path: pass fields_to_add and/or fields_to_remove, optionally bump the version, optionally migrate historical raw_fragments. loadActiveSchema() is the read used by ingestion and the reducer.
Invariants#
MUST
- Carry a non-null entity_type, schema_version, schema_definition, and reducer_config
- Have at most one active row per (entity_type, scope, user_id) combination
- Be referenced by every observation via observation.schema_version (immutable on observations)
- Be the single source of truth for both validation and reducer merge policies
- Validate every converter against CONVERTER_REGISTRY before registration
MUST NOT
- Mutate schema_definition or reducer_config in place, register a new schema_version instead
- Allow more than one active version per entity_type within the same scope
- Carry merge logic (that lives in the reducer), only declarative merge_policies
- Be edited from outside the schema registry service
Related#
- Schema registry doc , Full table definition, definition format, service interface
- Merge policies , How reducer_config drives deterministic snapshot merging
- Storage layers , Three-layer storage: raw_text, properties, raw_fragments
- Versioning & evolution , Semver rules, breaking changes, schema snapshot exports
- Schema definitions (code) , src/services/schema_definitions.ts, current source of truth in code
Where to go next#
- All schema concepts , registry, merge policies, storage layers, versioning
- Primitive record types , sources, observations, snapshots, and the rest of Neotoma's atoms
- Schema management workflows , CLI commands for listing, validating, and evolving schemas