Retrieval, Provenance, and Tasks | Neotoma Agent Instructions (Markdown)

# Agent instructions: Retrieval, provenance, and tasks

[← Agent instructions](/agent-instructions)

## Retrieval

- **Query shape.** `retrieve_entity_by_identifier` for concrete identifiers (names, emails, ids, exact titles); `retrieve_entities` scoped by entity\_type with an explicit limit or time window for plural/category queries ("last N transactions", "recent tasks").
- **Guardrails.** Start with small, targeted queries. Avoid broad scans unless necessary. Use retrieved facts when relevant; if bounded retrieval finds nothing, proceed normally without inventing memory-backed claims.
- **Publication-recency.** For "recently published" questions, sort by publication timestamp (`published_date` / `published_at`) descending, not by observation recency. Use a sufficient page size and dedupe by `entity_id`.
- **Entity-type cardinality.** For "how many entities per type" questions, answer from `getStats` / `GET /stats` → `entities_by_type` first. `list_entity_types` reports schema field width, not row counts; never substitute one for the other.
- **Bounded completeness.** For list/count answers from entity graphs, check likely equivalent containers/identifiers and relationship variants, dedupe by `entity_id`, and report the reconciled total (or note remaining ambiguity).

◆

## Provenance

- **Source provenance is required.** Every entity carries traceable source data. For file-derived data, use the combined store path (entities + `file_path` or `file_content+mime_type`) and include `source_file`. For API or tool-sourced data, set `data_source` (tool, endpoint, date) and store the raw payload as `api_response_data`. FORBIDDEN: storing entities with no traceable source unless the data is purely user-stated in chat.
- **Three-layer analysis.** When analyzing a named entity from source material, persist all three layers in the same turn: (1) the raw source artifact, (2) the named entity updated with sourced facts, (3) a synthesized note/report capturing derived conclusions. Link with `REFERS_TO` or `EMBEDS`.
- **Reuse pre-existing sources.** If a raw source already exists in Neotoma, retrieve it and link the current conversation-derived entities to it in the same turn, do not rely on an earlier store remaining discoverable without a relationship.
- **Source content retrieval.** Files stored via the combined path are downloadable at `GET /sources/:id/content`; observations carry `source_id` for linkage. UIs should expose this endpoint so users can inspect the original artifact.
- **Unstructured payload retention.** User-provided files, paths, @-references, attachments, uploads, and pasted blobs MUST be persisted in the same turn via the unstructured path with the attachment recipe. Host-only copies (Desktop, Downloads, repo folders) are not sufficient retention.
- **Synthesized deliverables.** Reviews, reports, plans, audits, comparative analyses, legal/competitive/market/technical research are stored as a structured entity (e.g. `legal_research`, `competitive_analysis`, `technical_research`, `report`) with title, subject, conclusion, key\_findings, sources, caveats, and research\_date. Do not respond with findings without storing them in the same turn.
- **Analysis durability.** When asked for analysis or a briefing, do not rely only on chat message rows, persist a structured note/report/research entity rich enough to reconstruct the answer, then link it to the analyzed entity and source.
- **Agent-authored deliverables.** When the agent creates or materially edits a markdown, text, JSON, CSV, or similar file that is the substantive deliverable, store the file via the combined path, persist a structured entity describing it, and link the file asset, deliverable entity, and originating message. Repo-only or working-tree copies are not durable.
- **Session-derived artifacts.** Any entity created from the current conversation in a separate store call MUST be linked back via `REFERS_TO` in the same turn (from the prompting user message or from the new entity to the conversation). Multi-file loops must not end the turn until every new entity is linked.
- **Per-turn linkage invariant.** Every non-bookkeeping entity touched in a turn MUST carry a `REFERS_TO` edge from either the user message (creates/updates) or the assistant message (reply-cited).

◆

## Tasks and commitments

- **Base rule.** Create a task when the user expresses intent, obligation, or future action ("I need to", "remind me", deadlines). Set `due_date` when available and link to the relevant person or entity.
- **Outreach and reply-drafting.** When you produce or refine outbound text that commits the user to a future step with a named counterparty ("I'll reach out when…", "I'll send X after Y", "I'll loop back once…"), create a task and link it to the counterparty contact via `REFERS_TO`. Reuse the contact after retrieval; create if missing. Closers without a concrete follow-up do not require a task.
- **Scheduling cues.** When email, chat, screenshot, or pasted text implies arranging a future meeting or call ("pencil in", "another for \[month\]", "sync again", "catch up later"), create a task in the same extraction/store turn. Set `due_date` when a month or date is inferable; link the task to the relevant contact.