Agent instructions: Retrieval, provenance, and tasks

← Agent instructions

Retrieval

Query shape. retrieve_entity_by_identifier for concrete identifiers (names, emails, ids, exact titles); retrieve_entities scoped by entity_type with an explicit limit or time window for plural/category queries ("last N transactions", "recent tasks").
Guardrails. Start with small, targeted queries. Avoid broad scans unless necessary. Use retrieved facts when relevant; if bounded retrieval finds nothing, proceed normally without inventing memory-backed claims.
Publication-recency. For "recently published" questions, sort by publication timestamp (published_date / published_at) descending, not by observation recency. Use a sufficient page size and dedupe by entity_id.
Entity-type cardinality. For "how many entities per type" questions, answer from getStats / GET /stats → entities_by_type first. list_entity_types reports schema field width, not row counts; never substitute one for the other.
Bounded completeness. For list/count answers from entity graphs, check likely equivalent containers/identifiers and relationship variants, dedupe by entity_id, and report the reconciled total (or note remaining ambiguity).

Provenance

Source provenance is required. Every entity carries traceable source data. For file-derived data, use the combined store path (entities + file_path or file_content+mime_type) and include source_file. For API or tool-sourced data, set data_source (tool, endpoint, date) and store the raw payload as api_response_data. FORBIDDEN: storing entities with no traceable source unless the data is purely user-stated in chat.
Three-layer analysis. When analyzing a named entity from source material, persist all three layers in the same turn: (1) the raw source artifact, (2) the named entity updated with sourced facts, (3) a synthesized note/report capturing derived conclusions. Link with REFERS_TO or EMBEDS.
Reuse pre-existing sources. If a raw source already exists in Neotoma, retrieve it and link the current conversation-derived entities to it in the same turn, do not rely on an earlier store remaining discoverable without a relationship.
Source content retrieval. Files stored via the combined path are downloadable at GET /sources/:id/content; observations carry source_id for linkage. UIs should expose this endpoint so users can inspect the original artifact.
Unstructured payload retention. User-provided files, paths, @-references, attachments, uploads, and pasted blobs MUST be persisted in the same turn via the unstructured path with the attachment recipe. Host-only copies (Desktop, Downloads, repo folders) are not sufficient retention.
Synthesized deliverables. Reviews, reports, plans, audits, comparative analyses, legal/competitive/market/technical research are stored as a structured entity (e.g. legal_research, competitive_analysis, technical_research, report) with title, subject, conclusion, key_findings, sources, caveats, and research_date. Do not respond with findings without storing them in the same turn.
Analysis durability. When asked for analysis or a briefing, do not rely only on chat message rows, persist a structured note/report/research entity rich enough to reconstruct the answer, then link it to the analyzed entity and source.
Agent-authored deliverables. When the agent creates or materially edits a markdown, text, JSON, CSV, or similar file that is the substantive deliverable, store the file via the combined path, persist a structured entity describing it, and link the file asset, deliverable entity, and originating message. Repo-only or working-tree copies are not durable.
Session-derived artifacts. Any entity created from the current conversation in a separate store call MUST be linked back via REFERS_TO in the same turn (from the prompting user message or from the new entity to the conversation). Multi-file loops must not end the turn until every new entity is linked.
Per-turn linkage invariant. Every non-bookkeeping entity touched in a turn MUST carry a REFERS_TO edge from either the user message (creates/updates) or the assistant message (reply-cited).

Tasks and commitments

Base rule. Create a task when the user expresses intent, obligation, or future action ("I need to", "remind me", deadlines). Set due_date when available and link to the relevant person or entity.
Outreach and reply-drafting. When you produce or refine outbound text that commits the user to a future step with a named counterparty ("I'll reach out when…", "I'll send X after Y", "I'll loop back once…"), create a task and link it to the counterparty contact via REFERS_TO. Reuse the contact after retrieval; create if missing. Closers without a concrete follow-up do not require a task.
Scheduling cues. When email, chat, screenshot, or pasted text implies arranging a future meeting or call ("pencil in", "another for [month]", "sync again", "catch up later"), create a task in the same extraction/store turn. Set due_date when a month or date is inferable; link the task to the relevant contact.