Agent instructions: Retrieval, provenance, and tasks

← Agent instructions

Retrieval

  • Query shape. retrieve_entity_by_identifier for concrete identifiers (names, emails, ids, exact titles); retrieve_entities scoped by entity_type with an explicit limit or time window for plural/category queries ("last N transactions", "recent tasks").
  • Guardrails. Start with small, targeted queries. Avoid broad scans unless necessary. Use retrieved facts when relevant; if bounded retrieval finds nothing, proceed normally without inventing memory-backed claims.
  • Publication-recency. For "recently published" questions, sort by publication timestamp (published_date / published_at) descending, not by observation recency. Use a sufficient page size and dedupe by entity_id.
  • Entity-type cardinality. For "how many entities per type" questions, answer from getStats / GET /statsentities_by_type first. list_entity_types reports schema field width, not row counts; never substitute one for the other.
  • Bounded completeness. For list/count answers from entity graphs, check likely equivalent containers/identifiers and relationship variants, dedupe by entity_id, and report the reconciled total (or note remaining ambiguity).

Provenance

  • Source provenance is required. Every entity carries traceable source data. For file-derived data, use the combined store path (entities + file_path or file_content+mime_type) and include source_file. For API or tool-sourced data, set data_source (tool, endpoint, date) and store the raw payload as api_response_data. FORBIDDEN: storing entities with no traceable source unless the data is purely user-stated in chat.
  • Three-layer analysis. When analyzing a named entity from source material, persist all three layers in the same turn: (1) the raw source artifact, (2) the named entity updated with sourced facts, (3) a synthesized note/report capturing derived conclusions. Link with REFERS_TO or EMBEDS.
  • Reuse pre-existing sources. If a raw source already exists in Neotoma, retrieve it and link the current conversation-derived entities to it in the same turn, do not rely on an earlier store remaining discoverable without a relationship.
  • Source content retrieval. Files stored via the combined path are downloadable at GET /sources/:id/content; observations carry source_id for linkage. UIs should expose this endpoint so users can inspect the original artifact.
  • Unstructured payload retention. User-provided files, paths, @-references, attachments, uploads, and pasted blobs MUST be persisted in the same turn via the unstructured path with the attachment recipe. Host-only copies (Desktop, Downloads, repo folders) are not sufficient retention.
  • Synthesized deliverables. Reviews, reports, plans, audits, comparative analyses, legal/competitive/market/technical research are stored as a structured entity (e.g. legal_research, competitive_analysis, technical_research, report) with title, subject, conclusion, key_findings, sources, caveats, and research_date. Do not respond with findings without storing them in the same turn.
  • Analysis durability. When asked for analysis or a briefing, do not rely only on chat message rows, persist a structured note/report/research entity rich enough to reconstruct the answer, then link it to the analyzed entity and source.
  • Agent-authored deliverables. When the agent creates or materially edits a markdown, text, JSON, CSV, or similar file that is the substantive deliverable, store the file via the combined path, persist a structured entity describing it, and link the file asset, deliverable entity, and originating message. Repo-only or working-tree copies are not durable.
  • Session-derived artifacts. Any entity created from the current conversation in a separate store call MUST be linked back via REFERS_TO in the same turn (from the prompting user message or from the new entity to the conversation). Multi-file loops must not end the turn until every new entity is linked.
  • Per-turn linkage invariant. Every non-bookkeeping entity touched in a turn MUST carry a REFERS_TO edge from either the user message (creates/updates) or the assistant message (reply-cited).

Tasks and commitments

  • Base rule. Create a task when the user expresses intent, obligation, or future action ("I need to", "remind me", deadlines). Set due_date when available and link to the relevant person or entity.
  • Outreach and reply-drafting. When you produce or refine outbound text that commits the user to a future step with a named counterparty ("I'll reach out when…", "I'll send X after Y", "I'll loop back once…"), create a task and link it to the counterparty contact via REFERS_TO. Reuse the contact after retrieval; create if missing. Closers without a concrete follow-up do not require a task.
  • Scheduling cues. When email, chat, screenshot, or pasted text implies arranging a future meeting or call ("pencil in", "another for [month]", "sync again", "catch up later"), create a task in the same extraction/store turn. Set due_date when a month or date is inferable; link the task to the relevant contact.