Neotoma

Agent instructions: Display, attribution, and conventions

← Agent instructions

Communication and display

  • Silent storage default. Do not mention storage, memory, or linking unless the user asked, except when a turn created, updated, or retrieved entities (then the display rule below applies). Do not narrate internal persistence ("Storing the conversation first"). When confirming, use memory-related language ("remember", "recall", "stored in memory").
  • Proactive storage. Store when the user states relevant information; store first, then respond. Do not skip storage just because the user did not ask to save.
  • Repo canon is additive. When asked to capture a durable principle or standing rule in a repo document, persist the durable fact in Neotoma in the same turn, then update the repo document if the repo is also a source of truth. "This belongs in the repo" is not a reason to skip Neotoma.
  • External tool store-first. When you pull data from any external source (email, calendar, search, web fetch, scrape, API, file read), extract and store people, companies, locations, events, tasks, notifications, device status, and relationships in the same turn, before responding. Create tasks for action items.
  • Depth of capture. List/summary responses are index rows, not the final payload. Hydrate via the matching detail endpoint and persist both layers under api_response_data.list and api_response_data.detail.
    • Scope cap: hydrate up to ~10 items per turn unless the user asks for more.
    • Size cap: if a detail body exceeds ~100 KB, persist via the unstructured path and link with EMBEDS.
    • Sensitivity: for email/DM/document sources, persist the body but do not echo it into chat beyond what answering requires.
    • Idempotent upgrade: hydrate via correct on an existing entity_id rather than creating a duplicate.
    • Tool-capability awareness: only hydrate when a detail endpoint exists and is cheap. Otherwise persist what the list returned and set capture_depth: "summary_only" for later enrichment.
  • Embedded entity extraction. Once a payload is in hand, scan for first-class entities (a subscription charge in a billing email → transaction; a meeting proposal → event + task; an order in a receipt → one order_item per line). Store each alongside the container in the same call, with its own data_source, source_quote, and normalized fields. Link container→embedded with REFERS_TO. Cap embedded extractions per container at ~20 items.
  • User identity. When the user provides or implies their identity (name, email, "me"), store as contact or person in the same turn.
  • Extract-all. Extract every distinct entity from the user, people, tasks, events, commitments, preferences, possessions, relationships, places. For container+asset, use EMBEDS when the asset is in Neotoma; otherwise store only a reference on the container.
  • Display rule. When a turn creates, updates, or retrieves non-bookkeeping entities, render a section headed 🧠 Neotoma with a horizontal rule above it.
    • Groups: only non-empty Created (N), Updated (N), Retrieved (N), and Ambiguous (N); Ambiguous appears when the store response includes warnings[] with code: "HEURISTIC_MERGE".
    • Store disambiguation: entities created or observation-updated this turn (including external-tool ingestion) appear under Created or Updated, never under Retrieved.
    • Bullet format: each bullet starts with one schema-typed emoji (✅ task, 👤 contact, 🏢 company, 📅 event, ✉️ email_message, 🧾 receipt, 💸 transaction, 📝 note, 📍 place, 📎 file_asset, 🔍 research, 💬 product_feedback; default 🗂️), uses a short primary label, omits verbs already in the group header, and ends with the schema entity_type in inline-code parentheses.
    • Empty state: before rendering an empty-state or Suggestions block, run a final capture pass, store any concrete candidate (synthesized note/report, implied task, authored artifact) and render it under Created/Updated instead of suggesting it.
    • Override scope: the display rule overrides the silent-storage default and the no-emoji style for this disclosure only; do not narrate internal sequencing.
  • Weekly value surfacing. When the conversation is the first of the day or the user has not interacted for several days, run a bounded retrieval (recent time window or list_timeline_events for the past 7 days) and surface a 1-2 sentence summary. Do not surface this more than once per day.

Attribution and agent identity

Every write to Neotoma is attributed per row and surfaces in the Inspector, /stats, and audit trails. Self-identification is a user-facing contract, see the AAuth reference for the full attribution flow.

  • Preferred, AAuth. Sign requests with AAuth (RFC 9421 HTTP Message Signatures plus an aa-agent+jwt agent token). Verified agents render with a hardware trust badge for ES256/EdDSA keys or software for other algorithms. Honoured on /mcp, direct write routes, and /session; the same identity threads into the write-path services regardless of transport.
  • Fallback, clientInfo. When AAuth is unavailable, set clientInfo.name and clientInfo.version on the MCP initialize handshake to a recognisable identifier (e.g. cursor-agent + build, claude-code + release). Generic values like mcp, client, unknown, or anonymous are normalised to the anonymous tier.
  • Optional free-form label. Scripts and CI jobs may include agent_label or agent_id on the payload. Copied to provenance but never used for authorization.
  • Do not spoof. Copying another agent's clientInfo, reusing a public-key thumbprint, or inventing agent_sub/agent_iss pairs is a policy breach. Future releases will enforce per-tier ACLs.
  • Inspector contract. The Inspector exposes an Agent column and filter across entities, observations, relationships, sources, timeline events, and interpretations; the Settings page summarises attribution coverage.
  • Preflight your session. Before enabling writes from a new client or proxy, call get_session_identity (or GET /session, or neotoma auth session) and verify attribution.tier is software/hardware and eligible_for_trusted_writes is true.

Conventions

  • Transport precedence. When both neotoma (prod) and neotoma-dev MCP servers are available, default to neotoma; use neotoma-dev only when the user requests dev or the task is clearly dev-only.
  • Avoid get_authenticated_user unless the next action needs it.
  • Pre-check before storing. Look for existing records by name/type before inserting; reuse the existing entity_id for relationships if found.
  • Include all fields from source when storing.
  • User scope vs CLI. MCP infers user_id from authentication; omit it on tool calls. Operators using NEOTOMA_API_ONLY can scope reads via --user-id or NEOTOMA_USER_ID in the CLI.
  • Combined-store remote files. When Neotoma is reachable over HTTP but not on the same host as the attachment, use file_content + mime_type (and original_filename when known); file_path only works when the server can read that path.
  • Store resolution errors. On ERR_STORE_RESOLUTION_FAILED, inspect issues[].hint for payload-shape guidance and surface the error envelope. Do not treat entities_created=0 as success when replayed is not true and no entities matched.
  • Structured vs unstructured path. Use the entities array for conversation-sourced data (omit original_filename). For tool- or file-derived data, prefer the combined path so the raw source is preserved. Use the unstructured path for raw file preservation; pass raw bytes, do not interpret.
  • CLI parity. In CLI backup mode, entities search accepts a positional identifier or --identifier; structured store accepts --entities/--file and the alias --json=<json>.
  • CLI backup transport. When MCP is available or when reconciling counts with MCP, run CLI reads/writes via API transport (--api-only or explicit --base-url); do not rely on the offline-default transport unless the user requests local data.
  • Summarization after MCP actions. Follow the display rule exactly. Do not dump raw snapshot keys or internal ids into prose; do not repeat the same phrase as both thought and section heading.
  • Update check. At session start, optionally call npm_check_update with packageName: "neotoma" and prompt the user to upgrade if an update is available.

Feedback reporting

When you hit friction using Neotoma, a failing tool call, an opaque error, a missing surface, or a doc gap, submit feedback proactively via submit_feedback. This is how fixes get prioritized.

  • Reporting modes. Default is proactive. If the user runs neotoma feedback mode consent, ask once per submission; if off, only submit when the user explicitly asks.
  • PII redaction. Redact emails, phone numbers, API tokens, UUIDs, and home-directory path fragments with <LABEL:hash> placeholders before submission. The server applies a backstop redaction pass and returns redaction_preview for audit.
  • metadata.environment. MUST include at minimum neotoma_version, client_name, os; add tool_name, invocation_shape, error_message, and best-effort error_class/hit_count when applicable.
  • Persist a product_feedback record. Immediately after submit_feedback returns, store or update a Neotoma product_feedback entity with feedback_id, access_token, kind, title, submitted_at, next_check_suggested_at, and current status. Treat access_token as sensitive, keep it inside Neotoma, never in logs or user-visible prose.
  • Polling. Poll via get_feedback_status(access_token); respect next_check_suggested_at and do not poll more frequently. The token is single-purpose, do not share or log it beyond your own agent context.
  • Upgrade and verification. When upgrade_guidance is present, treat it as actionable: run or propose install_commands, follow verification_steps, then re-attempt the original invocation. If verification_request is present, submit a kind=fix_verification follow-up with parent_feedback_id and verification_outcome by verify_by; silence is treated as unable_to_verify.

Errors and recovery

  • Store retry policy. If store_structured fails, retry once with the same payload. If it fails again, surface the error to the user ("Storage failed: [error message]") before responding with any retrieved data. Do not silently skip storage and respond as if it succeeded.
  • SQLite corruption. On database disk image is malformed, SQLITE_CORRUPT, btreeInitPage, or failed integrity checks, tell the user the local SQLite file is likely corrupted and suggest neotoma storage recover-db first, then neotoma storage recover-db --recover after the user stops Neotoma. Do not auto-swap the recovered DB without explicit approval.
  • getStats unreachable. If getStats is unreachable when answering entity-type cardinality questions, state that explicitly rather than substituting an expensive per-type count or a schema-width value.

Onboarding

  • Discovery flow. When Neotoma has little or no data (first run or empty state), follow the install workflow: (1) ask the user which data types matter most (project files, chat transcripts, meeting notes, journals, code context, email, financial docs, custom paths) and which mode they prefer (quick win, guided, power user); (2) discover high-value local files by shallow scan, ranked by entity density, temporal signals, recency, and relationship potential; (3) group results into domains and explain why each was selected; (4) confirm per-folder or per-file with a reconstruction preview; (5) ingest confirmed files and reconstruct the strongest timeline with provenance; (6) show the timeline immediately, not a file count; (7) offer one targeted follow-up plus 2-4 leveraged next actions; (8) demonstrate correction.
  • Output rule (Installation Aha). After first-run ingestion, the first visible output MUST be a reconstructed timeline with provenance, not a file count. Format: "[Entity name], Timeline reconstructed from [N] sources" followed by dated events each with "Source: [filename], [location]".
  • Chat transcript discovery. Check for chat transcript exports (ChatGPT JSON, Slack exports, Claude history, meeting transcripts). They are the highest-signal ingestion source, they encode decisions, commitments, and project discussions with timestamps, ideal for timeline reconstruction.