Display, Attribution, and Conventions | Neotoma Agent Instructions (Markdown)

# Agent instructions: Display, attribution, and conventions

[← Agent instructions](/agent-instructions)

## Communication and display

- **Silent storage default.** Do not mention storage, memory, or linking unless the user asked, except when a turn created, updated, or retrieved entities (then the display rule below applies). Do not narrate internal persistence ("Storing the conversation first"). When confirming, use memory-related language ("remember", "recall", "stored in memory").
- **Proactive storage.** Store when the user states relevant information; store first, then respond. Do not skip storage just because the user did not ask to save.
- **Repo canon is additive.** When asked to capture a durable principle or standing rule in a repo document, persist the durable fact in Neotoma in the same turn, then update the repo document if the repo is also a source of truth. "This belongs in the repo" is not a reason to skip Neotoma.
- **External tool store-first.** When you pull data from any external source (email, calendar, search, web fetch, scrape, API, file read), extract and store people, companies, locations, events, tasks, notifications, device status, and relationships in the same turn, before responding. Create tasks for action items.
- **Depth of capture.** List/summary responses are index rows, not the final payload. Hydrate via the matching detail endpoint and persist both layers under `api_response_data.list` and `api_response_data.detail`.
- *Scope cap:* hydrate up to ~10 items per turn unless the user asks for more.
- *Size cap:* if a detail body exceeds ~100 KB, persist via the unstructured path and link with `EMBEDS`.
- *Sensitivity:* for email/DM/document sources, persist the body but do not echo it into chat beyond what answering requires.
- *Idempotent upgrade:* hydrate via `correct` on an existing `entity_id` rather than creating a duplicate.
- *Tool-capability awareness:* only hydrate when a detail endpoint exists and is cheap. Otherwise persist what the list returned and set `capture_depth: "summary_only"` for later enrichment.
- **Embedded entity extraction.** Once a payload is in hand, scan for first-class entities (a subscription charge in a billing email → `transaction`; a meeting proposal → `event` + `task`; an order in a receipt → one `order_item` per line). Store each alongside the container in the same call, with its own `data_source`, `source_quote`, and normalized fields. Link container→embedded with `REFERS_TO`. Cap embedded extractions per container at ~20 items.
- **User identity.** When the user provides or implies their identity (name, email, "me"), store as contact or person in the same turn.
- **Extract-all.** Extract every distinct entity from the user, people, tasks, events, commitments, preferences, possessions, relationships, places. For container+asset, use `EMBEDS` when the asset is in Neotoma; otherwise store only a reference on the container.
- **Display rule.** When a turn creates, updates, or retrieves non-bookkeeping entities, render a section headed `🧠 Neotoma` with a horizontal rule above it.
- *Groups:* only non-empty `Created (N)`, `Updated (N)`, `Retrieved (N)`, and `Ambiguous (N)`; `Ambiguous` appears when the store response includes `warnings[]` with `code: "HEURISTIC_MERGE"`.
- *Store disambiguation:* entities created or observation-updated this turn (including external-tool ingestion) appear under `Created` or `Updated`, never under `Retrieved`.
- *Bullet format:* each bullet starts with one schema-typed emoji (✅ task, 👤 contact, 🏢 company, 📅 event, ✉️ email\_message, 🧾 receipt, 💸 transaction, 📝 note, 📍 place, 📎 file\_asset, 🔍 research, 💬 product\_feedback; default 🗂️), uses a short primary label, omits verbs already in the group header, and ends with the schema `entity_type` in inline-code parentheses.
- *Empty state:* before rendering an empty-state or `Suggestions` block, run a final capture pass, store any concrete candidate (synthesized note/report, implied task, authored artifact) and render it under `Created`/`Updated` instead of suggesting it.
- *Override scope:* the display rule overrides the silent-storage default and the no-emoji style for this disclosure only; do not narrate internal sequencing.
- **Weekly value surfacing.** When the conversation is the first of the day or the user has not interacted for several days, run a bounded retrieval (recent time window or `list_timeline_events` for the past 7 days) and surface a 1-2 sentence summary. Do not surface this more than once per day.

◆

## Attribution and agent identity

Every write to Neotoma is attributed per row and surfaces in the Inspector, `/stats`, and audit trails. Self-identification is a user-facing contract, see the [AAuth reference](/aauth) for the full attribution flow.

- **Preferred, AAuth.** Sign requests with AAuth (RFC 9421 HTTP Message Signatures plus an `aa-agent+jwt` agent token). Verified agents render with a `hardware` trust badge for ES256/EdDSA keys or `software` for other algorithms. Honoured on `/mcp`, direct write routes, and `/session`; the same identity threads into the write-path services regardless of transport.
- **Fallback, clientInfo.** When AAuth is unavailable, set `clientInfo.name` and `clientInfo.version` on the MCP `initialize` handshake to a recognisable identifier (e.g. `cursor-agent` + build, `claude-code` + release). Generic values like `mcp`, `client`, `unknown`, or `anonymous` are normalised to the `anonymous` tier.
- **Optional free-form label.** Scripts and CI jobs may include `agent_label` or `agent_id` on the payload. Copied to provenance but never used for authorization.
- **Do not spoof.** Copying another agent's `clientInfo`, reusing a public-key thumbprint, or inventing `agent_sub`/`agent_iss` pairs is a policy breach. Future releases will enforce per-tier ACLs.
- **Inspector contract.** The Inspector exposes an Agent column and filter across entities, observations, relationships, sources, timeline events, and interpretations; the Settings page summarises attribution coverage.
- **Preflight your session.** Before enabling writes from a new client or proxy, call `get_session_identity` (or `GET /session`, or `neotoma auth session`) and verify `attribution.tier` is `software`/`hardware` and `eligible_for_trusted_writes` is true.

◆

## Conventions

- **Transport precedence.** When both `neotoma` (prod) and `neotoma-dev` MCP servers are available, default to `neotoma`; use `neotoma-dev` only when the user requests dev or the task is clearly dev-only.
- **Avoid get\_authenticated\_user** unless the next action needs it.
- **Pre-check before storing.** Look for existing records by name/type before inserting; reuse the existing `entity_id` for relationships if found.
- **Include all fields from source** when storing.
- **User scope vs CLI.** MCP infers `user_id` from authentication; omit it on tool calls. Operators using `NEOTOMA_API_ONLY` can scope reads via `--user-id` or `NEOTOMA_USER_ID` in the CLI.
- **Combined-store remote files.** When Neotoma is reachable over HTTP but not on the same host as the attachment, use `file_content + mime_type` (and `original_filename` when known); `file_path` only works when the server can read that path.
- **Store resolution errors.** On `ERR_STORE_RESOLUTION_FAILED`, inspect `issues[].hint` for payload-shape guidance and surface the error envelope. Do not treat `entities_created=0` as success when `replayed` is not true and no entities matched.
- **Structured vs unstructured path.** Use the entities array for conversation-sourced data (omit `original_filename`). For tool- or file-derived data, prefer the combined path so the raw source is preserved. Use the unstructured path for raw file preservation; pass raw bytes, do not interpret.
- **CLI parity.** In CLI backup mode, `entities search` accepts a positional identifier or `--identifier`; structured `store` accepts `--entities`/`--file` and the alias `--json=<json>`.
- **CLI backup transport.** When MCP is available or when reconciling counts with MCP, run CLI reads/writes via API transport (`--api-only` or explicit `--base-url`); do not rely on the offline-default transport unless the user requests local data.
- **Summarization after MCP actions.** Follow the display rule exactly. Do not dump raw snapshot keys or internal ids into prose; do not repeat the same phrase as both thought and section heading.
- **Update check.** At session start, optionally call `npm_check_update` with `packageName: "neotoma"` and prompt the user to upgrade if an update is available.

◆

## Feedback reporting

When you hit friction using Neotoma, a failing tool call, an opaque error, a missing surface, or a doc gap, submit feedback proactively via `submit_feedback`. This is how fixes get prioritized.

- **Reporting modes.** Default is proactive. If the user runs `neotoma feedback mode consent`, ask once per submission; if `off`, only submit when the user explicitly asks.
- **PII redaction.** Redact emails, phone numbers, API tokens, UUIDs, and home-directory path fragments with `<LABEL:hash>` placeholders before submission. The server applies a backstop redaction pass and returns `redaction_preview` for audit.
- **metadata.environment.** MUST include at minimum `neotoma_version`, `client_name`, `os`; add `tool_name`, `invocation_shape`, `error_message`, and best-effort `error_class`/`hit_count` when applicable.
- **Persist a product\_feedback record.** Immediately after `submit_feedback` returns, store or update a Neotoma `product_feedback` entity with `feedback_id`, `access_token`, `kind`, title, `submitted_at`, `next_check_suggested_at`, and current status. Treat `access_token` as sensitive, keep it inside Neotoma, never in logs or user-visible prose.
- **Polling.** Poll via `get_feedback_status(access_token)`; respect `next_check_suggested_at` and do not poll more frequently. The token is single-purpose, do not share or log it beyond your own agent context.
- **Upgrade and verification.** When `upgrade_guidance` is present, treat it as actionable: run or propose `install_commands`, follow `verification_steps`, then re-attempt the original invocation. If `verification_request` is present, submit a `kind=fix_verification` follow-up with `parent_feedback_id` and `verification_outcome` by `verify_by`; silence is treated as `unable_to_verify`.

◆

## Errors and recovery

- **Store retry policy.** If `store_structured` fails, retry once with the same payload. If it fails again, surface the error to the user ("Storage failed: \[error message\]") before responding with any retrieved data. Do not silently skip storage and respond as if it succeeded.
- **SQLite corruption.** On `database disk image is malformed`, `SQLITE_CORRUPT`, `btreeInitPage`, or failed integrity checks, tell the user the local SQLite file is likely corrupted and suggest `neotoma storage recover-db` first, then `neotoma storage recover-db --recover` after the user stops Neotoma. Do not auto-swap the recovered DB without explicit approval.
- **getStats unreachable.** If `getStats` is unreachable when answering entity-type cardinality questions, state that explicitly rather than substituting an expensive per-type count or a schema-width value.

◆

## Onboarding

- **Discovery flow.** When Neotoma has little or no data (first run or empty state), follow the install workflow: (1) ask the user which data types matter most (project files, chat transcripts, meeting notes, journals, code context, email, financial docs, custom paths) and which mode they prefer (quick win, guided, power user); (2) discover high-value local files by shallow scan, ranked by entity density, temporal signals, recency, and relationship potential; (3) group results into domains and explain why each was selected; (4) confirm per-folder or per-file with a reconstruction preview; (5) ingest confirmed files and reconstruct the strongest timeline with provenance; (6) show the timeline immediately, not a file count; (7) offer one targeted follow-up plus 2-4 leveraged next actions; (8) demonstrate correction.
- **Output rule (Installation Aha).** After first-run ingestion, the first visible output MUST be a reconstructed timeline with provenance, not a file count. Format: *"\[Entity name\], Timeline reconstructed from \[N\] sources"* followed by dated events each with *"Source: \[filename\], \[location\]"*.
- **Chat transcript discovery.** Check for chat transcript exports (ChatGPT JSON, Slack exports, Claude history, meeting transcripts). They are the highest-signal ingestion source, they encode decisions, commitments, and project discussions with timestamps, ideal for timeline reconstruction.