Skip to content

Research Agent

Thinklio Built-in Agent Specification Version 0.1 | March 2026


1. Purpose and Problem Statement

The Research Agent is a producer agent. Its job is to find, retrieve, and structure source material — it does not write, summarise for end users, or make editorial judgements. Everything it produces is intended for consumption by another agent (typically a Writer Agent) or stored as a reference set attached to a Task, Item, or Note in the Thinklio data model.

The problem it solves is the gap between "I need to write something well-sourced" and "I have a curated, structured list of references ready to use." Without an agent handling this step, users either skip sourcing entirely, do it manually and inconsistently, or rely on a writer agent to retrieve sources inline — which produces poor results because retrieval and synthesis are distinct cognitive tasks that benefit from separation.

The Research Agent cleanly owns the retrieval step, hands off a structured output, and can be scheduled to keep that output fresh over time.


2. Relationship to Other Agents

The Research Agent sits at the top of a common pipeline:

Research Agent  →  Writer Agent  →  Checking Agent  →  Output

It can also be invoked standalone by a user who simply wants a source list, without any downstream writing task.

Checking Agent (to be specified separately) Before this pipeline is considered complete, a Checking Agent specification is needed. Its responsibilities will include:

  • Fact-checking claims in writer output against the source list produced by the Research Agent
  • Verifying that cited references actually support the claims made
  • Confirming the correct voice, tone, and style has been used
  • Proofreading for grammar, consistency, and formatting
  • Flagging hallucinated or misrepresented citations

This is a critical quality gate. It should be specified alongside the Writer Agent so the three form a coherent set.


3. Invocation Modes

The Research Agent can be invoked in two ways:

Programmatic (agent-to-agent) A calling agent — typically a coordinator or orchestrator — passes parameters directly. The Research Agent runs, returns its structured output, and the calling agent decides what to do with it. No UI is shown to the user unless the coordinator chooses to surface progress.

Standalone (user-initiated) A user opens the Research Agent directly from the agent library or from within a Task/Item/Note context. They interact with a simple UI to configure the run, review results, and optionally save the source list to the data model.


4. Configuration

Configuration operates at two levels: admin (hard limits set per workspace or deployment) and run-time (parameters set by the user or calling agent, constrained by admin limits).

4.1 Admin Configuration

These settings are configured by a workspace admin and cannot be overridden at run time.

Setting Description Example Default
Max references Hard ceiling on references returned in any single run 50
Allowed source types Which of academic / general / news can be used in this workspace All
Allowed APIs Which backend APIs are enabled (Tavily, Semantic Scholar, CrossRef, etc.) All
Max scheduled agents How many concurrent scheduled Research Agent instances a workspace can run 10
Min update interval Shortest permitted re-run frequency for scheduled instances 24 hours
Detail level ceiling Whether full-document extraction is permitted (storage and cost implications) All levels

4.2 Run-time Parameters

These are set by the user in the UI or passed by a calling agent. All values are clamped to admin limits.

Parameter Type Description
prompt string The research question or topic. Free text.
keywords string[] Optional explicit keywords to anchor retrieval. If omitted, extracted from prompt.
source_type enum academic, general, or news. Drives API selection.
num_references integer Target number of references to return. Clamped to admin max.
detail_level enum citation_summary, citation_extract, or facts_list. See section 6.
language string BCP 47 language tag. Defaults to workspace default.
date_from date Restrict sources to those published after this date. Optional.
date_to date Restrict sources to those published before this date. Optional.
update_frequency duration If set, schedules recurring re-runs. Null means one-shot.
save_to reference Optional link to a Task, Item, Note, or Organisation record to attach results to.

5. API Routing

The source_type parameter determines which backend APIs are used. Multiple APIs may be queried and results merged and deduplicated.

Source Type Primary APIs Notes
academic Semantic Scholar, CrossRef Prioritises peer-reviewed material. CrossRef provides DOI resolution and citation metadata. Semantic Scholar adds semantic search and citation graph data.
general Tavily Broad web search. Suitable for topic overviews, industry context, general reference.
news Tavily (news mode) Filtered to news sources with date-range awareness. Useful for current events context.

A run can only have one source_type. If a coordinator agent needs both academic and general sources, it should invoke two Research Agent instances and merge the outputs.

Future APIs to consider: PubMed (clinical/biomedical), arXiv (preprints), CORE (open access aggregator).


6. Detail Levels

The detail_level parameter controls how much content is extracted per source, and therefore what a downstream agent has to work with.

citation_summary

Returns the citation metadata and a short abstract or AI-generated summary (2–4 sentences). Suitable when the writer agent needs to know what sources exist and their general relevance, but will do its own synthesis.

Output per source: - Title, authors, publication, date, DOI/URL - Abstract or short summary - Relevance score (internal, based on match to prompt/keywords)

citation_extract

Returns citation metadata plus a substantial extract — either the full abstract and key sections if available, or a longer AI-generated summary sufficient to write from without accessing the original. Suitable when the writer agent needs substantive content to draw on.

Output per source: - Everything in citation_summary - Extended extract or full abstract - Key claims or findings (bulleted) - Methodology note where relevant (academic sources)

facts_list

Returns citation metadata plus a structured list of discrete facts or statements extracted from the source. Designed for cases where the downstream agent (writer or otherwise) needs to treat the Research Agent output as a complete working source — no further retrieval required.

Output per source: - Everything in citation_summary - Numbered list of factual statements, each traceable to the source - Each statement tagged with confidence level (high / medium / inferred)


7. Output Structure

The Research Agent always returns a structured source list. This is a typed output, not prose.

SourceList
├── run_id              UUID
├── prompt              string
├── keywords            string[]
├── source_type         enum
├── detail_level        enum
├── generated_at        timestamp
├── next_run_at         timestamp | null
└── sources[]
    ├── source_id       UUID
    ├── title           string
    ├── authors         string[]
    ├── publication     string
    ├── date            date
    ├── doi             string | null
    ├── url             string
    ├── relevance_score float (0–1)
    ├── summary         string               (all levels)
    ├── extract         string | null        (citation_extract, facts_list)
    └── facts[]         Fact[] | null        (facts_list only)
        ├── statement   string
        └── confidence  enum (high|medium|inferred)

This output can be: - Returned inline to a calling agent - Stored as a Note attached to a Task, Item, Organisation, or Person record - Displayed in the UI for user review and manual saving - Versioned if the agent is scheduled (each run produces a new snapshot)


8. Scheduled Runs

When update_frequency is set, the Research Agent becomes a persistent scheduled process. This has implications for the data model.

  • A Task is created in the Thinklio data model representing the scheduled research job
  • Each run appends a new result snapshot; previous snapshots are retained (configurable retention window)
  • The Task status reflects the last run: completed, failed, running, scheduled
  • A calling agent or user can subscribe to updates — when a new run completes, downstream agents or notifications can be triggered
  • The schedule can be paused, modified, or cancelled from the UI or by a calling agent

Scheduling is primarily useful for: - Monitoring a topic for new publications or news (e.g. a clinical guideline area, a competitor, an emerging technology) - Keeping a knowledge base section current without manual intervention - Powering a newsletter or digest agent that runs on a matching schedule


9. User Interface

The Research Agent UI is used in standalone mode. When invoked programmatically, no UI is shown unless the coordinator agent explicitly requests a progress view.

9.1 Configuration Screen

Presented when the agent is opened or when a user initiates a new run.

Fields shown: - Research question / prompt (multiline text, required) - Keywords (tag input, optional — auto-populated from prompt with ability to edit) - Source type (segmented control: Academic / General / News) - Number of references (slider or numeric input, capped at admin max, default shown) - Detail level (segmented control with short descriptions of each option) - Date range (optional from/to date pickers) - Language (dropdown, defaults to workspace language) - Schedule (toggle — off by default; if on, shows frequency selector and save-to picker) - Save results to (optional — search/select a Task, Item, Note, or record to attach to)

Behaviour: - Prompt is the only required field - Keywords, source type, and detail level have sensible defaults - Admin limits are silently enforced — the UI never shows options beyond what is permitted

9.2 Progress View

Shown while the agent is running.

  • Status message (e.g. "Querying Semantic Scholar…", "Processing 24 results…")
  • Progress indicator (indeterminate unless the API supports pagination status)
  • Cancel button

9.3 Results View

Shown when the run completes.

  • Summary bar: number of sources found, source type, detail level, run time
  • Source list — each source shown as a card:
  • Title (linked to DOI/URL)
  • Authors and publication
  • Date
  • Relevance score (visual indicator, e.g. coloured bar)
  • Summary or extract (expandable)
  • Facts list (expandable, facts_list mode only)
  • Filter/sort controls: by relevance, by date, by publication
  • Actions per source: remove from list, flag as preferred, open original
  • Bulk actions: save all to [record], export as markdown, copy citation list
  • Re-run button (repeats with same configuration)
  • Edit configuration button (returns to config screen with current settings pre-filled)

9.4 Scheduled Instance View

For runs where update_frequency is set, an additional view shows:

  • Next scheduled run time
  • Run history (date, number of sources, status)
  • New / changed sources since last run (diff view)
  • Pause / resume / cancel schedule controls

10. Data Model Integration

The Research Agent interacts with the Thinklio data model as follows:

Data Object Interaction
Task Created to represent a scheduled run. Status, timestamps, and run history stored here.
Note Source list output can be saved as a Note, attached to any record.
Item A research run can be initiated from or saved to an Item (e.g. a support or enquiry record needing a sourced response).
Organisation Research on an organisation (e.g. competitor analysis) can be attached directly.
Person Research relevant to a person record (e.g. a subject matter expert or author) can be linked.
Tags Source lists and scheduled tasks can be tagged for retrieval and filtering.

11. Use Cases

UC-1: Writer agent pipeline

A coordinator agent receives a content brief. It invokes the Research Agent with source_type: academic, detail_level: facts_list, and num_references: 15. The Research Agent returns a structured source list. The coordinator passes this to a Writer Agent, which produces a draft article drawing only on the facts provided. The Checking Agent then verifies claims against the source list before the draft is returned to the user.

UC-2: Standalone literature review

A user working on a clinical topic opens the Research Agent directly. They enter a prompt, select Academic, set detail level to citation_extract, and request 20 references. They review the results, remove three that are not relevant, and save the list as a Note attached to an existing Task in their workspace.

UC-3: Ongoing topic monitoring

A user sets up a scheduled Research Agent for a news topic, running daily. Each morning the agent queries for new articles, flags sources not seen in previous runs, and stores the updated list. A downstream notification agent summarises new findings and sends a digest.

UC-4: Organisational research

A coordinator agent is preparing a business development brief on a target organisation. It invokes the Research Agent with source_type: general, saves the output to the Organisation record, and passes it to a Writer Agent to produce a one-page overview.

UC-5: Evidence gathering for an Item

A user has an open enquiry (Item) requiring a sourced response. They initiate a Research Agent run from within the Item context, with save_to set to that Item. The source list is attached and used by a responding agent or the user directly when composing the reply.


12. Open Questions

  • Should a single run be able to mix source types (e.g. academic + news), or is that always handled by running two instances? The current model says two instances — this should be confirmed before implementation.
  • What is the versioning and retention policy for scheduled run snapshots? Storage cost scales with detail level and frequency.
  • Should the relevance score be surfaced to the user, or kept as an internal ordering mechanism only?
  • For facts_list mode, confidence tagging is AI-generated. Is this sufficient, or do we need a separate validation pass before downstream agents consume it?
  • CrossRef requires a polite pool email. How is this managed per workspace vs. per deployment?

Next: Writer Agent specification | See also: Checking Agent (to be specified)