Research Agent¶
Thinklio Built-in Agent Specification Version 0.1 | March 2026
1. Purpose and Problem Statement¶
The Research Agent is a producer agent. Its job is to find, retrieve, and structure source material — it does not write, summarise for end users, or make editorial judgements. Everything it produces is intended for consumption by another agent (typically a Writer Agent) or stored as a reference set attached to a Task, Item, or Note in the Thinklio data model.
The problem it solves is the gap between "I need to write something well-sourced" and "I have a curated, structured list of references ready to use." Without an agent handling this step, users either skip sourcing entirely, do it manually and inconsistently, or rely on a writer agent to retrieve sources inline — which produces poor results because retrieval and synthesis are distinct cognitive tasks that benefit from separation.
The Research Agent cleanly owns the retrieval step, hands off a structured output, and can be scheduled to keep that output fresh over time.
2. Relationship to Other Agents¶
The Research Agent sits at the top of a common pipeline:
It can also be invoked standalone by a user who simply wants a source list, without any downstream writing task.
Checking Agent (to be specified separately) Before this pipeline is considered complete, a Checking Agent specification is needed. Its responsibilities will include:
- Fact-checking claims in writer output against the source list produced by the Research Agent
- Verifying that cited references actually support the claims made
- Confirming the correct voice, tone, and style has been used
- Proofreading for grammar, consistency, and formatting
- Flagging hallucinated or misrepresented citations
This is a critical quality gate. It should be specified alongside the Writer Agent so the three form a coherent set.
3. Invocation Modes¶
The Research Agent can be invoked in two ways:
Programmatic (agent-to-agent) A calling agent — typically a coordinator or orchestrator — passes parameters directly. The Research Agent runs, returns its structured output, and the calling agent decides what to do with it. No UI is shown to the user unless the coordinator chooses to surface progress.
Standalone (user-initiated) A user opens the Research Agent directly from the agent library or from within a Task/Item/Note context. They interact with a simple UI to configure the run, review results, and optionally save the source list to the data model.
4. Configuration¶
Configuration operates at two levels: admin (hard limits set per workspace or deployment) and run-time (parameters set by the user or calling agent, constrained by admin limits).
4.1 Admin Configuration¶
These settings are configured by a workspace admin and cannot be overridden at run time.
| Setting | Description | Example Default |
|---|---|---|
| Max references | Hard ceiling on references returned in any single run | 50 |
| Allowed source types | Which of academic / general / news can be used in this workspace | All |
| Allowed APIs | Which backend APIs are enabled (Tavily, Semantic Scholar, CrossRef, etc.) | All |
| Max scheduled agents | How many concurrent scheduled Research Agent instances a workspace can run | 10 |
| Min update interval | Shortest permitted re-run frequency for scheduled instances | 24 hours |
| Detail level ceiling | Whether full-document extraction is permitted (storage and cost implications) | All levels |
4.2 Run-time Parameters¶
These are set by the user in the UI or passed by a calling agent. All values are clamped to admin limits.
| Parameter | Type | Description |
|---|---|---|
prompt |
string | The research question or topic. Free text. |
keywords |
string[] | Optional explicit keywords to anchor retrieval. If omitted, extracted from prompt. |
source_type |
enum | academic, general, or news. Drives API selection. |
num_references |
integer | Target number of references to return. Clamped to admin max. |
detail_level |
enum | citation_summary, citation_extract, or facts_list. See section 6. |
language |
string | BCP 47 language tag. Defaults to workspace default. |
date_from |
date | Restrict sources to those published after this date. Optional. |
date_to |
date | Restrict sources to those published before this date. Optional. |
update_frequency |
duration | If set, schedules recurring re-runs. Null means one-shot. |
save_to |
reference | Optional link to a Task, Item, Note, or Organisation record to attach results to. |
5. API Routing¶
The source_type parameter determines which backend APIs are used. Multiple APIs may be queried and results merged and deduplicated.
| Source Type | Primary APIs | Notes |
|---|---|---|
academic |
Semantic Scholar, CrossRef | Prioritises peer-reviewed material. CrossRef provides DOI resolution and citation metadata. Semantic Scholar adds semantic search and citation graph data. |
general |
Tavily | Broad web search. Suitable for topic overviews, industry context, general reference. |
news |
Tavily (news mode) | Filtered to news sources with date-range awareness. Useful for current events context. |
A run can only have one source_type. If a coordinator agent needs both academic and general sources, it should invoke two Research Agent instances and merge the outputs.
Future APIs to consider: PubMed (clinical/biomedical), arXiv (preprints), CORE (open access aggregator).
6. Detail Levels¶
The detail_level parameter controls how much content is extracted per source, and therefore what a downstream agent has to work with.
citation_summary¶
Returns the citation metadata and a short abstract or AI-generated summary (2–4 sentences). Suitable when the writer agent needs to know what sources exist and their general relevance, but will do its own synthesis.
Output per source: - Title, authors, publication, date, DOI/URL - Abstract or short summary - Relevance score (internal, based on match to prompt/keywords)
citation_extract¶
Returns citation metadata plus a substantial extract — either the full abstract and key sections if available, or a longer AI-generated summary sufficient to write from without accessing the original. Suitable when the writer agent needs substantive content to draw on.
Output per source:
- Everything in citation_summary
- Extended extract or full abstract
- Key claims or findings (bulleted)
- Methodology note where relevant (academic sources)
facts_list¶
Returns citation metadata plus a structured list of discrete facts or statements extracted from the source. Designed for cases where the downstream agent (writer or otherwise) needs to treat the Research Agent output as a complete working source — no further retrieval required.
Output per source:
- Everything in citation_summary
- Numbered list of factual statements, each traceable to the source
- Each statement tagged with confidence level (high / medium / inferred)
7. Output Structure¶
The Research Agent always returns a structured source list. This is a typed output, not prose.
SourceList
├── run_id UUID
├── prompt string
├── keywords string[]
├── source_type enum
├── detail_level enum
├── generated_at timestamp
├── next_run_at timestamp | null
└── sources[]
├── source_id UUID
├── title string
├── authors string[]
├── publication string
├── date date
├── doi string | null
├── url string
├── relevance_score float (0–1)
├── summary string (all levels)
├── extract string | null (citation_extract, facts_list)
└── facts[] Fact[] | null (facts_list only)
├── statement string
└── confidence enum (high|medium|inferred)
This output can be: - Returned inline to a calling agent - Stored as a Note attached to a Task, Item, Organisation, or Person record - Displayed in the UI for user review and manual saving - Versioned if the agent is scheduled (each run produces a new snapshot)
8. Scheduled Runs¶
When update_frequency is set, the Research Agent becomes a persistent scheduled process. This has implications for the data model.
- A Task is created in the Thinklio data model representing the scheduled research job
- Each run appends a new result snapshot; previous snapshots are retained (configurable retention window)
- The Task status reflects the last run:
completed,failed,running,scheduled - A calling agent or user can subscribe to updates — when a new run completes, downstream agents or notifications can be triggered
- The schedule can be paused, modified, or cancelled from the UI or by a calling agent
Scheduling is primarily useful for: - Monitoring a topic for new publications or news (e.g. a clinical guideline area, a competitor, an emerging technology) - Keeping a knowledge base section current without manual intervention - Powering a newsletter or digest agent that runs on a matching schedule
9. User Interface¶
The Research Agent UI is used in standalone mode. When invoked programmatically, no UI is shown unless the coordinator agent explicitly requests a progress view.
9.1 Configuration Screen¶
Presented when the agent is opened or when a user initiates a new run.
Fields shown: - Research question / prompt (multiline text, required) - Keywords (tag input, optional — auto-populated from prompt with ability to edit) - Source type (segmented control: Academic / General / News) - Number of references (slider or numeric input, capped at admin max, default shown) - Detail level (segmented control with short descriptions of each option) - Date range (optional from/to date pickers) - Language (dropdown, defaults to workspace language) - Schedule (toggle — off by default; if on, shows frequency selector and save-to picker) - Save results to (optional — search/select a Task, Item, Note, or record to attach to)
Behaviour: - Prompt is the only required field - Keywords, source type, and detail level have sensible defaults - Admin limits are silently enforced — the UI never shows options beyond what is permitted
9.2 Progress View¶
Shown while the agent is running.
- Status message (e.g. "Querying Semantic Scholar…", "Processing 24 results…")
- Progress indicator (indeterminate unless the API supports pagination status)
- Cancel button
9.3 Results View¶
Shown when the run completes.
- Summary bar: number of sources found, source type, detail level, run time
- Source list — each source shown as a card:
- Title (linked to DOI/URL)
- Authors and publication
- Date
- Relevance score (visual indicator, e.g. coloured bar)
- Summary or extract (expandable)
- Facts list (expandable, facts_list mode only)
- Filter/sort controls: by relevance, by date, by publication
- Actions per source: remove from list, flag as preferred, open original
- Bulk actions: save all to [record], export as markdown, copy citation list
- Re-run button (repeats with same configuration)
- Edit configuration button (returns to config screen with current settings pre-filled)
9.4 Scheduled Instance View¶
For runs where update_frequency is set, an additional view shows:
- Next scheduled run time
- Run history (date, number of sources, status)
- New / changed sources since last run (diff view)
- Pause / resume / cancel schedule controls
10. Data Model Integration¶
The Research Agent interacts with the Thinklio data model as follows:
| Data Object | Interaction |
|---|---|
| Task | Created to represent a scheduled run. Status, timestamps, and run history stored here. |
| Note | Source list output can be saved as a Note, attached to any record. |
| Item | A research run can be initiated from or saved to an Item (e.g. a support or enquiry record needing a sourced response). |
| Organisation | Research on an organisation (e.g. competitor analysis) can be attached directly. |
| Person | Research relevant to a person record (e.g. a subject matter expert or author) can be linked. |
| Tags | Source lists and scheduled tasks can be tagged for retrieval and filtering. |
11. Use Cases¶
UC-1: Writer agent pipeline¶
A coordinator agent receives a content brief. It invokes the Research Agent with source_type: academic, detail_level: facts_list, and num_references: 15. The Research Agent returns a structured source list. The coordinator passes this to a Writer Agent, which produces a draft article drawing only on the facts provided. The Checking Agent then verifies claims against the source list before the draft is returned to the user.
UC-2: Standalone literature review¶
A user working on a clinical topic opens the Research Agent directly. They enter a prompt, select Academic, set detail level to citation_extract, and request 20 references. They review the results, remove three that are not relevant, and save the list as a Note attached to an existing Task in their workspace.
UC-3: Ongoing topic monitoring¶
A user sets up a scheduled Research Agent for a news topic, running daily. Each morning the agent queries for new articles, flags sources not seen in previous runs, and stores the updated list. A downstream notification agent summarises new findings and sends a digest.
UC-4: Organisational research¶
A coordinator agent is preparing a business development brief on a target organisation. It invokes the Research Agent with source_type: general, saves the output to the Organisation record, and passes it to a Writer Agent to produce a one-page overview.
UC-5: Evidence gathering for an Item¶
A user has an open enquiry (Item) requiring a sourced response. They initiate a Research Agent run from within the Item context, with save_to set to that Item. The source list is attached and used by a responding agent or the user directly when composing the reply.
12. Open Questions¶
- Should a single run be able to mix source types (e.g. academic + news), or is that always handled by running two instances? The current model says two instances — this should be confirmed before implementation.
- What is the versioning and retention policy for scheduled run snapshots? Storage cost scales with detail level and frequency.
- Should the relevance score be surfaced to the user, or kept as an internal ordering mechanism only?
- For
facts_listmode, confidence tagging is AI-generated. Is this sufficient, or do we need a separate validation pass before downstream agents consume it? - CrossRef requires a polite pool email. How is this managed per workspace vs. per deployment?
Next: Writer Agent specification | See also: Checking Agent (to be specified)