Agents Catalogue & Platform Services¶
Document 08 | Version: 1.0.0 | Date: April 2026 | Status: Active
This document describes the starter pack of built-in agents, the integration and tooling infrastructure that powers them, the predictive planning system that helps agents learn from past executions, and the platform services layer that manages LLM models, external service credentials, credit-based billing, and administration.
The agent catalogue defines 23 pre-configured templates organised into four groups (core specialists, coordinators, data and knowledge agents, and organisational specialists). The implementation logistics section maps each agent to its external integration dependencies and establishes a realistic build sequence. The predictive planning system captures execution outcomes, builds Bayesian and (later) ML-based scoring models, and feeds historical performance data back to agents at decision time. The platform services layer manages the external service registry, LLM model selection, per-account API key overrides (BYOK), and a USD-denominated credit ledger.
For the agent architecture, extensibility model, and delegation mechanics, see 03 Agent Architecture & Extensibility. For entity definitions (AgentTemplate, Tool, AgentTool), see 04 Data Model. For credential storage via the Convex-era secrets vault, see 07 Security & Governance. For the durable execution harness that runs every agent interaction, see 06 Events, Channels & Messaging.
Table of Contents
- Starter Agent Catalogue
- Agent Implementation Logistics
- Predictive Planning & Execution Learning
- Platform Services & LLM Configuration
- Credit-Based Billing
- Platform Administration
- Implementation Phases
- Revision History
1. Starter Agent Catalogue¶
The starter pack consists of 23 built-in platform agents available to all Thinklio customers. Each agent is a pre-configured template that customers can deploy immediately, customise, and compose into larger arrangements using Agent Studio.
The catalogue is organised into four groups:
- Group 1: Core Specialists -- standalone agents that do one thing well; the building blocks for coordination.
- Group 2: Coordinator Agents -- orchestrate specialists to handle broader, multi-step workflows.
- Group 3: Data & Knowledge Agents -- handle structured and unstructured information.
- Group 4: Organisational Specialists -- vertical agents suited to specific business functions.
Each entry defines the agent's capabilities, knowledge layer usage, capability level, and suggested settings. Where an agent is Google Workspace or Microsoft 365 compatible, tool assignments are noted as vendor-agnostic by default -- operators configure the actual integration at deployment time.
1.1 Group 1: Core Specialists¶
These agents are purpose-built for a single domain. They are useful standalone and form the reusable building blocks that coordinator agents delegate to.
Mail Agent¶
Tagline: Manages your inbox so you don't have to.
Reads, summarises, triages, and responds to email on behalf of a user or team. Handles routine correspondence autonomously (acknowledging receipt, sending standard responses) and prepares drafts for review on anything that requires a human decision. Works with any email provider via the configured integration.
Capabilities: Read inbox and retrieve messages by sender, subject, date range, or label. Summarise unread mail and flag items requiring action. Classify messages by urgency, topic, and required response type. Create draft replies and new emails (for review or direct send). Send emails with appropriate permission level. Apply labels, archive, or flag messages. Search mail history for prior correspondence. Detect and surface time-sensitive messages (deadlines, meeting requests, approvals).
Knowledge layers: User (personal communication preferences, tone, frequent contacts, recurring topics), Team (shared contacts, project context, standard response patterns), Account (approved communication templates, compliance restrictions on outbound mail content).
Settings:
capability_level: tools_only
tool_trust_required: low_risk_write (drafting and sending)
default_execution_mode: immediate
Permissions:
allow: read, summarise, label, archive, create_draft
require_approval: send
Suggested per-assignment restrictions:
Personal: Full access including send
Team (shared): Drafts only -- require human approval before send
Account-wide: Read and summarise only
Calendar Agent¶
Tagline: Finds time, books meetings, and keeps your schedule coherent.
Manages calendars across individuals and teams. Finds available time, schedules meetings with internal and external attendees, resolves conflicts, and sends invitations. At the read-only trust level it provides scheduling intelligence without making changes -- useful as a delegate within coordinators that need to check availability before committing.
Capabilities: Read calendar events across one or more calendars. Find free time within a given window, applying working-hours preferences. Detect and surface scheduling conflicts. Create, update, and cancel calendar events. Send and manage meeting invitations. Check attendee availability across a team. Suggest optimal meeting times based on preferences and patterns. Manage recurring events.
Knowledge layers: User (working hours, meeting preferences, blocked time patterns, preferred meeting lengths), Team (team members' availability patterns, shared calendars, recurring team rituals), Account (company holidays, meeting norms, blocked account-wide periods).
Settings:
capability_level: tools_only
tool_trust_required: low_risk_write (creating and updating events)
default_execution_mode: immediate
Permissions:
allow: read, find_free_time, check_conflicts
require_approval: create_event, cancel_event, send_invitation
Suggested per-assignment restrictions:
Personal: Full access including create and cancel
Team delegate: find_free_time and check_conflicts only
Read-only context: read and check_conflicts only
Task Agent¶
Tagline: Creates, tracks, and closes tasks without manual overhead.
Manages to-do lists, project tasks, and action items across whatever task management integration is configured. Creates tasks from chat, tracks status, sends reminders, and surfaces overdue or upcoming items. Commonly used as a delegate by coordinator agents that need to capture follow-up actions.
Capabilities: Create tasks with title, description, due date, assignee, and priority. Update task status, due dates, and ownership. Retrieve open, overdue, and upcoming tasks for a user or team. Organise tasks into projects or lists. Set and trigger reminders. Extract and create tasks from unstructured input (e.g. meeting notes, email threads). Produce daily or weekly task summaries. Flag tasks that are blocked or overdue.
Knowledge layers: User (personal workload patterns, preferred task organisation style, recurring task types), Team (shared projects, team members' active tasks, workload distribution), Account (project structure, task categorisation standards).
Settings:
capability_level: tools_only
tool_trust_required: low_risk_write
default_execution_mode: immediate
Permissions:
allow: read, create_task, update_task, set_reminder
require_approval: delete_task, reassign_task
Suggested per-assignment restrictions:
Personal: Full access
Team delegate: Create and update only; no delete
Read-only: Read and summarise only
Research Agent¶
Tagline: Searches, synthesises, and delivers structured briefings on any topic.
Performs multi-step web research on behalf of users and teams. Searches, reads, evaluates sources, and synthesises findings into structured briefings. For longer research tasks it operates in deferred mode, dispatching work and delivering results when ready. One of the most commonly used delegates in coordinator configurations.
Capabilities: Perform targeted web searches on a topic or question. Read and extract content from web pages and documents. Evaluate and compare sources. Synthesise findings into structured briefings (summary, key points, sources). Answer specific factual questions with cited sources. Monitor a topic across multiple sources (when used with the Monitor Agent). Save findings to team or user knowledge layer. Produce comparative analyses.
Knowledge layers: Agent (research methodology, source quality heuristics, output templates), User (topic depth preferences, preferred output format, saved research history), Team (prior research the team has commissioned, related project context).
Settings:
capability_level: workflow
tool_trust_required: read (web and documents only)
default_execution_mode: deferred (longer research tasks)
immediate (quick factual lookups)
Permissions:
allow: web_search, read_url, read_document, write_knowledge_fact
deny: send_email, create_event, external_write_tools
Suggested per-assignment restrictions:
Personal: Full access
Team delegate: Full access; knowledge writes scoped to team layer only
Restricted: Web search and read only; no knowledge writes
Writing Agent¶
Tagline: Drafts, edits, and refines written content in your voice.
Produces and improves written content across formats -- emails, reports, proposals, summaries, and long-form documents. Adapts tone and style to context and audience, and can learn individual and organisational voice from the knowledge layers. Frequently used as a delegate within coordinator agents that need to produce polished output.
Capabilities: Draft documents, emails, reports, proposals, and summaries from a brief or outline. Edit existing content for clarity, tone, and conciseness. Proofread for grammar, spelling, and consistency. Adapt writing style for audience (executive, technical, customer-facing). Reformat content. Maintain a consistent account voice using account knowledge. Transform unstructured notes or transcripts into polished documents. Produce multiple variants for comparison.
Knowledge layers: Agent (writing conventions, format templates, style guidance), User (personal writing style, preferred tone, frequently used phrases and structures), Account (brand voice, approved terminology, communication templates, style guide).
Settings:
capability_level: workflow
tool_trust_required: read (no external write tools by default)
default_execution_mode: immediate (short content)
deferred (long documents)
Permissions:
allow: create_document, read_document, web_search (for factual grounding)
deny: send_email, external_system_writes
Suggested per-assignment restrictions:
Personal: Full access
Team: Full access; save outputs to shared drive
Regulated context: Drafts only; no direct publish or send
Document Agent¶
Tagline: Reads, answers questions about, and extracts insight from documents.
Reads files (PDFs, Word documents, spreadsheets, slide decks) and answers questions about their content. Can summarise long documents, compare versions, extract structured data, and flag inconsistencies or compliance gaps. Distinct from the Writing Agent, which produces content -- the Document Agent primarily consumes and analyses it.
Capabilities: Read and summarise documents in any common format. Answer questions about a document's content. Extract specific data points, tables, or lists. Compare two or more documents and highlight differences. Flag sections that are inconsistent, ambiguous, or potentially non-compliant. Index documents into the knowledge layer for ongoing retrieval. Identify action items or commitments. Produce structured summaries (executive summary, key decisions, open questions).
Knowledge layers: User (documents the user has previously asked about, personal context for interpreting ambiguous content), Team (team document library context, related documents for cross-reference), Account (policy documents and reference material used to check compliance).
Settings:
capability_level: workflow
tool_trust_required: read
default_execution_mode: immediate (short docs)
deferred (large documents or batch processing)
Permissions:
allow: read_document, read_url, write_knowledge_fact
deny: delete_document, send_email, external_writes
Suggested per-assignment restrictions:
Personal: Full access
Team: Read and index only; no delete
Compliance context: Read only; knowledge writes require approval
Chat Agent¶
Tagline: A plain-language interface to everything the account knows.
The most direct interface in the catalogue -- a conversational window onto the full knowledge stack. Has no tools, no external integrations, and no delegation. What sets it apart from a generic LLM chat is context: it draws on all configured knowledge layers (user, team, and account, including indexed documents) to give answers grounded in this organisation's specifics. The UI is intentionally simple -- a plain chat window -- making it an appropriate first-touch interface for users who do not yet know which specialist agent they need.
Capabilities: Answer questions across any topic using model knowledge and all configured knowledge layers. Surface relevant account context -- policies, procedures, project history, indexed document content -- without the user needing to specify where to look. Help users think through problems, decisions, and ideas conversationally. Summarise, explain, or reframe information. Draft short-form content within the chat. Reason through multi-step problems step by step. Adapt communication style to user preferences. Suggest a more appropriate specialist agent when a request is better handled elsewhere.
Knowledge layers: User (active; communication preferences, conversational history, frequently asked topics, personal context), Team (active; project context, accumulated team knowledge, prior research and decisions), Account (active; policies, procedures, indexed documents, reference material).
Settings:
capability_level: tools_only
tool_trust_required: none (knowledge read only)
default_execution_mode: immediate
Permissions:
allow: model_inference, read_user_knowledge,
read_team_knowledge, read_account_knowledge
deny: all_external_tools, web_search, file_write,
send_email, create_event, create_task
Suggested per-assignment restrictions:
Personal: Full knowledge access across all layers
Team-shared: Team and account layers only; no personal user layer
Restricted context: Account layer only (policy and reference lookup)
The Chat Agent has no tool access by design. If a request requires an external action -- searching the web, sending an email, creating a task -- the user should be directed to the Personal Assistant or the relevant specialist.
Coach Agent¶
Tagline: A configurable knowledge companion, trained on your content.
A general-purpose persona agent designed to be given a name, a domain, and a library of relevant content, then deployed as a specialised guide for that domain. Out of the box it does nothing -- it gains its value when an account or user provides documents, sets a persona prompt, and names it. A fitness operator loads exercise science PDFs and calls it "Alex". A compliance team loads regulatory guidance and calls it "ComplianceDesk". The same underlying agent architecture supports both.
The Chat Agent is for open-ended chat across all account knowledge. The Coach Agent is focused: it is positioned in a domain, anchored to a curated library, and speaks with a consistent persona.
The HR Agent and Onboarding Agent in this catalogue are pre-configured instances of the Coach Agent pattern -- each with a fixed domain, default library assignments, and sensible defaults for their context. The Coach Agent is the general template a Thinklio Studio user would reach for when building a domain agent not already covered by the catalogue.
Capabilities: Answer questions grounded in the content of the assigned library, citing specific documents or sections where appropriate. Maintain a consistent persona (name, tone, communication style) configured at deployment. Draw on account knowledge layers for account context alongside library content. Guide a user through a process or topic progressively across a multi-turn chat. Acknowledge the boundaries of its knowledge and redirect to other agents or human contacts when appropriate. Surface summaries, overviews, and key points from library documents on request.
Knowledge layers: User (active; communication preferences, prior topics discussed with this agent), Team (active; team-level context), Account (active; account policies and general reference material), Library (active, primary; one or more Libraries configured at deployment provide the domain-specific knowledge corpus).
The Coach Agent's domain knowledge is delivered through the library system. At deployment, library_assignments in the agent template specifies which libraries the agent draws from and in what priority order. Account-scoped libraries are created by uploading documents via the media API and running the chunk_and_index processor to index them. Platform-scoped libraries (where available) are pre-built corpora for common domains. See 05 Persistence, Storage & Ingestion for the full library architecture.
Settings:
capability_level: tools_only
tool_trust_required: none (knowledge read only)
default_execution_mode: immediate
Persona configuration (set at deployment or in Studio):
agent_name: (e.g. "Alex", "ComplianceDesk", "FitCoach")
persona_tone: professional | friendly | coaching | authoritative
domain_focus: free text description of the agent's focus area
system_prompt_addendum: additional instructions for this instance
Library configuration (set at deployment):
library_assignments: [{library_id, priority}]
Permissions:
allow: model_inference, read_user_knowledge,
read_team_knowledge, read_account_knowledge,
read_library
deny: all_external_tools, web_search, file_write,
send_email, create_event, create_task
Suggested per-assignment restrictions:
Domain expert persona: Full library access; all knowledge layers
FAQ / helpdesk persona: Library access; account layer only; no user layer
Guided learning persona: Full library access; chat history depth
increased for multi-session continuity
Without library content loaded, the Coach Agent falls back to model knowledge and account knowledge layers. It will still function as a conversational agent, but its domain authority depends on a well-populated library.
1.2 Group 2: Coordinator Agents¶
Coordinator agents orchestrate two or more specialists to handle multi-step workflows. They are the "front door" agents that users interact with most, delegating the specifics to the appropriate specialists behind the scenes. All coordinator agents require workflow capability level or higher.
Personal Assistant¶
Tagline: Your intelligent front door -- handles requests, delegates to specialists, follows up.
The Personal Assistant is the generalist coordinator for individual users. It handles the full range of day-to-day requests -- from "what's on my plate today" to "research this topic and draft a briefing" -- by delegating to the appropriate specialist agents. It maintains strong user-level knowledge, learns preferences over time, and synthesises results from multiple delegates into coherent responses.
Capabilities: Triage incoming requests and route them to the appropriate specialist agent. Coordinate multi-step tasks across Mail, Calendar, Task, and Research agents. Provide daily briefings (unread mail summary, today's calendar, open tasks, reminders). Synthesise results from multiple delegate agents into a single response. Track open items and follow up on pending work. Manage the user's context and preferences through the user knowledge layer. Handle general questions and chat directly when no delegation is needed. Escalate to the user when a decision is required before proceeding.
Knowledge layers: User (very active; accumulates preferences, patterns, priorities, and personal context across all interactions), Team (project context, shared priorities, team norms), Account (policies and procedures the PA needs to operate within).
Delegates to: Mail Agent, Calendar Agent, Task Agent, Research Agent, Writing Agent, Document Agent.
Settings:
capability_level: workflow (learning recommended)
tool_trust_required: low_risk_write (via delegates)
default_execution_mode: mixed (immediate for quick lookups; deferred for research tasks)
Permissions:
allow: delegate_to_all_assigned_specialists
require_approval: send_email, create_event, delete_task
Suggested delegation restrictions (per delegate):
Mail Agent: drafts only (no autonomous send by default)
Calendar Agent: find_free_time and check_conflicts only
(user must confirm before creating events)
Task Agent: full create and update
Research Agent: full access
Writing Agent: full access
Meeting Agent¶
Tagline: Prepares agendas, captures notes, and turns meetings into action.
The Meeting Agent handles the full meeting lifecycle -- before, during, and after. Before a meeting it prepares an agenda and briefs attendees. After a meeting it processes notes or transcripts to extract action items, decisions, and follow-up tasks, then sends a summary to attendees. It coordinates Calendar, Mail, Task, and Writing agents to do this seamlessly.
Capabilities: Prepare structured agendas from a meeting brief or prior discussion. Brief the organiser on attendees, context, and prior meeting history. Process meeting notes or transcripts (text input or uploaded file). Extract action items, decisions, open questions, and owners from notes. Create tasks from extracted action items. Draft and send meeting summary emails to attendees. Schedule follow-up meetings when needed. Maintain a searchable history of past meetings and their outcomes in the team knowledge layer.
Knowledge layers: User (personal meeting preferences, recurring attendees, typical agenda formats), Team (project history, team member roles, prior meeting outcomes, running decisions log), Account (meeting norms, approved agenda templates, relevant policies).
Delegates to: Calendar Agent, Mail Agent, Task Agent, Writing Agent.
Settings:
capability_level: workflow
tool_trust_required: low_risk_write (via delegates)
default_execution_mode: immediate (preparation and extraction)
deferred (transcript processing for long meetings)
Permissions:
allow: read_calendar, create_task, write_knowledge_fact
require_approval: send_summary_email, create_follow_up_event
Suggested delegation restrictions:
Mail Agent: drafts only (require approval before send)
Calendar Agent: read and check_conflicts only
(separate approval for creating follow-up events)
Task Agent: full create access for extracted action items
Writing Agent: full access for summaries and agendas
Project Coordinator¶
Tagline: Keeps projects moving -- tracks status, surfaces blockers, coordinates the team.
The Project Coordinator is the team-level equivalent of the Personal Assistant. It maintains an overview of a project's status, surfaces blockers and overdue items, coordinates scheduling across the team, and produces status reports. It is deployed at the team level and draws heavily on the team knowledge layer, which accumulates project context over time.
Capabilities: Provide project status summaries on demand and on a schedule. Track task completion across the team and surface overdue or blocked items. Identify and escalate risks and blockers. Schedule project milestones, reviews, and team check-ins. Produce weekly status reports for stakeholders. Coordinate cross-team dependencies (scheduling, communication). Maintain a running decisions log and open questions register in team knowledge. Onboard new team members to project context.
Knowledge layers: Team (very active; accumulates all project decisions, status, context, client details, and team history), Account (project governance policies, reporting templates, escalation procedures), Agent (project management methodology and status report formats).
Delegates to: Calendar Agent, Task Agent, Research Agent, Writing Agent, Mail Agent.
Settings:
capability_level: workflow
tool_trust_required: low_risk_write (via delegates)
default_execution_mode: mixed
Permissions:
allow: read_tasks, create_task, write_knowledge_fact,
read_calendar, draft_email
require_approval: send_status_report, create_milestone_event
Suggested delegation restrictions:
Mail Agent: drafts only (team lead approves before send)
Calendar Agent: create_event allowed (for internal team scheduling)
Task Agent: full access
Writing Agent: full access for reports; drafts only for external comms
Briefing Agent¶
Tagline: Knows who you're meeting before you walk in the room.
The Briefing Agent compiles structured pre-meeting briefings from publicly available sources. Given a name, organisation, or meeting invite, it researches the relevant individuals and produces a formatted one-pager (or multi-person brief for group meetings) covering professional background, current role, qualifications, and inferred chat preferences. A custom brief -- what you're there to discuss, what you're pitching, what outcome you need -- focuses the output on what's most relevant to your specific meeting.
Capabilities: Research individuals from public sources (company websites, LinkedIn, news, published interviews, academic profiles, public filings). Research companies from public sources (about pages, annual reports, press releases, news coverage, board and executive listings). Compile individual profiles: name, photo URL, current title and employer, career history summary, qualifications and credentials, notable work or publications, known interests and positions. Infer chat preferences from public signals: topics they frequently speak or write about, known causes or affiliations, public positions, things notably absent from their public profile. Handle group briefings: company overview plus a profile card per named individual (board, executive team, or meeting attendees). Accept a custom brief describing the meeting purpose, what is being pitched or discussed, and any specific angles to prioritise, and weight the output accordingly. Flag confidence level for inferred preferences (distinguishing "stated publicly" from "inferred from patterns"). Note when information could not be found rather than speculating. Output as a formatted document (PDF or HTML recommended for photo embedding; markdown for plain text environments). Save briefings to user or team knowledge layer for reuse.
Knowledge layers: User (prior briefings prepared for this user; meeting context preferences and output format preferences), Team (prior research on contacts and organisations the team has met; relationship history), Account (nil -- this agent works from public sources, not internal knowledge).
Delegates to: Research Agent (for web research and source synthesis), Writing Agent (for formatting and polishing the final output), Document Agent (for reading any uploaded background material the user provides).
Input parameters (custom brief prompt):
The briefing request should include as much of the following as possible:
Who you are meeting: [Name / Organisation / LinkedIn URL / meeting invite]
Meeting purpose: [What the meeting is for -- pitch, catch-up, negotiation, interview]
Your context: [Who you are, what you're offering or seeking]
Focus areas: [Specific topics, angles, or questions to prioritise]
Output format: [One-pager / multi-person group brief / executive summary]
Settings:
capability_level: workflow
tool_trust_required: read (web research and document read only)
default_execution_mode: deferred (research takes time; deliver when ready)
Permissions:
allow: web_search, read_url, read_document (uploaded background material),
create_document, write_knowledge_fact
deny: read_internal_crm_without_approval, send_email,
access_private_data
Privacy note: This agent uses publicly available information only.
It does not access private records, internal systems,
or data the subject has not made public. Inferred
preferences are flagged as inference, not fact.
Output format guidance:
Plain text / markdown: Suitable for Telegram delivery or quick reads
PDF / HTML: Required for photo embedding and formatted layout
(configure the Writing Agent or Document Agent
as delegate for formatted output)
Suggested per-assignment restrictions:
Personal: Full access; save to user knowledge
Team: Full access; save to team knowledge
Restricted context: Deliver briefing only; no knowledge writes
1.3 Group 3: Data & Knowledge Agents¶
These agents handle structured and unstructured information -- reading data, managing knowledge, and producing formatted reports.
Data Agent¶
Tagline: Reads your data and tells you what it means.
The Data Agent works with structured data -- spreadsheets, CSV exports, database query results -- to answer questions, surface trends, and produce summaries. It does not replace a full analytics platform but handles the common case of "I have this spreadsheet, tell me what's going on." Results can be presented as prose summaries, tables, or chart descriptions.
Capabilities: Read and parse spreadsheets, CSV files, and tabular data. Answer natural-language questions about data content. Compute aggregations (totals, averages, counts, percentage changes). Identify trends, anomalies, and outliers. Compare datasets across time periods or categories. Produce plain-language summaries of findings. Generate structured data tables from prose or unstructured sources. Flag data quality issues (missing values, inconsistencies, duplicates).
Knowledge layers: User (preferred analysis framing, recurring reports, data interpretation preferences), Team (data context -- what the numbers mean in this team's work), Account (data definitions, calculation methodologies, reporting standards).
Settings:
capability_level: workflow
tool_trust_required: read (read-only data access by default)
default_execution_mode: immediate (small datasets)
deferred (large files or complex analyses)
Permissions:
allow: read_file, read_spreadsheet, create_document (for reports)
deny: write_spreadsheet, delete_file (by default)
require_approval: write_spreadsheet (if data correction is in scope)
Suggested per-assignment restrictions:
Personal: Full read; write on request
Team: Read only
Finance context: Read only; all outputs require review before distribution
Knowledge Base Agent¶
Tagline: The team's institutional memory -- answers questions from what the account knows.
The Knowledge Base Agent is the primary interface for querying the team and account knowledge layers. It answers questions by drawing on indexed knowledge facts, documents, and prior interaction history. Unlike the Research Agent (which searches the web), the Knowledge Base Agent works from what the account already knows. It is particularly valuable for onboarding, policy lookups, and maintaining institutional continuity.
Capabilities: Answer questions by retrieving and synthesising from team and account knowledge layers. Indicate confidence and cite the source fact or document for every answer. Flag when knowledge is stale, conflicting, or absent (and suggest an update). Accept and index new knowledge contributions from users. Provide a summary of what the team knows about a given topic, client, or project. Surface related knowledge when answering a question. Suggest knowledge gaps based on patterns of unanswered questions.
Knowledge layers: Account (primary; policies, procedures, reference material), Team (primary; accumulated project and client context), Agent (retrieval methodology, confidence scoring, knowledge gap patterns).
Settings:
capability_level: tools_only
tool_trust_required: read (knowledge read)
low_risk_write (for accepting new contributions)
default_execution_mode: immediate
Permissions:
allow: read_knowledge, write_knowledge_fact (contributions only)
require_approval: modify_knowledge_fact, delete_knowledge_fact
deny: web_search, external_writes
Suggested per-assignment restrictions:
Team member: Full read; write contributions allowed
External-facing: Read only; account layer only (no team layer exposed)
Admin: Full read and write including modifications
Report Writer Agent¶
Tagline: Data in, formatted report out.
The Report Writer takes structured or semi-structured input -- database query results, CSV exports, JSON data, research findings, or any combination -- and produces a complete, formatted report document. It is a structured output agent: its purpose is not to discuss data or surface findings conversationally, but to reason about what narrative the data supports and render that into a publication-ready document. It is distinct from the Writing Agent (which produces general prose from a brief) and the Data Agent (which surfaces analysis in plain text). The Report Writer's primary output formats at launch are HTML/CSS for PDF rendering via DocRaptor, and Markdown for lightweight or version-controlled delivery. Google Docs and Microsoft Word are planned for a future release.
Capabilities: Accept structured data inputs: CSV, JSON, spreadsheet exports, database query results, or pasted tabular data. Accept an optional report brief specifying audience, purpose, key questions, required sections, and output format. Reason about the data: identify the principal narrative, surface significant trends, flag anomalies, and determine what to foreground versus background. Select an appropriate report structure for the audience (executive summary, full analytical report, operational digest). Write narrative prose sections, callout figures, section summaries, and a concluding interpretation. Produce formatted data tables and annotated figures within the document. Apply account-level report templates, style guidelines, and branding from the knowledge layer. Output as styled HTML/CSS for PDF rendering via DocRaptor. Output as Markdown for lightweight or version-controlled delivery. (Planned) Output to Google Docs or Microsoft Word via configured integration. Produce multi-section reports with consistent structure and internal cross-references.
Knowledge layers: Account (primary; report templates, style guide, branding, standard section structures, approved table and figure styles), Team (recurring report formats, standard KPI definitions, known data source context, audience preferences), User (preferred output format, personal report style preferences, frequently reported data types), Agent (data-to-narrative reasoning methodology, format selection heuristics, output quality standards).
Delegates to: Data Agent (for analytical depth on complex or large datasets), Writing Agent (for polishing narrative prose sections).
Settings:
capability_level: workflow
tool_trust_required: read (data sources and knowledge)
low_risk_write (document creation)
default_execution_mode: immediate (short reports, simple data)
deferred (multi-section reports, large datasets)
Permissions:
allow: read_file, read_data, read_knowledge,
create_document
deny: send_email, modify_source_data,
external_system_writes
Output format permissions (configure at deployment):
html_css: enabled (rendered to PDF via DocRaptor)
markdown: enabled
google_docs: disabled (planned)
word_docx: disabled (planned)
Suggested per-assignment restrictions:
Personal: Full access; any configured output format
Team: Full access; outputs saved to shared drive
Regulated context: Draft output only; review required
before distribution
1.4 Group 4: Organisational Specialists¶
These agents are suited to specific business functions. They operate at team or account scope and typically draw heavily on the account knowledge layer.
HR Agent¶
Tagline: Answers HR questions, guides processes, and connects people to the right resources.
The HR Agent is a first-line resource for employee questions about policies, entitlements, processes, and procedures. It draws on the account knowledge layer (which HR admins maintain) to give accurate, policy-grounded responses. It does not make HR decisions -- it interprets policy and routes complex or sensitive cases to a human. It is deliberately conservative with write access.
The HR Agent is a pre-configured instance of the Coach Agent pattern, with a fixed domain (HR policy and process), default library assignments for HR documentation, and appropriate permission restrictions. See the Coach Agent entry in Group 1 for the underlying architecture.
Capabilities: Answer questions about leave entitlements, policies, and procedures. Guide employees through standard HR processes (onboarding, offboarding, performance reviews, leave requests). Explain benefits, remuneration, and entitlement policies. Help employees find the right form, document, or point of contact. Draft standard HR communications (offer letters, policy acknowledgements) from templates. Escalate sensitive or complex cases to an HR team member. Log queries for HR team visibility (anonymised where appropriate).
Knowledge layers: Account (primary; HR policies, procedure guides, entitlement schedules, approved templates), User (minimal; session context only; no personal HR data stored in user knowledge layer).
Settings:
capability_level: tools_only
tool_trust_required: read (policy retrieval)
low_risk_write (drafting from templates only)
default_execution_mode: immediate
Permissions:
allow: read_knowledge, create_document (from templates only)
require_approval: send_hr_communication, modify_hr_record
deny: read_personal_records_without_approval, web_search
Important: User knowledge writes are disabled. This agent does not
accumulate personal data about individual employees.
Suggested per-assignment restrictions:
Employee self-service: Policy read and process guidance only
HR admin: Full access including template drafting
Manager: Read policies; draft communications with approval
Finance Agent¶
Tagline: Tracks spending, categorises expenses, and produces financial summaries.
The Finance Agent helps individuals and teams manage expense tracking, budget monitoring, and financial reporting. It reads financial data from connected sources, categorises transactions, flags anomalies, and produces structured summaries. It operates in read-only mode by default; any write actions (updating records, submitting claims) require explicit approval.
Capabilities: Read expense records, bank statements, and budget data from connected sources. Categorise transactions against a chart of accounts or expense policy. Compare actual spend against budget by period, category, or team. Flag anomalies, duplicate transactions, and policy breaches. Produce expense summaries and reports for a given period. Draft expense claims or reimbursement requests. Reconcile receipts against expense records. Provide budget utilisation summaries for managers.
Knowledge layers: User (personal expense patterns, recurring expense categories, preferred report format), Team (team budget allocations, project cost codes, shared expense policies), Account (chart of accounts, expense policy, approval thresholds, finance calendar).
Settings:
capability_level: workflow
tool_trust_required: read (default -- financial data read only)
high_risk_write (for any submission or payment action)
default_execution_mode: immediate (summaries)
deferred (reconciliation runs, period-end reports)
Permissions:
allow: read_financial_data, create_document (for reports)
require_approval: submit_expense_claim, update_financial_record
deny: initiate_payment, delete_financial_record
Suggested per-assignment restrictions:
Personal: Full read; draft claims (with approval to submit)
Team manager: Full read across team; summary reports only
Finance team: Full access including submission workflows
Support Triage Agent¶
Tagline: Receives requests, classifies them, routes them, and keeps requestors informed.
The Support Triage Agent is designed for teams that receive a volume of inbound requests -- internal IT support, customer service, facilities, or any other service function. It classifies incoming requests, routes them to the right queue or person, creates tickets, and keeps requestors updated on status. It can handle first-line responses autonomously and escalates when a human is needed.
Capabilities: Receive and acknowledge inbound requests via any configured channel. Classify requests by type, urgency, and required skill. Route requests to the appropriate team member, queue, or escalation path. Create tickets in the configured task or ticketing system. Draft first-line responses from a knowledge base (FAQ, known issue library). Resolve simple requests without escalation (knowledge-based answers). Update requestors on ticket status and estimated resolution. Flag SLA breaches and overdue tickets. Produce support queue summaries and trend reports for team leads.
Knowledge layers: Account (service catalogue, routing rules, escalation policies, approved response templates), Team (team members' specialisations and availability, known issue backlog, resolution history), Agent (classification models, routing heuristics, SLA thresholds).
Settings:
capability_level: workflow
tool_trust_required: low_risk_write (ticket creation, status updates)
default_execution_mode: immediate (triage and first response)
deferred (human handoff jobs)
Permissions:
allow: read_inbound, create_ticket, update_ticket,
send_acknowledgement, read_knowledge
require_approval: close_ticket, send_resolution_response,
escalate_to_external_team
deny: access_personal_customer_data_without_approval
Suggested per-assignment restrictions:
External-facing: Acknowledgement and status updates only;
all substantive responses require approval
Internal IT: Full triage and first-line resolution
Customer service: Full triage; escalation requires team lead approval
Content Agent¶
Tagline: Produces on-brand content across channels -- social, blog, email, and more.
The Content Agent drafts, adapts, and manages content for marketing and communications. It works from the account knowledge layer's brand guidelines and voice documentation to produce content that stays on-message across channels. It can adapt a single piece of source material into formats appropriate for different audiences and platforms. It is typically deployed at team or account level, not personal.
Capabilities: Draft social media posts (LinkedIn, X/Twitter, and others) from a brief or source material. Write blog posts and articles from an outline or research brief. Produce email newsletters and campaign copy. Adapt existing content for different audiences, channels, or tones. Maintain brand voice consistency using account-level brand guidelines. Suggest content angles and topics based on a given theme or objective. Produce multiple variants of copy for testing. Check content against account style guide and flag deviations.
Knowledge layers: Account (primary; brand voice, style guide, approved terminology, content templates, past published content), Team (campaign context, product messaging, target audiences), Agent (channel-specific best practices and format guidance).
Settings:
capability_level: workflow
tool_trust_required: read (brand and reference material)
low_risk_write (drafts and document creation)
default_execution_mode: immediate
Permissions:
allow: read_knowledge, create_document, web_search
(for factual research and competitive awareness)
require_approval: publish_content, send_campaign
deny: post_to_social_directly (drafts only until approved)
Suggested per-assignment restrictions:
Copywriter: Full draft access; publish requires approval
Marketing manager: Full access including publish approval
Agency/contractor: Draft only; all output reviewed before use
Customer Intelligence Agent¶
Tagline: Knows your customers -- briefs you before calls, logs interactions, surfaces opportunities.
The Customer Intelligence Agent is the interface between the team and its CRM data. It prepares meeting briefs (who you're meeting, what they've bought, what's outstanding), logs interaction notes back to the CRM, flags follow-up opportunities, and produces account summaries. It connects to any CRM via the configured external tool integration.
Capabilities: Retrieve and summarise customer and account records from the CRM. Prepare pre-meeting briefs (contact history, outstanding issues, prior chats). Log meeting notes and interaction summaries back to the CRM. Flag accounts with open opportunities, upcoming renewals, or overdue follow-ups. Produce account health summaries for a portfolio or territory. Answer questions about a specific account's history, purchases, or status. Surface upsell or cross-sell signals based on account data. Sync meeting outcomes to the CRM automatically.
Knowledge layers: Team (sales process, account ownership, relationship context, deal history), Account (CRM data schema, sales methodology, account tiers and policies), User (personal relationship notes, private context about specific contacts).
Settings:
capability_level: workflow
tool_trust_required: read (CRM read, default)
low_risk_write (logging notes back)
default_execution_mode: immediate
Permissions:
allow: read_crm, write_interaction_log, create_task
require_approval: update_account_record, create_opportunity,
delete_contact
deny: export_customer_data_without_approval
Suggested per-assignment restrictions:
Sales rep: Full read; log notes; create tasks
Manager: Full read across team; no write
External reviewer: Anonymised summaries only
Onboarding Agent¶
Tagline: Gets new team members up to speed quickly and without burdening the rest of the team.
The Onboarding Agent guides new employees through their first days and weeks. It answers questions, surfaces relevant policies and procedures, assigns onboarding tasks, tracks progress, and escalates blockers. It draws heavily on the account knowledge layer, which administrators maintain with onboarding content. It reduces the time senior team members spend on routine orientation activities.
Like the HR Agent, the Onboarding Agent is a pre-configured instance of the Coach Agent pattern, with a fixed domain (employee onboarding), default library assignments for onboarding content, and task management capabilities enabled.
Capabilities: Welcome new starters and guide them through a structured onboarding sequence. Answer questions about policies, benefits, tools, and procedures. Assign and track onboarding tasks (reading, account setup, introductions). Surface relevant documentation contextually (e.g. "now that you've completed payroll setup, here's the expense policy"). Introduce the new starter to relevant team members and suggest intro meetings. Flag onboarding blockers (e.g. access not granted, task not completed after N days). Produce onboarding progress reports for HR or team leads. Capture feedback on the onboarding experience for continuous improvement.
Knowledge layers: Account (onboarding sequences, policies, procedures, tools documentation, team structure), Team (team-specific onboarding steps, project context, who does what), User (the new starter's progress, questions asked, tasks completed).
Settings:
capability_level: workflow
tool_trust_required: read (knowledge retrieval)
low_risk_write (task assignment and progress tracking)
default_execution_mode: immediate
Permissions:
allow: read_knowledge, create_task, update_task,
read_calendar (for scheduling intro meetings)
require_approval: send_external_communication,
modify_onboarding_sequence
deny: read_payroll_data, read_personal_hr_records
Suggested per-assignment restrictions:
New starter: Full access for self-guided onboarding
HR admin: Full access including sequence modification
Manager: Progress read only; escalation alerts
Monitor Agent¶
Tagline: Watches for conditions you define and alerts you when they're met.
The Monitor Agent runs in the background, watching for conditions across connected systems, inbound data, and job results. When a condition is met -- a budget threshold crossed, a job completing, an inbound message matching a pattern, a metric exceeding a limit -- it triggers an alert or a follow-up action. It is most useful at team and account level for operational awareness and for closing the loop on deferred work.
Capabilities: Monitor job completion and deliver results to a specified agent or user. Watch for threshold conditions on budget, usage, error rates, or custom metrics. Monitor inbound channels for messages matching specified patterns or keywords. Trigger alert notifications via any configured channel (email, Telegram, web chat). Escalate alerts that have not been acknowledged within a configured time window. Produce periodic status digests (hourly, daily, weekly) summarising monitored conditions. Register as a job observer on behalf of another agent or user. Maintain a log of all triggered alerts and their acknowledgement status.
Knowledge layers: Team (monitored conditions, alert preferences, escalation paths, prior alert history), Account (alert policies, escalation procedures, SLA thresholds).
Settings:
capability_level: tools_only
tool_trust_required: read (read-only access to monitored systems)
low_risk_write (sending alert notifications)
default_execution_mode: deferred (monitors are inherently background processes)
Permissions:
allow: read_jobs, read_metrics, register_job_observer,
send_notification, read_budget
require_approval: trigger_external_action, modify_alert_thresholds
deny: modify_monitored_systems, delete_records
Suggested per-assignment restrictions:
Team operational: Standard alert thresholds; notify team channel
Account admin: Full access to all account-level metrics and jobs
Read-only context: Notification delivery only; no escalation actions
Branding Agent¶
Tagline: Turns a rough idea into a coherent visual and verbal identity.
The Branding Agent works interactively with a user to develop a brand identity for a project, product, app, or business. It begins by asking structured clarifying questions to understand the concept, audience, values, and tone, then produces a brand specification covering colour palette, typography, logo and icon directions, and brand voice guidelines. The output is a formatted brand brief suitable for handing to a designer or feeding directly into an image generation prompt. It does not produce actual graphics; it produces the specification that drives them.
Capabilities: Ask structured clarifying questions to establish brand context: what the thing is, who it's for, what personality it should have, what it should feel like, what it should not feel like, and any existing constraints (e.g. existing colours, names, or assets to work around). Suggest a primary colour palette (two to three colours with hex codes, names, and rationale). Suggest supporting and accent colours with usage guidance. Recommend typography pairings (heading and body typefaces) with rationale and fallback options. Describe logo and icon concept directions (three to five distinct directions, each with a concept description, visual metaphor, style reference, and suggested mood). Produce brand voice guidelines: tone adjectives, dos and don'ts, example phrases that are on-brand versus off-brand. Generate a consolidated brand brief document ready for a designer or image generation tool. Suggest naming directions if the project is unnamed or the name is under consideration. Iterate on any element based on feedback. Optionally produce image generation prompts (for Midjourney, DALL-E, or similar) based on the agreed logo concept directions.
Knowledge layers: Agent (design principles, colour theory, typography conventions, brand strategy frameworks, logo concept vocabulary), User (prior branding work the user has done; their aesthetic preferences and known dislikes; brand briefs previously produced), Account (existing brand elements if the request is a brand extension rather than a new identity; approved design system components).
Delegates to: Writing Agent (for polishing the final brand brief document and voice guidelines).
Execution note: This agent is one of the few in the starter pack that is inherently interactive before it can produce output. It should always begin with a clarifying question sequence before generating any brand elements. The quality of the output is directly proportional to the quality of the brief gathered. A bare minimum viable brief requires: what the thing does, who uses it, and three adjectives that describe how it should feel.
Input parameters (initial prompt):
The user's starting prompt can be as brief as a sentence or as detailed as a full brief. The agent fills gaps through questions. Typical useful starting information:
What it is: [Product / app / business / event / project name and description]
Audience: [Who will encounter this brand]
Personality: [Adjectives -- what should it feel like?]
Anti-personality: [What should it definitely NOT feel like?]
Constraints: [Any existing assets, colours, or names to work around]
Output needed: [Full brand spec / colour palette only / logo directions only]
Designer handoff: [Yes -- produce a PDF brief / No -- working doc is fine]
Settings:
capability_level: workflow
tool_trust_required: read (reference research, design inspiration lookup)
default_execution_mode: interactive (clarifying questions before generation)
deferred (final brand brief document production)
Permissions:
allow: web_search (competitor and inspiration research),
create_document, write_knowledge_fact
deny: send_email, external_system_writes
Optional integration:
image_generation_tool: If configured, the agent can pass agreed logo
concept directions directly to an image generation
tool as structured prompts. Disabled by default.
Suggested per-assignment restrictions:
Personal / solo founder: Full interactive access; full output
Team (design review): Output to shared drive; changes require
design lead sign-off before brand is adopted
Agency / builder: Full access; outputs scoped to client context
in team knowledge layer
1.5 Summary Table¶
| # | Agent | Group | Context | Capability Level | Delegates To |
|---|---|---|---|---|---|
| 1 | Mail Agent | Specialist | Personal / Team | tools_only | -- |
| 2 | Calendar Agent | Specialist | Personal / Team | tools_only | -- |
| 3 | Task Agent | Specialist | Personal / Team | tools_only | -- |
| 4 | Research Agent | Specialist | Personal / Team | workflow | -- |
| 5 | Writing Agent | Specialist | Personal / Team | workflow | -- |
| 6 | Document Agent | Specialist | Personal / Team / Account | workflow | -- |
| 7 | Personal Assistant | Coordinator | Personal | workflow / learning | Mail, Calendar, Task, Research, Writing, Document |
| 8 | Meeting Agent | Coordinator | Personal / Team | workflow | Calendar, Mail, Task, Writing |
| 9 | Project Coordinator | Coordinator | Team | workflow | Calendar, Task, Research, Writing, Mail |
| 10 | Briefing Agent | Coordinator | Personal / Team | workflow | Research, Writing, Document |
| 11 | Data Agent | Data & Knowledge | Team / Account | workflow | -- |
| 12 | Knowledge Base Agent | Data & Knowledge | Team / Account | tools_only | -- |
| 13 | HR Agent | Org Specialist | Account | tools_only | -- |
| 14 | Finance Agent | Org Specialist | Personal / Team / Account | workflow | -- |
| 15 | Support Triage Agent | Org Specialist | Team / Account | workflow | Task, Mail, Knowledge Base |
| 16 | Content Agent | Org Specialist | Team / Account | workflow | -- |
| 17 | Customer Intelligence Agent | Org Specialist | Team / Account | workflow | Task, Calendar |
| 18 | Onboarding Agent | Org Specialist | Account | workflow | Task, Calendar, Knowledge Base |
| 19 | Monitor Agent | System | Team / Account | tools_only | -- |
| 20 | Branding Agent | Creative | Personal / Team | workflow | Writing |
| 21 | Chat Agent | Specialist | Personal / Team / Account | tools_only | -- |
| 22 | Report Writer Agent | Data & Knowledge | Personal / Team / Account | workflow | Data, Writing |
| 23 | Coach Agent | Specialist | Personal / Team / Account | tools_only | -- |
1.6 Upcoming Agents¶
The following agents are planned but not yet fully specified. Design decisions need to be resolved before formal catalogue entries are written.
Visualiser Agent (design decisions pending)¶
Planned group: Data & Knowledge
Tagline: Turns data and descriptions into charts, diagrams, and infographics.
The Visualiser Agent takes structured data, a process description, or a conceptual prompt and produces a visual output -- charts, workflow diagrams, infographics, relationship maps, or timelines. At launch it is display-only: the agent generates a visual artefact that the platform renders. A later release would introduce interactivity (filterable charts, clickable workflow steps, drill-down). It is a natural delegate of the Report Writer Agent, which can embed Visualiser outputs within formatted reports.
Planned capabilities (v1, display only): Generate workflow and process diagrams from a description or structured input. Generate data charts (bar, line, pie, scatter, and others) from tabular data. Produce infographics and relationship/entity maps. Produce org charts and simple timelines. Apply account brand colours and approved styles from the knowledge layer. Select the appropriate visualisation type for the data and audience. Output in the configured format (see design decisions below).
Planned capabilities (v2, interactive, future release): Filterable and drill-down charts. Clickable workflow steps with contextual detail. Animated data stories. Parameterised outputs the platform can make interactive without re-invoking the agent.
Open design decisions (v1):
-
Output format. Three realistic approaches for v1. Mermaid markup: model-native, platform renders; excellent for diagrams and workflows, limited for data charts. Vega-Lite spec: declarative config the model generates from data, platform renders via a Vega renderer; better separation of concerns for charts. SVG: model generates directly; flexible but gets unwieldy for complex data visualisations. Likely answer: Mermaid for diagrams and workflows; Vega-Lite for data charts; SVG as fallback for custom infographics. To confirm.
-
Rendering responsibility. Does the platform build a renderer, or does the agent output something already renderable in-browser (SVG, HTML)? Mermaid and Vega-Lite require a client-side rendering library. SVG and HTML do not. Decision affects the platform engineering scope for v1.
Open design decisions (v2):
- Interactivity model. Two approaches. Model generates component code (React, D3): more powerful, but requires a safe sandboxed execution environment. Platform adds an interaction layer over static output: simpler to implement, more limited in capability.
1.7 Implementation Notes¶
Phasing. Not all 23 agents need to ship simultaneously. A suggested launch order:
- Phase 1 (foundation): Mail, Calendar, Task, Personal Assistant, Chat Agent, Coach Agent -- the personal productivity core plus first document-grounded persona agent.
- Phase 2 (team value): Research, Writing, Document, Meeting Agent, Briefing Agent, Knowledge Base Agent.
- Phase 3 (org functions): Project Coordinator, Support Triage, Onboarding, HR, Monitor, Report Writer Agent.
- Phase 4 (vertical depth): Finance, Content, Customer Intelligence, Data Agent, Branding Agent.
Coordination dependencies. The three coordinator agents (Personal Assistant, Meeting Agent, Project Coordinator) require at least their core specialist delegates to be available. They should not be enabled unless the relevant specialists are also deployed.
Template vs live agents. These entries define AgentTemplate records. Customers deploy live agents from these templates, then customise them. The templates set sensible defaults; operators adjust for their context.
Knowledge seeding. Several agents (HR, Onboarding, Knowledge Base, Content, Coach) benefit from account knowledge being seeded before the agent is useful. The template should include a prompt to the deploying admin to provide initial knowledge documents before activating the agent. For the Coach Agent specifically, an empty library renders the agent effectively a less-focused version of the Chat Agent -- seeding the library is the deployment step that makes it purposeful.
Coach, HR, and Onboarding pattern. The HR Agent and Onboarding Agent are pre-configured instances of the same architectural pattern as the Coach Agent. All three use the library system for domain knowledge, with account-scoped libraries managed by the admin. They are presented as separate catalogue entries because they have different default configurations, risk profiles, and user expectations, but they share the same underlying template mechanism. See 05 Persistence, Storage & Ingestion for the library system.
2. Agent Implementation Logistics¶
The agent catalogue defines what each agent does. This section defines what needs to be built and connected to make each agent actually work: external integrations, tool abstractions, the implementation sequence, and operational considerations.
2.1 Integration Landscape¶
Every agent's capabilities ultimately resolve to tool calls. The tools fall into four categories:
| Category | Description | Examples |
|---|---|---|
| Platform tools | Built into Thinklio, no external dependency | memory_store, memory_search, current_time, write_knowledge_fact |
| LLM-native tools | Capabilities that are purely LLM reasoning, no tool call needed | Summarisation, drafting, classification, analysis, question answering |
| External integration tools | Connect to third-party services via API | Gmail, Google Calendar, Todoist, Jira, CRM, etc. |
| Document tools | Read and process uploaded documents | read_document, search_documents (backed by the document ingestion system) |
Most agents are a mix of LLM-native reasoning and tool calls. An agent like the Writing Agent is almost entirely LLM-native -- it reasons, drafts, and edits without needing any external system. An agent like the Calendar Agent is almost entirely tool-dependent -- it is useless without a calendar integration.
This distinction matters for implementation sequencing: LLM-native agents can ship immediately, tool-dependent agents can only ship when their integrations exist.
2.2 External Integration Map¶
Email¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| Gmail | Gmail API (REST) | OAuth 2.0 (user consent) | Read, send, label, search. Requires per-user OAuth grant. |
| Microsoft 365 | Microsoft Graph API | OAuth 2.0 (org or user) | Same capabilities. Enterprise customers likely use this. |
| Generic IMAP/SMTP | IMAP for read, SMTP for send | Username/password or app password | Fallback for providers without REST APIs. Limited compared to Gmail/Graph. |
Tool abstractions needed:
email_read Read messages (inbox, by sender, by label, by date range)
email_search Search messages by query
email_send Send a new email
email_reply Reply to an existing thread
email_draft Create a draft (no send)
email_label Apply/remove labels or folders
email_archive Archive a message
email_flag Flag/star a message
The tool registry stores the abstract tool (e.g. email_read). The tool's config field specifies which provider implementation to use. At deployment time, the operator configures the provider and credentials. The agent calls email_read regardless of whether it is backed by Gmail or Graph.
Agents that need this: Mail Agent, Personal Assistant, Meeting Agent, Project Coordinator, Support Triage Agent.
Calendar¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| Google Calendar | Google Calendar API (REST) | OAuth 2.0 | Full CRUD, free/busy queries, recurring events |
| Microsoft 365 | Microsoft Graph API (calendar) | OAuth 2.0 | Same capabilities, different API shape |
| CalDAV | CalDAV protocol | Various | Open standard, used by some self-hosted solutions |
Tool abstractions needed:
calendar_read Read events (by date range, calendar)
calendar_find_free_time Find available slots across calendars
calendar_check_conflicts Check for conflicts at a proposed time
calendar_create_event Create a new event
calendar_update_event Update an existing event
calendar_cancel_event Cancel/delete an event
calendar_send_invite Send meeting invitations
calendar_rsvp Respond to an invitation
Agents that need this: Calendar Agent, Personal Assistant, Meeting Agent, Project Coordinator, Customer Intelligence Agent, Onboarding Agent.
Task Management¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| Todoist | Todoist REST API v2 | API token | Personal task management, simple and clean |
| Jira | Jira REST API | OAuth 2.0 or API token | Enterprise project management, complex schema |
| Asana | Asana REST API | OAuth 2.0 or PAT | Team task management, project-oriented |
| Linear | Linear GraphQL API | API key | Modern dev-oriented project management |
| Trello | Trello REST API | API key + token | Board-based, simpler than Jira |
| Internal | Thinklio knowledge layer | Platform native | Tasks stored as knowledge facts (lightweight fallback) |
Tool abstractions needed:
task_create Create a task (title, description, due date, assignee, priority)
task_update Update task fields
task_complete Mark a task as complete
task_delete Delete a task
task_list List tasks (by assignee, project, status, due date)
task_search Search tasks by keyword
task_assign Assign/reassign a task
task_set_reminder Set a reminder on a task
Agents that need this: Task Agent, Personal Assistant, Meeting Agent, Project Coordinator, Support Triage Agent, Onboarding Agent.
CRM¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| HubSpot | HubSpot API v3 | OAuth 2.0 or API key | Contacts, companies, deals, activities |
| Salesforce | Salesforce REST API | OAuth 2.0 | Enterprise CRM, complex object model |
| Pipedrive | Pipedrive REST API | API token | Sales-focused, simpler model |
| Internal | Thinklio knowledge layer | Platform native | CRM data stored as knowledge facts (lightweight) |
Tool abstractions needed:
crm_read_contact Read a contact/customer record
crm_search_contacts Search contacts by name, email, company
crm_read_account Read a company/account record
crm_log_interaction Log a meeting note, call, or interaction
crm_read_deals Read opportunities/deals for an account
crm_create_task Create a follow-up task linked to a contact
crm_read_history Read interaction history for a contact
Agents that need this: Customer Intelligence Agent, Support Triage Agent (optional).
Web Search and Research¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| Tavily | Tavily Search API | API key | Built for AI agents, returns structured results |
| Serper | Serper API | API key | Google Search results via API |
| Brave Search | Brave Search API | API key | Privacy-focused, good for general search |
| Direct fetch | HTTP GET + readability extraction | None | Read specific URLs, extract content |
Tool abstractions needed:
web_search Search the web for a query (returns title, snippet, URL)
web_read_url Read and extract content from a specific URL
web_read_multiple Read multiple URLs (batch)
Agents that need this: Research Agent, Writing Agent, Briefing Agent, Content Agent, Data Agent (for reference data).
File Storage and Document Access¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| Thinklio R2 | Internal document system | Platform auth | Primary. See 05 Persistence, Storage & Ingestion. |
| Google Drive | Google Drive API | OAuth 2.0 | Read/write files in user's Drive |
| Microsoft OneDrive/SharePoint | Microsoft Graph API | OAuth 2.0 | Enterprise document access |
| Direct upload | Upload API | Platform auth | Files uploaded directly to Thinklio |
Tool abstractions needed:
document_upload Upload a file to Thinklio storage
document_read Read/extract content from a stored document
document_search Search document chunks by semantic query
document_list List documents for an agent/scope
document_delete Delete a document and derived content
Agents that need this: Document Agent, Knowledge Base Agent, HR Agent, Finance Agent, all agents that consume uploaded reference material.
Notifications and Messaging¶
| Provider | API | Auth | Notes |
|---|---|---|---|
| Telegram | Telegram Bot API | Bot token | Already implemented |
| Slack | Slack Web API | OAuth 2.0 or bot token | Team notifications, channel messages |
| Via email integration | (same as email) | Notification delivery via email | |
| Web push | Platform websocket/SSE | Platform auth | Real-time notifications in web UI |
Tool abstractions needed:
notify_user Send a notification to a user via their preferred channel
notify_team Send a notification to a team channel
Agents that need this: Monitor Agent, Support Triage Agent, Personal Assistant, any agent that delivers deferred results.
2.3 Agent-to-Integration Dependency Matrix¶
| Agent | Calendar | Tasks | CRM | Web Search | Documents | Notifications | LLM-Only | |
|---|---|---|---|---|---|---|---|---|
| Mail Agent | Required | -- | -- | -- | -- | -- | -- | -- |
| Calendar Agent | -- | Required | -- | -- | -- | -- | -- | -- |
| Task Agent | -- | -- | Required | -- | -- | -- | -- | -- |
| Research Agent | -- | -- | -- | -- | Required | -- | -- | -- |
| Writing Agent | -- | -- | -- | -- | Optional | Optional | -- | Primary |
| Document Agent | -- | -- | -- | -- | -- | Required | -- | -- |
| Personal Assistant | Via delegates | Via delegates | Via delegates | -- | Via delegates | -- | Optional | Routing/synthesis |
| Meeting Agent | Via delegates | Via delegates | Via delegates | -- | -- | Optional | -- | Extraction/synthesis |
| Project Coordinator | Via delegates | Via delegates | Via delegates | -- | Optional | -- | Optional | Status/synthesis |
| Briefing Agent | -- | -- | -- | -- | Via delegates | Via delegates | -- | Synthesis |
| Data Agent | -- | -- | -- | -- | -- | Required | -- | Analysis |
| Knowledge Base Agent | -- | -- | -- | -- | -- | Required | -- | Primary |
| HR Agent | -- | -- | -- | -- | -- | Required | -- | Primary |
| Finance Agent | -- | -- | -- | -- | -- | Required | -- | Analysis |
| Support Triage Agent | Optional | -- | Required | Optional | -- | Optional | Required | Classification |
| Content Agent | -- | -- | -- | -- | Optional | Optional | -- | Primary |
| Customer Intelligence | -- | Optional | Optional | Required | -- | -- | -- | Synthesis |
| Onboarding Agent | -- | Optional | Required | -- | -- | Required | -- | Guidance |
| Monitor Agent | -- | -- | -- | -- | -- | -- | Required | -- |
| Branding Agent | -- | -- | -- | -- | Optional | -- | -- | Primary |
| Chat Agent | -- | -- | -- | -- | -- | -- | -- | Primary |
| Report Writer Agent | -- | -- | -- | -- | -- | Required | -- | Analysis |
| Coach Agent | -- | -- | -- | -- | -- | Required | -- | Primary |
Key: Required means the agent is not useful without this integration. Optional means it enhances the agent but is not required. Via delegates means the coordinator accesses this through specialist delegates. Primary means the agent's value is mostly LLM reasoning, not tool integration.
2.4 Implementation Sequence¶
Based on the dependency matrix and the realistic order of what can be built, the integration rollout follows six waves.
Wave 1: LLM-native agents (no external integrations needed). These agents work with just the platform's built-in capabilities (knowledge facts, document chunks, chat history, LLM reasoning). They can ship as soon as the document ingestion system and agent templates are in place.
| Agent | What it needs | Status |
|---|---|---|
| Writing Agent | LLM + knowledge layers | Ready now |
| Chat Agent | LLM + knowledge layers | Ready now |
| Coach Agent | Document ingestion + knowledge retrieval + library | Needs doc ingestion |
| Knowledge Base Agent | Document ingestion + knowledge retrieval | Needs doc ingestion |
| HR Agent | Document ingestion + knowledge retrieval | Needs doc ingestion |
| Content Agent | LLM + knowledge layers + optional web search | Ready now (basic) |
| Branding Agent | LLM + knowledge layers | Ready now |
| Document Agent | Document ingestion | Needs doc ingestion |
| Data Agent | Document ingestion (for spreadsheet/CSV) | Needs doc ingestion |
The gate for most of these is the document ingestion system. See 05 Persistence, Storage & Ingestion.
Wave 2: Web search integration. Adding web_search and web_read_url unlocks the research-oriented agents.
| Agent | What it unlocks |
|---|---|
| Research Agent | Full web search, read, and synthesise workflow |
| Briefing Agent | Person/organisation research from public sources |
| Content Agent (enhanced) | Competitive research, fact-checking |
| Writing Agent (enhanced) | Factual grounding from web sources |
One web search provider is needed (Tavily recommended for AI agent use), plus URL content extraction.
Wave 3: Task management integration.
| Agent | What it unlocks |
|---|---|
| Task Agent | Full task CRUD |
| Personal Assistant | Delegate to Task Agent |
| Meeting Agent | Extract action items and create tasks |
| Project Coordinator | Task tracking, status, blockers |
| Onboarding Agent | Onboarding task sequences |
| Support Triage Agent | Ticket creation and tracking |
One task provider to start (Todoist for simplicity, or Jira for enterprise). The internal fallback (tasks as knowledge facts) provides basic capability without external integration.
Wave 4: Calendar integration.
| Agent | What it unlocks |
|---|---|
| Calendar Agent | Full calendar CRUD |
| Personal Assistant | Delegate for scheduling |
| Meeting Agent | Pre-meeting briefs, follow-up scheduling |
| Project Coordinator | Milestone and review scheduling |
Google Calendar API (OAuth) first, with Microsoft Graph as the enterprise follow-on.
Wave 5: Email integration.
| Agent | What it unlocks |
|---|---|
| Mail Agent | Full email management |
| Personal Assistant | Delegate for email |
| Meeting Agent | Send summaries |
| Project Coordinator | Send status reports |
Gmail API (OAuth). Email is powerful but complex -- OAuth flows, send permissions, compliance considerations. Worth deferring until the simpler integrations are proven.
Wave 6: CRM and advanced integrations.
| Agent | What it unlocks |
|---|---|
| Customer Intelligence Agent | CRM data access |
| Support Triage Agent (enhanced) | Customer context from CRM |
| Monitor Agent | Full system monitoring with notifications |
HubSpot or Salesforce API. Enterprise-specific and likely driven by early customer requirements.
2.5 Tool Abstraction Architecture¶
Vendor-Agnostic Tool Layer¶
Each integration domain (email, calendar, tasks, CRM) has a vendor-agnostic tool interface and one or more provider implementations.
Tool Registry
|
+-- email_read (abstract)
| +-- GmailProvider
| +-- GraphProvider
| +-- IMAPProvider
|
+-- calendar_read (abstract)
| +-- GoogleCalendarProvider
| +-- GraphCalendarProvider
| +-- CalDAVProvider
|
+-- task_create (abstract)
+-- TodoistProvider
+-- JiraProvider
+-- InternalProvider
Provider Configuration¶
Each tool's config field in the database specifies the active provider and its credentials:
{
"provider": "gmail",
"oauth_token_ref": "vault:gmail-token-user-123",
"scopes": ["https://www.googleapis.com/auth/gmail.readonly"],
"rate_limit": { "max_per_minute": 30 }
}
Provider credentials are stored securely (initially in environment variables or the database with encryption, later via the secrets vault). OAuth tokens are managed per-user -- each user who wants email integration must complete an OAuth consent flow.
Internal Fallback¶
For every integration domain, there is an internal fallback that stores data in Thinklio's own knowledge layer or database. This ensures agents are functional (at a basic level) even without external integrations:
| Domain | Internal fallback |
|---|---|
| Tasks | Store tasks as knowledge facts with category "task" |
| Calendar | Not feasible as internal-only -- calendar needs real calendar data |
| Not feasible as internal-only -- email needs a real mailbox | |
| CRM | Store contact notes as knowledge facts |
| Documents | Thinklio document ingestion system (primary, not a fallback) |
| Web search | Not feasible as internal-only -- needs real web access |
2.6 OAuth Flow for User-Scoped Integrations¶
Email, calendar, and some CRM integrations require per-user OAuth consent. The flow:
- User navigates to settings in the Thinklio UI (or receives a setup link).
- User clicks "Connect Gmail" (or similar).
- Thinklio redirects to the provider's OAuth consent screen.
- User grants access.
- Thinklio receives the OAuth token and stores it securely.
- The agent's tools are now active for this user's scope.
This requires OAuth client registration with each provider (Google Cloud project, Azure app registration, etc.), token storage and refresh handling, scope management (request minimum needed scopes), and revocation handling when a user disconnects.
For v1, manual API key/token configuration by the operator. OAuth flows are a Phase 2 concern for integrations.
2.7 System Prompt Strategy¶
Each agent template includes a system prompt that defines its personality, capabilities, and constraints. System prompts are not included in the catalogue but are a critical implementation artefact.
System prompt components:
- Identity -- who the agent is and what it does.
- Capabilities -- what tools it has and when to use them.
- Constraints -- what it must not do, trust levels, scope limitations.
- Knowledge guidance -- how to use its knowledge layers.
- Output format -- how to structure responses for the expected channel.
- Delegation guidance (coordinators only) -- when to delegate versus handle directly, which delegate for which task.
Example structure:
You are {agent_name}, a {description}.
## Your capabilities
You have access to the following tools:
{dynamically injected tool list}
## How you work
{behavioural guidance specific to this agent}
## What you know
{knowledge layer guidance -- what's in your context and how to use it}
## Constraints
- {trust level constraints}
- {scope constraints}
- {content policy constraints}
## Output
{format guidance for responses}
System prompts will be developed iteratively through testing. Initial versions should be functional but conservative -- it is easier to loosen constraints than to tighten them after users develop expectations.
2.8 Coordinator Delegation Configuration¶
The coordinator agents (Personal Assistant, Meeting Agent, Project Coordinator, Briefing Agent) need pre-configured delegation relationships. These are defined in the AgentTemplate.delegation_config field.
Example: Personal Assistant template delegation:
{
"delegates": [
{
"tool_slug": "mail_agent",
"restrictions": { "send": { "require_approval": true } }
},
{
"tool_slug": "calendar_agent",
"restrictions": { "create_event": { "require_approval": true } }
},
{
"tool_slug": "task_agent",
"restrictions": {}
},
{
"tool_slug": "research_agent",
"restrictions": {}
},
{
"tool_slug": "writing_agent",
"restrictions": {}
}
]
}
When a customer deploys a Personal Assistant from this template, the system creates the PA agent from the template, registers each delegate agent as a tool (type agent) if not already registered, attaches the delegate tools to the PA with the configured restrictions, and runs cycle detection to verify the delegation graph is acyclic. The customer can then modify restrictions per-assignment in Agent Studio.
2.9 Stub Implementation Strategy¶
For initial deployment and testing, every agent ships with at least stub functionality:
| Implementation tier | What works | What does not work |
|---|---|---|
| Stub | System prompt, LLM reasoning, knowledge retrieval, chat. Agent can discuss its domain and answer questions from its knowledge. | No tool calls -- cannot actually read email, check calendar, create tasks, etc. |
| Platform tools | Above + memory_store, memory_search, current_time, document_search | No external integrations |
| Integrated | Above + actual external tool calls (email, calendar, tasks, CRM, web) | Full functionality |
Every agent can ship at the stub tier immediately. The system prompt should acknowledge the agent's current capabilities honestly. For example: "I'm your Calendar Agent. I can help you think about scheduling and time management. When calendar integration is connected, I'll be able to directly read and manage your calendar. For now, I can help you plan your schedule and I'll remember your preferences for when the integration is live."
This approach lets testers interact with every agent from day one, even before integrations exist. Feedback on personality, knowledge behaviour, and delegation routing is just as valuable as feedback on tool execution.
2.10 Monitoring and Quality¶
Per-Agent Metrics¶
Track for each deployed agent: interactions per day/week, average response time, tool call success/failure rate (per tool), knowledge retrieval hit rate, user satisfaction (if feedback mechanism exists), cost per interaction, and delegation success rate (coordinators).
Common Failure Modes¶
| Failure | Detection | Mitigation |
|---|---|---|
| External API down | Tool execution error rate spike | Circuit breaker, fallback to knowledge-only |
| OAuth token expired | 401 from provider | Automatic refresh, notify user if refresh fails |
| Rate limited by provider | 429 from provider | Backoff, queue requests, alert operator |
| LLM hallucination | User feedback, knowledge mismatch | Improve system prompts, increase knowledge coverage |
| Delegation loop | Depth limit exceeded | Already handled by delegation governance |
| Knowledge empty | Low retrieval hit rate | Prompt operator to seed knowledge / upload documents |
3. Predictive Planning & Execution Learning¶
3.1 Problem Statement¶
Thinklio agents make decisions at the think step of every interaction (see 06 Events, Channels & Messaging). Given a user's message and the assembled context, the LLM decides which tools to call, in what order, and with what parameters. This decision is currently stateless: the agent has no structured knowledge of whether a similar approach has worked before, how much it cost, how long it took, or whether users were satisfied with the result.
The platform already records everything needed to learn from past executions. Every step is persisted with its state, cost, duration, and outcome. Every interaction has a terminal state (success, failed, timeout). User feedback (thumbs up/down) is captured at the interaction level. Job outcomes are tracked through the job state machine. But none of this data feeds back into future decision-making.
Without execution learning, agents repeat the same mistakes. An agent that tries tool A for a particular task and fails will try tool A again next time, because it has no memory of the failure. An agent that discovers a three-step approach works better than a five-step approach for a given task type has no way to carry that knowledge forward. Cost and latency vary unpredictably because the agent cannot prefer cheaper or faster approaches that have proven equally effective.
The predictive planning system observes agent executions, records structured outcomes, builds a statistical model of what works, and makes that model available to agents at decision time. The agent remains in control of its decisions; the system provides scores, not instructions. It does not override agent reasoning, does not introduce a separate execution path (it integrates with the existing Harness), is not a data warehouse or analytics platform (it is a real-time scoring service with a learning backend), and avoids premature ML complexity (the initial system must work with sparse data).
3.2 Value Progression¶
This system is an investment with a long payoff curve. The infrastructure built in the first phase delivers little direct value, but every day it runs it accumulates the data that makes later phases transformative.
Stage 1: Data Collection and Bayesian Scoring (implement now). Build the outcome collection pipeline, the canonical plan model, and the Bayesian scoring service. Start recording every execution outcome from day one. The Bayesian model provides initial scores with low confidence, improving as data accumulates. Value: low. The scores are advisory and based on small samples. The real value is the data being captured, which cannot be collected retroactively.
Stage 2: Review, Tune, and Validate (ongoing, months 2 to 6). As the dataset grows, review the Bayesian scores against actual outcomes. Tune the hierarchy weights, the success definition, the decay parameters, and the scope key structure. Validate that the scores correlate with genuinely better outcomes by comparing scored versus unscored agent performance. This is not a separate build phase. It is an operational discipline that runs alongside Stage 1, requiring periodic human review of the score data, not additional engineering. Value: medium. The Bayesian scores become reliable enough to influence agent behaviour.
Stage 3: ML Training (implement when data justifies). When an account has accumulated sufficient execution history (target: 10,000+ outcomes, 50+ distinct plans, 3+ months of operation), train a gradient-boosted model on the feature set. The ML model captures patterns the Bayesian model cannot: feature interactions, non-linear effects, and cross-plan generalisation. Value: medium-high.
Stage 4: Transition from Bayesian to ML (gradual or instant). Shift scoring weight from the Bayesian model to the ML model. The gradual strategy increases ml_weight in the blending formula incrementally (e.g. 0.1 per week) while monitoring prediction accuracy on a rolling validation window, rolling back if accuracy degrades. The instant strategy switches entirely to ML scoring if the model demonstrates statistically significant improvement over the Bayesian baseline on held-out data (measured by log-loss, calibrated over at least 1,000 predictions), with the Bayesian model retained as a fallback. The choice should be made per account based on data volume and risk tolerance. Value: high. Agents reliably choose better plans, cost decreases, success rates increase, and the platform can demonstrate measurable improvement over time.
Stage 5: Autonomous Plan Suggestion (future). The scoring service evolves from "score these candidates" to "here are the candidates you should consider." Given a task description and agent capabilities, the system suggests optimal plans the agent has not generated on its own, synthesised from patterns across the entire execution history. The agent still decides, but the decision space is pre-filtered and ranked. Value: highest. Agents benefit from collective platform intelligence. A new agent can immediately access the distilled experience of thousands of prior executions across the platform.
3.3 Core Concepts¶
What is a "Plan"?¶
A plan is the sequence of tool calls an agent intends to make in response to a user's message. At the think step, the LLM generates a structured plan before executing it. A plan consists of a tool sequence (which tools to call and in what order, e.g. "calendar_lookup, then web_search, then compose_response"), an execution mode (immediate, deferred, or interactive per step), and a parameters pattern (the structural shape of tool parameters, not the specific values, which are instance-dependent).
Plans are canonicalised by stripping instance-specific values (user IDs, specific dates, search queries) and retaining the structural signature. Two interactions that both do "calendar_lookup, web_search, compose_response" with different search queries are executing the same canonical plan.
What is "Success"?¶
Success is measured on a composite scale, not a binary. The inputs:
| Signal | Source | Weight | Available From |
|---|---|---|---|
| Interaction completed without error | Interaction state | Baseline | Day one |
| All steps succeeded | Step states | Baseline | Day one |
| User gave thumbs up | Feedback event | High | Day one |
| User gave thumbs down | Feedback event | High (negative) | Day one |
| Job resolved successfully | Job state | Medium | When jobs are used |
| User continued the chat | Session activity | Low positive | Day one |
| User abandoned the chat | Session inactivity timeout | Low negative | Day one |
The composite score is a weighted sum normalised to [0, 1]. The weights are configurable per account (with sensible defaults). In Stage 1, the system uses a simplified binary: success if the interaction completed without error and the user did not give a thumbs down; failure otherwise. The composite score is a Stage 3 enhancement.
What is the "Context"?¶
Plans do not succeed or fail in isolation. The same plan may work well for one task type and poorly for another. The context captures the conditions under which a plan was executed: which agent (or agent template) was running, a lightweight task classification of the user's intent (derived from the think step's reasoning, not a separate classifier), which tools were available to the agent at execution time, and which channel the interaction came through (web, Telegram, email, etc.). Context is used to scope statistics. The system answers "how well does this plan work for this agent type on this kind of task?" rather than "how well does this plan work globally?"
3.4 Architecture¶
The system has three components: the Outcome Collector (listens to execution events and records structured outcomes), the Score Service (provides real-time plan scoring via an internal API), and the Learning Engine (updates statistical models from accumulated outcomes).
Harness (doc 06 Workflow component)
|
+-- writes event.kind = "interaction.completed" --+
+-- writes event.kind = "step.completed" ---------+
|
v
Outcome Collector
(scheduled Convex function)
|
v
execution_outcome table
|
| (at the think step, before tool selection)
|
+-- Score Service (Convex query) <--- plan_score table
| ^
| |
+-- Learning Engine ----------------+
(scheduled Convex function)
Integration with the Harness¶
The system hooks into the Harness at two points.
After execution (passive, event-driven). The Outcome Collector is a Convex scheduled function that reads the event table for kinds interaction.completed, step.completed, job.resolved, and job.failed (see 06 Events, Channels & Messaging for the event model). It runs on a short cadence (every 30 seconds), batching new events into structured execution_outcome rows. It is entirely passive and adds no latency to the execution path. The read is cursor-tracked so the function processes each event exactly once.
Before tool selection (active, synchronous). At the think step, when the agent has generated one or more candidate plans, it can call the Score Service for historical performance data. The Score Service is a Convex query that reads plan_score rows keyed by plan hash and context. The agent's system prompt includes the returned scores as additional context for the LLM's decision. The LLM remains free to ignore the scores.
The scoring call is a plain Convex query, typically served from the reactive query cache with sub-millisecond overhead after the first call. If the Score Service returns no rows (cold start, new plan, scoped context never seen), the agent proceeds without scores. This is a degraded mode, not a failure.
Event Kinds Consumed¶
The Outcome Collector reads the event table for the following kinds (documented in 06 Events, Channels & Messaging section 3):
| Event kind | Purpose |
|---|---|
interaction.completed |
Capture full interaction outcome, plan structure, cost, duration. |
step.completed |
Capture per-step outcomes for granular analysis. |
job.resolved |
Capture deferred work outcomes. |
job.failed |
Capture deferred work failures. |
These are existing event kinds. The Outcome Collector is a new reader over the shared event table; no new infrastructure is needed.
3.5 Data Model¶
All tables live in the database alongside the existing schema. All tenant-scoped tables include account_id with RLS policies enforcing isolation.
canonical_plan¶
Stores the structural signature of each unique plan the system has observed.
| Field | Type | Description |
|---|---|---|
| id | UUID | PK |
| account_id | UUID | FK to account |
| plan_hash | text | SHA-256 of the canonicalised plan structure. Used for fast lookup. |
| tool_sequence | text[] | Ordered array of tool names (e.g. ['calendar_lookup', 'web_search', 'compose_response']) |
| execution_modes | text[] | Parallel array of execution modes per tool (e.g. ['immediate', 'immediate', 'immediate']) |
| parameter_schema | JSONB | Structural shape of parameters (types and keys, not values) |
| step_count | integer | Number of tool calls in the plan |
| first_seen_at | timestamp | When this plan was first observed |
| last_seen_at | timestamp | When this plan was most recently executed |
| execution_count | integer | Total number of times this plan has been executed (denormalised for fast reads) |
| created_at | timestamp |
Constraints: UNIQUE(account_id, plan_hash) for one canonical record per unique plan structure per account. Indexed on (account_id, plan_hash) for lookup during scoring. RLS policy: account members can read plans from their own account.
execution_outcome¶
Records the outcome of every interaction that involved tool calls.
| Field | Type | Description |
|---|---|---|
| id | UUID | PK |
| account_id | UUID | FK to account |
| interaction_id | UUID | FK to interaction |
| plan_id | UUID | FK to canonical_plan |
| agent_id | UUID | FK to agent |
| agent_template_id | UUID | FK to agent_template (nullable) |
| task_classification | text | Lightweight intent category from the think step |
| channel_type | text | Channel the interaction came through |
| success | boolean | Binary outcome (Stage 1: completed without error and no thumbs-down) |
| composite_score | numeric(4,3) | Weighted outcome score in [0, 1] (Stage 3, nullable until then) |
| feedback | text | thumbs_up, thumbs_down, or none |
| total_cost | numeric(10,6) | Total interaction cost in USD |
| total_duration_ms | integer | Total interaction duration in milliseconds |
| step_count | integer | Number of steps executed |
| steps_succeeded | integer | Number of steps that completed successfully |
| steps_failed | integer | Number of steps that failed |
| metadata | JSONB | Additional context (tool versions, model used, etc.) |
| created_at | timestamp |
Constraints: UNIQUE(interaction_id) for one outcome record per interaction. Indexed on (account_id, plan_id, created_at) for aggregation queries. Indexed on (account_id, agent_id, task_classification) for context-scoped lookups. RLS policy: account members can read outcomes from their own account; the Learning Engine service role can read across accounts for global prior calculation (with de-identification).
plan_score¶
Stores the current Bayesian posterior for each plan in each context. This is the primary table the Score Service reads from.
| Field | Type | Description |
|---|---|---|
| id | UUID | PK |
| account_id | UUID | FK to account |
| plan_id | UUID | FK to canonical_plan |
| scope_key | text | Context scope identifier (e.g. agent:{id}:task:{classification}) |
| alpha | numeric(10,4) | Beta distribution alpha parameter (successes + prior) |
| beta | numeric(10,4) | Beta distribution beta parameter (failures + prior) |
| mean_probability | numeric(4,3) | alpha / (alpha + beta), precomputed for fast reads |
| confidence | numeric(4,3) | 1 minus variance of the Beta distribution, normalised to [0, 1] |
| sample_size | integer | Number of observations backing this score |
| mean_cost | numeric(10,6) | Average cost of executions using this plan in this scope |
| mean_duration_ms | integer | Average duration of executions using this plan in this scope |
| last_updated_at | timestamp | When the Learning Engine last recalculated this score |
| created_at | timestamp |
Constraints: UNIQUE(account_id, plan_id, scope_key) for one score per plan per context scope per account. Indexed on (account_id, scope_key) for Score Service lookups.
Relationship to Existing Tables¶
The execution learning system reads from but does not modify existing tables: interaction (source of interaction state, duration, and session context), step (source of per-step outcomes, costs, and tool call details), job and subjob (source of deferred work outcomes), and agent and agent_template (agent identity and type for context scoping).
The three new tables (canonical_plan, execution_outcome, plan_score) are append-mostly. execution_outcome is write-once (one record per completed interaction). plan_score is updated by the Learning Engine on a schedule. canonical_plan grows as new plan structures are observed.
3.6 Bayesian Scoring Model (Stage 1)¶
Why Bayesian?¶
The system starts with sparse data. A new account might have tens or hundreds of interactions, not millions. Classical ML approaches need large datasets to generalise. A Bayesian approach works from the first observation: it starts with a prior belief, updates it with each outcome, and produces a probability estimate with an explicit measure of confidence.
The Beta-Binomial model is the natural choice for binary outcomes (success/failure). The Beta distribution is the conjugate prior for the Binomial likelihood, which means updates are a simple arithmetic operation, not an optimisation problem.
The Model¶
For each (plan, context scope) pair, maintain a Beta distribution parameterised by alpha and beta:
Prior: Beta(alpha_0, beta_0) where alpha_0 and beta_0 encode the prior belief about success probability.
Update rule: On success, alpha increments by 1. On failure, beta increments by 1.
Posterior mean (probability of success): P = alpha / (alpha + beta).
Confidence: C = 1 - Var(Beta(alpha, beta)) / Var(Beta(1, 1)) where Var(Beta(alpha, beta)) = alpha * beta / ((alpha + beta)^2 * (alpha + beta + 1)). This normalises confidence to [0, 1] where 0 is maximum uncertainty (uniform prior) and 1 approaches certainty.
Hierarchical Priors¶
A brand-new plan in a brand-new account has no observations. Rather than starting from a uniform prior (alpha_0 = 1, beta_0 = 1), the system uses hierarchical smoothing to inherit knowledge from broader scopes:
Level 1 (global): Beta(alpha_global, beta_global)
All outcomes across all accounts (de-identified).
Updated monthly by the Learning Engine.
Level 2 (account): Beta(alpha_account, beta_account)
All outcomes within this account, regardless of agent or task.
Updated hourly.
Level 3 (agent): Beta(alpha_agent, beta_agent)
Outcomes for this specific agent within this account.
Updated on every new outcome.
Level 4 (context): Beta(alpha_context, beta_context)
Outcomes for this agent on this task classification.
Updated on every new outcome.
When scoring a plan at Level 4, if the sample size is below a threshold (default: 10), the prior is pulled from Level 3. If Level 3 is also sparse, it pulls from Level 2, and so on. The blending formula:
effective_alpha = alpha_context + weight * alpha_parent
effective_beta = beta_context + weight * beta_parent
where weight = max(0, 1 - sample_size / threshold). As the context accumulates its own observations, the parent prior's influence fades to zero.
Scope Keys¶
The scope_key in plan_score encodes the hierarchy level:
| Level | Scope Key Pattern | Example |
|---|---|---|
| Global | global |
global |
| Account | account:{account_id} |
account:a1b2c3 |
| Agent | account:{account_id}:agent:{agent_id} |
account:a1b2c3:agent:d4e5f6 |
| Context | account:{account_id}:agent:{agent_id}:task:{classification} |
account:a1b2c3:agent:d4e5f6:task:schedule_meeting |
The Learning Engine maintains scores at all four levels. The Score Service reads the most specific level available and blends with parent levels as described above.
3.7 Score Service¶
Internal API¶
The Score Service is an internal endpoint within the Gateway service (not a separate microservice). It exposes a single method:
POST /internal/plan-scores
Called by the Harness during the think step when the agent has generated candidate plans.
Request:
{
"account_id": "uuid",
"agent_id": "uuid",
"task_classification": "schedule_meeting",
"channel_type": "webchat",
"candidates": [
{
"tool_sequence": ["calendar_lookup", "compose_response"],
"execution_modes": ["immediate", "immediate"]
},
{
"tool_sequence": ["calendar_lookup", "web_search", "compose_response"],
"execution_modes": ["immediate", "immediate", "immediate"]
}
]
}
Response:
{
"scores": [
{
"plan_hash": "sha256...",
"probability": 0.82,
"confidence": 0.65,
"sample_size": 47,
"mean_cost_usd": 0.0034,
"mean_duration_ms": 2100
},
{
"plan_hash": "sha256...",
"probability": 0.71,
"confidence": 0.31,
"sample_size": 12,
"mean_cost_usd": 0.0051,
"mean_duration_ms": 3400
}
],
"source": "bayesian_v1"
}
Performance target: p99 latency under 5 ms. The Score Service is a Convex query over the plan_score table, served from Convex's reactive query cache after the first call. Convex invalidates the cache automatically when the Learning Engine writes new scores; there is no separate cache to manage. Cache warm-up is implicit: the first call loads the row, subsequent calls within the same subscription are free.
Caching via Convex query cache¶
The plan_score rows are read by a Convex query keyed on (accountId, scope_key, planHash). Convex memoises query results for every subscribed caller and re-evaluates only when a relevant write happens. This replaces the Redis cache that existed in the legacy architecture:
- Key: the query arguments
(accountId, scope_key, planHash)form the cache key. - Value: the
plan_scorerow content returned by the query. - Invalidation: automatic. When the Learning Engine writes a new score, every subscription reading that row re-evaluates on the next microtask.
- Cold start: a row that has never been read is fetched once from the database and cached for subsequent reads.
Agent Integration¶
The Harness injects plan scores into the agent's context at the think step. The agent's system prompt includes a section like:
## Historical Plan Performance
Based on past executions of similar tasks, here is the performance data for
approaches you might consider:
| Approach | Success Rate | Confidence | Avg Cost | Avg Time |
|----------|-------------|------------|----------|----------|
| calendar_lookup -> compose | 82% | High (47 obs) | $0.003 | 2.1s |
| calendar_lookup -> web_search -> compose | 71% | Medium (12 obs) | $0.005 | 3.4s |
Use this data to inform your tool selection, but apply your own judgement.
Low-confidence scores are based on limited observations and may not be reliable.
The agent is explicitly told that scores are advisory. The LLM may choose a lower-scoring plan if it has good reason (e.g. the user asked for something the higher-scoring plan cannot do).
3.8 Outcome Collector¶
Event Processing¶
The Outcome Collector is a Convex scheduled function (internalMutation invoked by a cron every 30 seconds). It reads the event table from a persisted cursor, processing each new event exactly once. The cursor is stored in a collector_state row keyed on the collector name.
When an interaction.completed event is read:
- Extract the plan. Query the
steptable for this interaction's act steps. Build the tool sequence and execution modes from the step records. - Canonicalise. Strip instance-specific parameter values, compute the plan hash.
- Find or create the canonical plan. Look up
canonical_planby(accountId, planHash). If not found, insert a new record. - Determine success. Check the interaction state (success/failed), look for a feedback event on this interaction (thumbs up/down), and compute the binary outcome.
- Write the outcome. Insert into
execution_outcome. - Trigger scoring update. Write an internal
planning.outcome_recordedevent so the Learning Engine schedules an incremental update for the affected(plan, scope)rows.
Interactions without tool calls (a simple conversational response with no act steps) have no plan to record. The Outcome Collector skips these.
User feedback (thumbs up/down) may arrive after the interaction has completed. The Outcome Collector reads feedback.recorded events on the same pass and updates the corresponding execution_outcome record. If feedback arrives after the Learning Engine has already processed the outcome, the Learning Engine picks up the correction on its next pass.
3.9 Learning Engine¶
Update Schedule¶
The Learning Engine runs as a periodic background job:
| Task | Frequency | Scope |
|---|---|---|
| Update Level 4 (context) scores | On every new outcome | Affected plan + context only |
| Update Level 3 (agent) scores | Every 15 minutes | All plans for agents with new outcomes |
| Update Level 2 (account) scores | Every hour | All plans in accounts with new outcomes |
| Update Level 1 (global) scores | Daily | All plans across all accounts (de-identified) |
Level 4 updates are triggered by the planning.outcome_recorded event and execute immediately. Higher-level updates are batched for efficiency.
Score Calculation¶
For each (plan, scope) being updated: query execution_outcome for all outcomes matching this plan and scope since the last update, count successes and failures, apply the update rule (alpha increments by successes, beta increments by failures), recompute mean_probability, confidence, mean_cost, and mean_duration, and write the updated plan_score row. Convex invalidates every subscription reading this row automatically; no explicit cache refresh step is needed.
Score Decay¶
Plans that have not been executed recently should have their confidence decay over time. The platform evolves, tools change, and a plan that worked six months ago may not work today. The Learning Engine applies a decay factor on each scheduled update:
where decay_factor = exp(-days_since_last_execution / half_life) and half_life is configurable (default: 90 days). This gradually pulls old scores back towards the prior, ensuring stale data does not dominate.
3.10 Machine Learning Layer (Stage 3)¶
Stage 3 introduces a supervised learning model that runs alongside the Bayesian system. It does not replace it; the two systems produce independent scores that are blended.
Activation Thresholds¶
The ML layer should be activated when an account has accumulated at least 10,000 execution outcomes, at least 50 distinct canonical plans have been observed, and the Bayesian system has been running for at least 3 months. These thresholds can be adjusted. The point is that ML needs enough data to generalise beyond what the Bayesian model already captures.
Feature Engineering¶
The ML model uses features derived from the execution context and plan structure:
| Feature Group | Features | Source |
|---|---|---|
| Plan structure | tool_count, tool_sequence_hash, has_deferred_steps, has_interactive_steps | canonical_plan |
| Tool usage | tool_frequency (per tool), tool co-occurrence pairs | canonical_plan + execution_outcome |
| Context | agent_template_id, task_classification, channel_type, hour_of_day, day_of_week | execution_outcome |
| Historical | bayesian_probability, bayesian_confidence, bayesian_sample_size | plan_score |
| Cost/performance | historical_mean_cost, historical_mean_duration, cost_vs_account_average | plan_score |
The Bayesian scores are included as features for the ML model. This allows the ML model to learn when to trust the Bayesian estimate and when to override it.
Model Choice¶
Gradient-boosted decision trees (XGBoost or LightGBM) are the recommended starting point. They handle mixed feature types, are interpretable via feature importance, train quickly on moderate datasets, and do not require GPU infrastructure. Logistic regression serves as the baseline for comparison.
Training Pipeline¶
Training data consists of all execution_outcome records with their features, labelled by success/failure (Stage 1 binary) or composite_score (Stage 3). Training runs weekly, using all data from the past 6 months (with decay weighting). Validation uses a time-based split (train on older data, validate on recent data) to prevent leakage. Serialised model artefacts are stored in Cloudflare R2, versioned. The trained model is loaded into memory by the Score Service at startup and refreshed when a new version is available.
Score Blending¶
When both Bayesian and ML scores are available, the final score is a weighted blend:
where ml_weight starts at 0 (Bayesian only) and is increased gradually as the ML model proves its accuracy on held-out data. The ML model must demonstrate a statistically significant improvement over the Bayesian baseline before its weight is increased. This is measured by comparing log-loss on a rolling validation window.
3.11 Governance and Privacy¶
Data Isolation¶
All execution outcomes and plan scores are scoped by account_id with RLS policies. No account can see another account's execution history or plan scores.
The one exception is the global prior (Level 1), which aggregates across accounts. This aggregation uses only plan structure (tool sequence), binary outcome (success/failure), and cost/duration. It does not include user identifiers, message content, tool parameters, agent names, or any account-identifying information. The aggregation is performed by a service-role query that strips account_id before writing to the global prior table.
Opt-in/Opt-out¶
Accounts can opt out of the global learning pool via an account setting. Opting out means the account's outcomes are not included in global prior calculations. The account still benefits from its own account-level, agent-level, and context-level scores. The account still receives global prior estimates (since those are aggregated from other participating accounts). There is no penalty for opting out beyond losing the ability to contribute to (and slightly improve) the global prior.
Auditability¶
Every plan score served to an agent is logged with the interaction ID that requested the score, the candidates submitted, the scores returned, and the source model (bayesian_v1, ml_v1, blended). This allows retrospective analysis of whether scores influenced agent decisions and whether those decisions were better than unscored alternatives.
Data Retention¶
execution_outcome: retained for 12 months, then archived (compressed, moved to cold storage in R2).canonical_plan: retained indefinitely (lightweight, grows slowly).plan_score: retained indefinitely (overwritten in place by the Learning Engine).- Score request logs: retained for 3 months.
Retention periods are configurable per account. Enterprise accounts may require longer retention for compliance.
4. Platform Services & LLM Configuration¶
This section specifies the architecture for external service management, LLM model selection, and how the platform resolves API credentials for every external call.
4.1 Two Billing Modes (Per Service)¶
Every external service the platform uses (LLMs, video, voice, PDF, search, etc.) follows the same pattern:
- Platform-managed (default). Thinklio provides the API key. The account's pre-paid credit balance is debited at actual cost + 2% platform margin. The account sees real USD costs in their usage history.
- Bring Your Own Key (BYOK). The account provides their own API key (stored in the secrets vault; see 07 Security & Governance). No credits are deducted for that service. If their key fails or runs out of credit with the provider, the service call fails.
This is configured per service, not globally. An account could use Thinklio's LLM credits but bring their own Twilio key, or vice versa.
4.2 Data Model¶
Platform Service Registry¶
platform_service stores the app-admin-managed registry of all external services.
| Field | Type | Description |
|---|---|---|
| slug | TEXT UNIQUE | e.g. openrouter, twilio, tavus |
| name | TEXT | Display name |
| category | TEXT | llm, embeddings, video, voice_sms, pdf, search, email, crm, tasks, storage, other |
| description | TEXT | What this service does |
| website_url | TEXT | Link to provider's site |
| credential_type | TEXT | api_key, oauth, none |
| platform_key_ref | TEXT | Vault secret name for Thinklio's own key (NULL if none) |
| platform_key_available | BOOLEAN | Whether Thinklio has a platform key |
| is_active | BOOLEAN |
Initial services seeded at launch: OpenRouter (LLM, platform key available, default), Anthropic and OpenAI (LLM, no platform key, BYOK only), Voyage AI (embeddings, platform key), Postmark (email, platform key), Tavily (search, platform key), Tavus (video), Twilio (voice/SMS), DocRaptor (PDF), Todoist, HubSpot, Google Calendar, Gmail (integrations), and Cloudflare R2 (storage, platform key).
LLM Model Registry¶
llm_model stores the curated list of available models, managed by app admins.
| Field | Type | Description |
|---|---|---|
| service_slug | TEXT | Which provider (e.g. openrouter) |
| model_id | TEXT | Provider's model ID (e.g. anthropic/claude-opus-4-6) |
| display_name | TEXT | Human-readable name |
| provider_name | TEXT | Model maker (Anthropic, OpenAI, Google, Meta) |
| recommended_tier | TEXT | deep, general, mini, any |
| input_cost_per_million | NUMERIC | USD per million input tokens |
| output_cost_per_million | NUMERIC | USD per million output tokens |
| context_window | INTEGER | Max tokens |
| is_enabled | BOOLEAN | |
| is_default_deep | BOOLEAN | Platform default for deep tier |
| is_default_general | BOOLEAN | Platform default for general tier |
| is_default_mini | BOOLEAN | Platform default for mini tier |
Three tiers: Deep for complex reasoning, multi-step planning, and delegation (default: Claude Opus 4.6). General for most interactions (default: Claude Sonnet 4.6). Mini for lightweight tasks such as summaries, keyword extraction, and simple classification (default: Claude Haiku 4.5).
Account Service Config¶
account_service_config stores per-account, per-service API key overrides.
| Field | Type | Description |
|---|---|---|
| account_id | UUID | |
| service_slug | TEXT | |
| credentials_ref | TEXT | Vault reference (NULL = use platform key) |
| is_active | BOOLEAN | |
| settings | JSONB | Service-specific config |
Account LLM Preferences¶
account_llm_preference stores which model each account uses for each tier.
| Field | Type | Description |
|---|---|---|
| account_id | UUID | |
| tier | TEXT | deep, general, mini |
| model_id | UUID | FK to llm_model |
If no preference is set, platform defaults apply.
Agent LLM Tier¶
Each agent has an llm_tier field (deep, general, mini) defaulting to general. This determines which model tier is used for that agent's interactions.
4.3 Runtime Resolution¶
LLM Call Resolution¶
The platform resolves which model and credential to use for each LLM call in seven steps:
- The agent has an
llm_tier(deep/general/mini). - Look up
account_llm_preferencefor that tier. - If no preference, use the platform default from the
llm_modeltable. - The model's
service_slugidentifies which provider (openrouter/anthropic/openai). - Look up
account_service_configfor that service. - If
credentials_refis set, fetch the key from the vault and use the account's key (no credit deduction). - If NULL, fetch
platform_key_reffrom the vault and use the platform key, then deduct credits.
Non-LLM Service Resolution¶
- Look up
account_service_configfor the service slug. - If
credentials_refis set, fetch from the vault (no credits). - If NULL and
platform_key_availableis true, fetch the platform key (deduct credits). - If NULL and no platform key, the service is unavailable.
4.4 Vault and Credential Configuration¶
All API keys, whether platform-supplied or account-supplied (BYOK), resolve through the secrets vault described in 07 Security & Governance. The platform_service row carries a platform_key_ref pointing at a vault entry; the account_service_config row carries a credentials_ref for account-supplied keys. Resolution at turn time is: account override first, then platform default, then a clean error if neither is configured.
Bootstrap credentials for the Convex deployment itself (Convex deploy key, Clerk publishable and secret keys, the vault master encryption key) live as Convex environment variables managed via npx convex env set. These are deployment-bootstrap secrets rather than runtime service credentials, and they are invariant for the life of the deployment. Everything else resolves through the vault so rotation is a data-only operation.
4.5 Credential Security¶
Account API keys are never stored in plaintext in any Convex table. All keys live encrypted in the secrets vault; tables reference them by vault name (credentials_ref or platform_key_ref). The Convex governance middleware resolves these references to usable credentials only at the point of outbound call, under caller-scoped authorisation. OAuth tokens follow the same vault pattern with additional refresh-token handling in the oauth_token table. See 07 Security & Governance for the full vault model and MCP credential scoping.
5. Credit-Based Billing¶
5.1 Credit Ledger¶
credit_ledger records every credit movement.
| Field | Type | Description |
|---|---|---|
| account_id | UUID | |
| type | TEXT | purchase, usage, refund, adjustment |
| amount | NUMERIC(12,6) | Positive = credit added, negative = deducted |
| balance_after | NUMERIC(12,6) | Running balance |
| description | TEXT | Human-readable line item |
| service_slug | TEXT | Which service was used |
| interaction_id | UUID | For LLM usage tracking |
account.credit_balance provides a denormalised current balance for fast reads.
All costs are shown in real USD. No abstract credit units. Users see "$0.003" not "3 credits." The balance is a USD prepaid balance.
5.2 Credit Deduction¶
Deduction is atomic: balance check + update + ledger write in a single transaction. Fails if insufficient balance.
5.3 Platform Config¶
platform_config is a single-row table for global platform settings.
| Field | Type | Description |
|---|---|---|
| status | TEXT | online, maintenance, offline |
| status_reason | TEXT | Shown to users during downtime |
| estimated_return | TIMESTAMPTZ | When service is expected back |
| platform_margin_percent | NUMERIC | Default 2.0% |
5.4 Kill Switch¶
Every API request (except /health and /v1/admin/platform/config) passes through kill switch middleware. If platform_config.status is not online, return HTTP 503 with:
{
"error": {
"code": "platform_unavailable",
"message": "Scheduled maintenance in progress",
"status": "maintenance",
"estimated_return": "2026-03-21T06:00:00Z"
}
}
Status is cached for 10 seconds to avoid database hits on every request.
6. Platform Administration¶
6.1 App Admin Role¶
user_profile.is_app_admin is a boolean platform-level flag, independent of account roles. App admins can manage platform-wide configuration, services, models, and (in future) account suspension.
6.2 API Endpoints¶
App Admin (requires is_app_admin)¶
GET /v1/admin/platform/config Read platform status
PATCH /v1/admin/platform/config Update kill switch / margin
GET /v1/admin/platform/services List all services
POST /v1/admin/platform/services Add a service
PATCH /v1/admin/platform/services/{slug} Update a service
DELETE /v1/admin/platform/services/{slug} Disable a service
GET /v1/admin/platform/models List all LLM models
POST /v1/admin/platform/models Add a model
PATCH /v1/admin/platform/models/{id} Update model (costs, defaults, enable/disable)
Account Settings (requires account membership)¶
GET /v1/accounts/services?account_id=... List services with own-key status
POST /v1/accounts/services?account_id=... Set/update account API key for a service
GET /v1/accounts/models?account_id=... List account's LLM tier preferences
POST /v1/accounts/models?account_id=... Set model for a tier
GET /v1/accounts/credits?account_id=... Balance + recent ledger entries
Platform Config (any authenticated user)¶
6.3 UI/UX Changes¶
Settings Page -- New Tabs¶
Models Tab (visible to editor and above). Shows three sections: Deep, General, Mini. Each shows the current model (or "Platform default: Claude Sonnet 4.6"). Dropdown to select from the curated list, filtered by recommended_tier. Each model card shows display name, provider, cost per million tokens, and context window. "Reset to default" button per tier.
Services & Keys Tab (visible to admin and owner). List of all platform services grouped by category. Each shows name, category, and a status indicator (green = using own key, blue = using platform key, grey = unavailable). "Add Key" button opens a form: paste API key, stored in the vault, credentials_ref set. "Remove Key" button removes credentials_ref, falling back to the platform key. OAuth services (Google Calendar, Gmail) show "Connect" button instead of key input. Services without a platform key and no own key show as "Not configured."
Subscription Tab (visible to admin and owner). Current balance displayed prominently (e.g. "$24.37 remaining"). "Add Funds" button (links to payment flow, future). Usage breakdown chart: cost by service over time. Recent transactions table from the credit ledger: date, description, service, amount, balance.
App Admin Tab (visible only when is_app_admin = true). Platform Status card with current status, reason field, estimated return picker, and save button. Services management: add/edit/remove services, set platform key references. Models management: add/edit models, set costs, toggle defaults, enable/disable. Accounts list with search/filter and suspend/unsuspend actions (future). Platform Margin setting with current percentage and edit button.
Role Mapping for Tab Visibility¶
| Tab | viewer | editor | admin | owner | app_admin |
|---|---|---|---|---|---|
| Profile | yes | yes | yes | yes | yes |
| Account | yes | yes | yes | yes | yes |
| Models | yes | yes | yes | yes | |
| Services & Keys | yes | yes | yes | ||
| Subscription | yes | yes | yes | ||
| App Admin | yes |
Agent Studio Changes¶
Add an LLM Tier dropdown to the Agent Studio form (Deep / General / Mini) with a tooltip explaining the cost/capability tradeoff. Default: General.
6.4 Bootstrap vs Steady State¶
| Phase | API Keys Source | Config Source |
|---|---|---|
| Bootstrap (initial deploy) | Env vars (legacy fallback) | Env vars + platform_config table |
| Steady state (target) | Secrets vault only | Vault + platform_config + platform_service |
The transition is gradual and non-breaking. Each service can be migrated independently from environment variable to vault.
7. Implementation Phases¶
Phase 1: Agent Templates and LLM-Native Agents¶
- Create the
AgentTemplatestructure and deploy it to the template registry. - Implement system prompt injection from template configuration.
- Deploy Wave 1 agents (Writing, Chat, Coach, Branding, Content) as LLM-native stubs.
- Implement the Coach Agent's library system integration for document-grounded personas.
- Deploy Knowledge Base, HR, Document, and Data agents once the document ingestion system is available.
- Seed initial account knowledge for HR and Onboarding agent testing.
Phase 2: Web Search and Research Agents¶
- Integrate a web search provider (Tavily recommended).
- Implement
web_search,web_read_url, andweb_read_multipletool abstractions. - Deploy Research Agent and Briefing Agent with full web research capability.
- Enhance Writing and Content agents with web-grounded factual lookup.
Phase 3: Task and Calendar Integration¶
- Implement vendor-agnostic tool abstractions for task management and calendar.
- Integrate the first task provider (Todoist or Jira) and calendar provider (Google Calendar).
- Deploy Task Agent, Calendar Agent, and the first coordinator (Personal Assistant).
- Deploy Meeting Agent, Project Coordinator, Onboarding Agent, and Support Triage Agent.
- Implement OAuth consent flow for user-scoped integrations.
- Implement coordinator delegation configuration and cycle detection.
Phase 4: Email and CRM Integration¶
- Implement email tool abstractions and integrate Gmail API.
- Deploy Mail Agent with full email management.
- Integrate CRM provider (HubSpot or Salesforce) and deploy Customer Intelligence Agent.
- Enhance Monitor Agent with full notification delivery across all channels.
- Deploy Finance Agent when financial data integrations are available.
Phase 5: Predictive Planning System¶
- Create
canonical_plan,execution_outcome, andplan_scoretables with RLS policies. - Deploy the Outcome Collector worker subscribing to interaction and step completion events.
- Deploy the Learning Engine with Bayesian updates at all four hierarchy levels and score decay.
- Deploy the Score Service as a Convex query over
plan_score, backed by Convex's reactive query cache. - Integrate score injection into the Harness think step.
- Add account settings for opt-in/opt-out of global learning.
- Begin Stage 2 tuning and validation once 1,000+ outcomes accumulate.
Phase 6: Platform Services and Credit Billing¶
- Create
platform_service,llm_model,account_service_config,account_llm_preference,credit_ledger, andplatform_configtables. - Seed the platform service registry and LLM model catalogue.
- Implement LLM call resolution (seven-step resolver) and non-LLM service resolution.
- Implement atomic credit deduction with balance check and ledger write.
- Deploy kill switch middleware.
- Build Settings UI: Models tab, Services & Keys tab, Subscription tab, App Admin tab.
- Add LLM Tier dropdown to Agent Studio.
- Migrate platform API keys from environment variables to the secrets vault.
- Deploy BYOK support for accounts with their own API keys.
8. Revision History¶
| Version | Date | Description |
|---|---|---|
| 1.0.0 | April 2026 | Consolidated from old docs 19 (Starter Agent Catalogue v01), 21 (Agent Implementation Logistics v01), 30 (Predictive Planning System v01), and 31 (Platform Services LLM Credits Admin v01). |