Agents Catalogue & Platform Services¶

Document 08 | Version: 1.0.0 | Date: April 2026 | Status: Active

This document describes the starter pack of built-in agents, the integration and tooling infrastructure that powers them, the predictive planning system that helps agents learn from past executions, and the platform services layer that manages LLM models, external service credentials, credit-based billing, and administration.

The agent catalogue defines 23 pre-configured templates organised into four groups (core specialists, coordinators, data and knowledge agents, and organisational specialists). The implementation logistics section maps each agent to its external integration dependencies and establishes a realistic build sequence. The predictive planning system captures execution outcomes, builds Bayesian and (later) ML-based scoring models, and feeds historical performance data back to agents at decision time. The platform services layer manages the external service registry, LLM model selection, per-account API key overrides (BYOK), and a USD-denominated credit ledger.

For the agent architecture, extensibility model, and delegation mechanics, see 03 Agent Architecture & Extensibility. For entity definitions (AgentTemplate, Tool, AgentTool), see 04 Data Model. For credential storage via the Convex-era secrets vault, see 07 Security & Governance. For the durable execution harness that runs every agent interaction, see 06 Events, Channels & Messaging.

Table of Contents

Starter Agent Catalogue
Agent Implementation Logistics
Predictive Planning & Execution Learning
Platform Services & LLM Configuration
Credit-Based Billing
Platform Administration
Implementation Phases
Revision History

1. Starter Agent Catalogue¶

The starter pack consists of 23 built-in platform agents available to all Thinklio customers. Each agent is a pre-configured template that customers can deploy immediately, customise, and compose into larger arrangements using Agent Studio.

The catalogue is organised into four groups:

Group 1: Core Specialists -- standalone agents that do one thing well; the building blocks for coordination.
Group 2: Coordinator Agents -- orchestrate specialists to handle broader, multi-step workflows.
Group 3: Data & Knowledge Agents -- handle structured and unstructured information.
Group 4: Organisational Specialists -- vertical agents suited to specific business functions.

Each entry defines the agent's capabilities, knowledge layer usage, capability level, and suggested settings. Where an agent is Google Workspace or Microsoft 365 compatible, tool assignments are noted as vendor-agnostic by default -- operators configure the actual integration at deployment time.

1.1 Group 1: Core Specialists¶

These agents are purpose-built for a single domain. They are useful standalone and form the reusable building blocks that coordinator agents delegate to.

Mail Agent¶

Tagline: Manages your inbox so you don't have to.

Reads, summarises, triages, and responds to email on behalf of a user or team. Handles routine correspondence autonomously (acknowledging receipt, sending standard responses) and prepares drafts for review on anything that requires a human decision. Works with any email provider via the configured integration.

Capabilities: Read inbox and retrieve messages by sender, subject, date range, or label. Summarise unread mail and flag items requiring action. Classify messages by urgency, topic, and required response type. Create draft replies and new emails (for review or direct send). Send emails with appropriate permission level. Apply labels, archive, or flag messages. Search mail history for prior correspondence. Detect and surface time-sensitive messages (deadlines, meeting requests, approvals).

Knowledge layers: User (personal communication preferences, tone, frequent contacts, recurring topics), Team (shared contacts, project context, standard response patterns), Account (approved communication templates, compliance restrictions on outbound mail content).

Settings:

capability_level:        tools_only
tool_trust_required:     low_risk_write  (drafting and sending)
default_execution_mode:  immediate

Permissions:
  allow:   read, summarise, label, archive, create_draft
  require_approval: send

Suggested per-assignment restrictions:
  Personal:       Full access including send
  Team (shared):  Drafts only -- require human approval before send
  Account-wide:   Read and summarise only

Calendar Agent¶

Tagline: Finds time, books meetings, and keeps your schedule coherent.

Manages calendars across individuals and teams. Finds available time, schedules meetings with internal and external attendees, resolves conflicts, and sends invitations. At the read-only trust level it provides scheduling intelligence without making changes -- useful as a delegate within coordinators that need to check availability before committing.

Capabilities: Read calendar events across one or more calendars. Find free time within a given window, applying working-hours preferences. Detect and surface scheduling conflicts. Create, update, and cancel calendar events. Send and manage meeting invitations. Check attendee availability across a team. Suggest optimal meeting times based on preferences and patterns. Manage recurring events.

Knowledge layers: User (working hours, meeting preferences, blocked time patterns, preferred meeting lengths), Team (team members' availability patterns, shared calendars, recurring team rituals), Account (company holidays, meeting norms, blocked account-wide periods).

Settings:

capability_level:        tools_only
tool_trust_required:     low_risk_write  (creating and updating events)
default_execution_mode:  immediate

Permissions:
  allow:   read, find_free_time, check_conflicts
  require_approval: create_event, cancel_event, send_invitation

Suggested per-assignment restrictions:
  Personal:           Full access including create and cancel
  Team delegate:      find_free_time and check_conflicts only
  Read-only context:  read and check_conflicts only

Task Agent¶

Tagline: Creates, tracks, and closes tasks without manual overhead.

Manages to-do lists, project tasks, and action items across whatever task management integration is configured. Creates tasks from chat, tracks status, sends reminders, and surfaces overdue or upcoming items. Commonly used as a delegate by coordinator agents that need to capture follow-up actions.

Capabilities: Create tasks with title, description, due date, assignee, and priority. Update task status, due dates, and ownership. Retrieve open, overdue, and upcoming tasks for a user or team. Organise tasks into projects or lists. Set and trigger reminders. Extract and create tasks from unstructured input (e.g. meeting notes, email threads). Produce daily or weekly task summaries. Flag tasks that are blocked or overdue.

Knowledge layers: User (personal workload patterns, preferred task organisation style, recurring task types), Team (shared projects, team members' active tasks, workload distribution), Account (project structure, task categorisation standards).

Settings:

capability_level:        tools_only
tool_trust_required:     low_risk_write
default_execution_mode:  immediate

Permissions:
  allow:   read, create_task, update_task, set_reminder
  require_approval: delete_task, reassign_task

Suggested per-assignment restrictions:
  Personal:         Full access
  Team delegate:    Create and update only; no delete
  Read-only:        Read and summarise only

Research Agent¶

Tagline: Searches, synthesises, and delivers structured briefings on any topic.

Performs multi-step web research on behalf of users and teams. Searches, reads, evaluates sources, and synthesises findings into structured briefings. For longer research tasks it operates in deferred mode, dispatching work and delivering results when ready. One of the most commonly used delegates in coordinator configurations.

Capabilities: Perform targeted web searches on a topic or question. Read and extract content from web pages and documents. Evaluate and compare sources. Synthesise findings into structured briefings (summary, key points, sources). Answer specific factual questions with cited sources. Monitor a topic across multiple sources (when used with the Monitor Agent). Save findings to team or user knowledge layer. Produce comparative analyses.

Knowledge layers: Agent (research methodology, source quality heuristics, output templates), User (topic depth preferences, preferred output format, saved research history), Team (prior research the team has commissioned, related project context).

Settings:

capability_level:        workflow
tool_trust_required:     read  (web and documents only)
default_execution_mode:  deferred  (longer research tasks)
                         immediate  (quick factual lookups)

Permissions:
  allow:   web_search, read_url, read_document, write_knowledge_fact
  deny:    send_email, create_event, external_write_tools

Suggested per-assignment restrictions:
  Personal:       Full access
  Team delegate:  Full access; knowledge writes scoped to team layer only
  Restricted:     Web search and read only; no knowledge writes

Writing Agent¶

Tagline: Drafts, edits, and refines written content in your voice.

Produces and improves written content across formats -- emails, reports, proposals, summaries, and long-form documents. Adapts tone and style to context and audience, and can learn individual and organisational voice from the knowledge layers. Frequently used as a delegate within coordinator agents that need to produce polished output.

Capabilities: Draft documents, emails, reports, proposals, and summaries from a brief or outline. Edit existing content for clarity, tone, and conciseness. Proofread for grammar, spelling, and consistency. Adapt writing style for audience (executive, technical, customer-facing). Reformat content. Maintain a consistent account voice using account knowledge. Transform unstructured notes or transcripts into polished documents. Produce multiple variants for comparison.

Knowledge layers: Agent (writing conventions, format templates, style guidance), User (personal writing style, preferred tone, frequently used phrases and structures), Account (brand voice, approved terminology, communication templates, style guide).

Settings:

capability_level:        workflow
tool_trust_required:     read  (no external write tools by default)
default_execution_mode:  immediate  (short content)
                         deferred   (long documents)

Permissions:
  allow:   create_document, read_document, web_search (for factual grounding)
  deny:    send_email, external_system_writes

Suggested per-assignment restrictions:
  Personal:           Full access
  Team:               Full access; save outputs to shared drive
  Regulated context:  Drafts only; no direct publish or send

Document Agent¶

Tagline: Reads, answers questions about, and extracts insight from documents.

Reads files (PDFs, Word documents, spreadsheets, slide decks) and answers questions about their content. Can summarise long documents, compare versions, extract structured data, and flag inconsistencies or compliance gaps. Distinct from the Writing Agent, which produces content -- the Document Agent primarily consumes and analyses it.

Capabilities: Read and summarise documents in any common format. Answer questions about a document's content. Extract specific data points, tables, or lists. Compare two or more documents and highlight differences. Flag sections that are inconsistent, ambiguous, or potentially non-compliant. Index documents into the knowledge layer for ongoing retrieval. Identify action items or commitments. Produce structured summaries (executive summary, key decisions, open questions).

Knowledge layers: User (documents the user has previously asked about, personal context for interpreting ambiguous content), Team (team document library context, related documents for cross-reference), Account (policy documents and reference material used to check compliance).

Settings:

capability_level:        workflow
tool_trust_required:     read
default_execution_mode:  immediate  (short docs)
                         deferred   (large documents or batch processing)

Permissions:
  allow:   read_document, read_url, write_knowledge_fact
  deny:    delete_document, send_email, external_writes

Suggested per-assignment restrictions:
  Personal:           Full access
  Team:               Read and index only; no delete
  Compliance context: Read only; knowledge writes require approval

Chat Agent¶

Tagline: A plain-language interface to everything the account knows.

The most direct interface in the catalogue -- a conversational window onto the full knowledge stack. Has no tools, no external integrations, and no delegation. What sets it apart from a generic LLM chat is context: it draws on all configured knowledge layers (user, team, and account, including indexed documents) to give answers grounded in this organisation's specifics. The UI is intentionally simple -- a plain chat window -- making it an appropriate first-touch interface for users who do not yet know which specialist agent they need.

Capabilities: Answer questions across any topic using model knowledge and all configured knowledge layers. Surface relevant account context -- policies, procedures, project history, indexed document content -- without the user needing to specify where to look. Help users think through problems, decisions, and ideas conversationally. Summarise, explain, or reframe information. Draft short-form content within the chat. Reason through multi-step problems step by step. Adapt communication style to user preferences. Suggest a more appropriate specialist agent when a request is better handled elsewhere.

Knowledge layers: User (active; communication preferences, conversational history, frequently asked topics, personal context), Team (active; project context, accumulated team knowledge, prior research and decisions), Account (active; policies, procedures, indexed documents, reference material).

Settings:

capability_level:        tools_only
tool_trust_required:     none  (knowledge read only)
default_execution_mode:  immediate

Permissions:
  allow:   model_inference, read_user_knowledge,
           read_team_knowledge, read_account_knowledge
  deny:    all_external_tools, web_search, file_write,
           send_email, create_event, create_task

Suggested per-assignment restrictions:
  Personal:           Full knowledge access across all layers
  Team-shared:        Team and account layers only; no personal user layer
  Restricted context: Account layer only (policy and reference lookup)

The Chat Agent has no tool access by design. If a request requires an external action -- searching the web, sending an email, creating a task -- the user should be directed to the Personal Assistant or the relevant specialist.

Coach Agent¶

Tagline: A configurable knowledge companion, trained on your content.

A general-purpose persona agent designed to be given a name, a domain, and a library of relevant content, then deployed as a specialised guide for that domain. Out of the box it does nothing -- it gains its value when an account or user provides documents, sets a persona prompt, and names it. A fitness operator loads exercise science PDFs and calls it "Alex". A compliance team loads regulatory guidance and calls it "ComplianceDesk". The same underlying agent architecture supports both.

The Chat Agent is for open-ended chat across all account knowledge. The Coach Agent is focused: it is positioned in a domain, anchored to a curated library, and speaks with a consistent persona.

The HR Agent and Onboarding Agent in this catalogue are pre-configured instances of the Coach Agent pattern -- each with a fixed domain, default library assignments, and sensible defaults for their context. The Coach Agent is the general template a Thinklio Studio user would reach for when building a domain agent not already covered by the catalogue.

Capabilities: Answer questions grounded in the content of the assigned library, citing specific documents or sections where appropriate. Maintain a consistent persona (name, tone, communication style) configured at deployment. Draw on account knowledge layers for account context alongside library content. Guide a user through a process or topic progressively across a multi-turn chat. Acknowledge the boundaries of its knowledge and redirect to other agents or human contacts when appropriate. Surface summaries, overviews, and key points from library documents on request.

Knowledge layers: User (active; communication preferences, prior topics discussed with this agent), Team (active; team-level context), Account (active; account policies and general reference material), Library (active, primary; one or more Libraries configured at deployment provide the domain-specific knowledge corpus).

The Coach Agent's domain knowledge is delivered through the library system. At deployment, library_assignments in the agent template specifies which libraries the agent draws from and in what priority order. Account-scoped libraries are created by uploading documents via the media API and running the chunk_and_index processor to index them. Platform-scoped libraries (where available) are pre-built corpora for common domains. See 05 Persistence, Storage & Ingestion for the full library architecture.

Settings:

capability_level:        tools_only
tool_trust_required:     none  (knowledge read only)
default_execution_mode:  immediate

Persona configuration (set at deployment or in Studio):
  agent_name:            (e.g. "Alex", "ComplianceDesk", "FitCoach")
  persona_tone:          professional | friendly | coaching | authoritative
  domain_focus:          free text description of the agent's focus area
  system_prompt_addendum: additional instructions for this instance

Library configuration (set at deployment):
  library_assignments:   [{library_id, priority}]

Permissions:
  allow:   model_inference, read_user_knowledge,
           read_team_knowledge, read_account_knowledge,
           read_library
  deny:    all_external_tools, web_search, file_write,
           send_email, create_event, create_task

Suggested per-assignment restrictions:
  Domain expert persona:   Full library access; all knowledge layers
  FAQ / helpdesk persona:  Library access; account layer only; no user layer
  Guided learning persona: Full library access; chat history depth
                           increased for multi-session continuity

Without library content loaded, the Coach Agent falls back to model knowledge and account knowledge layers. It will still function as a conversational agent, but its domain authority depends on a well-populated library.

1.2 Group 2: Coordinator Agents¶

Coordinator agents orchestrate two or more specialists to handle multi-step workflows. They are the "front door" agents that users interact with most, delegating the specifics to the appropriate specialists behind the scenes. All coordinator agents require workflow capability level or higher.

Personal Assistant¶

Tagline: Your intelligent front door -- handles requests, delegates to specialists, follows up.

The Personal Assistant is the generalist coordinator for individual users. It handles the full range of day-to-day requests -- from "what's on my plate today" to "research this topic and draft a briefing" -- by delegating to the appropriate specialist agents. It maintains strong user-level knowledge, learns preferences over time, and synthesises results from multiple delegates into coherent responses.

Capabilities: Triage incoming requests and route them to the appropriate specialist agent. Coordinate multi-step tasks across Mail, Calendar, Task, and Research agents. Provide daily briefings (unread mail summary, today's calendar, open tasks, reminders). Synthesise results from multiple delegate agents into a single response. Track open items and follow up on pending work. Manage the user's context and preferences through the user knowledge layer. Handle general questions and chat directly when no delegation is needed. Escalate to the user when a decision is required before proceeding.

Knowledge layers: User (very active; accumulates preferences, patterns, priorities, and personal context across all interactions), Team (project context, shared priorities, team norms), Account (policies and procedures the PA needs to operate within).

Delegates to: Mail Agent, Calendar Agent, Task Agent, Research Agent, Writing Agent, Document Agent.

Settings:

capability_level:        workflow  (learning recommended)
tool_trust_required:     low_risk_write  (via delegates)
default_execution_mode:  mixed  (immediate for quick lookups; deferred for research tasks)

Permissions:
  allow:   delegate_to_all_assigned_specialists
  require_approval: send_email, create_event, delete_task

Suggested delegation restrictions (per delegate):
  Mail Agent:       drafts only (no autonomous send by default)
  Calendar Agent:   find_free_time and check_conflicts only
                    (user must confirm before creating events)
  Task Agent:       full create and update
  Research Agent:   full access
  Writing Agent:    full access

Meeting Agent¶

Tagline: Prepares agendas, captures notes, and turns meetings into action.

The Meeting Agent handles the full meeting lifecycle -- before, during, and after. Before a meeting it prepares an agenda and briefs attendees. After a meeting it processes notes or transcripts to extract action items, decisions, and follow-up tasks, then sends a summary to attendees. It coordinates Calendar, Mail, Task, and Writing agents to do this seamlessly.

Capabilities: Prepare structured agendas from a meeting brief or prior discussion. Brief the organiser on attendees, context, and prior meeting history. Process meeting notes or transcripts (text input or uploaded file). Extract action items, decisions, open questions, and owners from notes. Create tasks from extracted action items. Draft and send meeting summary emails to attendees. Schedule follow-up meetings when needed. Maintain a searchable history of past meetings and their outcomes in the team knowledge layer.

Knowledge layers: User (personal meeting preferences, recurring attendees, typical agenda formats), Team (project history, team member roles, prior meeting outcomes, running decisions log), Account (meeting norms, approved agenda templates, relevant policies).

Delegates to: Calendar Agent, Mail Agent, Task Agent, Writing Agent.

Settings:

capability_level:        workflow
tool_trust_required:     low_risk_write  (via delegates)
default_execution_mode:  immediate  (preparation and extraction)
                         deferred   (transcript processing for long meetings)

Permissions:
  allow:   read_calendar, create_task, write_knowledge_fact
  require_approval: send_summary_email, create_follow_up_event

Suggested delegation restrictions:
  Mail Agent:     drafts only (require approval before send)
  Calendar Agent: read and check_conflicts only
                  (separate approval for creating follow-up events)
  Task Agent:     full create access for extracted action items
  Writing Agent:  full access for summaries and agendas

Project Coordinator¶

Tagline: Keeps projects moving -- tracks status, surfaces blockers, coordinates the team.

The Project Coordinator is the team-level equivalent of the Personal Assistant. It maintains an overview of a project's status, surfaces blockers and overdue items, coordinates scheduling across the team, and produces status reports. It is deployed at the team level and draws heavily on the team knowledge layer, which accumulates project context over time.

Capabilities: Provide project status summaries on demand and on a schedule. Track task completion across the team and surface overdue or blocked items. Identify and escalate risks and blockers. Schedule project milestones, reviews, and team check-ins. Produce weekly status reports for stakeholders. Coordinate cross-team dependencies (scheduling, communication). Maintain a running decisions log and open questions register in team knowledge. Onboard new team members to project context.

Knowledge layers: Team (very active; accumulates all project decisions, status, context, client details, and team history), Account (project governance policies, reporting templates, escalation procedures), Agent (project management methodology and status report formats).

Delegates to: Calendar Agent, Task Agent, Research Agent, Writing Agent, Mail Agent.

Settings:

capability_level:        workflow
tool_trust_required:     low_risk_write  (via delegates)
default_execution_mode:  mixed

Permissions:
  allow:   read_tasks, create_task, write_knowledge_fact,
           read_calendar, draft_email
  require_approval: send_status_report, create_milestone_event

Suggested delegation restrictions:
  Mail Agent:     drafts only (team lead approves before send)
  Calendar Agent: create_event allowed (for internal team scheduling)
  Task Agent:     full access
  Writing Agent:  full access for reports; drafts only for external comms

Briefing Agent¶

Tagline: Knows who you're meeting before you walk in the room.

The Briefing Agent compiles structured pre-meeting briefings from publicly available sources. Given a name, organisation, or meeting invite, it researches the relevant individuals and produces a formatted one-pager (or multi-person brief for group meetings) covering professional background, current role, qualifications, and inferred chat preferences. A custom brief -- what you're there to discuss, what you're pitching, what outcome you need -- focuses the output on what's most relevant to your specific meeting.

Capabilities: Research individuals from public sources (company websites, LinkedIn, news, published interviews, academic profiles, public filings). Research companies from public sources (about pages, annual reports, press releases, news coverage, board and executive listings). Compile individual profiles: name, photo URL, current title and employer, career history summary, qualifications and credentials, notable work or publications, known interests and positions. Infer chat preferences from public signals: topics they frequently speak or write about, known causes or affiliations, public positions, things notably absent from their public profile. Handle group briefings: company overview plus a profile card per named individual (board, executive team, or meeting attendees). Accept a custom brief describing the meeting purpose, what is being pitched or discussed, and any specific angles to prioritise, and weight the output accordingly. Flag confidence level for inferred preferences (distinguishing "stated publicly" from "inferred from patterns"). Note when information could not be found rather than speculating. Output as a formatted document (PDF or HTML recommended for photo embedding; markdown for plain text environments). Save briefings to user or team knowledge layer for reuse.

Knowledge layers: User (prior briefings prepared for this user; meeting context preferences and output format preferences), Team (prior research on contacts and organisations the team has met; relationship history), Account (nil -- this agent works from public sources, not internal knowledge).

Delegates to: Research Agent (for web research and source synthesis), Writing Agent (for formatting and polishing the final output), Document Agent (for reading any uploaded background material the user provides).

Input parameters (custom brief prompt):

The briefing request should include as much of the following as possible:

Who you are meeting:   [Name / Organisation / LinkedIn URL / meeting invite]
Meeting purpose:       [What the meeting is for -- pitch, catch-up, negotiation, interview]
Your context:          [Who you are, what you're offering or seeking]
Focus areas:           [Specific topics, angles, or questions to prioritise]
Output format:         [One-pager / multi-person group brief / executive summary]

Settings:

capability_level:        workflow
tool_trust_required:     read  (web research and document read only)
default_execution_mode:  deferred  (research takes time; deliver when ready)

Permissions:
  allow:   web_search, read_url, read_document (uploaded background material),
           create_document, write_knowledge_fact
  deny:    read_internal_crm_without_approval, send_email,
           access_private_data

Privacy note: This agent uses publicly available information only.
              It does not access private records, internal systems,
              or data the subject has not made public. Inferred
              preferences are flagged as inference, not fact.

Output format guidance:
  Plain text / markdown:  Suitable for Telegram delivery or quick reads
  PDF / HTML:             Required for photo embedding and formatted layout
                          (configure the Writing Agent or Document Agent
                          as delegate for formatted output)

Suggested per-assignment restrictions:
  Personal:             Full access; save to user knowledge
  Team:                 Full access; save to team knowledge
  Restricted context:   Deliver briefing only; no knowledge writes

1.3 Group 3: Data & Knowledge Agents¶

These agents handle structured and unstructured information -- reading data, managing knowledge, and producing formatted reports.

Data Agent¶

Tagline: Reads your data and tells you what it means.

The Data Agent works with structured data -- spreadsheets, CSV exports, database query results -- to answer questions, surface trends, and produce summaries. It does not replace a full analytics platform but handles the common case of "I have this spreadsheet, tell me what's going on." Results can be presented as prose summaries, tables, or chart descriptions.

Capabilities: Read and parse spreadsheets, CSV files, and tabular data. Answer natural-language questions about data content. Compute aggregations (totals, averages, counts, percentage changes). Identify trends, anomalies, and outliers. Compare datasets across time periods or categories. Produce plain-language summaries of findings. Generate structured data tables from prose or unstructured sources. Flag data quality issues (missing values, inconsistencies, duplicates).

Knowledge layers: User (preferred analysis framing, recurring reports, data interpretation preferences), Team (data context -- what the numbers mean in this team's work), Account (data definitions, calculation methodologies, reporting standards).

Settings:

capability_level:        workflow
tool_trust_required:     read  (read-only data access by default)
default_execution_mode:  immediate  (small datasets)
                         deferred   (large files or complex analyses)

Permissions:
  allow:   read_file, read_spreadsheet, create_document (for reports)
  deny:    write_spreadsheet, delete_file (by default)
  require_approval: write_spreadsheet  (if data correction is in scope)

Suggested per-assignment restrictions:
  Personal:           Full read; write on request
  Team:               Read only
  Finance context:    Read only; all outputs require review before distribution

Knowledge Base Agent¶

Tagline: The team's institutional memory -- answers questions from what the account knows.

The Knowledge Base Agent is the primary interface for querying the team and account knowledge layers. It answers questions by drawing on indexed knowledge facts, documents, and prior interaction history. Unlike the Research Agent (which searches the web), the Knowledge Base Agent works from what the account already knows. It is particularly valuable for onboarding, policy lookups, and maintaining institutional continuity.

Capabilities: Answer questions by retrieving and synthesising from team and account knowledge layers. Indicate confidence and cite the source fact or document for every answer. Flag when knowledge is stale, conflicting, or absent (and suggest an update). Accept and index new knowledge contributions from users. Provide a summary of what the team knows about a given topic, client, or project. Surface related knowledge when answering a question. Suggest knowledge gaps based on patterns of unanswered questions.

Knowledge layers: Account (primary; policies, procedures, reference material), Team (primary; accumulated project and client context), Agent (retrieval methodology, confidence scoring, knowledge gap patterns).

Settings:

capability_level:        tools_only
tool_trust_required:     read  (knowledge read)
                         low_risk_write  (for accepting new contributions)
default_execution_mode:  immediate

Permissions:
  allow:   read_knowledge, write_knowledge_fact (contributions only)
  require_approval: modify_knowledge_fact, delete_knowledge_fact
  deny:    web_search, external_writes

Suggested per-assignment restrictions:
  Team member:      Full read; write contributions allowed
  External-facing:  Read only; account layer only (no team layer exposed)
  Admin:            Full read and write including modifications

Report Writer Agent¶

Tagline: Data in, formatted report out.

The Report Writer takes structured or semi-structured input -- database query results, CSV exports, JSON data, research findings, or any combination -- and produces a complete, formatted report document. It is a structured output agent: its purpose is not to discuss data or surface findings conversationally, but to reason about what narrative the data supports and render that into a publication-ready document. It is distinct from the Writing Agent (which produces general prose from a brief) and the Data Agent (which surfaces analysis in plain text). The Report Writer's primary output formats at launch are HTML/CSS for PDF rendering via DocRaptor, and Markdown for lightweight or version-controlled delivery. Google Docs and Microsoft Word are planned for a future release.

Capabilities: Accept structured data inputs: CSV, JSON, spreadsheet exports, database query results, or pasted tabular data. Accept an optional report brief specifying audience, purpose, key questions, required sections, and output format. Reason about the data: identify the principal narrative, surface significant trends, flag anomalies, and determine what to foreground versus background. Select an appropriate report structure for the audience (executive summary, full analytical report, operational digest). Write narrative prose sections, callout figures, section summaries, and a concluding interpretation. Produce formatted data tables and annotated figures within the document. Apply account-level report templates, style guidelines, and branding from the knowledge layer. Output as styled HTML/CSS for PDF rendering via DocRaptor. Output as Markdown for lightweight or version-controlled delivery. (Planned) Output to Google Docs or Microsoft Word via configured integration. Produce multi-section reports with consistent structure and internal cross-references.

Knowledge layers: Account (primary; report templates, style guide, branding, standard section structures, approved table and figure styles), Team (recurring report formats, standard KPI definitions, known data source context, audience preferences), User (preferred output format, personal report style preferences, frequently reported data types), Agent (data-to-narrative reasoning methodology, format selection heuristics, output quality standards).

Delegates to: Data Agent (for analytical depth on complex or large datasets), Writing Agent (for polishing narrative prose sections).

Settings:

capability_level:        workflow
tool_trust_required:     read  (data sources and knowledge)
                         low_risk_write  (document creation)
default_execution_mode:  immediate  (short reports, simple data)
                         deferred   (multi-section reports, large datasets)

Permissions:
  allow:   read_file, read_data, read_knowledge,
           create_document
  deny:    send_email, modify_source_data,
           external_system_writes

Output format permissions (configure at deployment):
  html_css:       enabled  (rendered to PDF via DocRaptor)
  markdown:       enabled
  google_docs:    disabled  (planned)
  word_docx:      disabled  (planned)

Suggested per-assignment restrictions:
  Personal:           Full access; any configured output format
  Team:               Full access; outputs saved to shared drive
  Regulated context:  Draft output only; review required
                      before distribution

1.4 Group 4: Organisational Specialists¶

These agents are suited to specific business functions. They operate at team or account scope and typically draw heavily on the account knowledge layer.

HR Agent¶

Tagline: Answers HR questions, guides processes, and connects people to the right resources.

The HR Agent is a first-line resource for employee questions about policies, entitlements, processes, and procedures. It draws on the account knowledge layer (which HR admins maintain) to give accurate, policy-grounded responses. It does not make HR decisions -- it interprets policy and routes complex or sensitive cases to a human. It is deliberately conservative with write access.

The HR Agent is a pre-configured instance of the Coach Agent pattern, with a fixed domain (HR policy and process), default library assignments for HR documentation, and appropriate permission restrictions. See the Coach Agent entry in Group 1 for the underlying architecture.

Capabilities: Answer questions about leave entitlements, policies, and procedures. Guide employees through standard HR processes (onboarding, offboarding, performance reviews, leave requests). Explain benefits, remuneration, and entitlement policies. Help employees find the right form, document, or point of contact. Draft standard HR communications (offer letters, policy acknowledgements) from templates. Escalate sensitive or complex cases to an HR team member. Log queries for HR team visibility (anonymised where appropriate).

Knowledge layers: Account (primary; HR policies, procedure guides, entitlement schedules, approved templates), User (minimal; session context only; no personal HR data stored in user knowledge layer).

Settings:

capability_level:        tools_only
tool_trust_required:     read  (policy retrieval)
                         low_risk_write  (drafting from templates only)
default_execution_mode:  immediate

Permissions:
  allow:   read_knowledge, create_document (from templates only)
  require_approval: send_hr_communication, modify_hr_record
  deny:    read_personal_records_without_approval, web_search

Important: User knowledge writes are disabled. This agent does not
           accumulate personal data about individual employees.

Suggested per-assignment restrictions:
  Employee self-service:   Policy read and process guidance only
  HR admin:                Full access including template drafting
  Manager:                 Read policies; draft communications with approval

Finance Agent¶

Tagline: Tracks spending, categorises expenses, and produces financial summaries.

The Finance Agent helps individuals and teams manage expense tracking, budget monitoring, and financial reporting. It reads financial data from connected sources, categorises transactions, flags anomalies, and produces structured summaries. It operates in read-only mode by default; any write actions (updating records, submitting claims) require explicit approval.

Capabilities: Read expense records, bank statements, and budget data from connected sources. Categorise transactions against a chart of accounts or expense policy. Compare actual spend against budget by period, category, or team. Flag anomalies, duplicate transactions, and policy breaches. Produce expense summaries and reports for a given period. Draft expense claims or reimbursement requests. Reconcile receipts against expense records. Provide budget utilisation summaries for managers.

Knowledge layers: User (personal expense patterns, recurring expense categories, preferred report format), Team (team budget allocations, project cost codes, shared expense policies), Account (chart of accounts, expense policy, approval thresholds, finance calendar).

Settings:

capability_level:        workflow
tool_trust_required:     read  (default -- financial data read only)
                         high_risk_write  (for any submission or payment action)
default_execution_mode:  immediate  (summaries)
                         deferred   (reconciliation runs, period-end reports)

Permissions:
  allow:   read_financial_data, create_document (for reports)
  require_approval: submit_expense_claim, update_financial_record
  deny:    initiate_payment, delete_financial_record

Suggested per-assignment restrictions:
  Personal:           Full read; draft claims (with approval to submit)
  Team manager:       Full read across team; summary reports only
  Finance team:       Full access including submission workflows

Support Triage Agent¶

Tagline: Receives requests, classifies them, routes them, and keeps requestors informed.

The Support Triage Agent is designed for teams that receive a volume of inbound requests -- internal IT support, customer service, facilities, or any other service function. It classifies incoming requests, routes them to the right queue or person, creates tickets, and keeps requestors updated on status. It can handle first-line responses autonomously and escalates when a human is needed.

Capabilities: Receive and acknowledge inbound requests via any configured channel. Classify requests by type, urgency, and required skill. Route requests to the appropriate team member, queue, or escalation path. Create tickets in the configured task or ticketing system. Draft first-line responses from a knowledge base (FAQ, known issue library). Resolve simple requests without escalation (knowledge-based answers). Update requestors on ticket status and estimated resolution. Flag SLA breaches and overdue tickets. Produce support queue summaries and trend reports for team leads.

Knowledge layers: Account (service catalogue, routing rules, escalation policies, approved response templates), Team (team members' specialisations and availability, known issue backlog, resolution history), Agent (classification models, routing heuristics, SLA thresholds).

Settings:

capability_level:        workflow
tool_trust_required:     low_risk_write  (ticket creation, status updates)
default_execution_mode:  immediate  (triage and first response)
                         deferred   (human handoff jobs)

Permissions:
  allow:   read_inbound, create_ticket, update_ticket,
           send_acknowledgement, read_knowledge
  require_approval: close_ticket, send_resolution_response,
                    escalate_to_external_team
  deny:    access_personal_customer_data_without_approval

Suggested per-assignment restrictions:
  External-facing:    Acknowledgement and status updates only;
                      all substantive responses require approval
  Internal IT:        Full triage and first-line resolution
  Customer service:   Full triage; escalation requires team lead approval

Content Agent¶

Tagline: Produces on-brand content across channels -- social, blog, email, and more.

The Content Agent drafts, adapts, and manages content for marketing and communications. It works from the account knowledge layer's brand guidelines and voice documentation to produce content that stays on-message across channels. It can adapt a single piece of source material into formats appropriate for different audiences and platforms. It is typically deployed at team or account level, not personal.

Capabilities: Draft social media posts (LinkedIn, X/Twitter, and others) from a brief or source material. Write blog posts and articles from an outline or research brief. Produce email newsletters and campaign copy. Adapt existing content for different audiences, channels, or tones. Maintain brand voice consistency using account-level brand guidelines. Suggest content angles and topics based on a given theme or objective. Produce multiple variants of copy for testing. Check content against account style guide and flag deviations.

Knowledge layers: Account (primary; brand voice, style guide, approved terminology, content templates, past published content), Team (campaign context, product messaging, target audiences), Agent (channel-specific best practices and format guidance).

Settings:

capability_level:        workflow
tool_trust_required:     read  (brand and reference material)
                         low_risk_write  (drafts and document creation)
default_execution_mode:  immediate

Permissions:
  allow:   read_knowledge, create_document, web_search
            (for factual research and competitive awareness)
  require_approval: publish_content, send_campaign
  deny:    post_to_social_directly (drafts only until approved)

Suggested per-assignment restrictions:
  Copywriter:         Full draft access; publish requires approval
  Marketing manager:  Full access including publish approval
  Agency/contractor:  Draft only; all output reviewed before use

Customer Intelligence Agent¶

Tagline: Knows your customers -- briefs you before calls, logs interactions, surfaces opportunities.

The Customer Intelligence Agent is the interface between the team and its CRM data. It prepares meeting briefs (who you're meeting, what they've bought, what's outstanding), logs interaction notes back to the CRM, flags follow-up opportunities, and produces account summaries. It connects to any CRM via the configured external tool integration.

Capabilities: Retrieve and summarise customer and account records from the CRM. Prepare pre-meeting briefs (contact history, outstanding issues, prior chats). Log meeting notes and interaction summaries back to the CRM. Flag accounts with open opportunities, upcoming renewals, or overdue follow-ups. Produce account health summaries for a portfolio or territory. Answer questions about a specific account's history, purchases, or status. Surface upsell or cross-sell signals based on account data. Sync meeting outcomes to the CRM automatically.

Knowledge layers: Team (sales process, account ownership, relationship context, deal history), Account (CRM data schema, sales methodology, account tiers and policies), User (personal relationship notes, private context about specific contacts).

Settings:

capability_level:        workflow
tool_trust_required:     read  (CRM read, default)
                         low_risk_write  (logging notes back)
default_execution_mode:  immediate

Permissions:
  allow:   read_crm, write_interaction_log, create_task
  require_approval: update_account_record, create_opportunity,
                    delete_contact
  deny:    export_customer_data_without_approval

Suggested per-assignment restrictions:
  Sales rep:          Full read; log notes; create tasks
  Manager:            Full read across team; no write
  External reviewer:  Anonymised summaries only

Onboarding Agent¶

Tagline: Gets new team members up to speed quickly and without burdening the rest of the team.

The Onboarding Agent guides new employees through their first days and weeks. It answers questions, surfaces relevant policies and procedures, assigns onboarding tasks, tracks progress, and escalates blockers. It draws heavily on the account knowledge layer, which administrators maintain with onboarding content. It reduces the time senior team members spend on routine orientation activities.

Like the HR Agent, the Onboarding Agent is a pre-configured instance of the Coach Agent pattern, with a fixed domain (employee onboarding), default library assignments for onboarding content, and task management capabilities enabled.

Capabilities: Welcome new starters and guide them through a structured onboarding sequence. Answer questions about policies, benefits, tools, and procedures. Assign and track onboarding tasks (reading, account setup, introductions). Surface relevant documentation contextually (e.g. "now that you've completed payroll setup, here's the expense policy"). Introduce the new starter to relevant team members and suggest intro meetings. Flag onboarding blockers (e.g. access not granted, task not completed after N days). Produce onboarding progress reports for HR or team leads. Capture feedback on the onboarding experience for continuous improvement.

Knowledge layers: Account (onboarding sequences, policies, procedures, tools documentation, team structure), Team (team-specific onboarding steps, project context, who does what), User (the new starter's progress, questions asked, tasks completed).

Settings:

capability_level:        workflow
tool_trust_required:     read  (knowledge retrieval)
                         low_risk_write  (task assignment and progress tracking)
default_execution_mode:  immediate

Permissions:
  allow:   read_knowledge, create_task, update_task,
           read_calendar (for scheduling intro meetings)
  require_approval: send_external_communication,
                    modify_onboarding_sequence
  deny:    read_payroll_data, read_personal_hr_records

Suggested per-assignment restrictions:
  New starter:        Full access for self-guided onboarding
  HR admin:           Full access including sequence modification
  Manager:            Progress read only; escalation alerts

Monitor Agent¶

Tagline: Watches for conditions you define and alerts you when they're met.

The Monitor Agent runs in the background, watching for conditions across connected systems, inbound data, and job results. When a condition is met -- a budget threshold crossed, a job completing, an inbound message matching a pattern, a metric exceeding a limit -- it triggers an alert or a follow-up action. It is most useful at team and account level for operational awareness and for closing the loop on deferred work.

Capabilities: Monitor job completion and deliver results to a specified agent or user. Watch for threshold conditions on budget, usage, error rates, or custom metrics. Monitor inbound channels for messages matching specified patterns or keywords. Trigger alert notifications via any configured channel (email, Telegram, web chat). Escalate alerts that have not been acknowledged within a configured time window. Produce periodic status digests (hourly, daily, weekly) summarising monitored conditions. Register as a job observer on behalf of another agent or user. Maintain a log of all triggered alerts and their acknowledgement status.

Knowledge layers: Team (monitored conditions, alert preferences, escalation paths, prior alert history), Account (alert policies, escalation procedures, SLA thresholds).

Settings:

capability_level:        tools_only
tool_trust_required:     read  (read-only access to monitored systems)
                         low_risk_write  (sending alert notifications)
default_execution_mode:  deferred  (monitors are inherently background processes)

Permissions:
  allow:   read_jobs, read_metrics, register_job_observer,
           send_notification, read_budget
  require_approval: trigger_external_action, modify_alert_thresholds
  deny:    modify_monitored_systems, delete_records

Suggested per-assignment restrictions:
  Team operational:   Standard alert thresholds; notify team channel
  Account admin:      Full access to all account-level metrics and jobs
  Read-only context:  Notification delivery only; no escalation actions

Branding Agent¶

Tagline: Turns a rough idea into a coherent visual and verbal identity.

The Branding Agent works interactively with a user to develop a brand identity for a project, product, app, or business. It begins by asking structured clarifying questions to understand the concept, audience, values, and tone, then produces a brand specification covering colour palette, typography, logo and icon directions, and brand voice guidelines. The output is a formatted brand brief suitable for handing to a designer or feeding directly into an image generation prompt. It does not produce actual graphics; it produces the specification that drives them.

Capabilities: Ask structured clarifying questions to establish brand context: what the thing is, who it's for, what personality it should have, what it should feel like, what it should not feel like, and any existing constraints (e.g. existing colours, names, or assets to work around). Suggest a primary colour palette (two to three colours with hex codes, names, and rationale). Suggest supporting and accent colours with usage guidance. Recommend typography pairings (heading and body typefaces) with rationale and fallback options. Describe logo and icon concept directions (three to five distinct directions, each with a concept description, visual metaphor, style reference, and suggested mood). Produce brand voice guidelines: tone adjectives, dos and don'ts, example phrases that are on-brand versus off-brand. Generate a consolidated brand brief document ready for a designer or image generation tool. Suggest naming directions if the project is unnamed or the name is under consideration. Iterate on any element based on feedback. Optionally produce image generation prompts (for Midjourney, DALL-E, or similar) based on the agreed logo concept directions.

Knowledge layers: Agent (design principles, colour theory, typography conventions, brand strategy frameworks, logo concept vocabulary), User (prior branding work the user has done; their aesthetic preferences and known dislikes; brand briefs previously produced), Account (existing brand elements if the request is a brand extension rather than a new identity; approved design system components).

Delegates to: Writing Agent (for polishing the final brand brief document and voice guidelines).

Execution note: This agent is one of the few in the starter pack that is inherently interactive before it can produce output. It should always begin with a clarifying question sequence before generating any brand elements. The quality of the output is directly proportional to the quality of the brief gathered. A bare minimum viable brief requires: what the thing does, who uses it, and three adjectives that describe how it should feel.

Input parameters (initial prompt):

The user's starting prompt can be as brief as a sentence or as detailed as a full brief. The agent fills gaps through questions. Typical useful starting information:

What it is:          [Product / app / business / event / project name and description]
Audience:            [Who will encounter this brand]
Personality:         [Adjectives -- what should it feel like?]
Anti-personality:    [What should it definitely NOT feel like?]
Constraints:         [Any existing assets, colours, or names to work around]
Output needed:       [Full brand spec / colour palette only / logo directions only]
Designer handoff:    [Yes -- produce a PDF brief / No -- working doc is fine]

Settings:

capability_level:        workflow
tool_trust_required:     read  (reference research, design inspiration lookup)
default_execution_mode:  interactive  (clarifying questions before generation)
                         deferred     (final brand brief document production)

Permissions:
  allow:   web_search (competitor and inspiration research),
           create_document, write_knowledge_fact
  deny:    send_email, external_system_writes

Optional integration:
  image_generation_tool:  If configured, the agent can pass agreed logo
                          concept directions directly to an image generation
                          tool as structured prompts. Disabled by default.

Suggested per-assignment restrictions:
  Personal / solo founder:   Full interactive access; full output
  Team (design review):      Output to shared drive; changes require
                             design lead sign-off before brand is adopted
  Agency / builder:          Full access; outputs scoped to client context
                             in team knowledge layer

1.5 Summary Table¶

#	Agent	Group	Context	Capability Level	Delegates To
1	Mail Agent	Specialist	Personal / Team	tools_only	--
2	Calendar Agent	Specialist	Personal / Team	tools_only	--
3	Task Agent	Specialist	Personal / Team	tools_only	--
4	Research Agent	Specialist	Personal / Team	workflow	--
5	Writing Agent	Specialist	Personal / Team	workflow	--
6	Document Agent	Specialist	Personal / Team / Account	workflow	--
7	Personal Assistant	Coordinator	Personal	workflow / learning	Mail, Calendar, Task, Research, Writing, Document
8	Meeting Agent	Coordinator	Personal / Team	workflow	Calendar, Mail, Task, Writing
9	Project Coordinator	Coordinator	Team	workflow	Calendar, Task, Research, Writing, Mail
10	Briefing Agent	Coordinator	Personal / Team	workflow	Research, Writing, Document
11	Data Agent	Data & Knowledge	Team / Account	workflow	--
12	Knowledge Base Agent	Data & Knowledge	Team / Account	tools_only	--
13	HR Agent	Org Specialist	Account	tools_only	--
14	Finance Agent	Org Specialist	Personal / Team / Account	workflow	--
15	Support Triage Agent	Org Specialist	Team / Account	workflow	Task, Mail, Knowledge Base
16	Content Agent	Org Specialist	Team / Account	workflow	--
17	Customer Intelligence Agent	Org Specialist	Team / Account	workflow	Task, Calendar
18	Onboarding Agent	Org Specialist	Account	workflow	Task, Calendar, Knowledge Base
19	Monitor Agent	System	Team / Account	tools_only	--
20	Branding Agent	Creative	Personal / Team	workflow	Writing
21	Chat Agent	Specialist	Personal / Team / Account	tools_only	--
22	Report Writer Agent	Data & Knowledge	Personal / Team / Account	workflow	Data, Writing
23	Coach Agent	Specialist	Personal / Team / Account	tools_only	--

1.6 Upcoming Agents¶

The following agents are planned but not yet fully specified. Design decisions need to be resolved before formal catalogue entries are written.

Visualiser Agent (design decisions pending)¶

Planned group: Data & Knowledge

Tagline: Turns data and descriptions into charts, diagrams, and infographics.

The Visualiser Agent takes structured data, a process description, or a conceptual prompt and produces a visual output -- charts, workflow diagrams, infographics, relationship maps, or timelines. At launch it is display-only: the agent generates a visual artefact that the platform renders. A later release would introduce interactivity (filterable charts, clickable workflow steps, drill-down). It is a natural delegate of the Report Writer Agent, which can embed Visualiser outputs within formatted reports.

Planned capabilities (v1, display only): Generate workflow and process diagrams from a description or structured input. Generate data charts (bar, line, pie, scatter, and others) from tabular data. Produce infographics and relationship/entity maps. Produce org charts and simple timelines. Apply account brand colours and approved styles from the knowledge layer. Select the appropriate visualisation type for the data and audience. Output in the configured format (see design decisions below).

Planned capabilities (v2, interactive, future release): Filterable and drill-down charts. Clickable workflow steps with contextual detail. Animated data stories. Parameterised outputs the platform can make interactive without re-invoking the agent.

Open design decisions (v1):

Output format. Three realistic approaches for v1. Mermaid markup: model-native, platform renders; excellent for diagrams and workflows, limited for data charts. Vega-Lite spec: declarative config the model generates from data, platform renders via a Vega renderer; better separation of concerns for charts. SVG: model generates directly; flexible but gets unwieldy for complex data visualisations. Likely answer: Mermaid for diagrams and workflows; Vega-Lite for data charts; SVG as fallback for custom infographics. To confirm.
Rendering responsibility. Does the platform build a renderer, or does the agent output something already renderable in-browser (SVG, HTML)? Mermaid and Vega-Lite require a client-side rendering library. SVG and HTML do not. Decision affects the platform engineering scope for v1.

Open design decisions (v2):

Interactivity model. Two approaches. Model generates component code (React, D3): more powerful, but requires a safe sandboxed execution environment. Platform adds an interaction layer over static output: simpler to implement, more limited in capability.

1.7 Implementation Notes¶

Phasing. Not all 23 agents need to ship simultaneously. A suggested launch order:

Phase 1 (foundation): Mail, Calendar, Task, Personal Assistant, Chat Agent, Coach Agent -- the personal productivity core plus first document-grounded persona agent.
Phase 2 (team value): Research, Writing, Document, Meeting Agent, Briefing Agent, Knowledge Base Agent.
Phase 3 (org functions): Project Coordinator, Support Triage, Onboarding, HR, Monitor, Report Writer Agent.
Phase 4 (vertical depth): Finance, Content, Customer Intelligence, Data Agent, Branding Agent.

Coordination dependencies. The three coordinator agents (Personal Assistant, Meeting Agent, Project Coordinator) require at least their core specialist delegates to be available. They should not be enabled unless the relevant specialists are also deployed.

Template vs live agents. These entries define AgentTemplate records. Customers deploy live agents from these templates, then customise them. The templates set sensible defaults; operators adjust for their context.

Knowledge seeding. Several agents (HR, Onboarding, Knowledge Base, Content, Coach) benefit from account knowledge being seeded before the agent is useful. The template should include a prompt to the deploying admin to provide initial knowledge documents before activating the agent. For the Coach Agent specifically, an empty library renders the agent effectively a less-focused version of the Chat Agent -- seeding the library is the deployment step that makes it purposeful.

Coach, HR, and Onboarding pattern. The HR Agent and Onboarding Agent are pre-configured instances of the same architectural pattern as the Coach Agent. All three use the library system for domain knowledge, with account-scoped libraries managed by the admin. They are presented as separate catalogue entries because they have different default configurations, risk profiles, and user expectations, but they share the same underlying template mechanism. See 05 Persistence, Storage & Ingestion for the library system.

2. Agent Implementation Logistics¶

The agent catalogue defines what each agent does. This section defines what needs to be built and connected to make each agent actually work: external integrations, tool abstractions, the implementation sequence, and operational considerations.

2.1 Integration Landscape¶

Every agent's capabilities ultimately resolve to tool calls. The tools fall into four categories:

Category	Description	Examples
Platform tools	Built into Thinklio, no external dependency	memory_store, memory_search, current_time, write_knowledge_fact
LLM-native tools	Capabilities that are purely LLM reasoning, no tool call needed	Summarisation, drafting, classification, analysis, question answering
External integration tools	Connect to third-party services via API	Gmail, Google Calendar, Todoist, Jira, CRM, etc.
Document tools	Read and process uploaded documents	read_document, search_documents (backed by the document ingestion system)

Most agents are a mix of LLM-native reasoning and tool calls. An agent like the Writing Agent is almost entirely LLM-native -- it reasons, drafts, and edits without needing any external system. An agent like the Calendar Agent is almost entirely tool-dependent -- it is useless without a calendar integration.

This distinction matters for implementation sequencing: LLM-native agents can ship immediately, tool-dependent agents can only ship when their integrations exist.

2.2 External Integration Map¶

Email¶

Provider	API	Auth	Notes
Gmail	Gmail API (REST)	OAuth 2.0 (user consent)	Read, send, label, search. Requires per-user OAuth grant.
Microsoft 365	Microsoft Graph API	OAuth 2.0 (org or user)	Same capabilities. Enterprise customers likely use this.
Generic IMAP/SMTP	IMAP for read, SMTP for send	Username/password or app password	Fallback for providers without REST APIs. Limited compared to Gmail/Graph.

Tool abstractions needed:

email_read          Read messages (inbox, by sender, by label, by date range)
email_search        Search messages by query
email_send          Send a new email
email_reply         Reply to an existing thread
email_draft         Create a draft (no send)
email_label         Apply/remove labels or folders
email_archive       Archive a message
email_flag          Flag/star a message

The tool registry stores the abstract tool (e.g. email_read). The tool's config field specifies which provider implementation to use. At deployment time, the operator configures the provider and credentials. The agent calls email_read regardless of whether it is backed by Gmail or Graph.

Agents that need this: Mail Agent, Personal Assistant, Meeting Agent, Project Coordinator, Support Triage Agent.

Calendar¶

Provider	API	Auth	Notes
Google Calendar	Google Calendar API (REST)	OAuth 2.0	Full CRUD, free/busy queries, recurring events
Microsoft 365	Microsoft Graph API (calendar)	OAuth 2.0	Same capabilities, different API shape
CalDAV	CalDAV protocol	Various	Open standard, used by some self-hosted solutions

Tool abstractions needed:

calendar_read           Read events (by date range, calendar)
calendar_find_free_time Find available slots across calendars
calendar_check_conflicts Check for conflicts at a proposed time
calendar_create_event   Create a new event
calendar_update_event   Update an existing event
calendar_cancel_event   Cancel/delete an event
calendar_send_invite    Send meeting invitations
calendar_rsvp           Respond to an invitation

Agents that need this: Calendar Agent, Personal Assistant, Meeting Agent, Project Coordinator, Customer Intelligence Agent, Onboarding Agent.

Task Management¶

Provider	API	Auth	Notes
Todoist	Todoist REST API v2	API token	Personal task management, simple and clean
Jira	Jira REST API	OAuth 2.0 or API token	Enterprise project management, complex schema
Asana	Asana REST API	OAuth 2.0 or PAT	Team task management, project-oriented
Linear	Linear GraphQL API	API key	Modern dev-oriented project management
Trello	Trello REST API	API key + token	Board-based, simpler than Jira
Internal	Thinklio knowledge layer	Platform native	Tasks stored as knowledge facts (lightweight fallback)

Tool abstractions needed:

task_create         Create a task (title, description, due date, assignee, priority)
task_update         Update task fields
task_complete       Mark a task as complete
task_delete         Delete a task
task_list           List tasks (by assignee, project, status, due date)
task_search         Search tasks by keyword
task_assign         Assign/reassign a task
task_set_reminder   Set a reminder on a task

Agents that need this: Task Agent, Personal Assistant, Meeting Agent, Project Coordinator, Support Triage Agent, Onboarding Agent.

CRM¶

Provider	API	Auth	Notes
HubSpot	HubSpot API v3	OAuth 2.0 or API key	Contacts, companies, deals, activities
Salesforce	Salesforce REST API	OAuth 2.0	Enterprise CRM, complex object model
Pipedrive	Pipedrive REST API	API token	Sales-focused, simpler model
Internal	Thinklio knowledge layer	Platform native	CRM data stored as knowledge facts (lightweight)

Tool abstractions needed:

crm_read_contact        Read a contact/customer record
crm_search_contacts     Search contacts by name, email, company
crm_read_account        Read a company/account record
crm_log_interaction     Log a meeting note, call, or interaction
crm_read_deals          Read opportunities/deals for an account
crm_create_task         Create a follow-up task linked to a contact
crm_read_history        Read interaction history for a contact

Agents that need this: Customer Intelligence Agent, Support Triage Agent (optional).

Web Search and Research¶

Provider	API	Auth	Notes
Tavily	Tavily Search API	API key	Built for AI agents, returns structured results
Serper	Serper API	API key	Google Search results via API
Brave Search	Brave Search API	API key	Privacy-focused, good for general search
Direct fetch	HTTP GET + readability extraction	None	Read specific URLs, extract content

Tool abstractions needed:

web_search          Search the web for a query (returns title, snippet, URL)
web_read_url        Read and extract content from a specific URL
web_read_multiple   Read multiple URLs (batch)

Agents that need this: Research Agent, Writing Agent, Briefing Agent, Content Agent, Data Agent (for reference data).

File Storage and Document Access¶

Provider	API	Auth	Notes
Thinklio R2	Internal document system	Platform auth	Primary. See 05 Persistence, Storage & Ingestion.
Google Drive	Google Drive API	OAuth 2.0	Read/write files in user's Drive
Microsoft OneDrive/SharePoint	Microsoft Graph API	OAuth 2.0	Enterprise document access
Direct upload	Upload API	Platform auth	Files uploaded directly to Thinklio

Tool abstractions needed:

document_upload         Upload a file to Thinklio storage
document_read           Read/extract content from a stored document
document_search         Search document chunks by semantic query
document_list           List documents for an agent/scope
document_delete         Delete a document and derived content

Agents that need this: Document Agent, Knowledge Base Agent, HR Agent, Finance Agent, all agents that consume uploaded reference material.

Notifications and Messaging¶

Provider	API	Auth	Notes
Telegram	Telegram Bot API	Bot token	Already implemented
Slack	Slack Web API	OAuth 2.0 or bot token	Team notifications, channel messages
Email	Via email integration	(same as email)	Notification delivery via email
Web push	Platform websocket/SSE	Platform auth	Real-time notifications in web UI

Tool abstractions needed:

notify_user         Send a notification to a user via their preferred channel
notify_team         Send a notification to a team channel

Agents that need this: Monitor Agent, Support Triage Agent, Personal Assistant, any agent that delivers deferred results.

2.3 Agent-to-Integration Dependency Matrix¶

Agent	Email	Calendar	Tasks	CRM	Web Search	Documents	Notifications	LLM-Only
Mail Agent	Required	--	--	--	--	--	--	--
Calendar Agent	--	Required	--	--	--	--	--	--
Task Agent	--	--	Required	--	--	--	--	--
Research Agent	--	--	--	--	Required	--	--	--
Writing Agent	--	--	--	--	Optional	Optional	--	Primary
Document Agent	--	--	--	--	--	Required	--	--
Personal Assistant	Via delegates	Via delegates	Via delegates	--	Via delegates	--	Optional	Routing/synthesis
Meeting Agent	Via delegates	Via delegates	Via delegates	--	--	Optional	--	Extraction/synthesis
Project Coordinator	Via delegates	Via delegates	Via delegates	--	Optional	--	Optional	Status/synthesis
Briefing Agent	--	--	--	--	Via delegates	Via delegates	--	Synthesis
Data Agent	--	--	--	--	--	Required	--	Analysis
Knowledge Base Agent	--	--	--	--	--	Required	--	Primary
HR Agent	--	--	--	--	--	Required	--	Primary
Finance Agent	--	--	--	--	--	Required	--	Analysis
Support Triage Agent	Optional	--	Required	Optional	--	Optional	Required	Classification
Content Agent	--	--	--	--	Optional	Optional	--	Primary
Customer Intelligence	--	Optional	Optional	Required	--	--	--	Synthesis
Onboarding Agent	--	Optional	Required	--	--	Required	--	Guidance
Monitor Agent	--	--	--	--	--	--	Required	--
Branding Agent	--	--	--	--	Optional	--	--	Primary
Chat Agent	--	--	--	--	--	--	--	Primary
Report Writer Agent	--	--	--	--	--	Required	--	Analysis
Coach Agent	--	--	--	--	--	Required	--	Primary

Key: Required means the agent is not useful without this integration. Optional means it enhances the agent but is not required. Via delegates means the coordinator accesses this through specialist delegates. Primary means the agent's value is mostly LLM reasoning, not tool integration.

2.4 Implementation Sequence¶

Based on the dependency matrix and the realistic order of what can be built, the integration rollout follows six waves.

Wave 1: LLM-native agents (no external integrations needed). These agents work with just the platform's built-in capabilities (knowledge facts, document chunks, chat history, LLM reasoning). They can ship as soon as the document ingestion system and agent templates are in place.

Agent	What it needs	Status
Writing Agent	LLM + knowledge layers	Ready now
Chat Agent	LLM + knowledge layers	Ready now
Coach Agent	Document ingestion + knowledge retrieval + library	Needs doc ingestion
Knowledge Base Agent	Document ingestion + knowledge retrieval	Needs doc ingestion
HR Agent	Document ingestion + knowledge retrieval	Needs doc ingestion
Content Agent	LLM + knowledge layers + optional web search	Ready now (basic)
Branding Agent	LLM + knowledge layers	Ready now
Document Agent	Document ingestion	Needs doc ingestion
Data Agent	Document ingestion (for spreadsheet/CSV)	Needs doc ingestion

The gate for most of these is the document ingestion system. See 05 Persistence, Storage & Ingestion.

Wave 2: Web search integration. Adding web_search and web_read_url unlocks the research-oriented agents.

Agent	What it unlocks
Research Agent	Full web search, read, and synthesise workflow
Briefing Agent	Person/organisation research from public sources
Content Agent (enhanced)	Competitive research, fact-checking
Writing Agent (enhanced)	Factual grounding from web sources

One web search provider is needed (Tavily recommended for AI agent use), plus URL content extraction.

Wave 3: Task management integration.

Agent	What it unlocks
Task Agent	Full task CRUD
Personal Assistant	Delegate to Task Agent
Meeting Agent	Extract action items and create tasks
Project Coordinator	Task tracking, status, blockers
Onboarding Agent	Onboarding task sequences
Support Triage Agent	Ticket creation and tracking

One task provider to start (Todoist for simplicity, or Jira for enterprise). The internal fallback (tasks as knowledge facts) provides basic capability without external integration.

Wave 4: Calendar integration.

Agent	What it unlocks
Calendar Agent	Full calendar CRUD
Personal Assistant	Delegate for scheduling
Meeting Agent	Pre-meeting briefs, follow-up scheduling
Project Coordinator	Milestone and review scheduling

Google Calendar API (OAuth) first, with Microsoft Graph as the enterprise follow-on.

Wave 5: Email integration.

Agent	What it unlocks
Mail Agent	Full email management
Personal Assistant	Delegate for email
Meeting Agent	Send summaries
Project Coordinator	Send status reports

Gmail API (OAuth). Email is powerful but complex -- OAuth flows, send permissions, compliance considerations. Worth deferring until the simpler integrations are proven.

Wave 6: CRM and advanced integrations.

Agent	What it unlocks
Customer Intelligence Agent	CRM data access
Support Triage Agent (enhanced)	Customer context from CRM
Monitor Agent	Full system monitoring with notifications

HubSpot or Salesforce API. Enterprise-specific and likely driven by early customer requirements.

2.5 Tool Abstraction Architecture¶

Vendor-Agnostic Tool Layer¶

Each integration domain (email, calendar, tasks, CRM) has a vendor-agnostic tool interface and one or more provider implementations.

Tool Registry
    |
    +-- email_read (abstract)
    |       +-- GmailProvider
    |       +-- GraphProvider
    |       +-- IMAPProvider
    |
    +-- calendar_read (abstract)
    |       +-- GoogleCalendarProvider
    |       +-- GraphCalendarProvider
    |       +-- CalDAVProvider
    |
    +-- task_create (abstract)
            +-- TodoistProvider
            +-- JiraProvider
            +-- InternalProvider

Provider Configuration¶

Each tool's config field in the database specifies the active provider and its credentials:

{
    "provider": "gmail",
    "oauth_token_ref": "vault:gmail-token-user-123",
    "scopes": ["https://www.googleapis.com/auth/gmail.readonly"],
    "rate_limit": { "max_per_minute": 30 }
}

Provider credentials are stored securely (initially in environment variables or the database with encryption, later via the secrets vault). OAuth tokens are managed per-user -- each user who wants email integration must complete an OAuth consent flow.

Internal Fallback¶

For every integration domain, there is an internal fallback that stores data in Thinklio's own knowledge layer or database. This ensures agents are functional (at a basic level) even without external integrations:

Domain	Internal fallback
Tasks	Store tasks as knowledge facts with category "task"
Calendar	Not feasible as internal-only -- calendar needs real calendar data
Email	Not feasible as internal-only -- email needs a real mailbox
CRM	Store contact notes as knowledge facts
Documents	Thinklio document ingestion system (primary, not a fallback)
Web search	Not feasible as internal-only -- needs real web access

2.6 OAuth Flow for User-Scoped Integrations¶

Email, calendar, and some CRM integrations require per-user OAuth consent. The flow:

User navigates to settings in the Thinklio UI (or receives a setup link).
User clicks "Connect Gmail" (or similar).
Thinklio redirects to the provider's OAuth consent screen.
User grants access.
Thinklio receives the OAuth token and stores it securely.
The agent's tools are now active for this user's scope.

This requires OAuth client registration with each provider (Google Cloud project, Azure app registration, etc.), token storage and refresh handling, scope management (request minimum needed scopes), and revocation handling when a user disconnects.

For v1, manual API key/token configuration by the operator. OAuth flows are a Phase 2 concern for integrations.

2.7 System Prompt Strategy¶

Each agent template includes a system prompt that defines its personality, capabilities, and constraints. System prompts are not included in the catalogue but are a critical implementation artefact.

System prompt components:

Identity -- who the agent is and what it does.
Capabilities -- what tools it has and when to use them.
Constraints -- what it must not do, trust levels, scope limitations.
Knowledge guidance -- how to use its knowledge layers.
Output format -- how to structure responses for the expected channel.
Delegation guidance (coordinators only) -- when to delegate versus handle directly, which delegate for which task.

Example structure:

You are {agent_name}, a {description}.

## Your capabilities
You have access to the following tools:
{dynamically injected tool list}

## How you work
{behavioural guidance specific to this agent}

## What you know
{knowledge layer guidance -- what's in your context and how to use it}

## Constraints
- {trust level constraints}
- {scope constraints}
- {content policy constraints}

## Output
{format guidance for responses}

System prompts will be developed iteratively through testing. Initial versions should be functional but conservative -- it is easier to loosen constraints than to tighten them after users develop expectations.

2.8 Coordinator Delegation Configuration¶

The coordinator agents (Personal Assistant, Meeting Agent, Project Coordinator, Briefing Agent) need pre-configured delegation relationships. These are defined in the AgentTemplate.delegation_config field.

Example: Personal Assistant template delegation:

{
    "delegates": [
        {
            "tool_slug": "mail_agent",
            "restrictions": { "send": { "require_approval": true } }
        },
        {
            "tool_slug": "calendar_agent",
            "restrictions": { "create_event": { "require_approval": true } }
        },
        {
            "tool_slug": "task_agent",
            "restrictions": {}
        },
        {
            "tool_slug": "research_agent",
            "restrictions": {}
        },
        {
            "tool_slug": "writing_agent",
            "restrictions": {}
        }
    ]
}

When a customer deploys a Personal Assistant from this template, the system creates the PA agent from the template, registers each delegate agent as a tool (type agent) if not already registered, attaches the delegate tools to the PA with the configured restrictions, and runs cycle detection to verify the delegation graph is acyclic. The customer can then modify restrictions per-assignment in Agent Studio.

2.9 Stub Implementation Strategy¶

For initial deployment and testing, every agent ships with at least stub functionality:

Implementation tier	What works	What does not work
Stub	System prompt, LLM reasoning, knowledge retrieval, chat. Agent can discuss its domain and answer questions from its knowledge.	No tool calls -- cannot actually read email, check calendar, create tasks, etc.
Platform tools	Above + memory_store, memory_search, current_time, document_search	No external integrations
Integrated	Above + actual external tool calls (email, calendar, tasks, CRM, web)	Full functionality

Every agent can ship at the stub tier immediately. The system prompt should acknowledge the agent's current capabilities honestly. For example: "I'm your Calendar Agent. I can help you think about scheduling and time management. When calendar integration is connected, I'll be able to directly read and manage your calendar. For now, I can help you plan your schedule and I'll remember your preferences for when the integration is live."

This approach lets testers interact with every agent from day one, even before integrations exist. Feedback on personality, knowledge behaviour, and delegation routing is just as valuable as feedback on tool execution.

2.10 Monitoring and Quality¶

Per-Agent Metrics¶

Track for each deployed agent: interactions per day/week, average response time, tool call success/failure rate (per tool), knowledge retrieval hit rate, user satisfaction (if feedback mechanism exists), cost per interaction, and delegation success rate (coordinators).

Common Failure Modes¶

Failure	Detection	Mitigation
External API down	Tool execution error rate spike	Circuit breaker, fallback to knowledge-only
OAuth token expired	401 from provider	Automatic refresh, notify user if refresh fails
Rate limited by provider	429 from provider	Backoff, queue requests, alert operator
LLM hallucination	User feedback, knowledge mismatch	Improve system prompts, increase knowledge coverage
Delegation loop	Depth limit exceeded	Already handled by delegation governance
Knowledge empty	Low retrieval hit rate	Prompt operator to seed knowledge / upload documents

3. Predictive Planning & Execution Learning¶

3.1 Problem Statement¶

Thinklio agents make decisions at the think step of every interaction (see 06 Events, Channels & Messaging). Given a user's message and the assembled context, the LLM decides which tools to call, in what order, and with what parameters. This decision is currently stateless: the agent has no structured knowledge of whether a similar approach has worked before, how much it cost, how long it took, or whether users were satisfied with the result.

The platform already records everything needed to learn from past executions. Every step is persisted with its state, cost, duration, and outcome. Every interaction has a terminal state (success, failed, timeout). User feedback (thumbs up/down) is captured at the interaction level. Job outcomes are tracked through the job state machine. But none of this data feeds back into future decision-making.

Without execution learning, agents repeat the same mistakes. An agent that tries tool A for a particular task and fails will try tool A again next time, because it has no memory of the failure. An agent that discovers a three-step approach works better than a five-step approach for a given task type has no way to carry that knowledge forward. Cost and latency vary unpredictably because the agent cannot prefer cheaper or faster approaches that have proven equally effective.

The predictive planning system observes agent executions, records structured outcomes, builds a statistical model of what works, and makes that model available to agents at decision time. The agent remains in control of its decisions; the system provides scores, not instructions. It does not override agent reasoning, does not introduce a separate execution path (it integrates with the existing Harness), is not a data warehouse or analytics platform (it is a real-time scoring service with a learning backend), and avoids premature ML complexity (the initial system must work with sparse data).

3.2 Value Progression¶

This system is an investment with a long payoff curve. The infrastructure built in the first phase delivers little direct value, but every day it runs it accumulates the data that makes later phases transformative.

Stage 1: Data Collection and Bayesian Scoring (implement now). Build the outcome collection pipeline, the canonical plan model, and the Bayesian scoring service. Start recording every execution outcome from day one. The Bayesian model provides initial scores with low confidence, improving as data accumulates. Value: low. The scores are advisory and based on small samples. The real value is the data being captured, which cannot be collected retroactively.

Stage 2: Review, Tune, and Validate (ongoing, months 2 to 6). As the dataset grows, review the Bayesian scores against actual outcomes. Tune the hierarchy weights, the success definition, the decay parameters, and the scope key structure. Validate that the scores correlate with genuinely better outcomes by comparing scored versus unscored agent performance. This is not a separate build phase. It is an operational discipline that runs alongside Stage 1, requiring periodic human review of the score data, not additional engineering. Value: medium. The Bayesian scores become reliable enough to influence agent behaviour.

Stage 3: ML Training (implement when data justifies). When an account has accumulated sufficient execution history (target: 10,000+ outcomes, 50+ distinct plans, 3+ months of operation), train a gradient-boosted model on the feature set. The ML model captures patterns the Bayesian model cannot: feature interactions, non-linear effects, and cross-plan generalisation. Value: medium-high.

Stage 4: Transition from Bayesian to ML (gradual or instant). Shift scoring weight from the Bayesian model to the ML model. The gradual strategy increases ml_weight in the blending formula incrementally (e.g. 0.1 per week) while monitoring prediction accuracy on a rolling validation window, rolling back if accuracy degrades. The instant strategy switches entirely to ML scoring if the model demonstrates statistically significant improvement over the Bayesian baseline on held-out data (measured by log-loss, calibrated over at least 1,000 predictions), with the Bayesian model retained as a fallback. The choice should be made per account based on data volume and risk tolerance. Value: high. Agents reliably choose better plans, cost decreases, success rates increase, and the platform can demonstrate measurable improvement over time.

Stage 5: Autonomous Plan Suggestion (future). The scoring service evolves from "score these candidates" to "here are the candidates you should consider." Given a task description and agent capabilities, the system suggests optimal plans the agent has not generated on its own, synthesised from patterns across the entire execution history. The agent still decides, but the decision space is pre-filtered and ranked. Value: highest. Agents benefit from collective platform intelligence. A new agent can immediately access the distilled experience of thousands of prior executions across the platform.

3.3 Core Concepts¶

What is a "Plan"?¶

A plan is the sequence of tool calls an agent intends to make in response to a user's message. At the think step, the LLM generates a structured plan before executing it. A plan consists of a tool sequence (which tools to call and in what order, e.g. "calendar_lookup, then web_search, then compose_response"), an execution mode (immediate, deferred, or interactive per step), and a parameters pattern (the structural shape of tool parameters, not the specific values, which are instance-dependent).

Plans are canonicalised by stripping instance-specific values (user IDs, specific dates, search queries) and retaining the structural signature. Two interactions that both do "calendar_lookup, web_search, compose_response" with different search queries are executing the same canonical plan.

What is "Success"?¶

Success is measured on a composite scale, not a binary. The inputs:

Signal	Source	Weight	Available From
Interaction completed without error	Interaction state	Baseline	Day one
All steps succeeded	Step states	Baseline	Day one
User gave thumbs up	Feedback event	High	Day one
User gave thumbs down	Feedback event	High (negative)	Day one
Job resolved successfully	Job state	Medium	When jobs are used
User continued the chat	Session activity	Low positive	Day one
User abandoned the chat	Session inactivity timeout	Low negative	Day one

The composite score is a weighted sum normalised to [0, 1]. The weights are configurable per account (with sensible defaults). In Stage 1, the system uses a simplified binary: success if the interaction completed without error and the user did not give a thumbs down; failure otherwise. The composite score is a Stage 3 enhancement.

What is the "Context"?¶

Plans do not succeed or fail in isolation. The same plan may work well for one task type and poorly for another. The context captures the conditions under which a plan was executed: which agent (or agent template) was running, a lightweight task classification of the user's intent (derived from the think step's reasoning, not a separate classifier), which tools were available to the agent at execution time, and which channel the interaction came through (web, Telegram, email, etc.). Context is used to scope statistics. The system answers "how well does this plan work for this agent type on this kind of task?" rather than "how well does this plan work globally?"

3.4 Architecture¶

The system has three components: the Outcome Collector (listens to execution events and records structured outcomes), the Score Service (provides real-time plan scoring via an internal API), and the Learning Engine (updates statistical models from accumulated outcomes).

Harness (doc 06 Workflow component)
    |
    +-- writes event.kind = "interaction.completed" --+
    +-- writes event.kind = "step.completed" ---------+
                                                      |
                                                      v
                                           Outcome Collector
                                           (scheduled Convex function)
                                                      |
                                                      v
                                         execution_outcome table
    |
    |   (at the think step, before tool selection)
    |
    +-- Score Service (Convex query) <--- plan_score table
            |                                   ^
            |                                   |
            +-- Learning Engine ----------------+
                (scheduled Convex function)

Integration with the Harness¶

The system hooks into the Harness at two points.

After execution (passive, event-driven). The Outcome Collector is a Convex scheduled function that reads the event table for kinds interaction.completed, step.completed, job.resolved, and job.failed (see 06 Events, Channels & Messaging for the event model). It runs on a short cadence (every 30 seconds), batching new events into structured execution_outcome rows. It is entirely passive and adds no latency to the execution path. The read is cursor-tracked so the function processes each event exactly once.

Before tool selection (active, synchronous). At the think step, when the agent has generated one or more candidate plans, it can call the Score Service for historical performance data. The Score Service is a Convex query that reads plan_score rows keyed by plan hash and context. The agent's system prompt includes the returned scores as additional context for the LLM's decision. The LLM remains free to ignore the scores.

The scoring call is a plain Convex query, typically served from the reactive query cache with sub-millisecond overhead after the first call. If the Score Service returns no rows (cold start, new plan, scoped context never seen), the agent proceeds without scores. This is a degraded mode, not a failure.

Event Kinds Consumed¶

The Outcome Collector reads the event table for the following kinds (documented in 06 Events, Channels & Messaging section 3):

Event kind	Purpose
`interaction.completed`	Capture full interaction outcome, plan structure, cost, duration.
`step.completed`	Capture per-step outcomes for granular analysis.
`job.resolved`	Capture deferred work outcomes.
`job.failed`	Capture deferred work failures.

These are existing event kinds. The Outcome Collector is a new reader over the shared event table; no new infrastructure is needed.

3.5 Data Model¶

All tables live in the database alongside the existing schema. All tenant-scoped tables include account_id with RLS policies enforcing isolation.

`canonical_plan`¶

Stores the structural signature of each unique plan the system has observed.

Field	Type	Description
id	UUID	PK
account_id	UUID	FK to account
plan_hash	text	SHA-256 of the canonicalised plan structure. Used for fast lookup.
tool_sequence	text[]	Ordered array of tool names (e.g. `['calendar_lookup', 'web_search', 'compose_response']`)
execution_modes	text[]	Parallel array of execution modes per tool (e.g. `['immediate', 'immediate', 'immediate']`)
parameter_schema	JSONB	Structural shape of parameters (types and keys, not values)
step_count	integer	Number of tool calls in the plan
first_seen_at	timestamp	When this plan was first observed
last_seen_at	timestamp	When this plan was most recently executed
execution_count	integer	Total number of times this plan has been executed (denormalised for fast reads)
created_at	timestamp

Constraints: UNIQUE(account_id, plan_hash) for one canonical record per unique plan structure per account. Indexed on (account_id, plan_hash) for lookup during scoring. RLS policy: account members can read plans from their own account.

`execution_outcome`¶

Records the outcome of every interaction that involved tool calls.

Field	Type	Description
id	UUID	PK
account_id	UUID	FK to account
interaction_id	UUID	FK to interaction
plan_id	UUID	FK to canonical_plan
agent_id	UUID	FK to agent
agent_template_id	UUID	FK to agent_template (nullable)
task_classification	text	Lightweight intent category from the think step
channel_type	text	Channel the interaction came through
success	boolean	Binary outcome (Stage 1: completed without error and no thumbs-down)
composite_score	numeric(4,3)	Weighted outcome score in [0, 1] (Stage 3, nullable until then)
feedback	text	`thumbs_up`, `thumbs_down`, or `none`
total_cost	numeric(10,6)	Total interaction cost in USD
total_duration_ms	integer	Total interaction duration in milliseconds
step_count	integer	Number of steps executed
steps_succeeded	integer	Number of steps that completed successfully
steps_failed	integer	Number of steps that failed
metadata	JSONB	Additional context (tool versions, model used, etc.)
created_at	timestamp

Constraints: UNIQUE(interaction_id) for one outcome record per interaction. Indexed on (account_id, plan_id, created_at) for aggregation queries. Indexed on (account_id, agent_id, task_classification) for context-scoped lookups. RLS policy: account members can read outcomes from their own account; the Learning Engine service role can read across accounts for global prior calculation (with de-identification).

`plan_score`¶

Stores the current Bayesian posterior for each plan in each context. This is the primary table the Score Service reads from.

Field	Type	Description
id	UUID	PK
account_id	UUID	FK to account
plan_id	UUID	FK to canonical_plan
scope_key	text	Context scope identifier (e.g. `agent:{id}:task:{classification}`)
alpha	numeric(10,4)	Beta distribution alpha parameter (successes + prior)
beta	numeric(10,4)	Beta distribution beta parameter (failures + prior)
mean_probability	numeric(4,3)	alpha / (alpha + beta), precomputed for fast reads
confidence	numeric(4,3)	1 minus variance of the Beta distribution, normalised to [0, 1]
sample_size	integer	Number of observations backing this score
mean_cost	numeric(10,6)	Average cost of executions using this plan in this scope
mean_duration_ms	integer	Average duration of executions using this plan in this scope
last_updated_at	timestamp	When the Learning Engine last recalculated this score
created_at	timestamp

Constraints: UNIQUE(account_id, plan_id, scope_key) for one score per plan per context scope per account. Indexed on (account_id, scope_key) for Score Service lookups.

Relationship to Existing Tables¶

The execution learning system reads from but does not modify existing tables: interaction (source of interaction state, duration, and session context), step (source of per-step outcomes, costs, and tool call details), job and subjob (source of deferred work outcomes), and agent and agent_template (agent identity and type for context scoping).

The three new tables (canonical_plan, execution_outcome, plan_score) are append-mostly. execution_outcome is write-once (one record per completed interaction). plan_score is updated by the Learning Engine on a schedule. canonical_plan grows as new plan structures are observed.

3.6 Bayesian Scoring Model (Stage 1)¶

Why Bayesian?¶

The system starts with sparse data. A new account might have tens or hundreds of interactions, not millions. Classical ML approaches need large datasets to generalise. A Bayesian approach works from the first observation: it starts with a prior belief, updates it with each outcome, and produces a probability estimate with an explicit measure of confidence.

The Beta-Binomial model is the natural choice for binary outcomes (success/failure). The Beta distribution is the conjugate prior for the Binomial likelihood, which means updates are a simple arithmetic operation, not an optimisation problem.

The Model¶

For each (plan, context scope) pair, maintain a Beta distribution parameterised by alpha and beta:

Prior: Beta(alpha_0, beta_0) where alpha_0 and beta_0 encode the prior belief about success probability.

Update rule: On success, alpha increments by 1. On failure, beta increments by 1.

Posterior mean (probability of success): P = alpha / (alpha + beta).

Confidence: C = 1 - Var(Beta(alpha, beta)) / Var(Beta(1, 1)) where Var(Beta(alpha, beta)) = alpha * beta / ((alpha + beta)^2 * (alpha + beta + 1)). This normalises confidence to [0, 1] where 0 is maximum uncertainty (uniform prior) and 1 approaches certainty.

Hierarchical Priors¶

A brand-new plan in a brand-new account has no observations. Rather than starting from a uniform prior (alpha_0 = 1, beta_0 = 1), the system uses hierarchical smoothing to inherit knowledge from broader scopes:

Level 1 (global):     Beta(alpha_global, beta_global)
    All outcomes across all accounts (de-identified).
    Updated monthly by the Learning Engine.

Level 2 (account):    Beta(alpha_account, beta_account)
    All outcomes within this account, regardless of agent or task.
    Updated hourly.

Level 3 (agent):      Beta(alpha_agent, beta_agent)
    Outcomes for this specific agent within this account.
    Updated on every new outcome.

Level 4 (context):    Beta(alpha_context, beta_context)
    Outcomes for this agent on this task classification.
    Updated on every new outcome.

When scoring a plan at Level 4, if the sample size is below a threshold (default: 10), the prior is pulled from Level 3. If Level 3 is also sparse, it pulls from Level 2, and so on. The blending formula:

effective_alpha = alpha_context + weight * alpha_parent
effective_beta  = beta_context  + weight * beta_parent

where weight = max(0, 1 - sample_size / threshold). As the context accumulates its own observations, the parent prior's influence fades to zero.

Scope Keys¶

The scope_key in plan_score encodes the hierarchy level:

Level	Scope Key Pattern	Example
Global	`global`	`global`
Account	`account:{account_id}`	`account:a1b2c3`
Agent	`account:{account_id}:agent:{agent_id}`	`account:a1b2c3:agent:d4e5f6`
Context	`account:{account_id}:agent:{agent_id}:task:{classification}`	`account:a1b2c3:agent:d4e5f6:task:schedule_meeting`

The Learning Engine maintains scores at all four levels. The Score Service reads the most specific level available and blends with parent levels as described above.

3.7 Score Service¶

Internal API¶

The Score Service is an internal endpoint within the Gateway service (not a separate microservice). It exposes a single method:

POST /internal/plan-scores

Called by the Harness during the think step when the agent has generated candidate plans.

Request:

{
    "account_id": "uuid",
    "agent_id": "uuid",
    "task_classification": "schedule_meeting",
    "channel_type": "webchat",
    "candidates": [
        {
            "tool_sequence": ["calendar_lookup", "compose_response"],
            "execution_modes": ["immediate", "immediate"]
        },
        {
            "tool_sequence": ["calendar_lookup", "web_search", "compose_response"],
            "execution_modes": ["immediate", "immediate", "immediate"]
        }
    ]
}

Response:

{
    "scores": [
        {
            "plan_hash": "sha256...",
            "probability": 0.82,
            "confidence": 0.65,
            "sample_size": 47,
            "mean_cost_usd": 0.0034,
            "mean_duration_ms": 2100
        },
        {
            "plan_hash": "sha256...",
            "probability": 0.71,
            "confidence": 0.31,
            "sample_size": 12,
            "mean_cost_usd": 0.0051,
            "mean_duration_ms": 3400
        }
    ],
    "source": "bayesian_v1"
}

Performance target: p99 latency under 5 ms. The Score Service is a Convex query over the plan_score table, served from Convex's reactive query cache after the first call. Convex invalidates the cache automatically when the Learning Engine writes new scores; there is no separate cache to manage. Cache warm-up is implicit: the first call loads the row, subsequent calls within the same subscription are free.

Caching via Convex query cache¶

The plan_score rows are read by a Convex query keyed on (accountId, scope_key, planHash). Convex memoises query results for every subscribed caller and re-evaluates only when a relevant write happens. This replaces the Redis cache that existed in the legacy architecture:

Key: the query arguments (accountId, scope_key, planHash) form the cache key.
Value: the plan_score row content returned by the query.
Invalidation: automatic. When the Learning Engine writes a new score, every subscription reading that row re-evaluates on the next microtask.
Cold start: a row that has never been read is fetched once from the database and cached for subsequent reads.

Agent Integration¶

The Harness injects plan scores into the agent's context at the think step. The agent's system prompt includes a section like:

## Historical Plan Performance

Based on past executions of similar tasks, here is the performance data for
approaches you might consider:

| Approach | Success Rate | Confidence | Avg Cost | Avg Time |
|----------|-------------|------------|----------|----------|
| calendar_lookup -> compose | 82% | High (47 obs) | $0.003 | 2.1s |
| calendar_lookup -> web_search -> compose | 71% | Medium (12 obs) | $0.005 | 3.4s |

Use this data to inform your tool selection, but apply your own judgement.
Low-confidence scores are based on limited observations and may not be reliable.

The agent is explicitly told that scores are advisory. The LLM may choose a lower-scoring plan if it has good reason (e.g. the user asked for something the higher-scoring plan cannot do).

3.8 Outcome Collector¶

Event Processing¶

The Outcome Collector is a Convex scheduled function (internalMutation invoked by a cron every 30 seconds). It reads the event table from a persisted cursor, processing each new event exactly once. The cursor is stored in a collector_state row keyed on the collector name.

When an interaction.completed event is read:

Extract the plan. Query the step table for this interaction's act steps. Build the tool sequence and execution modes from the step records.
Canonicalise. Strip instance-specific parameter values, compute the plan hash.
Find or create the canonical plan. Look up canonical_plan by (accountId, planHash). If not found, insert a new record.
Determine success. Check the interaction state (success/failed), look for a feedback event on this interaction (thumbs up/down), and compute the binary outcome.
Write the outcome. Insert into execution_outcome.
Trigger scoring update. Write an internal planning.outcome_recorded event so the Learning Engine schedules an incremental update for the affected (plan, scope) rows.

Interactions without tool calls (a simple conversational response with no act steps) have no plan to record. The Outcome Collector skips these.

User feedback (thumbs up/down) may arrive after the interaction has completed. The Outcome Collector reads feedback.recorded events on the same pass and updates the corresponding execution_outcome record. If feedback arrives after the Learning Engine has already processed the outcome, the Learning Engine picks up the correction on its next pass.

3.9 Learning Engine¶

Update Schedule¶

The Learning Engine runs as a periodic background job:

Task	Frequency	Scope
Update Level 4 (context) scores	On every new outcome	Affected plan + context only
Update Level 3 (agent) scores	Every 15 minutes	All plans for agents with new outcomes
Update Level 2 (account) scores	Every hour	All plans in accounts with new outcomes
Update Level 1 (global) scores	Daily	All plans across all accounts (de-identified)

Level 4 updates are triggered by the planning.outcome_recorded event and execute immediately. Higher-level updates are batched for efficiency.

Score Calculation¶

For each (plan, scope) being updated: query execution_outcome for all outcomes matching this plan and scope since the last update, count successes and failures, apply the update rule (alpha increments by successes, beta increments by failures), recompute mean_probability, confidence, mean_cost, and mean_duration, and write the updated plan_score row. Convex invalidates every subscription reading this row automatically; no explicit cache refresh step is needed.

Score Decay¶

Plans that have not been executed recently should have their confidence decay over time. The platform evolves, tools change, and a plan that worked six months ago may not work today. The Learning Engine applies a decay factor on each scheduled update:

alpha_decayed = 1 + (alpha - 1) * decay_factor
beta_decayed  = 1 + (beta  - 1) * decay_factor

where decay_factor = exp(-days_since_last_execution / half_life) and half_life is configurable (default: 90 days). This gradually pulls old scores back towards the prior, ensuring stale data does not dominate.

3.10 Machine Learning Layer (Stage 3)¶

Stage 3 introduces a supervised learning model that runs alongside the Bayesian system. It does not replace it; the two systems produce independent scores that are blended.

Activation Thresholds¶

The ML layer should be activated when an account has accumulated at least 10,000 execution outcomes, at least 50 distinct canonical plans have been observed, and the Bayesian system has been running for at least 3 months. These thresholds can be adjusted. The point is that ML needs enough data to generalise beyond what the Bayesian model already captures.

Feature Engineering¶

The ML model uses features derived from the execution context and plan structure:

Feature Group	Features	Source
Plan structure	tool_count, tool_sequence_hash, has_deferred_steps, has_interactive_steps	`canonical_plan`
Tool usage	tool_frequency (per tool), tool co-occurrence pairs	`canonical_plan` + `execution_outcome`
Context	agent_template_id, task_classification, channel_type, hour_of_day, day_of_week	`execution_outcome`
Historical	bayesian_probability, bayesian_confidence, bayesian_sample_size	`plan_score`
Cost/performance	historical_mean_cost, historical_mean_duration, cost_vs_account_average	`plan_score`

The Bayesian scores are included as features for the ML model. This allows the ML model to learn when to trust the Bayesian estimate and when to override it.

Model Choice¶

Gradient-boosted decision trees (XGBoost or LightGBM) are the recommended starting point. They handle mixed feature types, are interpretable via feature importance, train quickly on moderate datasets, and do not require GPU infrastructure. Logistic regression serves as the baseline for comparison.

Training Pipeline¶

Training data consists of all execution_outcome records with their features, labelled by success/failure (Stage 1 binary) or composite_score (Stage 3). Training runs weekly, using all data from the past 6 months (with decay weighting). Validation uses a time-based split (train on older data, validate on recent data) to prevent leakage. Serialised model artefacts are stored in Cloudflare R2, versioned. The trained model is loaded into memory by the Score Service at startup and refreshed when a new version is available.

Score Blending¶

When both Bayesian and ML scores are available, the final score is a weighted blend:

final_score = (1 - ml_weight) * bayesian_score + ml_weight * ml_score

where ml_weight starts at 0 (Bayesian only) and is increased gradually as the ML model proves its accuracy on held-out data. The ML model must demonstrate a statistically significant improvement over the Bayesian baseline before its weight is increased. This is measured by comparing log-loss on a rolling validation window.

3.11 Governance and Privacy¶

Data Isolation¶

All execution outcomes and plan scores are scoped by account_id with RLS policies. No account can see another account's execution history or plan scores.

The one exception is the global prior (Level 1), which aggregates across accounts. This aggregation uses only plan structure (tool sequence), binary outcome (success/failure), and cost/duration. It does not include user identifiers, message content, tool parameters, agent names, or any account-identifying information. The aggregation is performed by a service-role query that strips account_id before writing to the global prior table.

Opt-in/Opt-out¶

Accounts can opt out of the global learning pool via an account setting. Opting out means the account's outcomes are not included in global prior calculations. The account still benefits from its own account-level, agent-level, and context-level scores. The account still receives global prior estimates (since those are aggregated from other participating accounts). There is no penalty for opting out beyond losing the ability to contribute to (and slightly improve) the global prior.

Auditability¶

Every plan score served to an agent is logged with the interaction ID that requested the score, the candidates submitted, the scores returned, and the source model (bayesian_v1, ml_v1, blended). This allows retrospective analysis of whether scores influenced agent decisions and whether those decisions were better than unscored alternatives.

Data Retention¶

execution_outcome: retained for 12 months, then archived (compressed, moved to cold storage in R2).
canonical_plan: retained indefinitely (lightweight, grows slowly).
plan_score: retained indefinitely (overwritten in place by the Learning Engine).
Score request logs: retained for 3 months.

Retention periods are configurable per account. Enterprise accounts may require longer retention for compliance.

4. Platform Services & LLM Configuration¶

This section specifies the architecture for external service management, LLM model selection, and how the platform resolves API credentials for every external call.

4.1 Two Billing Modes (Per Service)¶

Every external service the platform uses (LLMs, video, voice, PDF, search, etc.) follows the same pattern:

Platform-managed (default). Thinklio provides the API key. The account's pre-paid credit balance is debited at actual cost + 2% platform margin. The account sees real USD costs in their usage history.
Bring Your Own Key (BYOK). The account provides their own API key (stored in the secrets vault; see 07 Security & Governance). No credits are deducted for that service. If their key fails or runs out of credit with the provider, the service call fails.

This is configured per service, not globally. An account could use Thinklio's LLM credits but bring their own Twilio key, or vice versa.

4.2 Data Model¶

Platform Service Registry¶

platform_service stores the app-admin-managed registry of all external services.

Field	Type	Description
slug	TEXT UNIQUE	e.g. `openrouter`, `twilio`, `tavus`
name	TEXT	Display name
category	TEXT	`llm`, `embeddings`, `video`, `voice_sms`, `pdf`, `search`, `email`, `crm`, `tasks`, `storage`, `other`
description	TEXT	What this service does
website_url	TEXT	Link to provider's site
credential_type	TEXT	`api_key`, `oauth`, `none`
platform_key_ref	TEXT	Vault secret name for Thinklio's own key (NULL if none)
platform_key_available	BOOLEAN	Whether Thinklio has a platform key
is_active	BOOLEAN

Initial services seeded at launch: OpenRouter (LLM, platform key available, default), Anthropic and OpenAI (LLM, no platform key, BYOK only), Voyage AI (embeddings, platform key), Postmark (email, platform key), Tavily (search, platform key), Tavus (video), Twilio (voice/SMS), DocRaptor (PDF), Todoist, HubSpot, Google Calendar, Gmail (integrations), and Cloudflare R2 (storage, platform key).

LLM Model Registry¶

llm_model stores the curated list of available models, managed by app admins.

Field	Type	Description
service_slug	TEXT	Which provider (e.g. `openrouter`)
model_id	TEXT	Provider's model ID (e.g. `anthropic/claude-opus-4-6`)
display_name	TEXT	Human-readable name
provider_name	TEXT	Model maker (Anthropic, OpenAI, Google, Meta)
recommended_tier	TEXT	`deep`, `general`, `mini`, `any`
input_cost_per_million	NUMERIC	USD per million input tokens
output_cost_per_million	NUMERIC	USD per million output tokens
context_window	INTEGER	Max tokens
is_enabled	BOOLEAN
is_default_deep	BOOLEAN	Platform default for deep tier
is_default_general	BOOLEAN	Platform default for general tier
is_default_mini	BOOLEAN	Platform default for mini tier

Three tiers: Deep for complex reasoning, multi-step planning, and delegation (default: Claude Opus 4.6). General for most interactions (default: Claude Sonnet 4.6). Mini for lightweight tasks such as summaries, keyword extraction, and simple classification (default: Claude Haiku 4.5).

Account Service Config¶

account_service_config stores per-account, per-service API key overrides.

Field	Type	Description
account_id	UUID
service_slug	TEXT
credentials_ref	TEXT	Vault reference (NULL = use platform key)
is_active	BOOLEAN
settings	JSONB	Service-specific config

Account LLM Preferences¶

account_llm_preference stores which model each account uses for each tier.

Field	Type	Description
account_id	UUID
tier	TEXT	`deep`, `general`, `mini`
model_id	UUID	FK to llm_model

If no preference is set, platform defaults apply.

Agent LLM Tier¶

Each agent has an llm_tier field (deep, general, mini) defaulting to general. This determines which model tier is used for that agent's interactions.

4.3 Runtime Resolution¶

LLM Call Resolution¶

The platform resolves which model and credential to use for each LLM call in seven steps:

The agent has an llm_tier (deep/general/mini).
Look up account_llm_preference for that tier.
If no preference, use the platform default from the llm_model table.
The model's service_slug identifies which provider (openrouter/anthropic/openai).
Look up account_service_config for that service.
If credentials_ref is set, fetch the key from the vault and use the account's key (no credit deduction).
If NULL, fetch platform_key_ref from the vault and use the platform key, then deduct credits.

Non-LLM Service Resolution¶

Look up account_service_config for the service slug.
If credentials_ref is set, fetch from the vault (no credits).
If NULL and platform_key_available is true, fetch the platform key (deduct credits).
If NULL and no platform key, the service is unavailable.

4.4 Vault and Credential Configuration¶

All API keys, whether platform-supplied or account-supplied (BYOK), resolve through the secrets vault described in 07 Security & Governance. The platform_service row carries a platform_key_ref pointing at a vault entry; the account_service_config row carries a credentials_ref for account-supplied keys. Resolution at turn time is: account override first, then platform default, then a clean error if neither is configured.

Bootstrap credentials for the Convex deployment itself (Convex deploy key, Clerk publishable and secret keys, the vault master encryption key) live as Convex environment variables managed via npx convex env set. These are deployment-bootstrap secrets rather than runtime service credentials, and they are invariant for the life of the deployment. Everything else resolves through the vault so rotation is a data-only operation.

4.5 Credential Security¶

Account API keys are never stored in plaintext in any Convex table. All keys live encrypted in the secrets vault; tables reference them by vault name (credentials_ref or platform_key_ref). The Convex governance middleware resolves these references to usable credentials only at the point of outbound call, under caller-scoped authorisation. OAuth tokens follow the same vault pattern with additional refresh-token handling in the oauth_token table. See 07 Security & Governance for the full vault model and MCP credential scoping.

5. Credit-Based Billing¶

5.1 Credit Ledger¶

credit_ledger records every credit movement.

Field	Type	Description
account_id	UUID
type	TEXT	`purchase`, `usage`, `refund`, `adjustment`
amount	NUMERIC(12,6)	Positive = credit added, negative = deducted
balance_after	NUMERIC(12,6)	Running balance
description	TEXT	Human-readable line item
service_slug	TEXT	Which service was used
interaction_id	UUID	For LLM usage tracking

account.credit_balance provides a denormalised current balance for fast reads.

All costs are shown in real USD. No abstract credit units. Users see "$0.003" not "3 credits." The balance is a USD prepaid balance.

5.2 Credit Deduction¶

deducted_amount = actual_api_cost_usd * (1 + platform_margin_percent / 100)

Deduction is atomic: balance check + update + ledger write in a single transaction. Fails if insufficient balance.

5.3 Platform Config¶

platform_config is a single-row table for global platform settings.

Field	Type	Description
status	TEXT	`online`, `maintenance`, `offline`
status_reason	TEXT	Shown to users during downtime
estimated_return	TIMESTAMPTZ	When service is expected back
platform_margin_percent	NUMERIC	Default 2.0%

5.4 Kill Switch¶

Every API request (except /health and /v1/admin/platform/config) passes through kill switch middleware. If platform_config.status is not online, return HTTP 503 with:

{
  "error": {
    "code": "platform_unavailable",
    "message": "Scheduled maintenance in progress",
    "status": "maintenance",
    "estimated_return": "2026-03-21T06:00:00Z"
  }
}

Status is cached for 10 seconds to avoid database hits on every request.

6. Platform Administration¶

6.1 App Admin Role¶

user_profile.is_app_admin is a boolean platform-level flag, independent of account roles. App admins can manage platform-wide configuration, services, models, and (in future) account suspension.

6.2 API Endpoints¶

App Admin (requires `is_app_admin`)¶

GET    /v1/admin/platform/config          Read platform status
PATCH  /v1/admin/platform/config          Update kill switch / margin

GET    /v1/admin/platform/services        List all services
POST   /v1/admin/platform/services        Add a service
PATCH  /v1/admin/platform/services/{slug} Update a service
DELETE /v1/admin/platform/services/{slug} Disable a service

GET    /v1/admin/platform/models          List all LLM models
POST   /v1/admin/platform/models          Add a model
PATCH  /v1/admin/platform/models/{id}     Update model (costs, defaults, enable/disable)

Account Settings (requires account membership)¶

GET  /v1/accounts/services?account_id=... List services with own-key status
POST /v1/accounts/services?account_id=... Set/update account API key for a service

GET  /v1/accounts/models?account_id=...   List account's LLM tier preferences
POST /v1/accounts/models?account_id=...   Set model for a tier

GET  /v1/accounts/credits?account_id=...  Balance + recent ledger entries

Platform Config (any authenticated user)¶

GET /v1/admin/platform/config             Returns status, reason, estimated_return

6.3 UI/UX Changes¶

Settings Page -- New Tabs¶

Models Tab (visible to editor and above). Shows three sections: Deep, General, Mini. Each shows the current model (or "Platform default: Claude Sonnet 4.6"). Dropdown to select from the curated list, filtered by recommended_tier. Each model card shows display name, provider, cost per million tokens, and context window. "Reset to default" button per tier.

Services & Keys Tab (visible to admin and owner). List of all platform services grouped by category. Each shows name, category, and a status indicator (green = using own key, blue = using platform key, grey = unavailable). "Add Key" button opens a form: paste API key, stored in the vault, credentials_ref set. "Remove Key" button removes credentials_ref, falling back to the platform key. OAuth services (Google Calendar, Gmail) show "Connect" button instead of key input. Services without a platform key and no own key show as "Not configured."

Subscription Tab (visible to admin and owner). Current balance displayed prominently (e.g. "$24.37 remaining"). "Add Funds" button (links to payment flow, future). Usage breakdown chart: cost by service over time. Recent transactions table from the credit ledger: date, description, service, amount, balance.

App Admin Tab (visible only when is_app_admin = true). Platform Status card with current status, reason field, estimated return picker, and save button. Services management: add/edit/remove services, set platform key references. Models management: add/edit models, set costs, toggle defaults, enable/disable. Accounts list with search/filter and suspend/unsuspend actions (future). Platform Margin setting with current percentage and edit button.

Role Mapping for Tab Visibility¶

Tab	viewer	editor	admin	owner	app_admin
Profile	yes	yes	yes	yes	yes
Account	yes	yes	yes	yes	yes
Models		yes	yes	yes	yes
Services & Keys			yes	yes	yes
Subscription			yes	yes	yes
App Admin					yes

Agent Studio Changes¶

Add an LLM Tier dropdown to the Agent Studio form (Deep / General / Mini) with a tooltip explaining the cost/capability tradeoff. Default: General.

6.4 Bootstrap vs Steady State¶

Phase	API Keys Source	Config Source
Bootstrap (initial deploy)	Env vars (legacy fallback)	Env vars + `platform_config` table
Steady state (target)	Secrets vault only	Vault + `platform_config` + `platform_service`

The transition is gradual and non-breaking. Each service can be migrated independently from environment variable to vault.

7. Implementation Phases¶

Phase 1: Agent Templates and LLM-Native Agents¶

Create the AgentTemplate structure and deploy it to the template registry.
Implement system prompt injection from template configuration.
Deploy Wave 1 agents (Writing, Chat, Coach, Branding, Content) as LLM-native stubs.
Implement the Coach Agent's library system integration for document-grounded personas.
Deploy Knowledge Base, HR, Document, and Data agents once the document ingestion system is available.
Seed initial account knowledge for HR and Onboarding agent testing.

Phase 2: Web Search and Research Agents¶

Integrate a web search provider (Tavily recommended).
Implement web_search, web_read_url, and web_read_multiple tool abstractions.
Deploy Research Agent and Briefing Agent with full web research capability.
Enhance Writing and Content agents with web-grounded factual lookup.

Phase 3: Task and Calendar Integration¶

Implement vendor-agnostic tool abstractions for task management and calendar.
Integrate the first task provider (Todoist or Jira) and calendar provider (Google Calendar).
Deploy Task Agent, Calendar Agent, and the first coordinator (Personal Assistant).
Deploy Meeting Agent, Project Coordinator, Onboarding Agent, and Support Triage Agent.
Implement OAuth consent flow for user-scoped integrations.
Implement coordinator delegation configuration and cycle detection.

Phase 4: Email and CRM Integration¶

Implement email tool abstractions and integrate Gmail API.
Deploy Mail Agent with full email management.
Integrate CRM provider (HubSpot or Salesforce) and deploy Customer Intelligence Agent.
Enhance Monitor Agent with full notification delivery across all channels.
Deploy Finance Agent when financial data integrations are available.

Phase 5: Predictive Planning System¶

Create canonical_plan, execution_outcome, and plan_score tables with RLS policies.
Deploy the Outcome Collector worker subscribing to interaction and step completion events.
Deploy the Learning Engine with Bayesian updates at all four hierarchy levels and score decay.
Deploy the Score Service as a Convex query over plan_score, backed by Convex's reactive query cache.
Integrate score injection into the Harness think step.
Add account settings for opt-in/opt-out of global learning.
Begin Stage 2 tuning and validation once 1,000+ outcomes accumulate.

Phase 6: Platform Services and Credit Billing¶

Create platform_service, llm_model, account_service_config, account_llm_preference, credit_ledger, and platform_config tables.
Seed the platform service registry and LLM model catalogue.
Implement LLM call resolution (seven-step resolver) and non-LLM service resolution.
Implement atomic credit deduction with balance check and ledger write.
Deploy kill switch middleware.
Build Settings UI: Models tab, Services & Keys tab, Subscription tab, App Admin tab.
Add LLM Tier dropdown to Agent Studio.
Migrate platform API keys from environment variables to the secrets vault.
Deploy BYOK support for accounts with their own API keys.

8. Revision History¶

Version	Date	Description
1.0.0	April 2026	Consolidated from old docs 19 (Starter Agent Catalogue v01), 21 (Agent Implementation Logistics v01), 30 (Predictive Planning System v01), and 31 (Platform Services LLM Credits Admin v01).

Agents Catalogue & Platform Services¶

1. Starter Agent Catalogue¶

1.1 Group 1: Core Specialists¶

Mail Agent¶

Calendar Agent¶

Task Agent¶

Research Agent¶

Writing Agent¶

Document Agent¶

Chat Agent¶

Coach Agent¶

1.2 Group 2: Coordinator Agents¶

Personal Assistant¶

Meeting Agent¶

Project Coordinator¶

Briefing Agent¶

1.3 Group 3: Data & Knowledge Agents¶

Data Agent¶

Knowledge Base Agent¶

Report Writer Agent¶

1.4 Group 4: Organisational Specialists¶

HR Agent¶

Finance Agent¶

Support Triage Agent¶

Content Agent¶

Customer Intelligence Agent¶

Onboarding Agent¶

Monitor Agent¶

Branding Agent¶

1.5 Summary Table¶

1.6 Upcoming Agents¶

Visualiser Agent (design decisions pending)¶

1.7 Implementation Notes¶

2. Agent Implementation Logistics¶

2.1 Integration Landscape¶

2.2 External Integration Map¶

Email¶

Calendar¶

Task Management¶

CRM¶

Web Search and Research¶

File Storage and Document Access¶

Notifications and Messaging¶

2.3 Agent-to-Integration Dependency Matrix¶

2.4 Implementation Sequence¶

2.5 Tool Abstraction Architecture¶

Vendor-Agnostic Tool Layer¶

Provider Configuration¶

Internal Fallback¶

2.6 OAuth Flow for User-Scoped Integrations¶

2.7 System Prompt Strategy¶

2.8 Coordinator Delegation Configuration¶

2.9 Stub Implementation Strategy¶

2.10 Monitoring and Quality¶

Per-Agent Metrics¶

Common Failure Modes¶

3. Predictive Planning & Execution Learning¶

3.1 Problem Statement¶

3.2 Value Progression¶

3.3 Core Concepts¶

What is a "Plan"?¶

What is "Success"?¶

What is the "Context"?¶

3.4 Architecture¶

Integration with the Harness¶

Event Kinds Consumed¶

3.5 Data Model¶

canonical_plan¶

execution_outcome¶

plan_score¶

Relationship to Existing Tables¶

3.6 Bayesian Scoring Model (Stage 1)¶

Why Bayesian?¶

The Model¶

Hierarchical Priors¶

Scope Keys¶

3.7 Score Service¶

Internal API¶

Caching via Convex query cache¶

Agent Integration¶

`canonical_plan`¶

`execution_outcome`¶

`plan_score`¶

App Admin (requires `is_app_admin`)¶