Skip to content

Agent Architecture & Extensibility

Overview

This document is the canonical reference for Thinklio's agent layer: what an agent is, how agents are configured and executed, how they compose with one another, how the platform triages input for performance, how attention is surfaced to users, and how the system is extended with new tools, external orchestration, MCP servers, and custom Convex components.

An agent in Thinklio is a persistent, durable entity with an identity, a system prompt, a set of tools, delegation rules, a knowledge context drawn from four layers, channel connections, and a governance envelope. Agents are not sessions or prompts. They persist across conversations, accumulate knowledge, and maintain consistent behaviour according to their configuration. From a technical standpoint, every agent is described by a universal manifest format regardless of whether it ships as a platform built-in, is composed in the Agent Studio, or runs externally behind an HTTP endpoint.

The execution model divides into two branches. Platform-executed agents are driven entirely by an LLM system prompt and the Thinklio Harness (now implemented as Convex Workflow-wrapped actions with the @convex-dev/agent component). Externally-executed agents run behind an HTTP endpoint operated by a developer or third party. From the composition perspective the two are interchangeable: each exposes the same interface and each may be invoked as a delegate by other agents.

Agent composition is built on the agent-as-tool pattern. A coordinator agent delegates to specialist agents through the same tool-calling interface it uses to invoke any other tool. Delegation is governed by depth limits, cycle prevention, per-assignment tool restrictions, and the full policy stack. For work that outlives a single interaction, the job system (now modelled as Convex Workflow steps with waitForEvent) tracks deferred tasks, partial output, observer notifications, timeouts, and cancellation.

To avoid wasting full LLM reasoning cycles on simple inputs, a three-tier smart input triage system classifies every inbound message computationally before scheduling an agent turn. High-confidence read-only lookups and greetings are handled in sub-200ms at Tier 2; everything else flows to the standard Tier 3 agent reasoning path enriched by the parser's extracted intent and entities.

On the output side, the attention surfacing system ensures agents reduce rather than increase cognitive load. A briefing snapshot, roster urgency indicators, and decaying notifications work together so the user sees exactly what needs their attention without being overwhelmed. The Radar agent provides on-demand cross-agent inspection as a safety net.

Extensibility is organised into five layers: stateless tools (internal, external REST, or MCP), reasoning agents, external orchestration (n8n, Go services, generic dispatch/callback), custom Convex components, and the platform agent catalogue. The Agent Studio is the composition and configuration surface that sits above all five layers. No custom code is required to create, customise, or compose agents; code only enters the picture when a new tool implementation, a custom Convex component, or a high-throughput Go service is needed.

This document consolidates seven earlier documents (15, 33, 37, 40, 42, 45, 46) into a single reference. For agent governance details see doc 07 Security & Governance. For individual agent specifications see the agent-specs/ directory and doc 08 Agents Catalogue & Platform Services. For the underlying data model see doc 04 Data Model. For knowledge persistence and ingestion see doc 05 Persistence, Storage & Ingestion.

Table of contents

1. Purpose and context

This document is the single reference for the agent layer of the Thinklio platform. It answers:

  • What is an agent, and how is one configured?
  • How are agents executed, whether on-platform or externally?
  • How do agents compose with each other through delegation and the job system?
  • How does the platform triage inbound messages for performance?
  • How does the platform surface agent activity without overwhelming the user?
  • How is the system extended with new tools, external services, MCP servers, and custom components?

For the technical execution harness, Convex component APIs, and step lifecycle details, see doc 02 System Architecture and doc 11 Convex Reference. For individual agent specifications, see the agent-specs/ directory and doc 08 Agents Catalogue. For governance, trust levels, and policy enforcement, see doc 07 Security & Governance. For the knowledge data model and ingestion pipeline, see doc 05 Persistence, Storage & Ingestion.

2. Design principles

Twelve principles govern the agent architecture. Several originate from the extensibility and composition model; the rest from the attention surfacing and triage designs.

Simple agents pay no complexity tax. An agent that answers questions from its knowledge and calls a quick API tool never touches the job system or workflow machinery. The infrastructure exists only when needed.

The agent reasons, the system tracks. The job system manages state, notifications, and lifecycle. Business logic about what to do with results, partial output, and failures belongs in the agent's reasoning, not in the infrastructure.

Observation, not ownership. Any number of agents or system processes can observe a job. The job does not know or care who is watching. This decouples coordination from execution.

Deferred work is still governed. Jobs inherit the governance context (budget, policies, audit trail) of the interaction that created them. Deferred execution is not an escape hatch from governance.

Universal event model. All input, regardless of source (channel message, external webhook, scheduled trigger, dispatch callback, internal system event), is normalised into a single event shape before any agent logic processes it. Agent logic never inspects the transport; it consumes the normalised event.

Performance tiers. Not every input requires the same processing weight. The triage system routes events through three tiers so that quick things never pay the workflow overhead and complex things get the full reasoning path.

Immediate responsiveness. Users must always know the system has received their input and is working on it. Tier ½ responses appear within 500ms. Tier 3 responses show a "thinking" indicator immediately. Deferred work gets an immediate acknowledgement followed by the final result.

Graceful degradation. When a multi-step task partially fails, the agent reasons about what it has and what it is missing, then decides whether to deliver partial results or retry. Infrastructure handles transient failures transparently; the agent handles non-transient failures naturally.

Quiet means good. An empty briefing, a roster of green dots, a clear notification panel: these are success states, not failure states. Agents exist to reduce cognitive load, and the surfacing system must avoid creating cognitive load.

Attention is finite. Every signal the platform sends consumes some of the user's attention budget. Signals that turn out to be unimportant waste attention and train users to ignore future signals. The platform must be selective about what it surfaces and honest about urgency.

Agents manage complexity, not users. Users should not need to configure complex rules about what to surface. The agents themselves decide what is relevant and urgent within their domain. The platform aggregates these signals; the user consumes a single, coherent picture.

No duplicate signals. If something appears in the briefing, it does not need a notification unless it is time-sensitive. If it is a roster dot colour, it does not also need a banner.

3. What is an agent

An agent in Thinklio is an AI assistant with a persistent identity, a body of knowledge, a set of capabilities, and a governance envelope. It is not a session or a prompt. It is a durable entity that persists across conversations, accumulates knowledge, and maintains consistent behaviour according to its configuration.

Every agent has:

  • An identity. A name, slug, description, accent colour, avatar, and unique ID. The identity is how the agent presents itself across all channels.
  • A system prompt. The core instruction that shapes the agent's reasoning, tone, domain expertise, and behavioural rules.
  • A tool set. The external capabilities the agent can invoke: data lookups, document creation, calendar operations, API calls, MCP tool calls, and so on. Tools are registered in the tool table and assigned to agents via the agent_tool join table.
  • A delegation set. The other agents it may invoke to delegate specialist tasks, declared in the agent configuration and resolved through the agent-as-tool pattern.
  • A knowledge base. Structured context drawn from four knowledge layers: agent, account, team, and user (see section 11).
  • Channel connections. The communication channels on which the agent is reachable, with per-channel trigger modes (mention-only, proactive, silent).
  • A governance envelope. The policies, trust level, cost limits, and approval rules that bound what the agent can do, enforced by the layered governance hierarchy (account, team, user).

From the platform's perspective, an agent is a record in the agent table linked to an optional agent_catalog entry that carries its default configuration. Platform-executed agents are driven by the Harness (Convex Workflow-wrapped Agent component turns). Externally-executed agents expose an HTTP endpoint that the platform calls.

4. Execution model

4.1 Platform-executed agents

Most agents on Thinklio are platform-executed. Their logic is entirely expressed as a system prompt and a set of tool and delegation assignments. The Harness assembles the agent's context at runtime, calls the LLM with the system prompt and assembled context, executes any tool calls the LLM produces, and delivers the response.

The Harness persists every step (think, act, respond) before and after execution via the Workflow component's step journalling. This makes platform-executed agents automatically durable: if execution fails partway through, it can resume from the last completed step without re-executing earlier steps or incurring duplicate costs.

Platform-executed agents are configured, not coded. All reasoning happens inside the Harness using the LLM. The agent itself is a collection of configuration.

This covers all built-in agents from the platform catalogue, all Studio agents (which are reconfigurations of built-in agents), and any agent that can be fully described by a system prompt plus a set of tool and delegation assignments.

4.2 Externally-executed agents

External agents run outside the Thinklio platform, behind an HTTP endpoint operated by the developer or third party. The platform sends a context bundle (message history, knowledge context, tool set) to the endpoint and receives a structured response. The external agent may be arbitrarily complex internally, using any LLM, framework, or architecture, as long as it conforms to the execution contract defined in section 6.

From the composition perspective, external agents and platform-executed agents are interchangeable. Each exposes the same interface and each may be called as a delegate by other agents.

The line between "external tool" and "external agent" is a spectrum. A CRM lookup tool takes parameters and returns data. An external research agent takes a brief, runs for minutes, calls multiple APIs, and returns a structured report. Both use the same execution contract. The difference is in capability level and what the platform expects from them.

4.3 Steps and durability

Every agent turn proceeds through discrete steps managed by the Harness:

  1. think: the LLM receives the assembled context and produces a reasoning trace and action plan.
  2. act: tool calls and delegation requests are executed.
  3. respond: the agent's output is assembled and delivered to the originating channel.

Steps are persisted before and after execution via the Workflow component's journal. A failed step can be retried. Completed steps are never re-executed. This guarantees that even long-running, multi-step agent interactions are fully resumable without data loss or duplicate side-effects.

5. Universal agent manifest

Every agent, regardless of execution model, is described by the same definition. This is the portable, declarative format that works whether the agent ships built-in, is created in the Studio, or is installed from an external source.

5.1 Manifest format

# Agent Manifest: thinklio-agent.yaml
version: "1"
kind: agent

# Identity
name: "Emily"
slug: "emily"
description: "Executive assistant specialising in calendar management and email triage"
accent_colour: "#B85A24"
avatar_url: null

# Execution
execution:
  type: platform | external

  # For platform-executed agents:
  system_prompt: |
    You are Emily, an executive assistant...
  system_prompt_ref: "./prompts/emily.md"

  # For externally-executed agents:
  endpoint_url: "https://agents.example.com/emily/execute"
  health_check_url: "https://agents.example.com/emily/health"
  auth_config:
    type: bearer | hmac | api_key
    secret_name: "emily_agent_secret"
  timeout_seconds: 30
  supports_deferred: true

# Capabilities
capability_level: tools_only | workflow | experimental | learning
model_preference:
  provider: "anthropic"
  model: "claude-sonnet-4-6"

# Channels
channels:
  - web
  - email
  - telegram
  - api

# Tools
tools:
  - knowledge_retrieval
  - document_generator
  - web_search
  - calendar_api
  - email_send

# Delegation
delegates:
  - agent_ref: "scout"
    purpose: "Research tasks"
    restrictions:
      max_cost_per_delegation: 5.00
      allowed_tools: ["web_search", "knowledge_retrieval"]
  - agent_ref: "quill"
    purpose: "Document drafting"

# Knowledge
knowledge:
  library_assignments:
    - library_ref: "company-policies"
      priority: 1
    - library_ref: "project-docs"
      priority: 2
  seed_facts:
    - subject: "Preferred meeting length"
      value: "30 minutes unless specified otherwise"
      category: "preference"

# User configuration
configuration_schema:
  - field: "preferred_name"
    label: "What should I call you?"
    type: text
    required: true
  - field: "calendar_provider"
    label: "Calendar service"
    type: select
    options: ["Google Calendar", "Microsoft Outlook", "Apple Calendar"]
    required: true

# Views
views:
  - type: kanban
    label: "Board"
    icon: "Kanban"
    data_source: { entity: task, scope: inherit }
    mapping:
      columns: { field: status, values: [todo, in_progress, done] }
      card_title: title
      card_fields: [priority, due_date, assigned_to]
  - type: calendar
    label: "Calendar"
    icon: "CalendarBlank"
    data_source: { entity: task, scope: inherit }
    mapping: { date_field: due_date, title_field: title, colour_field: priority }

# Governance
governance:
  trust_level: standard
  budget_limit_per_interaction: 2.00
  requires_approval_for:
    - external_email
    - financial_transaction

# Metadata
metadata:
  origin: built_in | custom | installed
  author: "Thinklio"
  version: "1.0.0"
  source_url: null
  tags: ["assistant", "calendar", "email"]
  parent_template: null

5.2 Section reference

Identity is what users see: the name, description, colour, and avatar. Identical regardless of provenance.

Execution is the critical fork. For platform-executed agents, this contains the system prompt (inline or as a file reference). For externally-executed agents, this contains the endpoint URL, authentication, timeout, and whether the agent supports deferred work. The Harness reads this section to decide how to run the agent.

Capabilities tells the Harness what the agent is allowed to do. The capability_level governs whether the agent can use tools, compose with other agents, run experimental features, or access the learning system. Model preference is optional and overrides the account default.

Tools lists which platform tools the agent has access to. For platform-executed agents, these are the tools the LLM can invoke at the act step. For external agents, these are tools the external agent can call back into Thinklio to use via the Platform API.

Delegation defines which other agents this agent can invoke as tools (the agent-as-tool pattern from section 7). Each delegation entry can carry restrictions: cost caps, allowed tools for the delegate, and purpose descriptions that help the LLM decide when to delegate.

Knowledge configures which document libraries the agent draws from and any seed facts to pre-populate. Library references use slugs, resolved at deployment time against the account's available libraries.

Configuration schema defines fields that end users must complete before using the agent (the onboarding flow). This is how an agent becomes personalised: the PA asks for your preferred name and calendar provider; the Coach Agent asks for your domain and goals.

Views declares the agent's structured UI surfaces beyond conversation (see section 18).

Governance sets per-agent trust and spending limits, integrating with the security model (doc 07) and the policy engine.

Metadata tracks provenance. origin distinguishes built-in from custom from installed. parent_template links Studio derivatives back to their base template. version enables the platform to detect when an agent definition has been updated.

5.3 Studio agent mapping

When an admin creates an agent in the Agent Studio, the Studio is a visual editor that produces an agent manifest. The mapping:

Studio Field Manifest Field
Name name
Description description
Accent colour accent_colour
Base Template metadata.parent_template
System Prompt execution.system_prompt
Channels channels
Tools tools
User Configuration configuration_schema

The Studio currently does not expose delegation, knowledge library assignments, governance overrides, or model preferences. These are available in the manifest format for agents that need them, and the Studio can be extended to expose them later. For now, Studio agents inherit delegation and knowledge configuration from their parent template.

5.4 Built-in agent mapping

Each agent in the platform catalogue should be expressible as a manifest file. The platform ships with these manifests bundled in the codebase at a conventional path (e.g. agents/built-in/aria.yaml, agents/built-in/scout.yaml). On first boot or when a new account is created, the platform reads the manifests and creates agent records from them. The manifests are version-controlled and deployed with the application.

5.5 External agent mapping

An external agent uses the same manifest but with execution.type: external. The key differences:

  • execution.endpoint_url replaces execution.system_prompt. The Harness sends work to this URL instead of calling an LLM.
  • execution.auth_config defines how Thinklio authenticates to the external endpoint.
  • execution.supports_deferred tells the Harness whether it can dispatch long-running work to this agent.
  • tools lists platform tools the external agent is allowed to call back into via the Platform API, reversing the direction: instead of the Harness calling tools on behalf of the agent, the external agent calls Thinklio's tools directly.

6. External agent execution contract

6.1 The spectrum

External agents sit on a spectrum from simple to autonomous:

Level Input/Output State Duration Callbacks
Simple Tool Parameters in, result out Stateless Seconds None
Stateful Tool Parameters in, result out May maintain state Seconds None
Autonomous Agent Brief in, result eventually Manages own workflow Minutes to hours May call back into Thinklio APIs

The execution contract supports all three. The difference is in which parts of the contract the agent uses.

6.2 Synchronous execution

Identical to the tool execution contract from the developer docs. Thinklio sends a POST request, the agent returns a result.

Request:

{
    "call_id": "tc_01hxyz...",
    "agent_name": "crm-lookup",
    "type": "execute",
    "input": {
        "message": "Look up customer Acme Corp",
        "parameters": {
            "customer_id": "cust_abc123"
        }
    },
    "context": {
        "account_id": "acc_01...",
        "agent_id": "agt_01...",
        "interaction_id": "int_01...",
        "user_id": "usr_01...",
        "channel_type": "webchat"
    }
}

Response:

{
    "result": {
        "content": "Acme Corp is an enterprise customer on the $4,900/month plan.",
        "structured_data": {
            "customer_name": "Acme Corp",
            "plan": "enterprise",
            "mrr": 4900
        }
    }
}

The type: "execute" field distinguishes this from other request types. The input.message field carries the natural language instruction from the calling agent, while input.parameters carries structured data if the agent defines a parameter schema.

6.3 Deferred execution

For agents that need more time, the contract supports a deferred pattern. The external agent accepts the work and returns a job reference. When the work is complete, it calls back to Thinklio with the result.

Deferred response from the external agent:

{
    "deferred": true,
    "job_reference": "ext_job_abc123",
    "estimated_duration_seconds": 300,
    "status_url": "https://agents.example.com/deep-research/jobs/ext_job_abc123"
}

When the external agent returns "deferred": true, the Harness creates a job record and the interaction completes with a "working on it" response to the user.

Callback (external agent calls Thinklio when done):

POST https://api.thinklio.com/v1/callbacks/{call_id}
Authorization: Bearer <callback_token>
{
    "job_reference": "ext_job_abc123",
    "status": "resolved",
    "result": {
        "content": "Here is the competitive landscape report...",
        "artefacts": [
            {
                "title": "Battery Storage Competitive Analysis",
                "type": "document",
                "content_url": "https://agents.example.com/results/report_xyz.pdf"
            }
        ]
    }
}

The callback triggers a follow-up interaction where the calling agent processes the result.

Status polling (optional). The Harness can poll status_url for progress updates, feeding them into the Job Progress Card in the UI.

6.4 Platform API callbacks

A sophisticated external agent may need to call back into Thinklio during execution: to retrieve knowledge, check a user's calendar, or delegate to another Thinklio agent. The Platform API serves this purpose. The external agent authenticates using the callback token provided in the execution request context.

Available callback capabilities (subject to the agent's tools list in its manifest):

  • Knowledge retrieval (query the agent's assigned libraries)
  • Tool invocation (call platform tools the agent is authorised to use)
  • Agent delegation (invoke other Thinklio agents, if the agent has delegation permissions)
  • Artefact storage (upload results to the platform's storage)

The callback token is scoped to the current interaction and inherits the governance context (budget limits, trust level, policy constraints) of the calling agent. An external agent cannot exceed the permissions defined in its manifest.

6.5 Request verification and health checks

All requests from Thinklio include an X-Thinklio-Signature header (HMAC-SHA256 of the request body, signed with the agent's secret). External agents should verify this signature before processing requests.

Thinklio sends periodic GET requests to the agent's health_check_url. The agent should return 200 if healthy. Unhealthy agents are circuit-broken: the Harness will not send new work to an unhealthy external agent, and the platform notifies the account admin via the existing webhook notification system.

7. Agent composition and delegation

7.1 Agent-as-tool pattern

An agent can be registered as a tool available to other agents. From the invoking agent's perspective, delegating to another agent is the same as calling any other tool: it is an act step in the Harness, evaluated by the policy engine, tracked for cost, and recorded in the audit trail.

The tool table supports a tool type of agent alongside internal, external, and mcp:

tool (type: agent)
    slug                "scheduler_agent"
    name                "Scheduler Agent"
    type                agent
    trustLevelRequired  As per the delegate agent's most sensitive capability
    schema              The invocation contract (structured input)
    returnSchema        The result contract (structured output)
    config              { "agent_id": "..." }

When a coordinator's think step decides to delegate to the scheduler, the Harness creates an act step with tool type agent. The tool service resolves the agent ID from the tool config, creates or continues a thread for the delegate agent, and runs a full agent turn. The result flows back to the coordinator's context.

This means the tool invocation interface is uniform across all capability types (internal function, external API, MCP tool, agent delegation), but agents and tools remain distinct entities with different configuration, governance, and lifecycle.

7.2 Invocation contract

Every agent-as-tool defines a contract: what structured input it accepts and what structured output it returns. This is defined in the tool's schema and returnSchema, the same as any other tool.

The contract serves two purposes:

  1. For the invoking agent. The LLM knows what parameters to provide and what to expect back. The parameter schema appears in the tool definition within the invoking agent's context, just like any other tool.

  2. For the delegate agent. The structured input becomes part of the delegate's context. The delegate's system prompt and knowledge handle the reasoning. The structured output is what the delegate's Harness produces as the act step's result for the invoking agent.

Example contract for a scheduler agent:

{
    "schema": {
        "type": "object",
        "properties": {
            "action": {
                "enum": ["find_free_time", "create_event", "check_conflicts"]
            },
            "time_range": {
                "type": "object",
                "properties": {
                    "start": { "type": "string", "format": "date-time" },
                    "end": { "type": "string", "format": "date-time" }
                }
            },
            "context": { "type": "string" }
        },
        "required": ["action"]
    },
    "returnSchema": {
        "type": "object",
        "properties": {
            "status": { "enum": ["success", "partial", "failed"] },
            "slots": { "type": "array" },
            "event_created": { "type": "object" },
            "message": { "type": "string" }
        }
    }
}

7.3 Step execution modes

The durable execution Harness supports three explicit execution modes for act steps. The step state machine (created, running, success/failed) remains unchanged. What changes is how the Harness orchestrates what happens after a step completes.

Immediate mode. The default. The step executes synchronously within the interaction. The Harness waits for the result before proceeding to the next step. Examples: calendar lookup, web search, quick API call, knowledge retrieval. The entire interaction completes within seconds.

Deferred mode. The step dispatches work to an external execution engine and completes immediately with a job reference. The actual work happens outside the Harness. When the work finishes, a callback triggers a new follow-up interaction where the agent evaluates the result. Examples: long-running n8n workflows, human handoffs, external system integrations with unpredictable response times.

Interactive mode. The step returns a result, and the Harness feeds it back into a new think step where the agent decides what to do next. Multiple think/act/observe cycles within a single interaction, with the agent making routing decisions at each think step.

Mixed mode. A single interaction can use different modes for different steps. A PA might perform an immediate calendar check, dispatch a research job (deferred), and respond to the user acknowledging both. The Harness tracks mode per step. Immediate and interactive steps execute within the current interaction. Deferred steps create jobs and complete immediately. The interaction succeeds when all its steps have succeeded (including deferred steps that succeed by dispatching, not by completing the work).

When an agent invokes another agent as a tool, the delegation can also use any of these modes:

  • Immediate delegation. The delegate executes its full Harness cycle synchronously. Suitable when the delegate's work is quick.
  • Deferred delegation. The delegate dispatches deferred work. The invoking agent's act step creates a job. Results arrive later through a follow-up interaction.
  • Interactive delegation. The invoking agent delegates, evaluates the result, and makes further delegation decisions within the same interaction.

7.4 Delegation context and knowledge isolation

When an agent delegates to another agent, the delegate receives:

  1. Its own context layers. The delegate's knowledge (agent, account, team, user) is assembled as normal, scoped to the assignment context.
  2. The invocation payload. The structured input from the invoking agent, conforming to the parameter schema. This is the delegate's "brief".
  3. Relevant conversation context (optional). The invoking agent may include a summary of relevant conversation context in the invocation payload's context field. The invoking agent's job is to distil the request, not to forward the entire conversation history.

The delegate does not receive the invoking agent's full context, knowledge layers, or conversation history. This preserves knowledge isolation between agents and prevents context window bloat in the delegate's LLM call.

When a delegate agent extracts knowledge during its interaction, the knowledge is scoped to the delegate's assignment context, following the standard knowledge layer rules. The invoking agent does not automatically receive knowledge extracted by the delegate. Cross-agent knowledge sharing happens through the normal interaction flow, not through a backdoor.

7.5 Depth limits and cycle prevention

Delegation depth limit. A governance setting (max_delegation_depth) on the account policy limits how many levels of agent-to-agent delegation are permitted. Default: 3. When a delegation would exceed this depth, the policy engine denies the act step with reason delegation_depth_exceeded.

Cycle detection operates at two levels:

  1. Configuration time. When an agent-as-tool is registered in Agent Studio, the system checks the delegation graph for cycles. If adding the tool would create a cycle, the configuration is rejected with an explanation.
  2. Runtime. Each delegation carries a delegation chain (list of agent IDs in the current call stack). If a delegation would invoke an agent already in the chain, the policy engine denies it with reason delegation_cycle_detected.

7.6 Per-assignment tool restrictions

When an agent is assigned to a context (user, team, or account), the assignment can restrict the agent's tool capabilities below its configured maximum. This applies to both regular tools and agent-as-tool delegations.

The agent_assignment table carries a toolRestrictions field containing overrides that narrow (never widen) the agent's configured tool permissions:

{
    "scheduler_agent": {
        "allowed_actions": ["find_free_time", "check_conflicts"],
        "denied_actions": ["create_event"]
    },
    "search_web": {
        "max_calls_per_interaction": 3
    },
    "send_email": {
        "blocked": true
    }
}

The policy engine evaluates tool access in order:

  1. Agent configuration. Does the agent have this tool assigned? (Maximum capability.)
  2. Assignment restrictions. Does the assignment narrow the tool's permissions for this context?
  3. Account policies. Do account-level policies impose further restrictions?

Each layer can only restrict, never expand. An assignment cannot grant tool access that the agent does not already have. An account policy cannot grant permissions that the assignment has removed.

For agent-as-tool delegations, assignment restrictions can limit what actions the delegate agent is asked to perform. The scheduler agent might be configured with full calendar capabilities, but when used as a delegate from a particular team's PA, the assignment restriction limits it to read-only operations.

7.7 Coordinator pattern

A coordinator agent is a platform-executed agent whose primary purpose is orchestration rather than domain work. It receives requests, decomposes them into tasks, delegates to appropriate specialists, synthesises results, and delivers a unified response.

Coordinators have large context windows and are configured with a broad delegation set. They typically have minimal domain knowledge of their own. Their value is in understanding how to route work and synthesise results. The Personal Assistant agent is the canonical example.

Agent Studio provides a visual interface for composing agents. Admins select a base agent (typically a coordinator), configure its delegation set from available agents, write or adapt the system prompt, assign channels, and set the governance envelope. The result is a new agent configuration that can be deployed immediately.

Templates for composed agents (e.g. "Personal Assistant with Scheduler and Research") pre-configure delegation relationships and tool assignments, allowing non-technical users to build useful composite agents without understanding the underlying architecture.

7.8 Delegation audit trail

Every delegation creates a child interaction linked to the parent:

interaction (delegation)
    id                      Convex _id
    parentInteractionId     the invoking agent's interaction ID
    agentId                 the delegate agent
    delegationDepth         current depth in the delegation chain
    ...other standard interaction fields

The audit trail shows the full chain: user message, coordinator interaction, delegate interaction, result flows back. Cost aggregation follows the chain: the delegate's interaction cost becomes the act step's cost in the invoking interaction, which rolls up to the user/team/account attribution.

8. Job system

8.1 Purpose

The job system manages units of work that outlive a single interaction. A job is created when a deferred act step dispatches work, and it persists until the work completes, fails, times out, or is cancelled. Jobs are the coordination mechanism between the dispatching interaction and the follow-up interaction that processes results.

In the Convex-first architecture, the job lifecycle is modelled as Workflow steps with waitForEvent for the pause/resume cycle. The conceptual model (job entity, subjobs, observers, state machine) is preserved but implemented natively in Convex rather than in Redis and PostgreSQL.

8.2 Job and subjob entities

job
    id                  Convex _id
    type                string          research, handoff, workflow, delegation, etc.
    createdByAgent      reference       agent that created this job
    createdByInteraction reference      interaction that spawned this job
    sessionId           string          conversational context for follow-up
    state               enum            pending, dispatched, in_progress,
                                        resolved, failed, cancelled, timed_out
    hasUsefulOutput     boolean         whether partial output worth notifying about
    dispatchTarget      object          webhook URL, agent ID, external system config
    dispatchPayload     object          structured brief/parameters for the executor
    contextBundle       object          accumulated state from prior steps for
                                        follow-up interactions
    usefulnessRule      string          rule identifier (default: any_completed_subjob)
    timeoutAt           number          deadline for the timeout monitor
    createdAt           number
    updatedAt           number
subjob
    id                  Convex _id
    jobId               reference       parent job
    label               string          human-readable description
    order               number          sequence within the job
    state               enum            pending, running, completed, failed
    resultData          object          output from this subjob
    errorData           object          failure details (if failed)
    startedAt           number
    completedAt         number

A simple deferred job (dispatch one workflow, get one result) has a single subjob. A complex job (generate five articles) has five subjobs. The subjob structure allows the system to track granular progress and enables partial output notification.

The contextBundle carries forward state that follow-up interactions need. When a PA checks the calendar (immediate) and then dispatches research (deferred), the calendar result is stored in the job's context bundle. The follow-up interaction triggered by the research callback has access to both the research output (from the job's subjobs) and the calendar result (from the context bundle).

8.3 Job state machine

pending -> dispatched -> in_progress -> resolved
                                     -> failed
                                     -> cancelled
                                     -> timed_out

pending: Job record created, not yet dispatched.

dispatched: Work sent to the execution engine, awaiting acknowledgement.

in_progress: Execution engine has acknowledged. Subjobs are running. While in this state, the job may set hasUsefulOutput = true when at least one subjob has completed with usable output.

resolved: All subjobs have reached a terminal state and at least one subjob succeeded. The observing agent evaluates the subjob results to determine the meaning: full success, acceptable partial result, or effective failure despite technical completion.

failed: All subjobs have reached a terminal state and none succeeded.

cancelled: Explicitly cancelled by an agent or user. Cancellation is best-effort; the execution engine may or may not be able to halt in-flight work.

timed_out: The timeout deadline was reached with subjobs still in non-terminal states. The timeout monitor transitions the job and notifies all observers.

8.4 Partial output and usefulness

When a subjob completes, the job evaluates whether useful output is now available. The default rule (any_completed_subjob) is:

  • At least one subjob has reached completed state with non-null resultData.
  • The job is still in_progress (other subjobs are still pending or running).

When these conditions are met, hasUsefulOutput is set to true and qualifying observers are notified.

Custom usefulness rules can be defined for more complex job types:

  • Minimum count. At least N subjobs must complete before output is considered useful.
  • Ordered dependency. Subjobs 1 to 3 must complete before the output is useful (a pipeline where later stages depend on earlier ones).
  • Specific required. Particular subjobs (identified by label or order) must be among the completed set.

8.5 Observer model

The observer model decouples job creation from job consumption. Multiple agents, monitoring systems, or administrative processes can register interest in a job and receive notifications when its state changes.

job_observer
    jobId               reference       parent job
    observerType        enum            agent, system
    observerId          string          agent ID or system process identifier
    assignmentId        string          which assignment context governs permissions
    notifyOn            enum            completion_only, failure_only,
                                        partial_and_completion, all_changes
    callbackMetadata    object          context the observer needs when notified
    registeredAt        number

The creating agent is automatically registered as an observer. Additional observers can be added by any agent with visibility into the job (governed by assignment context and account policies).

Notifications pass through a three-layer filtering chain:

  1. Job-level filter. The job evaluates whether a state change is worth broadcasting. Only meaningful state changes propagate. Terminal states always propagate.
  2. Observer-level filter. Each observer's notifyOn preference determines which notifications reach it. Terminal state notifications bypass observer preferences; all observers are always notified of terminal states.
  3. Agent-level reasoning. The agent receives the notification, queries the current job state and subjob results, and decides whether to act.

8.6 Timeout monitoring

A scheduled Convex function monitors active jobs:

  1. Scan for jobs where timeoutAt < now() and state is non-terminal.
  2. Transition matching jobs to timed_out.
  3. Transition any running subjobs within timed-out jobs to failed with error reason parent_job_timed_out.
  4. Notify all observers.

Default timeout: 30 minutes. Maximum configurable timeout: 24 hours. For human handoff jobs, timeouts should be set generously.

8.7 Cancellation

Cancellation is a first-class operation.

User-initiated. The user tells the agent to cancel a pending job. The agent recognises the cancellation intent and calls a cancel action.

Agent-initiated. The agent decides to cancel a job based on new information (user changed their mind, budget is exhausted, a prerequisite failed).

System-initiated. The timeout monitor cancels timed-out jobs. An admin pauses an agent, which cancels its pending jobs.

Cancellation flow:

  1. Job state transitions to cancelled.
  2. A cancel signal is sent to the execution engine (best-effort).
  3. All observers are notified.
  4. Running subjobs are transitioned to failed with error reason job_cancelled.

Cancellation of a job that has already reached a terminal state is a no-op.

9. Smart input triage

9.1 Three-tier model

Every user message currently follows the same path: context assembly, LLM call, tool execution, LLM call, response. This takes 2.5 to 8 seconds regardless of complexity. Most messages do not need a full LLM reasoning cycle. The triage system classifies every inbound message and routes it to the appropriate processing tier.

User Input
    |
+-------------------------------------+
| Tier 1: Computational Parser (<10ms)|
| Intent, entities, confidence score  |
+-----+-------------------------------+
      |
      +-- HIGH confidence + known pattern -> Tier 2 (direct execution)
      +-- MEDIUM confidence -> Tier 2 (small LLM confirms, then execute)
      +-- LOW confidence -> Tier 3 (full LLM reasoning)
      |
+-----+-------------------------------+
| Tier 2: Fast Execution (<500ms)     |
| Direct tool/knowledge, template,    |
| or small local LLM                  |
+-----+-------------------------------+
      |
      +-- Resolved -> respond directly
      +-- Needs reasoning -> Tier 3
      |
+-----+-------------------------------+
| Tier 3: Full LLM Reasoning (2-8s)  |
| Claude via OpenRouter, multi-step   |
| tool use, complex composition       |
+-------------------------------------+

9.2 Tier 1: computational parser

The parser runs inline as a Convex mutation alongside the message write. It classifies intent, extracts entities, and computes a confidence score without any LLM call.

Intent categories:

Intent Pattern Example Action
knowledge_lookup Question + known entity/fact type "What's Ella's phone number?" Knowledge cache lookup
knowledge_store Imperative + entity + value "Remember Dave's birthday is March 5" Route to Tier 3 (write operation)
greeting Standard greeting patterns "Hello", "Hi Atlas" Template response
meta_question Questions about the agent/platform "What can you do?" Template or search_tools
tool_request Clear tool invocation "Search the web for..." Route to Tier 3 unless read-only
time_query Time/date questions "What time is it?" Direct current_time tool
complex_reasoning Multi-step, open-ended, creative "Write a report on..." Full LLM
conversation Continuing a thread, follow-up "Tell me more" Full LLM with history

Signal extraction uses three methods:

  • Lexicon-based (O(1) lookups). Knowledge verbs ("what is", "tell me", "show me"), store verbs ("remember", "note", "save"), tool verbs ("search", "create", "send"), greeting patterns, meta patterns, time patterns.
  • Regex-based. Entity patterns (possessive names, @mentions), value patterns (phone numbers, emails, dates, amounts), question patterns (who/what/where/when/how + subject).
  • Context-based. Known entities from the knowledge cache (names, categories), known agent names and capabilities, session history (is this a follow-up?).

Scoring algorithm. Each intent category starts with a base score. Signals accumulate additively. The base score for complex_reasoning is 0.2, making it the default when no strong signals are present. Agent-specific signal weights are cached per agent and loaded from the agent config before context assembly begins (not during context assembly, which would create a circular dependency).

9.3 Tier 2: fast execution

Based on the parser result, execute without a full LLM call:

Knowledge lookup (high confidence). Parser identifies intent, entity, and fact type. Direct query against knowledge cache. Response generated from template. Target: ~50ms.

Greeting (high confidence). Template response. Target: ~10ms.

Time/date query. Direct tool call. Target: ~50ms.

Medium confidence (0.4 to 0.7). Tier 2 is subdivided: - Tier 2a (deterministic). Knowledge lookup, greeting, time query. No LLM involvement. - Tier 2b (small LLM for phrasing, not reasoning). Deferred until traffic data justifies it. If implemented, the small LLM generates natural-language response text from structured results, not decisions.

9.4 Tier 3: full LLM reasoning

The current agent turn path, unchanged. Used when confidence is low (below 0.4), when the intent is complex_reasoning or conversation, or when Tier 2 execution fails.

Even for messages that route to Tier 3, the Tier 1 parse result adds value. The extracted intent and entities seed context assembly: the Harness already knows "this is likely a knowledge lookup for a contact named Dave," which changes which knowledge gets pre-loaded and which tools get surfaced.

9.5 Safety override

Hard rule. Tier 2 direct execution is restricted to read-only operations. Any intent that would invoke a tool with trustLevelRequired of standard or higher must route to Tier 3 regardless of confidence. This is enforced in the router, not the caller. The tool table's existing trustLevelRequired field is the authority for this classification.

Safe for Tier 2: knowledge_lookup, greeting, meta_question, time_query.

Always Tier 3: tool_request (unless the tool is read-only), knowledge_store, complex_reasoning, conversation.

9.6 Confidence metric

Replace a single confidence score with two values:

  • strength: the absolute score of the winning intent (0.0 to 1.0, normalised).
  • ambiguity: the gap between the top two scores (0.0 to 1.0; high means unambiguous).

Both must pass thresholds for Tier 2 execution:

  • strength >= 0.7 AND ambiguity >= 0.3 results in Tier 2.
  • Otherwise results in Tier 3.

All scores are normalised to sum to 1.0 before computing these values. This distinguishes two failure modes: a weak signal (low strength) versus genuine ambiguity between intents (low ambiguity), which call for different handling.

9.7 Client-side parser architecture

The parser runs in the client for instant feedback and sends the classification alongside the message to the server.

Shared lexicon format. A single JSON file per locale, loaded at app initialisation. English only for v1. Small payload (~5KB), loaded once per session.

Platform implementations:

Platform Language Role
Web app TypeScript Client-side classification + UI hints
Convex server TypeScript Authoritative: re-verifies, executes
Flutter Dart Client-side (future)

All implementations load the same lexicon and share test cases. The server is authoritative; client classification can be overridden.

Client-server protocol. The web app sends the parse result alongside the message:

{
  "agent_id": "...",
  "content": "What's Ella's phone number?",
  "triage": {
    "intent": "knowledge_lookup",
    "strength": 0.85,
    "ambiguity": 0.6,
    "entities": {
      "subject": "Ella",
      "fact_type": "phone number"
    }
  }
}

The server can trust the client classification (for read-only, high confidence), re-parse and override (for writes or medium confidence), or ignore triage entirely (for channels that do not send it, e.g. Telegram).

Multilingual support. v1 ships with English lexicon only. Non-English input produces low strength scores, which routes it to Tier 3 reliably. This is a deliberate design boundary. Per-language lexicons are a clean extension of the architecture, loading the appropriate lexicon based on the user's locale setting.

Redis unavailability. Explicit contract: Tier 2 failure of any kind (cache down, entity not found, template missing, tool error) falls through to Tier 3 silently with no error surfaced to the user. Tier 2 is an optimisation; Tier 3 is the guaranteed path.

Audit trail for Tier 2. All Tier 2 completions write a lightweight event record and a minimal interaction record so the interaction appears in the activity page and usage tracking, with total_cost: 0 (no LLM tokens consumed) and a single "triage" step.

9.8 Triage implementation plan

The strongest recommendation from the design review is to start with Tier 1 + Tier 3 only, letting observed traffic data drive the Tier 2 build-out.

Phase 1: Parser + Tier 3 enrichment. Build computational parser. Parse every message, log results. Pass parse hints to Tier 3 context assembly (seed knowledge loading, tool prioritisation). Inline handling only for trivially deterministic cases at a very high threshold (0.85+): greetings, explicit knowledge lookups where entity and fact type are both in the message.

Phase 2: Observe and measure. Run Phase 1 for sufficient volume. Analyse logs: what percentage of messages could Tier 2 have handled? What false positive rate would various thresholds produce? Decide which Tier 2a capabilities to add based on data.

Phase 3: Tier 2a build-out (if data supports). Add direct execution for patterns validated by Phase 2 data. Read-only operations only (safety override).

Phase 4: Tier 2b local LLM (if data supports). Only if Phase ⅔ data shows significant volume of messages that need natural-language response generation but not full reasoning.

10. Attention surfacing

10.1 Philosophy

The fundamental tension: agents exist to reduce cognitive load, but surfacing their work creates cognitive load. If the platform surfaces exactly what the user needs to know, when they need to know it, and stays quiet the rest of the time, the product is genuinely valuable. If it becomes noisy and overwhelming, the product is irritating.

Core principles:

Quiet means good. An empty briefing, a roster of green dots, a clear notification panel: these are success states. The product actively communicates this. When things are under control, Thinklio feels calm.

Attention is finite. Three rules: (1) Don't cry wolf. If something is marked urgent, it must genuinely need immediate action. (2) Absence is information. Not showing something is a deliberate choice. (3) Let things expire. Not everything deserves indefinite attention. A notification that was not acted on within three days probably was not important enough to persist.

Three channels, one picture. The platform surfaces information through three complementary channels, each with a distinct role. They should not duplicate each other.

Channel Nature When checked Purpose
Briefing Pull (user-initiated) Start of day, periodically Comprehensive snapshot: "here's your day"
Roster indicators Ambient (always visible) Glanceable, no action needed At-a-glance state: "which agents need me?"
Notifications Push (system-initiated) Real-time, interruptive Time-sensitive events: "this just happened and can't wait"

If something appears in the briefing, it does not need a notification unless it is time-sensitive. Most things surface through the briefing and roster dots, with notifications reserved for events that genuinely cannot wait.

10.2 The briefing

The briefing answers: "what do I need to know right now?" It is a snapshot, not a chronological feed. It does not append or accumulate. Each time the user sees it, it shows the present moment.

Section 1: Needs Attention (top, only shown if non-empty). A consolidated list of things that need the user's action right now. Cross-agent, sorted by urgency. Each item is one line: agent avatar, short description, link. Maximum 5 to 7 items visible with an "and N more..." expand option. If empty, the section does not appear. The absence of the section is the good news.

Section 2: Today (always shown, compact). The user's immediate context: calendar events, tasks due today (titles only), mail summary (one line). Three tight blocks, not three separate cards.

Section 3: Agent Updates (only if there is something new). One-line catch-up from agents that have something to report since the user last checked. Agents with nothing new do not appear.

What is not on the briefing: balance/credits (visible in the sidebar), knowledge snapshot (belongs on the knowledge page), active jobs (belong on the activity page; only surface on the briefing if a job needs attention).

The briefing gets shorter as users get busier. A new user sees more because everything is novel. An experienced user's briefing only surfaces exceptions.

10.3 Roster indicators

The agent roster in the sidebar uses coloured dots to communicate each agent's state:

Dot Colour Meaning
Red Needs attention now Overdue tasks, pending approvals, failed jobs, expired secrets, SLA breaches
Amber Something to be aware of Due today, in-progress high-priority items, expiring soon
Green Active, nothing needs attention Agent is running, all is well
Grey Dormant No recent activity, nothing pending
Blue (pulsing) Working Agent is currently processing

Roster ordering. Pinned agents appear first in user-defined pin order. Remaining agents are sorted by urgency: red, amber, green, grey. Within the same urgency, sorted by most recent activity.

Pinning is a simple toggle. Pinned agents stay at the top regardless of urgency, but their dots still reflect their actual state.

10.4 Notifications with decay

Notifications are for events that cannot wait until the next briefing check. They are interruptive by nature and should be used sparingly.

Lifecycle:

Age Visual state Behaviour
Fresh (< 1 hour) Full colour, prominent Counts toward unread badge
Ageing (1 to 24 hours) Slightly faded Still counts toward badge
Stale (1 to 3 days) Noticeably pale No longer counts toward badge
Expired (> 3 days) Gone Automatically removed

There is no persistent "unread" guilt. Notifications either get acted on, or they naturally fade and disappear.

What warrants a notification: approval requests, job failures, SLA breaches, agent-initiated messages that require a response.

What does not warrant a notification: job completions, new knowledge facts, routine agent activity, balance updates.

Push notifications (mobile/email). Follow the same philosophy. Only the most critical items trigger push. Defaults are conservative.

10.5 The Radar agent

Radar is a specialist agent that provides a comprehensive, on-demand inspection of everything across all agents. It answers: "what should I be focusing on right now?" and "have I missed anything?"

Views:

  • Chat. Natural language queries. "What's urgent?" "What did I miss today?" Radar delegates to other agents to gather current state and synthesises a prioritised response.
  • Action List. A sorted, filterable list of everything needing attention across all agents. Each item shows urgency indicator, source agent, description, due date, and a direct link.
  • Timeline. A forward-looking view of the next seven days: upcoming deadlines, scheduled events, expected deliveries, expiring credentials.

Radar delegates to other agents using the agent-as-tool pattern. Each agent returns a structured summary. Radar ranks, deduplicates, and presents a unified list.

Radar does not replace the briefing. The briefing is automatic and passive (there when you open the app). Radar is active (you ask it). The briefing is the newspaper on your doorstep; Radar is the analyst you call when you want a deeper look.

10.6 PA agent role in attention surfacing

The personal assistant has a special role. It is the agent the user interacts with most frequently and it orchestrates the other agents.

Proactive prompting. The PA can proactively surface things in conversation. "By the way, you have an overdue compliance task. Want me to check on it?" This uses the same signals that drive the briefing and roster dots, delivered conversationally at appropriate moments.

Delegation to Radar. When the user asks the PA "what should I be doing?", the PA delegates to Radar. Radar has the cross-agent inspection capability; the PA has the conversational relationship.

Morning briefing delivery. In channels that support proactive messages (Telegram, email), the PA can deliver a morning briefing message. This is opt-in and configurable (daily, weekdays only, specific time).

10.7 Urgency model

Each agent reports its urgency state to the platform. This drives the roster dot colour, the briefing "Needs Attention" section, and Radar's prioritisation.

Urgency levels:

Level Meaning Roster dot Briefing
critical Requires immediate action Red Always shown
attention Should be addressed soon Amber Shown if space
active Working normally Green Not shown
idle Nothing happening Grey Not shown
processing Currently executing Blue pulse Not shown

Domain-specific urgency rules (examples):

  • Taskmaster. critical: any overdue task. attention: tasks due today or high-priority tasks due within 2 days. active: tasks exist but none urgent. idle: no tasks.
  • Dispatch. critical: SLA breach or urgent-priority open items. attention: high-priority items in progress or items waiting > 24 hours.
  • Keeper. critical: any expired secret. attention: secrets expiring within 30 days.
  • Rolodex. critical: follow-ups overdue by > 3 days. attention: follow-ups due today or overdue by 1 to 3 days.

Urgency aggregation. The platform collects all critical and attention items from all agents, sorted by: critical before attention, then by deadline (nearest first), then by agent priority (user-configurable, defaults to alphabetical).

10.8 Empty states

Every view that can be empty should communicate something positive and informative. Empty states are not error states; they are success states.

Principles: affirm the good ("No outstanding tasks" rather than "No tasks found"), be specific, do not suggest unnecessary action, keep it brief (one line, not a paragraph).

View Empty state text
Briefing: Needs Attention (section not shown)
Briefing: Agent Updates (section not shown)
Task Board (all done) All tasks complete.
Dispatch Board (no items) No open tickets. Your queue is clear.
Radar Action List (empty) Nothing urgent. You're on track.
Radar Timeline (empty week) Clear week ahead.
Notification Panel (empty) All caught up.

11. Knowledge architecture in agent context

Thinklio agents do not work from a single flat knowledge base. Knowledge is organised into four layers that combine at runtime to give the agent rich, relevant context while maintaining strict privacy and access boundaries.

Layer 1: Agent Knowledge. The domain expertise intrinsic to the agent: its system prompt, embedded instructions, fine-tuned behavioural rules, and agent-level knowledge items. Agent knowledge defines what the agent knows how to do. This layer makes agents specialised rather than generic.

Layer 2: Account Knowledge. The authoritative "how we do things here" layer. Policies, procedures, guidelines, compliance rules, brand voice standards, and reference material curated by account admins. Account knowledge overrides all other layers where there is a conflict. This is the governance anchor for the knowledge system.

Layer 3: Team Knowledge. Grows organically from the team's collective interactions with agents. Project context, client details, decisions made, lessons learned, shared documents. Every member of the team has access to team knowledge through agents assigned to that team.

Layer 4: User Knowledge. Strictly personal. Individual preferences, private context, personal notes. Only the user who provided it can access it. This context is never shared with other users or exposed at the team or account level.

Resolution at runtime. When the Harness assembles context for an agent turn, it resolves knowledge from all four layers according to a hierarchy: account policies (override all), then agent knowledge (domain expertise), then team knowledge (collective context), then user knowledge (personal context). Conflicts are resolved top-down.

Knowledge scope from channel context. Knowledge retrieval is scoped by where the conversation is happening, not by agent configuration. An organisation channel provides account knowledge. A team channel provides account + team knowledge. A private or DM channel provides account + user knowledge. A group channel provides account + knowledge for all represented teams. The agent's own knowledge layer always applies regardless of channel. Users never have to think about which knowledge base is active.

Knowledge items are stored in the knowledge_item entity (doc 04) and retrieved via semantic search using the RAG component with namespaced indexes. For the full knowledge data model and ingestion pipeline, see doc 05 Persistence, Storage & Ingestion.

12. Channels and multi-channel delivery

Thinklio agents are channel-agnostic. The same agent can be reachable through multiple communication channels simultaneously. A team agent might be accessible via Telegram for quick interactions, through the web app for structured work, and via email for asynchronous briefings. The agent maintains context across all of these channels within the same conversation scope.

Channel identity. Users connect their channel identities (phone number, Telegram ID, email address) to their Thinklio account. Once connected, messages arriving from any linked channel are routed to the same interaction context and attributed to the correct user.

Channel assignment. An agent is made accessible on a channel by adding it as a member of the appropriate channel record. Multiple agents can share a channel (routed by keyword, context, or explicit @mention invocation), or a channel can be dedicated to a single agent.

Supported channels:

Channel Status Notes
Web (app.thinklio.ai) Implemented Agent Studio and web chat
Telegram Planned Primary external channel
Email (Postmark) Designed Bidirectional via agent aliases
WhatsApp Planned Future phase
Voice Planned Future capability
Platform API Designed Programmatic agent invocation

Multi-channel delivery. When an agent produces a response, it is delivered to the originating channel by default. For proactive messages (agent-initiated contact, job completion notifications), the platform selects the best delivery channel based on user preference and channel availability. The same response payload is formatted appropriately for each channel: a rich card on the web app, formatted text on Telegram, an email thread continuation on email.

Notification and interruption control. Users have fine-grained control over how agent activity surfaces:

  • Inline reply. Visible to everyone in the channel (default for mention-triggered responses).
  • Threaded reply. Visible but contained, does not clutter the main stream.
  • Direct message. Only the requesting user sees it.
  • Silent completion. The agent completes a task and updates a status indicator without posting a message.

Notification preferences are configurable per user per channel, with sensible defaults that the organisation can set.

13. Trigger types

Agents can be activated by five trigger types. All produce the same normalised event and flow through the same triage and execution path.

13.1 Channel message triggers

The primary trigger. A user sends a message in a channel where an agent is a member. The agent's trigger mode (mention, proactive, silent) determines whether it activates. Trigger mode is set per agent-channel membership and respects the governance hierarchy: the organisation sets which modes are available, and users or teams can tighten (but not loosen) within those bounds.

13.2 External event triggers

An external system sends a webhook or API call that should activate an agent. Examples: new appointment created in a patient management system, Stripe payment failed, email received via Postmark inbound.

Implementation: Convex HTTP endpoints receive the webhook, normalise it into the universal event format, identify the target agent and channel (via configuration in an event_trigger table), and schedule an agent turn.

13.3 Scheduled triggers

Agents that run on a cron. A daily briefing, a weekly report, a nightly data reconciliation. Convex cron scheduling invokes a function that creates or continues a thread for the agent, runs a turn with a system-generated prompt, and posts the result to the configured channel.

13.4 Dispatch callback triggers

When an external orchestration (n8n, Go service) completes and calls back, the callback handler can optionally trigger a follow-up agent turn to process the result and notify the user.

13.5 Internal system event triggers

Events generated within Thinklio itself: document ingestion complete, knowledge base updated, policy changed. These use the same event_trigger mechanism as external events but with internal source types.

14. Convex component integration

Five Convex components replace what would otherwise be thousands of lines of custom infrastructure. Here is what each one does and what Thinklio builds around it.

14.1 Agent component

@convex-dev/agent handles thread-based LLM orchestration. Creates threads, persists messages, manages the LLM call loop (prompt, response, tool calls, continuation), streams output, and provides hooks for cross-cutting concerns.

const agent = new Agent(components.agent, {
  name: "research",
  chat: openai.chat("gpt-4o"),
  textEmbeddingModel: openai.embedding("text-embedding-3-small"),
  instructions: "You are a research assistant...",
  tools: [searchKnowledge, webSearch],
  maxSteps: 10,
  usageHandler: async (ctx, args) => { /* cost accounting */ },
  rawResponseHandler: async (ctx, args) => { /* audit trail */ },
  contextHandler: async (ctx, args) => { /* knowledge injection */ },
});

Critical insight. The Agent component maintains its own message history in its own tables (managed by the component). The message table is the user-facing conversation record. These are two separate stores with different purposes. The Agent thread is the execution context; channel messages are the UI record. They are bridged: when a user sends a message in a channel, it is written to the message table and also fed to the agent's thread. When the agent produces a response, it is written to the message table for the channel.

14.2 Workflow component

@convex-dev/workflow handles durable step-by-step execution with journalling. Each step (query, mutation, action, or nested workflow) is recorded. If execution fails, it resumes from the last incomplete step. Supports sleeping, awaiting external events, parallelism limits, and retry with backoff.

const workflowManager = new WorkflowManager(components.workflow);

export const agentTurnWorkflow = workflowManager.define({
  args: { channelId: v.id("channels"), messageId: v.id("messages"), agentId: v.string() },
  handler: async (step, { channelId, messageId, agentId }) => {
    const context = await step.runQuery(internal.agents.assembleContext, { agentId, channelId });
    await step.runMutation(internal.governance.checkPolicies, { accountId: context.accountId, agentId });
    const result = await step.runAction(internal.agents.runLLM, { agentId, threadId: context.threadId, channelId });
    await step.runMutation(internal.messages.writeAgentResponse, { channelId, agentId, content: result.text });
    await step.runMutation(internal.audit.record, { event: "agent_turn", agentId, channelId });
  },
});

Every agent turn triggered by a user message runs inside a workflow. This provides durability (LLM timeouts do not lose work), observability (every step is queryable), and the foundation for multi-step agent tasks.

14.3 RAG component

@convex-dev/rag handles namespaced semantic search with chunking, embedding, importance weighting, and chunk context. Manages its own vector store within the Convex component.

Namespace strategy for the four knowledge layers:

account:{orgId}        Account-wide policies, brand voice, compliance
agent:{agentId}        Agent-specific domain knowledge
team:{teamId}          Team project context, client details
user:{userId}          Personal preferences, private notes

At query time, all applicable namespaces are searched and results merged with layer priority (account content wins over agent, which wins over team, which wins over user).

14.4 Rate Limiter and Sharded Counter

@convex-dev/rate-limiter enforces per-account and per-user token budgets. Called inside the Agent component's usageHandler callback to gate LLM usage in real time. Supports token bucket and fixed window strategies.

@convex-dev/sharded-counter tracks cumulative token usage, interaction counts, and tool call counts per account. Read by budget-checking queries and the billing/admin dashboard.

14.5 The three callbacks

The Agent component's callback system is the integration seam where Thinklio's cross-cutting concerns plug in.

usageHandler (cost accounting). Called after every LLM response with token counts. Increments sharded counters per account and per agent. Enforces rate limits (throws if over budget).

rawResponseHandler (audit trail). Called after every LLM response with the raw provider output. Writes an audit record with model, token counts, and timing.

contextHandler (knowledge injection). Called before every LLM call. Derives knowledge scope from the channel context (not from agent configuration). Searches all applicable RAG namespaces. Merges results with layer priority. Prepends account policies and knowledge to the message array the model will see.

contextHandler: async (ctx, { agentName, threadId, allMessages }) => {
  const agentConfig = await getAgentByName(ctx, agentName);
  const channelContext = await getChannelContextForThread(ctx, threadId);

  const lastUserMessage = allMessages.filter(m => m.role === "user").at(-1)?.content ?? "";
  const knowledge = await mergeKnowledgeLayers(ctx, rag, {
    query: lastUserMessage,
    accountId: channelContext.accountId,
    agentId: agentConfig._id,
    teamId: channelContext.teamId,
    userId: channelContext.requestingUserId,
  });

  const policies = await getAccountPolicies(ctx, channelContext.accountId);

  return [
    { role: "system", content: formatPolicies(policies) },
    { role: "system", content: formatKnowledge(knowledge) },
    ...allMessages,
  ];
},

14.6 Channel to agent thread bridge

The channel-to-thread bridge is the key architectural join. Users see channels and messages. The Agent component sees threads and messages.

Mapping table:

agent_thread: defineTable({
  channelId: v.id("channel"),
  agentId: v.id("agent"),
  threadId: v.string(),
  triggerMode: v.union(
    v.literal("mention"),
    v.literal("proactive"),
    v.literal("silent"),
  ),
  lastActiveAt: v.number(),
})
  .index("by_channel_agent", ["channelId", "agentId"])
  .index("by_thread", ["threadId"]),

When an agent is first triggered in a channel, a thread is created and the mapping recorded. Subsequent messages in that channel reuse the same thread via continueThread, giving the agent full conversation context.

Trigger flow:

  1. Message arrives in channel.
  2. Mutation writes message to the message table.
  3. Trigger/scheduler checks: is any agent a member of this channel?
  4. For each agent member, evaluate trigger mode: mention (check for @mention), proactive (always schedule), silent (skip unless explicitly mentioned).
  5. For each triggered agent: schedule an agentTurn workflow, look up or create the agent_thread mapping, feed recent channel messages into the Agent thread, generate response, determine delivery mode (inline, threaded, DM, or silent), write response to the message table.

15. External orchestration

Some work is better handled outside Convex: complex multi-step workflows with branching logic (n8n), high-throughput compute or queue management (Go services), or integrations with systems that require long-lived connections.

The dispatch/callback contract. All external orchestration follows the same pattern regardless of the target system:

1. DISPATCH
   Agent (via workflow step) sends HTTP POST to external system:
   {
     callbackUrl: "https://<convex-deployment>.convex.site/dispatch/callback",
     correlationId: "<workflow-event-key>",
     taskType: "generate-report" | "process-queue" | "run-n8n-workflow",
     payload: { ... task-specific data ... },
     timeout: 300000
   }

2. PAUSE
   Workflow calls step.waitForEvent(correlationId) and suspends.
   No Convex resources consumed while waiting.

3. PROGRESS (optional)
   External system POSTs status updates to /dispatch/progress
   Written to a dispatch_log table, UI subscribes reactively.

4. CALLBACK
   External system POSTs the result to /dispatch/callback
   Convex HTTP endpoint resumes the workflow with the result.

5. TIMEOUT
   If no callback arrives within the configured timeout, the workflow
   resumes with an error. The agent can notify the user and optionally
   retry or abandon.

When to use external orchestration:

  • n8n. Complex branching workflows that are easier to model visually than in code. Multi-system orchestration. Workflows that non-developers need to modify.
  • Go services. High-throughput queue processing, compute-intensive document transformation, real-time data streaming, anything requiring persistent connections or high concurrency beyond Convex action limits.
  • Generic external. Any HTTP-accessible service that accepts a task and calls back with a result.

16. MCP integration

The Model Context Protocol provides a standardised way for external systems to expose tools to LLM-based agents. Thinklio supports MCP servers as a tool source, alongside internal and external REST tools.

How it works. An MCP server is registered per account (or platform-wide for built-in servers) in the mcp_server table. When registered, the system connects to the server and discovers the available tools. These tools are stored in the discoveredTools field and can be assigned to agents through the standard agent_tool mechanism, with the tool record's type set to "mcp" and a reference to the MCP server.

Why MCP matters. For systems that expose multiple capabilities (e.g. a patient management system), MCP eliminates per-tool integration work. Instead of registering each REST endpoint as a separate external tool, the organisation deploys an MCP server and all tools are discovered automatically.

Thinklio itself can expose an MCP server, making its own capabilities (knowledge search, task management, channel messaging) available to external agents and tools. This is a natural extension of the external API surface.

Tool resolution order. When resolveAgentTools() builds the tool set for an agent turn:

  1. Internal tools (Convex function references, resolved directly)
  2. External REST tools (HTTP endpoints, wrapped in a fetch action)
  3. MCP tools (discovered from MCP servers, wrapped in the MCP client protocol)
  4. Delegation tools (agent references, wrapped in the delegation mechanism)

All four produce the same createTool() format for the AI SDK. The agent sees a flat list of tools with no type distinction.

17. The five extensibility layers

Thinklio's functionality is constructed from five layers, each with a clear purpose and boundary.

Layer 1: Tools (stateless capabilities). A tool is a function that takes parameters and returns a result. No reasoning, no state, no system prompt, no autonomy. Three tool types (internal, external REST, MCP), one invocation interface. All registered in the tool table and resolved through resolveAgentTools(). Trust levels on tools (read, standard, elevated, admin) determine what class of agent can invoke them.

Layer 2: Agents (reasoning entities). An agent is a persistent entity with reasoning, state, and autonomy. Constructed entirely through configuration: a system prompt, tools, delegation rules, knowledge context, and a governance envelope. Three construction patterns: catalogue agent (installed from the platform catalogue), composed agent (a coordinator delegating to other agents), custom agent (bespoke system prompt and tool set).

Layer 3: External orchestration (n8n, Go services). Work that is better handled outside Convex. The dispatch/callback contract (section 15) handles all external orchestration uniformly.

Layer 4: Custom Convex components. Reusable infrastructure that benefits from data isolation. Candidates: Governance Engine (policy evaluation and caching), Usage Metering (wraps sharded counter and rate limiter with billing logic), Document Parser (tiered parsing pipeline), Dispatch Manager (dispatch/callback lifecycle). What should not be a component: individual agents, business-specific tool implementations, channel adapters.

Layer 5: Platform agent catalogue. The library of pre-built agents. Catalogue entries are templates, not running agents. When an organisation installs one, an agent record is created with the template defaults. The organisation can then customise anything.

The Agent Studio sits above all five layers as the composition and configuration surface. No custom code is required. Code only enters the picture when a new tool implementation, a custom component, or a Go service is needed.

18. Agent views as content type

18.1 Two data paths

Thinklio channels pass content to and from agents. Text, audio, documents, and UI are all content types. A channel renders what it can and ignores what it cannot. UI is not special; it follows the same pattern as every other content type.

The architectural decision that makes agent UI work: there are two ways data flows between the user and the agent's underlying data.

Conversational path (agent-mediated):

User message -> Harness -> think -> act (tool call) -> database -> observe -> respond

The agent is in the loop. Every operation goes through reasoning, governance, and cost tracking.

View path (direct):

UI component -> Platform API -> database -> UI component

The agent is not in the loop. The UI reads and writes data directly through the same API endpoints the agent's tools use. The user's JWT authenticates the request. RLS policies enforce scope and visibility.

Both paths operate on the same data through the same API. When the agent creates a task via a tool call, the kanban view sees it immediately. When the user drags a card in the kanban, the agent's next interaction sees the updated status. There is no sync problem because there is only one source of truth.

The conversational path is governed (budget limits, trust levels, approval requirements). The view path is authenticated and authorised (RLS, roles) but not governed in the same way, because the user is performing the action directly.

18.2 View definitions in the manifest

The agent manifest includes a views section declaring what visual representations the agent supports. Each view maps the agent's data to a standard view type with a specific configuration.

Each view definition specifies three things: where to get the data (data_source), how to display it (mapping), and what the user can change (mutations). If mutations is omitted, the view is display-only.

Example for a task agent's kanban view:

views:
  - type: kanban
    label: "Board"
    icon: "Kanban"
    data_source:
      entity: task
      scope: inherit
      api_endpoint: "/v1/tasks"
      realtime_table: "task"
      filters:
        status: ["todo", "in_progress", "done"]
    mapping:
      columns:
        field: status
        values:
          - key: todo
            label: "To Do"
          - key: in_progress
            label: "In Progress"
          - key: done
            label: "Done"
      card_title: title
      card_fields: [priority, due_date, assigned_to]
      card_accent: priority
    mutations:
      drag_column: { field: status }
      inline_edit: [title, priority, due_date]

18.3 Standard view types

The platform ships with renderers for a fixed set of view types. Each renderer is a React component that accepts a view definition, fetches data, subscribes to real-time updates, and handles mutations.

View Type Use Case Read Path Write Path
kanban Status-based workflow boards GET with filter PATCH on drag/inline edit
calendar Date-based scheduling views GET with date range PATCH on drag to new date
table Sortable, filterable lists GET with sort/filter PATCH on inline cell edit
summary Aggregate statistics and charts GET with aggregation Read-only
detail Single-record view GET by ID PATCH on field edit
timeline Chronological event sequences GET with date range Read-only

All view types use TanStack React Query for data fetching and caching, Convex real-time subscriptions for live updates, and standard PATCH requests to the Platform API for mutations.

18.4 Custom view components

For agents that need something the standard types cannot express, the manifest supports custom view components loaded into a sandboxed iframe.

views:
  - type: custom
    label: "Pipeline"
    icon: "Funnel"
    component_url: "https://agents.example.com/crm-agent/views/pipeline.js"
    sandbox: true
    permissions: ["read:contacts", "write:contacts"]

Custom components run in an iframe with sandbox="allow-scripts". Communication uses postMessage. The iframe cannot access the parent DOM, cookies, or localStorage. It receives the platform's CSS custom properties via theme injection and a scoped API token that only permits the declared operations.

Quality and security mitigations: loading timeout (5 seconds), Content Security Policy within the iframe, bundle size limits, and an optional review/approval step.

18.5 Interaction between views and conversation

Views and conversation are two surfaces over the same data. They reinforce each other:

  • Conversation can reference views. "You have 3 overdue tasks. [View on board]" switches to the Board tab with a filter applied.
  • Views can trigger conversation. A context menu on a kanban card: "Ask agent about this task" switches to Chat and sends a contextual message.
  • Changes propagate both ways. Agent creates a task via tool call: the kanban updates via real-time subscription. User drags a card to "Done": the agent's next interaction sees the updated status.
  • Governance diverges by path. Actions through conversation are agent-mediated and subject to the full governance model. Actions through views are user-direct and subject to authentication and authorisation but not agent governance.

Invocation. Two paths to a view: (1) from the agent's own page (tabs in the header: Chat, Board, Calendar, List, Summary), or (2) from anywhere else in the app as an overlay (briefing page, another agent's conversation, activity feed, notifications). Both produce navigable URLs so deep links work.

19. Agent taxonomy and platform catalogue

Agents are organised into two tiers reflecting their structural complexity.

Tier 1: Fundamental agents

Tier 1 agents are atomic and self-contained. Each uses a focused tool set and operates independently without requiring other agents as delegates. They are the building blocks that Tier 2 agents compose.

Agent Domain Key Tools
Calendar Agent Scheduling getEvents, createEvent, findFreeTime
Mail Agent Email sendEmail, readInbox, searchMail
Taskmaster Task management createTask, listTasks, updateTask
Rolodex Contact management structured contact and relationship graph
Keeper Secrets/credentials secure credential storage, proxy execution
Messenger Messaging channel messaging
Knowledge Base Agent Knowledge retrieval searchKnowledge, addKnowledge
Research Agent Web research webSearch, searchKnowledge, summarise
Fact Checker Verification webSearch, findCitations, crossReference
Data Agent Data analysis queryData, createChart, exportTable
Writer Agent Content generation searchKnowledge, generateContent

Tier 2: Applied agents

Tier 2 agents are compositions or specialisations. They delegate to Tier 1 agents to accomplish their goals.

Agent Composes / Specialises
Executive Assistant Mail, Calendar, Taskmaster, Knowledge
Report Writer Research, Data, Writer
Support Triage Knowledge, Mail
Briefing Agent Research, Data, Knowledge
Newsletter / Case Study Writer, Research
Enquiry Agent Mail, Knowledge Base
Persona Agent (base) Knowledge Base (domain-specialised pattern)
Coordinator Agents PA, Meeting Agent, Project Coordinator, Briefing Agent
Scribe / Digest Agents Writer, Knowledge Base
Business Intelligence Agents Data, Research
Creative Agents Writer, Data
Radar Cross-agent inspection (delegates broadly to all Tier 1)
CEO Agent Broad delegation across BI, Research, Coordinator
Board Agents CEO, BI, Research

Catalogue structure. The agent_catalog table stores platform-level templates with no accountId. Each entry includes default system prompt, model, tool assignments, trust level, trigger mode, and tier classification. When an organisation installs a catalogue agent, the system creates an agent record with those defaults linked back to the catalogue entry. The organisation can then customise the system prompt, model, trust level, tool assignments, and governance policies.

Custom agents are created directly in the agent table with no catalogue reference. A podiatry practice might create a Practice Assistant (using external REST tools for patient lookup and appointments, delegating to Mail and Calendar) or a Patient Recall Agent. These follow the same patterns as catalogue agents.

For full individual agent specifications see the agent-specs/ directory and doc 08 Agents Catalogue & Platform Services.

20. Agent lifecycle

20.1 How agents enter the platform

Built-in agents. Shipped as manifest files in the platform codebase. Loaded on boot. Cannot be deleted by accounts, but can be activated or deactivated. Origin: built_in.

Studio agents. Created through the Agent Studio UI. The Studio generates a manifest internally and writes the corresponding agent record. These can be edited, promoted through deployment states, and deleted. Origin: custom.

External agents (via API). Registered through the Platform API by submitting a manifest or equivalent JSON. The registration goes through an approval flow. Once approved, the agent appears in the account's agent gallery. Origin: installed.

External agents (via manifest upload). An admin uploads a thinklio-agent.yaml file through the Agent Studio. The Studio parses, validates, and creates the registration. A convenience path for developers who author agents as files.

20.2 Deployment states

All agents progress through the same states regardless of origin:

  1. In Development. Being built or registered, not visible to users.
  2. Testing. Being validated, may be promoted.
  3. Available. Ready for activation by admins.
  4. Active. Deployed and accessible to users.

Built-in agents skip to Available or Active on first boot. Studio agents start in In Development. External agents start in In Development (pending approval) and move to Available on approval.

20.3 Assignment

An agent becomes active for a user or team when it is assigned. Assignments are managed through the account admin interface:

  • User assignment. The agent is available for personal use by a specific user.
  • Team assignment. The agent is available to all members of a team.
  • Account assignment. The agent is available across the entire account.

20.4 Versioning

Agent configuration changes take effect on the next turn, not mid-turn. When an agent turn starts, the agentFactory reads the current configuration and snapshots it for the duration of that turn. If someone updates the agent's configuration while a turn is running, the running turn completes with the original config and the next turn picks up the changes.

No explicit version numbers are needed on the agent record. Convex's document update semantics and the snapshot-at-turn-start pattern provide the correct behaviour. The audit_event table records which configuration was active for each turn, providing a de facto version history.

For the agent catalogue, template updates from the platform do not automatically propagate to installed instances. The organisation's customisations are preserved. A "sync with catalogue" action can be offered in the admin UI.

20.5 Deactivation and deletion

An agent can be deactivated (no new interactions accepted) without losing its history. Deletion removes the agent and its configuration but retains the interaction history and event log for audit purposes. Knowledge items associated with a deleted agent are moved to account-level custody.

21. Agent governance summary

The full governance model is documented in doc 07 Security & Governance. This section summarises the agent-specific governance concerns.

21.1 Trust levels

Every agent operates at a trust level that determines what classes of tool it can invoke without additional approval.

Level Description
read Read-only access to data and information. No external writes or mutations.
standard Can invoke tools that modify data within Thinklio (task creation, note writing).
elevated Can invoke tools that interact with external systems (email, calendar, CRM writes).
admin Can invoke platform management operations. Requires explicit account-level grant.

21.2 Governance hierarchy

Policy enforcement follows a strict layering model:

  1. Account policies set the ceiling. Non-negotiable constraints set by the organisation admin.
  2. Team policies can tighten within account bounds. A team lead might restrict available agents or lower token budgets.
  3. User preferences can tighten further. A user might set all agents to mention-only or disable proactive behaviour.

No layer can loosen a restriction set by a layer above it. This keeps the security model comprehensible and auditable. Policies are enforced by the Harness at the act step, before any tool call or delegation proceeds. A policy violation produces a failed step with the violation reason recorded.

21.3 Cost controls

Agents operate within a cost envelope set at account, team, and user levels. The platform tracks LLM token consumption, tool call costs, and delegation costs in real time. If an agent turn would exceed the remaining budget, it is halted before the cost is incurred.

All costs are metered per interaction step and attributed to the correct user, team, and account via the usage tracking system.

21.4 Audit trail

Every agent action (every think step, tool call, delegation, and response) is recorded as an immutable event in the audit_event table. Account admins can retrieve the full history of what an agent did, in what order, with what inputs and outputs, and at what cost. This audit capability is non-optional and applies to all agents regardless of trust level.

22. Durability and recovery

22.1 Workflow durability

Every Tier 3 agent turn runs inside a Convex Workflow. The workflow component journals each step (context assembly, policy check, LLM call, tool execution, response write). If execution fails mid-turn: transient failures (network timeout, database conflict) are retried automatically with backoff, the workflow resumes from the last incomplete step, and completed steps are not re-executed (their results are replayed from the journal).

22.2 Tool call recovery

Individual tool calls within an agent turn can fail independently. The agent component's tool loop handles this: failed tool calls are surfaced to the agent as error results in the conversation context, the agent can reason about the failure (retry, skip, try an alternative approach, or deliver partial results), and the maxSteps configuration prevents infinite retry loops.

22.3 Dispatch recovery

For external orchestration, the dispatch_log table provides recovery. A scheduled function periodically scans for dispatches in dispatched or running status that have exceeded their timeout. Timed-out dispatches are marked as timed_out and the waiting workflow is resumed with an error. The agent can then decide whether to retry, notify the user, or abandon.

22.4 Thread recovery

Using the @convex-dev/agent component's built-in recovery: interrupted threads (e.g. from a server restart during an agent turn) can be detected and resumed. A periodic recovery function scans for threads with pending tool calls and resumes execution.

23. Schema summary

The agent layer introduces the following tables (all using singular naming per project convention). For the full data model and field definitions see doc 04.

Table Purpose Key fields
agent_catalog Platform-level agent templates name, slug, category, defaultSystemPrompt, defaultModel, defaultTrustLevel, defaultToolIds, tier, version
agent Account-level agent configuration accountId, name, slug, systemPrompt, model, trustLevel, defaultTriggerMode, status, delegationSet, maxDelegationDepth
tool Tool registry (internal, external, MCP, agent) name, type, schema, handler/endpoint/mcpServerRef, trustLevelRequired, requiresApproval
agent_tool Agent-to-tool assignment (join table) agentId, toolId, enabled, config
agent_thread Channel-to-Agent-component-thread mapping channelId, agentId, threadId, triggerMode, lastActiveAt
agent_assignment Agent-to-scope assignment agentId, scope (user/team/account), scopeId, toolRestrictions
user_agent_config Per-user per-agent personal settings userId, agentId, credentials, preferences, overrides
account_policy Account-level governance policies accountId, type, rule, enabled, priority
team_policy Team-level governance policies (tighten-only) teamId, accountId, type, rule, enabled, priority
audit_event Immutable audit log accountId, event, actorType, actorId, detail, timestamp
dispatch_log External orchestration tracking accountId, correlationId, targetType, targetUrl, status, payload, result
event_trigger Webhook and event trigger configuration accountId, sourceType, sourceFilter, agentId, channelId, enabled
scheduled_agent_run Cron-scheduled agent runs accountId, agentId, channelId, schedule (cron), prompt, timezone, enabled
mcp_server MCP server registry accountId, name, url, authConfig, status, discoveredTools
job Deferred work tracking type, createdByAgent, state, dispatchTarget, contextBundle, timeoutAt
subjob Granular work units within a job jobId, label, order, state, resultData, errorData
job_observer Observer registrations on jobs jobId, observerType, observerId, notifyOn, callbackMetadata

24. Implementation order

Starting from the working foundation (Clerk + messaging + middleware), the suggested build sequence. Each step produces something testable.

Step 1: Install components and register. Install @convex-dev/agent, @convex-dev/workflow, @convex-dev/rag, @convex-dev/rate-limiter, @convex-dev/sharded-counter. Create convex.config.ts. Verify: npx convex dev picks up the components without errors.

Step 2: Agent schema and CRUD. Add all agent-related tables to schema.ts. Build basic CRUD functions: agent.create, agent.list, agent.get, agent.update, agent.listForAccount, agent.installFromCatalog, userAgentConfig.get, userAgentConfig.upsert.

Step 3: Minimal agent turn (no tools, no knowledge). Build the simplest possible agent turn: user sends message, agent responds with plain LLM output. Create agentFactory.ts (instantiates Agent from config), agentExecution.ts (trigger evaluation, thread lookup/creation, message feed, response write), and agentTrigger.ts (trigger mode evaluation). Wire the trigger into message.send.

Step 4: Streaming. Integrate Persistent Text Streaming so agent responses appear token-by-token.

Step 5: Tool framework. Build resolveAgentTools(). Create starter tools (getCurrentTime, searchMessages, listChannelMembers). Pass resolved tools to the Agent instance.

Step 6: Workflow wrapping. Wrap agent turns in a Workflow for durability. Define agentTurnWorkflow with steps. Build a workflow status query for the UI.

Step 7: Cost accounting. Wire the real usageHandler. Configure Sharded Counter and Rate Limiter. Increment counters on every LLM call. Check rate limits. Build a usage query for the admin dashboard.

Step 8: Audit trail. Wire the real rawResponseHandler. Add audit_event and account_policy tables. Write audit records on every agent generation.

Step 9: Knowledge (RAG). Configure the RAG component with namespace strategy. Build mergeKnowledgeLayers(). Wire into contextHandler. Create a searchKnowledge tool for agent-driven search. Build basic knowledge CRUD.

Step 10: Document ingestion pipeline. File upload to R2. Background action to parse documents. Feed parsed text to RAG. Track ingestion status.

Step 11: Delegation. Build the delegation tool type. Enforce delegationSet and maxDelegationDepth. Delegate results flow back to the coordinator's context.

Step 12: Governance policies. Implement the layered policy enforcement model (account, team, user). Policy CRUD. Policy evaluation in the workflow's policy check step. Content restrictions, tool restrictions, trigger mode restrictions, cost limit policies.

25. Scenarios and execution patterns

25.1 Standalone constrained agent (Scheduler)

A single-purpose agent with calendar tools. No delegation, no jobs. Execution: immediate mode only. Governance: per-assignment tool restrictions determine read-only vs full access. Job system involvement: none.

25.2 Agent as workflow trigger (Research Agent)

The agent reasons about a research brief and dispatches to an n8n workflow. Execution: deferred mode. Job created with five subjobs (one per article). Agent registered as observer with partial_and_completion. After the third article completes, hasUsefulOutput flips and the agent is notified. The follow-up interaction evaluates partial output against the user's deadline.

25.3 PA with sequential delegation

A personal assistant checks the calendar, then conditionally dispatches research. Execution: interactive mode with mixed delegation. The PA stores the calendar result in the job's context bundle. Follow-up interaction has both the research output and the calendar context.

25.4 Customer-facing support agent

A support agent handles family enquiries about aged care services. Execution: interactive mode with potential deferred handoff. The escalation creates a job with a single subjob (human resolution). The support agent and a monitoring agent register as observers. Channel adaptation: response formatting adapts to the surface (structured card on web, plain text on WhatsApp).

25.5 User-composed agent in Agent Studio

A team lead builds a "Project Coordinator" by selecting a PA template, attaching scheduler and research agents as delegates, and uploading project documents as knowledge. Agent Studio validates the delegation graph for cycles. The team lead can restrict the research agent to specific content categories.

25.6 Shared specialist with multiple consumers

The same scheduler agent serves the finance team (read-only), engineering team (full access via their PA), and the CEO's personal PA (full access including cross-calendar events). Each consumer accesses the scheduler through a different agent_assignment. Knowledge acquired in one context is scoped to that assignment and does not leak to another.

25.7 End-to-end multi-layer example

A user in a team channel asks the Practice Assistant: "Can you check if Mrs Chen has any appointments next week and send her a reminder about her orthotics fitting?"

  1. Universal event. Message written to message table.
  2. Triage. Tier 1 classifies as tool_request with high confidence. Falls through to Tier 3 because it requires writes.
  3. Agent turn scheduled. Practice Assistant is @mentioned. Workflow started.
  4. Context assembly. contextHandler derives knowledge scope from the team channel.
  5. Reasoning and tool execution. Agent calls lookupPatient (Layer 1, external REST), calls getAppointments (Layer 1, external REST), finds the appointment, decides to delegate the reminder to the Mail Agent (Layer 2, agent delegation).
  6. Delegation. Mail Agent applies its governance (account policy: draft and confirm before sending), drafts the email, surfaces the confirmation.
  7. User confirms. Mail Agent sends via sendEmail (Layer 1, external tool via Postmark).
  8. Response. Practice Assistant posts the summary.
  9. Audit. Every step recorded: tool calls, delegation, governance check, email send, token usage.

26. Open questions

  • Manifest storage in Convex. Should the YAML manifest be stored as-is in a string field on the agent table, or decomposed into normalised fields? Storing the raw manifest preserves portability (exportable). Decomposing makes querying easier. A hybrid approach (store both) adds redundancy but gives the best of both.
  • External agent authentication for callbacks. When an external agent calls back into the Platform API: short-lived JWT issued at execution time (simplest), longer-lived API key (more flexible for background work), or both depending on the use case?
  • External agent state management. The contract does not specify how external agents manage their own state across interactions. Options: the external agent manages entirely (Thinklio does not care), Thinklio provides a state storage API, or the context bundle carries forward state.
  • Composition depth for external agents. Can an external agent delegate to another external agent via the Platform API? Does the existing delegation_depth mechanism extend to external agents?
  • Manifest validation depth. Schema validation (always), semantic validation (referenced tools/agents exist), and execution validation (health check passes, endpoint reachable): how much should run synchronously at registration time vs asynchronously?
  • Studio extensions: next iteration. Candidates for the Agent Studio's next exposed manifest fields: delegation configuration, knowledge library assignments, model preference, governance overrides, capability level.
  • Unification of tools and agents in the Integration API. Tools and agents share structurally similar execution contracts. Should they be unified as "capabilities" with different capability levels? One registration endpoint, one execution contract, one health monitoring system.
  • Plugin and extension architecture. Cowork mode and IDE integrations introduce user-installable plugins. Design pass needed for how plugin skills, MCP connectors, and commands register into an account. Tentative placement: doc 09 External API & Tool Integration.
  • Doc 03 splitting. At ~216 KB this is the largest consolidated document. Pre-approved split options: (a) extract attention surfacing (section 10) and smart input triage (section 9) into a sibling "Agent Behaviour & Triage" doc, or (b) extract the job system (section 8) and delegation mechanics into a sibling "Agent Composition & Job System" doc. Decision deferred to the review gate after step 4 of the consolidation plan.

27. Revision history

Date Change
2026-03-14 Original Job System & Agent Composition (doc 15 v0.1.0) published
2026-03-20 Agent Definition & Extensibility discussion document (doc 33 v0.1.0) published
2026-03-26 Agent Architecture conceptual reference (doc 40 v0.1.0) published
2026-03-26 Attention Surfacing: Philosophy, Components & Implementation (doc 37 v0.1.0) published
2026-03-28 Smart Input Triage: Computational Parsing and Progressive Execution (doc 42 v0.2.0) published, incorporating design review
2026-04-02 Agent Component Integration Plan (doc 45 v0.2.0) published
2026-04-02 Agent Extensibility & Composability Model (doc 46 v0.1.0) published
2026-04-16 Consolidated docs 15, 33, 37, 40, 42, 45, and 46 into this document (v1.0.0). All sources archived.