System Architecture¶
Overview¶
Thinklio is a multi-tenant AI agent platform. This document is the canonical reference for the system's architecture: the platform stack, the service topology, the execution model, the communication patterns, and the deployment topology. It consolidates the former Architecture Overview (old doc 02), System Architecture (old doc 04), Convex Migration Architecture (old doc 43), Greenfield Convex-First Architecture (old doc 44), and Execution Tiers and Workflow Budget (old doc 54) into a single authoritative reference.
The platform runs on three managed services. Convex (Cloud, Ireland) is the reactive TypeScript backend that owns all application logic and data: schema, server functions, reactive queries, real-time subscriptions, native vector and full-text search, durable workflows, scheduled jobs, and HTTP endpoints. Clerk is the identity layer, managing user registration, authentication, organisations, roles, permissions, and pre-built auth UI. Cloudflare R2 is the object store for documents, uploads, and media, accessed via presigned URLs. Clients connect directly to Convex over WebSocket for reactive state and to Clerk for auth, with ConvexProviderWithClerk on the web (Next.js 15, React 19) keeping the two in sync. The mobile stack (Flutter, planned) uses the Clerk Dart SDK and convex_flutter for the equivalent experience.
The core abstraction is messaging: a channel is a conversation space with participants, and users and agents are both first-class participants in channels. A direct chat with an agent, a team room where an agent observes and contributes, and an organisation-wide agent service are all the same structural thing: messages in a channel. This unifies user-agent interaction, agent-to-agent delegation, and team collaboration under a single model.
The agent execution loop is durable but tiered. Interactive work, which is to say any channel where a human is waiting for a response, runs on a fast path using direct Convex mutations and actions with the Action Retrier wrapping the LLM call. This preserves responsiveness and avoids burning Convex's shared Workflow and Workpool slot pool. Durable work, which is work that genuinely needs step-by-step journalling and crash recovery, runs on the Workflow component: delegation chains, inbound email processing, multi-step document ingestion, scheduled agent turns, and promoted interactive turns that require durability. Background work that outgrows the Convex slot ceiling migrates to an external queue (Google Cloud Tasks or a BullMQ worker on Hetzner) that calls back into Convex. The three tiers together give the platform a concrete growth path from hundreds to thousands to millions of interactions without a fundamental architecture change.
Governance, tenancy, cost control, and audit are implemented as middleware over the Convex function layer. Every query and mutation that touches account-scoped data is wrapped with a customQuery or customMutation from convex-helpers that validates the caller's Clerk identity, resolves the active organisation, and injects an assertPermission helper and tenant-scoped context. Account policies live as regular Convex documents, which means reactive queries cache them automatically and a policy change propagates to every active evaluator on the next query cycle. Audit records are written via database triggers on significant mutations (agent messages, tool calls, delegations, policy evaluations, membership changes) and streamed out of Convex via Fivetran CDC to an external Postgres or data warehouse for long-term retention and compliance reporting. Real-time observability uses Convex log streams piped to Axiom.
Knowledge is organised into four layers: Agent (the agent's own domain expertise, system prompt, and learned workflows), Account (organisation-wide policies, procedures, and brand voice), Team (collective project context and client details), and User (individual preferences and private notes). Account policies override everything else. Agent knowledge comes next, then team, then user. The RAG component holds embeddings in four namespaces aligned with these layers, and context assembly at agent turn time retrieves from every applicable namespace and merges results per the precedence rule. User knowledge is private to the individual user within a team; team knowledge is private between teams within an account. Knowledge extracted inside a delegate's interaction is scoped to the delegate's context, never the coordinator's.
The Convex component ecosystem does most of the heavy lifting that would otherwise be custom infrastructure. The Agent component handles the LLM round-trip, prompt assembly, tool call loop, and streaming. The Workflow component provides durable step-by-step journalling and crash recovery. The RAG component owns namespaced retrieval. The Workpool component manages parallelism for document ingestion and notification fan-out. Rate Limiter covers API throttling. Persistent Text Streaming streams LLM output token-by-token over the same WebSocket subscription that delivers stored messages. Crons covers scheduled work. Aggregate and Sharded Counter power real-time usage counters. Action Retrier covers resilient external API calls. Migrations handles schema evolution on live data. Community components (Audit Log, Webhook Sender, LLM Cache, Expo Push Notifications) fill specific needs.
The deployment topology is essentially just managed services. Convex Cloud (Ireland) runs the backend; Clerk runs identity; Cloudflare R2 stores files. Next.js is hosted on Vercel or equivalent; the Flutter app publishes to the App Store and Play Store once the mobile stack is validated. Fivetran streams audit data to an external Postgres for compliance. Axiom receives Convex log streams. There are no servers to provision, patch, or monitor on the hot path. For enterprise customers requiring data isolation, the Convex open-source backend runs in Docker on Hetzner with Postgres as the backing store; Clerk Cloud stays in place or customer IdPs connect via OIDC. The enterprise deployment cost envelope is approximately 50 to 150 euros per month for compute plus the Clerk subscription.
The current Thinklio build sits on this Convex-first stack. The previous Go-service architecture (Gateway, Agent, Context, Tool, Queue, Usage services over Supabase Postgres and Redis Streams) is being retired. That design, and the two-plane Convex migration proposal that preceded the greenfield rebuild, are preserved in sections 20 and 21 of this document as archival context so that nothing is lost and so that engineers maintaining any legacy deployment still have a reference.
Table of contents¶
- 1. Purpose and context
- 2. Architectural history
- 3. Design principles
- 4. Platform stack
- 5. High-level component map
- 6. Authentication and organisation model
- 7. Messaging architecture
- 8. Agent architecture (summary)
- 9. Knowledge architecture (summary)
- 10. Governance and policy (summary)
- 11. File storage
- 12. Execution tiers and workflow budget
- 13. Communication patterns
- 14. Convex component map
- 15. Observability and data export
- 16. Deployment topology
- 17. Schema summary
- 18. Implementation plan (summary)
- 19. Resolved design questions
- 20. Legacy architecture (archival)
- 21. Migration context (archival)
- 22. Open questions
- 23. Revision history
1. Purpose and context¶
This document describes the full Thinklio architecture, from design principles and high-level component layout through to internal service design, deployment topology, and operational concerns. It is the authoritative reference for how the system is assembled. It does not repeat, in full, material that belongs in adjacent documents: the detailed schema lives in doc 04 Data Model, persistence and storage in doc 05 Persistence, Storage & Ingestion, the event and messaging model in doc 06 Events, Channels & Messaging, security and governance depth in doc 07 Security & Governance, the agent architecture in doc 03 Agent Architecture & Extensibility, and Convex platform specifics in doc 11 Convex Reference. Where this document touches those topics, it does so at architecture level with pointers down into the detail.
The original doc 02 (Architecture Overview) and doc 04 (System Architecture) were separate public-facing and internal documents that maintained parallel copies of overlapping material and drifted. They were merged on 26 March 2026 into a single authoritative architecture document and then further absorbed the Convex-first material (doc 44), the execution tiers design (doc 54), and the two-plane migration proposal (doc 43) into this consolidation on 16 April 2026. This document supersedes all five.
2. Architectural history¶
Thinklio's architecture has evolved through three recognisable stages. This history matters because it explains why the codebase still contains Go service entry points, Supabase migrations, and Redis patterns alongside the current Convex code, and because some of the conceptual work done during the earlier stages remains valid in the current design.
2.1 Stage 1: Go services on Supabase and Redis¶
The original design (old doc 04) was a distributed Go service architecture: a Gateway service handled channel protocols and external API surfaces; an Agent service ran the durable execution harness and LLM calls; a Context service assembled multi-layer knowledge; a Tool service evaluated policies and executed tool calls; a Queue service managed asynchronous work and scheduled tasks; a Usage service metered cost and enforced budgets. Services communicated over Redis Streams as an event bus and synchronous HTTP for tight couplings. All persistent state lived in Supabase Postgres with pgvector for semantic search, row-level security for tenant isolation, Supabase Auth for identity, and Supabase Vault for credentials. Cloudflare R2 provided object storage.
This design was deployable and correct. During early development, all logical services ran inside a single Go binary (cmd/server) on a Hetzner VPS managed by Coolify, with external Supabase Cloud (Frankfurt) and Cloudflare R2. Splitting into independent service binaries for general availability required only a deployment change, not an architectural change, because services communicated through events and explicit HTTP APIs.
The design's weakness was structural. The agent hot path crossed seven network hops: Gateway -> Redis -> Agent -> Postgres -> Redis -> LLM -> Postgres -> Redis -> Gateway. A six-tier caching strategy (in-process, Redis, Postgres) existed specifically to mitigate the latency of that chain, and the caching strategy added its own complexity: TTL management, invalidation bugs, stale data risks. For an AI agent platform where the interaction loop is the product, the latency floor was a problem.
Beyond latency, four secondary pressures were accumulating. Supabase was being used as a Postgres host without utilising Supabase Realtime, Storage, Edge Functions, or PostgREST, which meant the platform paid for a managed stack while using a fraction of it. The custom HarnessExecutor was approximately 200 lines of careful Go that replicated functionality available in purpose-built workflow engines. Knowledge retrieval via pgvector worked but was a general-purpose database doing vector similarity rather than a purpose-built engine. Redis Streams as the event bus for the agent execution path added a relay hop for client-facing delivery.
2.2 Stage 2: The two-plane Convex migration (proposed, not adopted)¶
The first attempt at addressing the latency and complexity pressures was a two-plane design (old doc 43): move the agent execution hot path to Convex while keeping the Go platform services, Postgres, and Redis for governance, billing, administration, and audit. Convex would own thread and message state, context assembly, knowledge retrieval via RAG namespaces, the durable execution loop, LLM orchestration, streaming, and background processing. Go would retain authentication via Keycloak, the policy service for trust-level and delegation checks, the usage service for cost metering, the audit event store, and platform administration. The two planes would communicate over HTTP.
The two-plane design was analysed, prototyped in part during the knowledge-layer phase, and ultimately superseded. The reasons for the supersession were practical. Maintaining two backend languages (Go and TypeScript) for a small team was expensive. The policy check HTTP call from Convex to Go on every tool execution introduced a coordination point that only mattered for budget, trust-level, and delegation checks, and those could be implemented as Convex custom function middleware reading cached policy documents with zero network overhead. The authentication migration from Supabase Auth to Keycloak was a significant piece of work, and Clerk offered the same capability plus pre-built UI plus first-class Convex integration with a much smaller operational surface. Postgres self-hosting for the system-of-record plane added infrastructure that was not obviously better than Convex plus Fivetran CDC streaming to an external warehouse.
The two-plane design is preserved as archival context in section 21 because the analysis of what Convex does well (reactive queries, native workflows, vector and text search, component ecosystem, enterprise trajectory) and the list of integration points and risks remain valid.
2.3 Stage 3: The Convex-first greenfield rebuild (current)¶
The current architecture (old doc 44, consolidated here) is a greenfield rebuild centred on three managed services: Convex for all application logic and data, Clerk for authentication and organisation management, and Cloudflare R2 for file storage. Redis is eliminated. Go services are eliminated from the initial build, with the option to add thin Go services later if specific integration needs arise. The architecture is messaging-first: the core abstraction is a channel with user and agent participants. All agent interaction, knowledge retrieval, tool execution, and governance enforcement happens within this messaging model, powered by Convex's reactive database and component ecosystem.
The goal is maximum performance, minimum moving parts, and a codebase that is straightforward for both humans and LLMs to reason about.
Execution tiering (old doc 54) was added on 6 April 2026 to manage the Convex Professional plan's 100-slot ceiling on concurrent Workflow and Workpool executions. Interactive work runs on Tier 1 (fast path, zero slots). Durable work runs on Tier 2 (Workflow component, slotted and budgeted). Background volume migrates to Tier 3 (external queue) when monitoring shows it needs to. The tiering model preserves responsiveness for users while giving the platform a concrete path to scale beyond the Professional plan.
3. Design principles¶
These principles express how the current architecture makes decisions. Several principles carry forward from the original Go-service design because they are independent of the specific stack.
3.1 Messaging-first¶
Every interaction in Thinklio flows through the messaging system. Agents are first-class participants in channels alongside users. A user asking an agent a question, an agent delivering a research report, a team discussing a project with an agent observing and contributing, these are all the same thing: messages in a channel. This unifies the interaction model and eliminates the distinction between "chatting with an agent" and "using the platform."
3.2 Reactive by default¶
Convex's reactive query engine means every client subscription automatically updates when underlying data changes. There is no caching layer to build or maintain, no pub/sub system, no event bus. When a message is written, every subscriber to that channel sees it. When a policy rule changes, every function that reads that rule sees the new value on its next evaluation. Reactivity is structural, not bolted on.
3.3 Single source of truth¶
All application state lives in one Convex project. One schema, one set of functions, one type system from database through to client. There are no cross-service contracts to maintain, no cache invalidation bugs, no eventual consistency between stores. The entire data model and all business logic are visible in a single codebase.
3.4 Managed infrastructure¶
Authentication (Clerk), application logic and data (Convex), and file storage (R2) are all managed services. There are no servers to provision, patch, or monitor on the hot path. The operational surface is configuration, not infrastructure. This lets a small team focus entirely on product.
3.5 Governance as middleware¶
Policy enforcement, tenant isolation, cost controls, and audit logging are implemented as Convex custom function middleware. They execute in-process on every function call, reading cached policy documents with zero network overhead. Governance is pervasive but invisible to the developer writing business logic.
3.6 Event-sourced audit¶
Even though the hot-path data model is reactive, the audit trail is still event-sourced. Every significant mutation (agent message written, tool call executed, delegation opened, policy evaluated, membership change) produces an immutable audit record with actor, resource, detail, and timestamp. Audit records are written via database triggers so they cannot be forgotten, and they stream out of Convex via Fivetran CDC for long-term retention. This preserves complete audit trails, enables event replay for compliance investigations, and provides the source data for usage analytics.
3.7 Channel agnosticism¶
The platform abstracts communication channels behind a universal internal message format. A message from the web app, a Telegram webhook, an email via Postmark, a voice transcription, and an API call all become the same kind of message once they cross the bridge adapter. Agents are channel-unaware: they process messages and produce responses. The bridge layer handles channel-specific protocols, formatting, and identity linking.
3.8 Multi-tenancy by design¶
Every table, every index, every function is scoped to an account (Clerk organisation) or a sub-scope within one (team, user, channel). Isolation is enforced at the application layer in every Convex query and mutation via the accountQuery / accountMutation wrappers, which assert an active Clerk organisation and inject the account context. A four-tier role model (owner, admin, editor, viewer in the legacy design; admin, member plus custom permissions in Clerk) governs access within each account. Multi-tenancy is structural, not bolted on.
3.9 Unified capability model¶
Tools and agents share a single manifest format and execution contract. From the platform's perspective, invoking a Convex-internal tool, calling an external tool over HTTP, and delegating to another agent all follow the same pattern: a capability is described by a manifest, executed through the Workflow component, and tracked with identical governance, cost attribution, and audit trail. This unification simplifies composition and eliminates special cases.
3.10 Interactive work is sacred¶
Chat-like interactions, which is to say any channel where a human is waiting for a response, must never compete with background processing for execution capacity. These interactions use the fast path: direct mutations and actions, no Workflow overhead. This principle drives the execution tier model in section 12.
3.11 Workflows for durability, not convenience¶
A Workflow slot should only be consumed when the work genuinely requires step-by-step journalling and crash recovery. If the failure mode is "the user sees no response and retries," a direct action with the Action Retrier is sufficient and cheaper. Workflow usage is budgeted.
3.12 External queues as escape hatch, not crutch¶
The external queue path (Tier 3) exists for when volume demands it, not as a default architectural choice. Moving work external adds operational complexity: a queue to manage, a worker to deploy, callback endpoints to secure. Each channel or job type starts Convex-native and migrates external only when monitoring shows it needs to.
4. Platform stack¶
| Layer | Technology | Purpose |
|---|---|---|
| Auth and identity | Clerk | User auth, organisation management, RBAC, SSO, pre-built UI components |
| Application platform | Convex (Cloud, Ireland) | Database, server functions, reactive queries, real-time subscriptions, vector search, full-text search, scheduling, durable workflows, HTTP endpoints |
| File storage | Cloudflare R2 | Document uploads, generated files, media. Accessed via presigned URLs. |
| Client, web | Next.js 15, React 19, TypeScript, Tailwind CSS 4 | Web application at app.thinklio.ai |
| Client, mobile | Flutter | iOS/Android app using convex_flutter for reactive subscriptions and Clerk Dart SDK for auth |
| LLM providers | OpenRouter / Anthropic API | Called from Convex actions via HTTP |
| Observability | Axiom (via Convex log streams) | Function execution logs, performance metrics, alerting |
| Analytics archive | Postgres or data warehouse (via Fivetran CDC) | Long-term audit trail, usage analytics, compliance archive |
| Email delivery | Postmark | Transactional email, inbound email channel |
4.1 What is not in the stack¶
| Removed | Replaced by |
|---|---|
| Redis | Convex reactive caching + Rate Limiter component |
| Supabase (Postgres + Auth) | Convex (data) + Clerk (auth) |
| Go services (Gateway, Agent, Context, Tool, Queue, Usage) | Convex functions + components |
| Custom event bus (Redis Streams) | Convex reactivity + triggers |
| Custom HarnessExecutor | Convex Workflow component |
| pgvector | Convex native vector search + RAG component |
| Multi-tier caching (L1/L2/L3) | Convex reactive query cache (automatic) |
| Keycloak (from the two-plane proposal) | Clerk |
5. High-level component map¶
┌─────────────────────────────────────────────────────────────────┐
│ Managed Services │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Clerk │ │ Convex │ │ Cloudflare │ │
│ │ (Auth) │ │ (Ireland) │ │ R2 (EU) │ │
│ │ │ │ │ │ │ │
│ │ Users │ │ Schema │ │ Documents │ │
│ │ Orgs │ │ Functions │ │ Uploads │ │
│ │ RBAC │ │ Workflows │ │ Media │ │
│ │ Webhooks │ │ RAG │ │ │ │
│ │ SSO │ │ Vector/text │ │ Presigned │ │
│ │ │ │ Crons │ │ URLs │ │
│ │ │ │ HTTP actions │ │ │ │
│ └────┬─────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
└────────┼──────────────────┼─────────────────────┼────────────────┘
│ │ │
┌────▼──────────────────▼─────────────────────▼────┐
│ Client Applications │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Next.js │ │ Flutter │ │
│ │ Web App │ │ Mobile App │ │
│ │ │ │ │ │
│ │ Convex + │ │ convex_ │ │
│ │ Clerk │ │ flutter + │ │
│ │ providers │ │ Clerk Dart │ │
│ └─────────────┘ └──────────────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│ Data Export (Async) │
│ │
│ Convex ──CDC──▶ Fivetran ──▶ Postgres / DW │
│ Convex ──log streams──▶ Axiom │
│ Convex ──webhooks──▶ External systems │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│ External Channel Bridges │
│ │
│ Postmark ──▶ Convex HTTP action ──▶ Channel │
│ Telegram ──▶ Convex HTTP action ──▶ Channel │
│ (WhatsApp, voice, SMS: future) │
└───────────────────────────────────────────────────┘
Single Convex project. Single Clerk instance. No servers to manage on the hot path. Bridges are Convex HTTP actions that translate external channel protocols into internal messages; they add no separate infrastructure until volume demands Tier 3 promotion.
6. Authentication and organisation model¶
6.1 Clerk as the identity layer¶
Clerk manages all user identity concerns: registration, login, MFA, session management, and the pre-built UI components for sign-in, user profile, and organisation management. Thinklio does not build any auth UI or manage any credentials. The Clerk Dart SDK ships for Flutter; ConvexProviderWithClerk handles the React side.
6.2 Organisation model¶
Clerk Organisations map directly to Thinklio accounts:
| Clerk concept | Thinklio concept |
|---|---|
| Organisation | Account |
| Organisation member | Account user |
| Organisation role | Account role |
| Organisation permission | Account permission |
| Active organisation | Active account (session context) |
Users can belong to multiple organisations and switch between them. Clerk provides the active organisation context in every session, which Convex receives via the JWT.
6.3 Roles and permissions¶
The free Clerk tier provides Admin and Member roles with custom permissions. This covers early development and initial customers. As Thinklio's permission model matures, the Enhanced B2B add-on ($100/month) unlocks custom roles, role sets (different role configurations per organisation tier), and unlimited members per organisation.
Initial permission set, defined in Clerk and enforced in Convex middleware:
| Permission | Description |
|---|---|
agent:manage |
Create, edit, delete agents |
agent:use |
Interact with assigned agents |
knowledge:manage |
Upload, edit, delete knowledge items |
knowledge:read |
Search and retrieve knowledge |
channel:manage |
Create, archive, configure channels |
channel:read |
View and participate in channels |
team:manage |
Create, edit teams and membership |
billing:manage |
View and manage subscription and billing |
billing:read |
View billing information |
admin:full |
Full account administration |
6.4 Convex integration¶
Clerk's first-class Convex integration is configured in convex/auth.config.ts:
In Convex functions, identity is read from ctx.auth:
const identity = await ctx.auth.getUserIdentity();
// identity.orgId → Clerk organisation ID (= Thinklio account ID)
// identity.orgRole → "org:admin" | "org:member" | custom
// identity.subject → Clerk user ID
The ConvexProviderWithClerk React component keeps auth state synchronised on the client. The Clerk Dart SDK does the same for Flutter.
6.5 Clerk webhooks and Convex sync¶
Key identity events sync from Clerk to Convex via a webhook HTTP action:
| Clerk event | Convex action |
|---|---|
organization.created |
Create account_record, seed starter agents |
organization.deleted |
Soft-delete account, archive data |
organizationMembership.created |
Create user_profile, assign default agents |
organizationMembership.deleted |
Remove from channels, revoke assignments |
user.updated |
Sync display name, avatar |
Svix (Clerk's webhook infrastructure) provides automatic retries with exponential backoff, webhook signature verification, and a delivery log. The Convex HTTP action receiving webhooks processes events idempotently using the event ID to deduplicate. A weekly cron calls the Clerk API to list current organisation memberships and syncs any discrepancies, which handles the edge case of a prolonged outage exhausting Svix's retry window. Real-world usage will inform whether the reconciliation frequency needs adjustment.
7. Messaging architecture¶
The messaging system is the backbone of Thinklio. All user-agent interaction, team collaboration, and agent-to-agent communication flows through it. This section covers the model at architecture level; doc 06 Events, Channels & Messaging carries the full detail including notification rules, mention semantics, read state, presence, and the messaging UX state of the art.
7.1 Channels¶
A channel is a conversation space with participants, messages, and threads. Channels are scoped to an account (Clerk organisation).
| Channel type | Participants | Visibility | Use case |
|---|---|---|---|
direct |
Exactly 2 (user-to-user or user-to-agent) | Private to participants | 1:1 chat with an agent or another user |
private_group |
Invited users and/or agents | Private to members | Project team with embedded agents |
public_group |
Any account member can join | Visible to all account members | Company-wide announcements with agent assistance |
team |
All members of a team plus assigned agents | Visible to team members | Team workspace with team agents |
organisation |
All account members plus account-level agents | Visible to all account members | Organisation-wide agent services |
Schema sketch for the channel table (full schema in doc 04):
channel: defineTable({
accountId: v.string(), // Clerk org ID
type: v.union(
v.literal("direct"),
v.literal("private_group"),
v.literal("public_group"),
v.literal("team"),
v.literal("organisation"),
),
name: v.optional(v.string()),
description: v.optional(v.string()),
teamId: v.optional(v.id("team")),
createdBy: v.string(), // Clerk user ID
archived: v.boolean(),
metadata: v.optional(v.any()),
})
.index("by_account", ["accountId"])
.index("by_account_type", ["accountId", "type"])
.index("by_team", ["teamId"]),
7.2 Channel members¶
Members can be users or agents. Both are first-class participants.
channel_member: defineTable({
channelId: v.id("channel"),
memberType: v.union(v.literal("user"), v.literal("agent")),
memberId: v.string(), // Clerk user ID or Convex agent ID
role: v.union(
v.literal("owner"),
v.literal("admin"),
v.literal("member"),
v.literal("observer"), // can read but not write (useful for monitoring agents)
),
notifications: v.union(
v.literal("all"),
v.literal("mentions"),
v.literal("none"),
),
joinedAt: v.number(),
lastReadAt: v.optional(v.number()),
})
.index("by_channel", ["channelId"])
.index("by_member", ["memberType", "memberId"])
.index("by_channel_member", ["channelId", "memberType", "memberId"]),
7.3 Messages¶
Messages are the atomic unit of communication. A message can be from a user, an agent, or the system.
message: defineTable({
channelId: v.id("channel"),
threadId: v.optional(v.id("message")), // parent message for threaded replies
authorType: v.union(
v.literal("user"),
v.literal("agent"),
v.literal("system"),
),
authorId: v.string(),
content: v.string(),
contentType: v.union(
v.literal("text"),
v.literal("markdown"),
v.literal("rich"), // structured content (cards, buttons, etc.)
v.literal("file"), // file attachment reference
v.literal("action"), // tool call result, delegation result, etc.
v.literal("thinking"), // agent reasoning trace (visible to admins)
),
attachments: v.optional(v.array(v.object({
fileId: v.string(), // R2 object key
fileName: v.string(),
fileType: v.string(),
fileSize: v.number(),
url: v.optional(v.string()),
}))),
metadata: v.optional(v.any()),
edited: v.boolean(),
deleted: v.boolean(),
})
.index("by_channel", ["channelId", "_creationTime"])
.index("by_channel_thread", ["channelId", "threadId", "_creationTime"])
.index("by_author", ["authorType", "authorId", "_creationTime"])
.searchIndex("search_content", {
searchField: "content",
filterFields: ["channelId", "authorType"],
}),
7.4 Threads¶
Threads are implemented as messages with a threadId pointing to the parent message. A threaded conversation is retrieved by querying messages with a given threadId. The parent message itself has no threadId, it is the thread root.
This avoids a separate threads table and keeps the query model simple: all content is messages, some of which are threaded.
7.5 Agents as channel participants¶
When an agent is added to a channel, it becomes a member just like a user. The agent can:
- Observe all messages in the channel (if its role permits)
- Respond when mentioned or addressed
- Proactively contribute based on its configuration (e.g., a monitoring agent that alerts when it detects a relevant pattern)
- Be mentioned by users with
@AgentNamesyntax - Mention other agents in the same channel to trigger cross-agent interaction
An agent's behaviour in a channel is governed by its configuration (system prompt, tools, knowledge layers) and the account's governance policies. The agent doesn't need to know what "channel" means in the infrastructure sense, it receives messages and produces responses within its execution context.
7.6 Real-time delivery¶
Clients subscribe to channel messages via Convex reactive queries:
When any participant (user or agent) writes a message, every subscriber sees it instantly via WebSocket. No polling, no pub/sub configuration, no event bus. This is Convex's core value proposition.
For LLM streaming (agent typing in real time), the Persistent Text Streaming component streams token-by-token output over the same WebSocket while persisting the final content to the message table.
7.7 External channel bridges¶
External messaging channels (Telegram, WhatsApp, email) are implemented as bridges, Convex HTTP actions that receive inbound webhooks and translate them to internal messages.
External Channel -> HTTP action (bridge) -> Internal message -> Channel
|
Agent processes <- Reactive subscription
|
Response message -> HTTP action (bridge) -> External Channel
Each bridge is a pair of functions: an inbound HTTP action (webhook receiver) and an outbound action (message sender). Bridges are thin translators; all business logic remains in the core messaging system. Bridge identity mapping (linking a Telegram user ID to a Clerk user) is stored in a channel_identity table.
Bridge execution tier is declared per channel (see section 12). Telegram and any future conversational channel run Tier 1. Postmark inbound email runs Tier 2 because it is a multi-step pipeline (parse, resolve sender, resolve agent, create or continue thread, run agent turn, send response) whose partial completion has value and needs journalling rather than restart.
8. Agent architecture (summary)¶
Doc 03 Agent Architecture & Extensibility is the authoritative reference. This section gives the architecture-level summary so this document stands on its own.
8.1 Agent model¶
An agent is a persistent entity with an identity, configuration, and capabilities. Agents are account-scoped.
agent: defineTable({
accountId: v.string(),
name: v.string(),
description: v.optional(v.string()),
avatar: v.optional(v.string()),
systemPrompt: v.string(),
model: v.string(), // e.g., "anthropic/claude-sonnet-4"
modelConfig: v.optional(v.object({
temperature: v.optional(v.number()),
maxTokens: v.optional(v.number()),
})),
trustLevel: v.union(
v.literal("read"),
v.literal("standard"),
v.literal("elevated"),
v.literal("admin"),
),
status: v.union(
v.literal("active"),
v.literal("inactive"),
v.literal("draft"),
),
tier: v.union(v.literal("fundamental"), v.literal("applied")),
delegationSet: v.optional(v.array(v.id("agent"))),
maxDelegationDepth: v.optional(v.number()),
contextBudget: v.optional(v.number()), // max tokens for knowledge context
createdBy: v.string(),
metadata: v.optional(v.any()),
})
.index("by_account", ["accountId"])
.index("by_account_status", ["accountId", "status"]),
8.2 Tools¶
Tools are capabilities that agents can invoke. The unified capability model is preserved: internal functions, external API calls, and agent delegation all follow the same pattern.
tool: defineTable({
accountId: v.optional(v.string()), // null = platform-wide tool
name: v.string(),
description: v.string(),
type: v.union(
v.literal("internal"), // Convex function
v.literal("external"), // External API via HTTP action
v.literal("agent"), // Delegate to another agent
),
schema: v.any(), // JSON Schema for parameters
endpoint: v.optional(v.string()),
handler: v.optional(v.string()), // Convex function reference for internal tools
requiresApproval: v.boolean(),
trustLevelRequired: v.union(
v.literal("read"),
v.literal("standard"),
v.literal("elevated"),
v.literal("admin"),
),
})
.index("by_account", ["accountId"])
.index("by_name", ["name"]),
agent_tool: defineTable({
agentId: v.id("agent"),
toolId: v.id("tool"),
enabled: v.boolean(),
config: v.optional(v.any()), // per-agent tool configuration overrides
})
.index("by_agent", ["agentId"])
.index("by_tool", ["toolId"]),
8.3 Agent execution flow¶
When a message triggers an agent (via mention, direct channel, or proactive rule), the execution flow is:
1. Trigger -> Message arrives in channel where agent is a participant
2. Tier routing -> agentRouter selects Tier 1 (fast path) or Tier 2 (durable workflow)
3. Context -> Assemble: system prompt + knowledge (RAG query) + recent messages + tool definitions
4. Policy check -> Middleware validates: account policies, trust level, cost budget (in-process cached queries)
5. LLM call -> Action calls LLM provider via HTTP, streams response via Persistent Text Streaming
6. Tool calls -> If the LLM requests tools: validate against trust level, execute, return results
7. Delegation -> If the LLM delegates: create a sub-workflow for the delegate agent (always Tier 2)
8. Response -> Final response written as a message in the channel
9. Audit -> Trigger writes `interaction` record and `step` details to audit tables
10. Metering -> Sharded counter increments for token usage, tool calls, etc.
On Tier 1 (fast path), steps run as direct mutations and actions with the Action Retrier wrapping the LLM call. On Tier 2 (durable), each step is journalled by the Workflow component and execution resumes from the last completed step after a crash.
Simplified Tier 2 execution:
export const executeAgentTurn = workflow(
{ name: "agentTurn" },
async (step, { channelId, messageId, agentId }) => {
const context = await step.run("assembleContext", async () => {
const agent = await getAgent(agentId);
const messages = await getRecentMessages(channelId, 50);
const knowledge = await ragQuery(agent, messages);
const tools = await getAgentTools(agentId);
return { agent, messages, knowledge, tools };
});
await step.run("policyCheck", async () => {
await assertAccountPolicy(context.agent.accountId, {
action: "agent_turn",
agentId,
trustLevel: context.agent.trustLevel,
});
await assertBudget(context.agent.accountId, agentId);
});
const llmResult = await step.run("llmCall", async () => {
return await callLLM({
model: context.agent.model,
systemPrompt: context.agent.systemPrompt,
messages: context.messages,
knowledge: context.knowledge,
tools: context.tools,
channelId, // for streaming output
});
});
let result = llmResult;
while (result.toolCalls?.length > 0) {
const toolResults = await step.run("toolExecution", async () => {
return await executeToolCalls(result.toolCalls, context.agent);
});
result = await step.run("llmContinuation", async () => {
return await callLLM({ ...context, toolResults });
});
}
await step.run("writeResponse", async () => {
await writeMessage(channelId, agentId, result.content);
await recordInteraction(channelId, messageId, agentId, result);
await incrementUsage(context.agent.accountId, result.usage);
});
}
);
8.4 Agent composition and delegation¶
Agents are composed through delegation: one agent invokes another as a tool, via the same Workflow step, policy evaluation, cost tracking, and audit trail as any tool call.
Key properties:
- Delegates operate under the invoking context's restrictions (permissions narrow, never widen)
- Delegation depth is limited by account policy to prevent unbounded chains (
maxDelegationDepth) - Cycle detection operates at both configuration time and runtime
- Knowledge isolation is preserved: the delegate assembles its own context
- Costs roll up through the delegation chain to the originating user, team, and account
- Each agent in a delegation chain runs as its own Workflow so the coordinator can resume if a delegate crashes
8.5 Agent assignment¶
Agents are assigned to users, teams, or the entire account. Assignments control which agents appear in a user's available agent list and which agents can be added to channels.
agent_assignment: defineTable({
agentId: v.id("agent"),
scope: v.union(
v.literal("user"),
v.literal("team"),
v.literal("account"),
),
scopeId: v.string(), // Clerk user ID, team ID, or Clerk org ID
toolRestrictions: v.optional(v.array(v.id("tool"))), // subset of agent's tools allowed for this assignment
})
.index("by_agent", ["agentId"])
.index("by_scope", ["scope", "scopeId"]),
Per-assignment tool restrictions allow the same agent to serve different contexts with appropriately scoped capabilities.
8.6 Agent capability levels¶
Capability levels describe how much freedom an agent has to compose its own solutions, carried forward from the original platform design:
- Knowledge and tools: answers questions and performs discrete actions within configured tool access.
- Workflow composition: chains tools to solve multi-step problems; may delegate to specialist agents.
- Experimental problem-solving: broader tool library; tries approaches beyond explicit configuration; governance limits scope.
- Learning: recognises successful workflows and codifies them as reviewable, shareable reusable patterns.
All capability levels use the same Workflow infrastructure, governance middleware, and accounting.
9. Knowledge architecture (summary)¶
Doc 05 Persistence, Storage & Ingestion is the authoritative reference for knowledge storage, the ingestion pipeline, document intelligence, and derived knowledge. This section gives the architecture-level summary.
9.1 Four-layer knowledge model¶
Every agent interaction draws from up to four knowledge layers:
| Layer | Owner | Mutability | Visibility | Examples |
|---|---|---|---|---|
| Agent | Platform / agent creator | Configured at setup, updated via learning | All users of this agent | Domain expertise, skills, system prompt, learned workflows |
| Account | Account admins | Curated, mostly static | All account members using this agent | Policies, procedures, compliance rules, brand guidelines |
| Team | Collective (team members) | Grows from interactions | Team members only | Project context, client details, shared decisions |
| User | Individual user | Grows from user's interactions | Private to that user only | Personal preferences, individual context, private notes |
Precedence: Account policies override all other layers. Then agent, team, and user knowledge in descending priority.
Privacy: User knowledge is never visible to other users, even within the same team. Team knowledge is isolated between teams within the same account.
Portability: When a user leaves a team, their user knowledge goes with them. Contributions to team knowledge remain with the team.
Delegation isolation: Knowledge extracted during a delegate agent's interaction is scoped to the delegate's context.
9.2 Knowledge storage¶
Knowledge items are stored in Convex with vector embeddings for semantic retrieval:
knowledge_item: defineTable({
accountId: v.string(),
scope: v.union(
v.literal("account"),
v.literal("agent"),
v.literal("team"),
v.literal("user"),
),
scopeId: v.string(),
title: v.optional(v.string()),
content: v.string(),
source: v.optional(v.string()), // document ID, URL, manual entry, etc.
importance: v.optional(v.number()), // 0.0 to 1.0 weighting for retrieval ranking
embedding: v.array(v.float64()),
metadata: v.optional(v.any()),
})
.index("by_scope", ["scope", "scopeId"])
.index("by_account", ["accountId"])
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 1536,
filterFields: ["accountId", "scope", "scopeId"],
})
.searchIndex("search_content", {
searchField: "content",
filterFields: ["accountId", "scope", "scopeId"],
}),
9.3 RAG component integration¶
The Convex RAG component provides the retrieval layer with namespacing, importance weighting, and chunk context. Each knowledge layer maps to a namespace:
const rag = new RAG({
namespaces: {
account: { filterField: "scopeId" },
agent: { filterField: "scopeId" },
team: { filterField: "scopeId" },
user: { filterField: "scopeId" },
},
});
At query time, the agent's context assembly retrieves from all applicable namespaces and merges results according to the resolution hierarchy (account overrides agent overrides team overrides user).
9.4 Token budget allocation¶
Token budget is allocated in priority order:
- System prompt and agent knowledge (fixed)
- Account policies (fixed, always included in full)
- Job context when a job is running (fixed when present)
- Conversation history (dynamic, recent messages prioritised)
- Team knowledge (dynamic, relevance-ranked)
- User knowledge (dynamic, relevance-ranked)
Context assembly fits within the agent's contextBudget and the model's context window.
9.5 Document ingestion¶
Documents uploaded to R2 are processed through an ingestion pipeline managed by the Workpool component:
1. Upload -> File stored in R2, metadata written to Convex
2. Parse -> Action extracts text (PDF, DOCX, etc.) using appropriate parser
3. Chunk -> Text split into semantic chunks with overlap
4. Embed -> Action calls embedding model (e.g., text-embedding-3-small)
5. Store -> Chunks stored as knowledge_item records with embeddings
6. Derive -> Optional: LLM extracts structured facts from chunks
The Workpool manages parallelism (configurable concurrency limit), retries with backoff, and provides reactive job status so the UI can show ingestion progress. See doc 05 for the detailed pipeline, document intelligence, and derived-knowledge workflows.
// Document processing with Workpool
const pool = new Workpool({
maxConcurrency: 10,
retries: 3,
backoff: { type: "exponential", baseMs: 1000 },
});
// Enqueue document for processing
await pool.enqueue(internal.documents.processDocument, {
documentId,
accountId,
scope: "team",
scopeId: teamId,
});
10. Governance and policy (summary)¶
Doc 07 Security & Governance is the authoritative reference. This section gives the architecture-level summary so the system's enforcement model is visible here.
10.1 Governance as custom function middleware¶
All governance enforcement happens via Convex custom functions from convex-helpers. Every query and mutation is wrapped with middleware that validates the caller's identity, organisation membership, and permissions before the function body executes.
import { customQuery, customMutation } from "convex-helpers/server/customFunctions";
// Base authenticated wrapper
export const authedQuery = customQuery(query, {
args: {},
input: async (ctx) => {
const identity = await ctx.auth.getUserIdentity();
if (!identity) throw new ConvexError("Unauthenticated");
return {
ctx: {
userId: identity.subject,
accountId: identity.orgId,
role: identity.orgRole,
},
args: {},
};
},
});
// Account-scoped wrapper (adds tenant isolation)
export const accountQuery = customQuery(query, {
args: {},
input: async (ctx) => {
const identity = await ctx.auth.getUserIdentity();
if (!identity) throw new ConvexError("Unauthenticated");
if (!identity.orgId) throw new ConvexError("No active organisation");
return {
ctx: {
userId: identity.subject,
accountId: identity.orgId,
role: identity.orgRole,
assertPermission: (permission: string) => {
if (!hasPermission(identity, permission)) {
throw new ConvexError(`Missing permission: ${permission}`);
}
},
},
args: {},
};
},
});
Every function that accesses account-scoped data uses accountQuery or accountMutation, which ensures the caller has an active organisation and injects the account context. The function then filters data by accountId and validates permissions.
10.2 Account policies¶
Account policies are stored as Convex documents. Because they are read by queries, the reactive engine caches them automatically. A policy change propagates to all active function evaluations without any manual cache invalidation.
account_policy: defineTable({
accountId: v.string(),
type: v.union(
v.literal("content_restriction"),
v.literal("tool_restriction"),
v.literal("delegation_restriction"),
v.literal("cost_limit"),
v.literal("operating_hours"),
v.literal("approval_gate"),
),
rule: v.any(), // policy-type-specific rule definition
enabled: v.boolean(),
priority: v.number(),
})
.index("by_account", ["accountId"])
.index("by_account_type", ["accountId", "type"]),
The policy check step in agent execution reads applicable policies via a cached query and evaluates them in-process. No HTTP call, no Redis lookup, no latency.
10.3 Cost controls and usage metering¶
Usage is tracked in real time using the Sharded Counter and Aggregate components:
const counters = new ShardedCounter();
// After each agent turn:
await counters.increment("tokens", accountId, result.usage.totalTokens);
await counters.increment("tool_calls", accountId, result.usage.toolCalls);
await counters.increment("interactions", accountId, 1);
Budget limits are account policies. The policy check step reads the current counter value and compares it to the limit before allowing execution to proceed.
10.4 Audit trail¶
Every agent action is recorded via database triggers from convex-helpers:
const triggers = new Triggers();
triggers.register("message", async (ctx, change) => {
if (change.operation === "insert" && change.newDoc.authorType === "agent") {
await ctx.runMutation(internal.audit.record, {
accountId: change.newDoc.accountId,
event: "agent_message",
agentId: change.newDoc.authorId,
channelId: change.newDoc.channelId,
messageId: change.id,
timestamp: Date.now(),
});
}
});
Audit records are stored in a Convex table for immediate querying. For long-term retention and compliance, Fivetran CDC streams the audit_event table to an external Postgres instance or data warehouse.
audit_event: defineTable({
accountId: v.string(),
event: v.string(),
actorType: v.union(v.literal("user"), v.literal("agent"), v.literal("system")),
actorId: v.string(),
resourceType: v.optional(v.string()),
resourceId: v.optional(v.string()),
detail: v.optional(v.any()),
timestamp: v.number(),
})
.index("by_account", ["accountId", "timestamp"])
.index("by_actor", ["actorType", "actorId", "timestamp"])
.index("by_event", ["event", "timestamp"]),
10.5 Data isolation¶
- Database: application-layer tenant isolation enforced by every
accountQuery/accountMutationwrapper filtering byaccountId - Application: runtime context assertions in every service function
- Delegation: delegate agents assemble their own context; the invoking agent's full context is not forwarded
For the full security model including threat model, trust levels, rate limiting, GDPR readiness, and incident response, see doc 07 Security & Governance.
11. File storage¶
11.1 Cloudflare R2¶
R2 is the file storage layer. Files are stored with account-scoped key prefixes:
{accountId}/{context}/{fileId}.{ext}
Examples:
acc_123/documents/doc_456.pdf
acc_123/avatars/user_789.jpg
acc_123/exports/report_012.xlsx
11.2 File metadata in Convex¶
File metadata is tracked in Convex for indexing, access control, and ingestion pipeline management:
file: defineTable({
accountId: v.string(),
r2Key: v.string(),
fileName: v.string(),
fileType: v.string(), // MIME type
fileSize: v.number(),
uploadedBy: v.string(), // Clerk user ID
context: v.union(
v.literal("document"),
v.literal("avatar"),
v.literal("attachment"),
v.literal("export"),
),
ingestionStatus: v.optional(v.union(
v.literal("pending"),
v.literal("processing"),
v.literal("complete"),
v.literal("failed"),
)),
metadata: v.optional(v.any()),
})
.index("by_account", ["accountId"])
.index("by_ingestion", ["ingestionStatus"]),
11.3 Upload flow¶
Upload uses presigned URLs to avoid routing file bytes through Convex:
1. Client requests upload URL -> Convex mutation creates file record, generates presigned R2 URL
2. Client uploads directly to R2 -> No server proxy
3. Client confirms upload -> Convex mutation marks file as uploaded
4. If document -> Workpool enqueues ingestion pipeline
For the complete storage and ingestion design, including bucket tiers (platform-shared, enterprise-dedicated, account-supplied BYOB), presigned URL TTLs, retention policies, and document intelligence, see doc 05 Persistence, Storage & Ingestion.
12. Execution tiers and workflow budget¶
This section consolidates the execution tier model (former doc 54). It is the authoritative reference for how work is routed through Convex and how the Workflow slot budget is allocated.
12.1 Problem statement¶
Convex's Professional plan caps the total number of concurrent Workflow and Workpool executions at 100 across the entire deployment. Every execution slot consumed by a background task is one fewer slot available for an interactive agent turn. If agent turns, email processing, knowledge indexing, notifications, and batch jobs all compete for the same 100 slots, interactive responsiveness degrades under load and the platform hits hard limits far before it reaches meaningful user scale.
The tiered execution model separates interactive work from durable work, allocates the workflow budget deliberately, and defines the growth path for when volume exceeds what the Convex-native pool can handle.
12.2 Tier 1: fast path (no Workflow)¶
Mechanism: Convex mutation -> scheduled action (via ctx.scheduler.runAfter(0, ...)) or inline action, with Action Retrier wrapping the LLM call.
Characteristics:
- Zero workflow slots consumed.
- No journalling or step recovery. If the action crashes, the thinking indicator is cleaned up by a timeout and the user can resend.
- The LLM call itself is retried (transient failures, rate limits) via the Action Retrier component with exponential backoff.
- Response is written directly as a mutation on completion.
- Latency target: sub-second to first token (streaming), under 30 seconds for full response.
Used for:
- Web app chat messages (user sends a message in a channel, agent responds).
- Telegram inbound messages (webhook -> HTTP action -> mutation + scheduled action).
- Any future real-time channel (WhatsApp, voice transcription, SMS) where a human is actively waiting.
- Typing indicators and presence (already handled by the Presence component, no actions involved).
Flow:
User message arrives
-> mutation: write message to channel, write thinking indicator, schedule agent action
-> action: assemble context, check policy, call LLM (with Action Retrier), handle tool calls
-> mutation (from action): write agent response, remove thinking indicator, record audit, increment usage counters
If the action fails after all retries, a cleanup mutation removes the thinking indicator and optionally writes a system message ("I wasn't able to respond, please try again"). The user's message is preserved; nothing is lost.
When to promote to Tier 2: When the agent turn involves delegation to another agent, multi-step tool chains that take more than 60 seconds, or any step where partial completion produces value that would be lost on crash.
12.3 Tier 2: durable workflow¶
Mechanism: Convex Workflow component. Each step is journalled. Crash recovery resumes from the last completed step.
Characteristics:
- Consumes one workflow slot for the duration of execution.
- Steps are individually retried per the Workflow's retry policy.
- Sub-workflows can be spawned for delegation, each consuming their own slot.
- Completion callbacks can trigger downstream work (e.g., notifications).
Used for:
- Agent delegation chains (coordinator -> delegate -> sub-delegate). Each agent in the chain runs as a Workflow so that the coordinator can resume if a delegate crashes.
- Multi-step tool execution sequences where intermediate results have value (for example, an agent runs a database query, processes the results, then calls an external API with the processed data; losing the intermediate result on crash means re-running an expensive query).
- Inbound email processing: parse, resolve sender, resolve agent, create or continue thread, run agent turn, send response. Six steps, any of which can fail independently, and partial completion (e.g., thread created but response not sent) needs recovery, not restart.
- Complex document processing pipelines where chunking, embedding, and indexing happen in sequence.
Slot budget allocation (target steady state):
| Workload | Allocated slots | Notes |
|---|---|---|
| Agent delegation chains | 15 | Assumes max 5 concurrent delegations at depth <= 3 |
| Inbound email processing | 10 | Burst capacity for email; most accounts receive < 10 concurrent emails |
| Document/knowledge indexing | 10 | Via Workpool with parallelism capped at 10 |
| Complex agent turns (promoted from Tier 1) | 10 | Overflow for turns that need durability |
| Total allocated | 45 | |
| Headroom (unallocated) | 55 | 55% headroom for spikes and future workloads |
12.4 Tier 3: external queue¶
Mechanism: Convex HTTP action receives the trigger, writes the payload to the database, and enqueues a job to an external queue (initially Google Cloud Tasks or a simple BullMQ worker on the existing Hetzner infrastructure). The external worker processes the job and calls back into Convex via authenticated internal actions to read context and write results.
Characteristics:
- Zero workflow slots consumed.
- Adds operational surface: the queue and worker must be deployed, monitored, and scaled.
- Higher latency (network round-trips between worker and Convex).
- Suitable for high-volume, latency-tolerant workloads.
Used for (when volume demands migration from Tier 2):
- High-volume inbound email (e.g., a customer account processing hundreds of emails per hour).
- Bulk document ingestion (uploading a large document library for indexing).
- Webhook-heavy channel integrations (WhatsApp Business API with high message volume).
- Scheduled batch operations (nightly report generation, bulk data exports).
Migration trigger: A workload moves from Tier 2 to Tier 3 when monitoring shows it is consistently consuming more than 80% of its allocated Tier 2 budget, or when the total Tier 2 usage regularly exceeds 70 slots (70% of the ceiling).
Callback pattern:
External worker picks up job
-> HTTP POST to Convex httpAction (authenticated via shared secret or JWT)
-> httpAction calls internal mutation/action to process and write results
-> Result is visible via Convex's reactive queries (clients update automatically)
12.5 Notifications¶
Notifications are a separate concern from the work that triggers them. They are never part of the same Workflow as the work itself.
Why separate. A notification is a fan-out operation: one event (agent responded, job completed, email delivered) may need to reach multiple recipients across multiple channels (in-app, push notification, email digest). The originating Workflow should not be held open while notifications fan out, nor should a notification delivery failure cause the originating work to retry.
Notification flow:
Work completes (Tier 1 action or Tier 2 workflow step)
-> completion mutation writes the result AND schedules a notification action
-> notification action: resolve recipients, resolve delivery channels per recipient preferences
-> for each channel: deliver (in-app: mutation; push: external API call; email: Postmark API call)
Notification delivery uses the Action Retrier for external channels (push, email) so that transient failures are handled without consuming workflow slots. In-app notifications are simple mutations and are effectively free.
For high fan-out scenarios (e.g., a notification going to 50 team members), the notification action can use a Workpool to parallelise delivery with a conservative concurrency limit (for example, 5 concurrent deliveries).
Notification budget. Notifications should consume minimal workflow budget. The primary path (in-app plus single push or email per recipient) uses no workflow slots at all, just a scheduled action with the Action Retrier. The Workpool path for high fan-out should be allocated no more than 5 slots in the overall budget, and only activates for genuinely large fan-outs.
12.6 Channel classification¶
Each channel type is assigned a default execution tier. The tier can be overridden per account if that account's volume demands it.
| Channel | Default tier | Rationale |
|---|---|---|
| Web app (app.thinklio.ai) | Tier 1 | User is watching the screen. Must be instant. |
| Telegram | Tier 1 | Conversational channel. User expects fast replies. |
| WhatsApp (future) | Tier 1 | Same as Telegram. |
| Voice transcription (future) | Tier 1 | Real-time channel. |
| SMS (future) | Tier 1 | Conversational channel. |
| Email inbound (Postmark) | Tier 2 | Multi-step processing, no real-time expectation. User expects minutes, not seconds. |
| Email outbound (Postmark) | Tier 2 | Delivery tracking, retry semantics. |
| API-triggered agent turns | Tier 1 or 2 | Depends on caller expectations. Default Tier 1; promote to Tier 2 if the caller requests durable execution via an API flag. |
| Scheduled/cron agent turns | Tier 2 | No human waiting. Durability matters more than speed. |
| Bulk document ingestion | Tier 2 (-> Tier 3) | Background work. First candidate for external queue migration. |
| Batch operations (exports, reports) | Tier 2 (-> Tier 3) | Background work. Second candidate for external queue migration. |
| Webhook integrations (future) | Tier 2 (-> Tier 3) | Volume-dependent. Start durable, migrate external when volume demands. |
12.7 Implementation: fast path refactor¶
The earliest agentWorkflow.ts implementation wrapped every agent turn in a Workflow. It was refactored into a two-path execution model.
Component registration. The Action Retrier must be registered in convex/convex.config.ts to use it on the fast path. It is a peer dependency of the Workflow component but not explicitly registered by default. Add:
The Workpool component must also be registered for Tier 2 parallelised workloads (document indexing, high-fan-out notifications):
File structure:
convex/
agentExecution.ts -- Tier 1 fast path: direct action for interactive agent turns
agentWorkflow.ts -- Tier 2 durable path: Workflow for complex/delegated turns
agentExecutionHelpers.ts -- Shared helpers (context assembly, policy checks, LLM calling)
agentRouter.ts -- Decides which tier to use for a given turn
notifications.ts -- Notification fan-out (separate from agent execution)
Routing logic. When a message arrives in a channel where an agent is a participant, agentRouter decides the execution tier:
function selectTier(context: {
agent: Agent;
channel: Channel;
message: Message;
delegationDepth: number;
}): "fast" | "durable" {
// Delegation always uses durable path
if (context.delegationDepth > 0) return "durable";
// Agent explicitly configured for durable execution
if (context.agent.executionMode === "durable") return "durable";
// Channel type override
if (context.channel.type === "email") return "durable";
// Scheduled/cron triggers
if (context.message.triggerType === "scheduled") return "durable";
// Default: fast path for all interactive channels
return "fast";
}
Action Retrier configuration. The LLM call on the fast path uses the Action Retrier with these defaults:
const llmRetrier = new ActionRetrier(components.actionRetrier, {
initialBackoffMs: 500,
base: 2,
maxFailures: 3,
});
This gives retry attempts at approximately 500ms, 1s, and 2s, fast enough to feel responsive, with enough retries to survive transient LLM provider hiccups. The total worst-case delay before failure is approximately 4 seconds, which is acceptable for an interactive turn (the user sees a thinking indicator during this time).
Thinking indicator lifecycle. On the fast path, the thinking indicator is managed by two mutations:
- Write indicator: called before scheduling the agent action. Writes a temporary message with
type: "thinking"to the channel. - Clear indicator: called by the agent action on completion (success or failure). Replaces the thinking message with the agent's response, or removes it and writes a system error message.
A safety net cron runs every 60 seconds and clears any thinking indicators older than 120 seconds. This handles the case where the agent action crashes hard enough that the clear mutation never fires.
12.8 Monitoring and alerts¶
Slot usage dashboard. The deployment should track and expose:
- Current workflow slot usage (total and per workload category).
- Peak slot usage over rolling 1-hour and 24-hour windows.
- Slot usage by account (to identify accounts that may need Tier 3 migration).
Alert thresholds:
| Metric | Warning | Critical |
|---|---|---|
| Total slot usage | > 60 sustained for 5 minutes | > 80 sustained for 2 minutes |
| Single workload exceeding its allocated budget | > 80% of allocation for 10 minutes | > 100% of allocation |
| Tier 1 action failure rate | > 5% over 5 minutes | > 15% over 2 minutes |
| Thinking indicator timeout rate | > 2% over 15 minutes | > 10% over 5 minutes |
Logging. All tier routing decisions should be logged with the channel type, agent ID, account ID, and selected tier. This provides the data needed to make informed Tier 2 to Tier 3 migration decisions.
12.9 Growth path¶
Phase 1: current (Professional plan, 100 slots). Implement the three-tier model as described. All channels start at their default tier. Monitor slot usage closely. Expected comfortable capacity: dozens of concurrent interactive users with moderate background processing.
Phase 2: scaling within Convex. If slot pressure grows, the first lever is optimising slot hold time. Shorter workflows (fewer steps, faster steps) free slots sooner. The second lever is moving workloads from Tier 2 to Tier 3 (external queue) based on the migration triggers defined above.
Phase 3: Enterprise plan or self-hosted. Convex's Enterprise plan provides higher concurrency limits. Alternatively, Convex self-hosted on Hetzner removes the slot ceiling entirely; the limit becomes the server's available resources. This is the long-term path for large deployments.
Phase 4: hybrid architecture. For enterprise customers with extreme volume, a hybrid model: Convex handles all interactive work and real-time state, while a dedicated job processing service (Go or Node.js on Hetzner) handles high-volume background work via Tier 3. Convex remains the single source of truth for all data; the external workers are stateless processors that read from and write to Convex.
12.10 Decision record¶
| Decision | Rationale | Alternatives considered |
|---|---|---|
| Interactive agent turns bypass Workflow | Preserves slot budget for work that needs durability; interactive turns have a simple failure mode (retry) | Keeping all turns in Workflow (simpler code, but burns slots unnecessarily) |
| Action Retrier for LLM calls on fast path | Handles transient LLM failures without Workflow overhead | No retry (too fragile), custom retry logic (reinvents the component) |
| Notifications always separate from originating work | Prevents fan-out from holding workflow slots; decouples failure domains | Inline notification at end of workflow (simpler but couples failure modes) |
| Email processing starts at Tier 2 | Multi-step pipeline benefits from journalling; volume is low initially | Tier 1 (too fragile for multi-step email), Tier 3 (premature complexity) |
| 40% headroom target | Provides buffer for traffic spikes without requiring immediate scaling action | Lower headroom (risky), higher headroom (wastes available capacity) |
| External queue as volume-triggered migration | Avoids premature complexity while providing a clear growth path | External queue from day one (too much infra for current scale), no external queue (hits ceiling eventually) |
12.11 Slot budget summary¶
| Category | Tier | Slots | Notes |
|---|---|---|---|
| Interactive agent turns (web, Telegram, etc.) | 1 | 0 | No workflow slots consumed |
| Presence / typing indicators | 1 | 0 | Handled by Presence component |
| Notifications (standard) | 1 | 0 | Scheduled actions with Action Retrier |
| Notifications (high fan-out) | 2 | 5 | Workpool for 50+ recipient notifications |
| Agent delegation chains | 2 | 15 | Max 5 concurrent times depth 3 |
| Inbound email processing | 2 | 10 | Multi-step pipeline |
| Outbound email delivery | 2 | 5 | Separate from notifications |
| Document/knowledge indexing | 2 | 10 | Workpool with parallelism cap |
| Complex agent turns (promoted) | 2 | 10 | Overflow from Tier 1 |
| Total allocated | 55 | ||
| Unallocated headroom | 45 | 45% of ceiling |
13. Communication patterns¶
13.1 Inbound message flow¶
External Channel (web app, Telegram, Email, API)
|
v
Convex HTTP action or Convex mutation (client direct)
|- Validate webhook signature (for external channels)
|- Resolve or create user identity (Clerk + channel_identity)
|- Edge rate limit check (Rate Limiter component)
|
v
Message mutation
|- Write message to channel
|- Write thinking indicator (if agent is addressed)
|- Select execution tier (agentRouter)
|
v
Tier 1 scheduled action OR Tier 2 workflow trigger
13.2 Agent execution flow (within Convex)¶
Agent execution (Tier 1 action or Tier 2 workflow)
|- Context assembly (recent messages + RAG search across namespaces)
|- Policy evaluation (middleware on every function; budget pre-check from counters)
|- Think step (LLM call via provider action, streamed via Persistent Text Streaming)
|- For each tool call:
| |- Tool trust level + account policy check (in-process)
| |- If allowed: execute tool (Convex internal, external HTTP, or agent delegation)
| |- If require-approval: wait for approval mutation (Tier 2 only)
|- Observe step (LLM synthesises tool results)
|- Respond step (persist message, stream to clients)
|- Audit (trigger writes to audit_event)
|- Usage increment (Sharded Counter)
|- Extract step (Workpool enqueues knowledge extraction, Tier 2 only)
13.3 Outbound delivery flow¶
Web/mobile clients. Direct Convex WebSocket subscription. No bridge involvement. Client subscribes to channel messages; streaming and stored messages arrive via the same subscription.
External channels (Telegram, email). Convex invokes the channel's outbound bridge (an action) with the response payload and channel routing information. The bridge translates to the channel-specific protocol and calls the external API (Telegram Bot API, Postmark). Delivery retries are handled by the Action Retrier.
13.4 Cost reporting flow¶
Workflow step or action completes
|
|- Update Convex Sharded Counter (real-time usage)
|- Update Aggregate component (dashboard metrics)
|- Write audit event (audit_event table)
There is no separate cost service; counters live in Convex. Budget limits are policies that read counters. Fivetran CDC streams the audit_event table and any usage projections to an external warehouse for billing reconciliation and analytics.
13.5 Clerk webhook sync flow¶
Clerk event (org.created, membership.created, user.updated, etc.)
|
v
Svix webhook delivery (signed HTTP POST)
|
v
Convex HTTP action
|- Verify signature
|- Deduplicate by event ID
|- Mutate Convex tables (account_record, user_profile, channel_member)
A weekly cron reconciles Clerk organisation membership with Convex state to catch drift.
13.6 Data export flow¶
Convex mutation
|
|- Fivetran CDC (continuous, near-real-time): audit_event, usage projections -> Postgres / warehouse
|- Convex log streams (continuous): function execution events -> Axiom
|- Convex triggers (on relevant events): fire internal actions that POST to external webhooks via Webhook Sender
For data that needs to reach custom systems (billing, CRM sync, alerting), Convex triggers fire on relevant mutations and call internal actions that POST to external webhooks. The Webhook Sender community component handles retry, signing, and delivery confirmation.
14. Convex component map¶
The following Convex components replace custom-built infrastructure:
| Component | Replaces | Purpose in Thinklio |
|---|---|---|
| Agent | Custom Agent Service + Context Service | Thread management, LLM orchestration, tool execution |
| Workflow | Custom HarnessExecutor | Durable step-by-step agent execution with journalling |
| RAG | pgvector + custom retrieval | Namespaced semantic search across four knowledge layers |
| Workpool | Custom Queue Service + worker pools | Document processing parallelism, background job management |
| Rate Limiter | Redis rate limiting | API throttling, per-account and per-user limits |
| Persistent Text Streaming | Custom WebSocket streaming | Real-time LLM output in chat |
| Crons | Custom scheduled tasks | Recurring agent work, usage reporting, cleanup |
| Aggregate | Custom analytics queries | Real-time usage totals, dashboard metrics |
| Sharded Counter | Redis counters | High-throughput token/interaction metering |
| Migrations | Custom migration scripts | Schema evolution on live data |
| Action Retrier | Custom retry logic | Resilient external API calls (LLM, R2, webhooks) |
| Presence | (new capability) | Typing indicators, online status |
Community components:
| Component | Purpose in Thinklio |
|---|---|
| Audit Log | Structured audit trail with querying |
| Webhook Sender | Outbound webhook delivery with signing and retries |
| LLM Cache | Cache LLM responses for repeated queries (cost optimisation) |
| Expo Push Notifications | Mobile push notifications for Flutter app |
Full component coverage, version pins, and integration patterns live in doc 11 Convex Reference.
15. Observability and data export¶
15.1 Real-time observability¶
Convex Log Streams (Pro plan) push function execution events to Axiom or Datadog. This covers:
- Function execution latency and error rates
- Console output from all functions
- Concurrency and scheduler statistics
- Storage usage
15.2 Audit and compliance export¶
Fivetran CDC streams the audit_event table (and any other tables needed for compliance) from Convex to an external destination:
- Postgres for compliance teams that need SQL access to audit data
- Snowflake/BigQuery for business intelligence and usage analytics
This is one-way, continuous, and near-real-time. It satisfies the requirement for an external audit trail without burdening the Convex instance with analytical queries.
15.3 Custom webhook export¶
For data that needs to reach custom systems (billing, CRM sync, alerting), Convex triggers fire on relevant mutations and call internal actions that POST to external webhooks. The Webhook Sender community component handles retry, signing, and delivery confirmation.
15.4 Structured logging¶
Convex functions emit structured JSON logs with consistent fields: functionName, traceId, accountId, userId, agentId, level, message, timestamp. For delegations: delegationDepth, parentInteractionId. For jobs: jobId, jobState. Sensitive data (user content, credentials, raw tool payloads) is redacted at the logging helper level.
15.5 Metrics¶
Convex exposes function-level metrics through its dashboard and log streams:
- Request rates, latencies, error rates per function
- Queue depths and processing rates (Workpool)
- Workflow slot utilisation
- Cache hit rates (reactive query cache)
- Active subscription count
- Budget utilisation (from Sharded Counter via Aggregate)
- Active delegation depth distribution
15.6 Health checks¶
Convex itself manages platform health. Application-level health endpoints are exposed as HTTP actions that probe the critical paths: can we read an account record, can we write a small audit event, can we enqueue a Workpool task, can we reach the LLM provider. These are used by external uptime monitoring (e.g., Better Uptime) and by the admin dashboard.
16. Deployment topology¶
16.1 Primary deployment (cloud)¶
┌──────────────────────────────────────────────────────────┐
│ Managed Services │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Clerk │ │ Convex │ │ Cloudflare │ │
│ │ (Auth) │ │ (Ireland) │ │ R2 (EU) │ │
│ └────┬─────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
└────────┼────────────────┼───────────────────┼─────────────┘
│ │ │
┌────▼────────────────▼───────────────────▼────┐
│ Client Applications │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Next.js │ │ Flutter │ │
│ │ Web App │ │ Mobile App │ │
│ └─────────────┘ └──────────────┘ │
└───────────────────────────────────────────────┘
┌───────────────────────────────────────────────┐
│ Data Export (Async) │
│ │
│ Convex ──CDC──▶ Fivetran ──▶ Postgres/DW │
│ Convex ──log streams──▶ Axiom │
└───────────────────────────────────────────────┘
Single Convex project. Single Clerk instance. No servers to manage.
16.2 Enterprise self-hosted option¶
For customers requiring data isolation:
┌──────────────────────────────────────────────────────────┐
│ Customer Hetzner Cluster (EU) │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ CPX32/CPX62 │ │ CPX32 │ │
│ │ Convex │ │ Postgres │ │
│ │ (self-hosted)│──▶│ (backing │ │
│ │ │ │ store) │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ CPX32 │ │ Cloudflare │ │
│ │ Workload 2 │ │ R2 (EU) │ │
│ │ (optional) │ │ │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ Auth: Clerk Cloud (or customer IdP via OIDC) │
│ Cost: ~50 to 150 euros/month for compute │
└──────────────────────────────────────────────────────────┘
Self-hosted Convex instances use Postgres as the backing store. If workload partitioning is needed, a second Convex instance handles document processing. Authentication remains via Clerk Cloud for most cases; customers with their own IdP connect via OIDC through Clerk or directly via Convex's custom OIDC support.
16.3 Release and environment topology¶
Thinklio runs three Convex environments in normal operation:
- Development (
dev): each engineer runs their own Convex dev deployment vianpx convex dev. Clerk uses a development instance. - Preview (
preview): a shared Convex deployment used for pull request preview environments, seeded with anonymised production-like data. - Production (
prod): the live Convex Cloud deployment. Deploys are triggered by CI on successful merge tomainwith schema migrations gated by the Migrations component.
Clerk production is mirrored in preview with the same JWT template configuration. R2 production and preview use separate buckets with identical naming conventions to avoid cross-environment leakage.
17. Schema summary¶
Complete table listing for the single Convex project. Detailed schema definitions, relationships, indexes, and invariants live in doc 04 Data Model; the table below is a summary for architectural orientation.
| Table | Purpose | Key indexes |
|---|---|---|
channel |
Conversation spaces | by_account, by_account_type, by_team |
channel_member |
User and agent participation | by_channel, by_member |
message |
All messages (text, files, actions, thinking) | by_channel, by_channel_thread, search_content |
agent |
Agent definitions and configuration | by_account, by_account_status |
tool |
Tool definitions (internal, external, agent-type) | by_account, by_name |
agent_tool |
Agent-to-tool assignments | by_agent, by_tool |
agent_assignment |
Agent-to-scope assignments (user, team, account) | by_agent, by_scope |
knowledge_item |
Knowledge across four layers with embeddings | by_scope, by_account, by_embedding (vector), search_content |
file |
R2 file metadata and ingestion status | by_account, by_ingestion |
team |
Teams within accounts | by_account |
team_member |
Team membership | by_team, by_user |
account_policy |
Governance rules per account | by_account, by_account_type |
audit_event |
Immutable action log | by_account, by_actor, by_event |
channel_identity |
External channel identity linking | by_user, by_channel_type_external_id |
interaction |
Agent execution records | by_channel, by_agent |
step |
Execution step journal | by_interaction |
user_profile |
Synced from Clerk via webhook | by_clerk_id |
account_record |
Account-level settings and defaults | by_clerk_org_id |
18. Implementation plan (summary)¶
Doc 13 Implementation Plan & Status holds the living status, milestone tracking, and current sprint focus. This section summarises the phases for architectural orientation.
Phase 1: foundation¶
Set up Convex project, define schema, configure Clerk integration, implement auth middleware and tenant isolation. Build the messaging system (channels, members, messages) with reactive queries. Connect the Next.js web app to Convex via ConvexProviderWithClerk. Deploy a working multi-tenant chat system with no agents. In parallel, begin the Flutter proof-of-concept: validate Clerk Dart SDK plus convex_flutter integration, establish whether the three-way integration is production-viable.
Exit criteria: Users can sign up via Clerk, create or join an organisation, create channels, send messages, and see them in real time on web. Flutter PoC confirms or rejects the mobile stack.
Phase 2: agents¶
Implement the agent model, tool framework, and execution workflow using the Convex Agent component for LLM orchestration and Workflow component for durable execution. Add agent participants to channels. Build the LLM integration via Convex actions (OpenRouter/Anthropic). Implement Persistent Text Streaming for real-time agent responses. Deploy a system where users can chat with agents in channels.
Exit criteria: Users can add agents to channels, mention them, receive streamed responses, and agents can execute tools.
Phase 3: knowledge¶
Implement the four-layer knowledge model with RAG component. Build the document ingestion pipeline with Workpool (upload to R2 -> parse -> chunk -> embed via OpenAI -> store as knowledge items). Connect knowledge retrieval to agent execution context assembly. Deploy agents that draw on account, team, and user knowledge.
Exit criteria: Users can upload documents, knowledge is ingested and searchable, agents retrieve relevant knowledge during conversations.
Phase 4: governance, channels, and mobile¶
Implement account policies, cost controls (Sharded Counter plus budget limits), and audit trail (triggers plus audit_event table). Build the admin interface for policy management. Connect Fivetran CDC for audit export to Postgres/warehouse. Add external channel bridges (Telegram first, as Convex HTTP actions). Ship the Flutter mobile app (or native apps per Phase 1 findings). Begin parallel web and mobile development cadence.
Exit criteria: Account admins can set policies, usage is metered, audit trail exports to external store, Telegram bridge operational, mobile app in production.
Phase 5: enterprise and scale¶
Validate self-hosted Convex deployment on Hetzner (CPX32 for testing, CPX62 for production). Document the enterprise installation process. If external channel bridge throughput requires it, build a Go bidirectional bridge service for Telegram/WhatsApp queuing. Evaluate whether any other Go services are needed for specific integrations. Build the workload partitioning option (separate Convex instance for document processing) for high-throughput enterprise customers.
Exit criteria: Self-hosted deployment validated and documented, enterprise customers can run isolated instances, scale constraints identified and mitigated.
19. Resolved design questions¶
These questions were raised during architecture design and resolved through analysis and discussion. They are preserved here because the reasoning still informs current decisions.
19.1 Convex Agent component vs custom orchestration¶
Resolution: use the Agent component for LLM orchestration inside workflow steps; the channel and message tables are the user-facing messaging layer.
The Agent component handles the LLM round-trip: prompt assembly, tool call loop, and streaming. The messaging system handles everything else: channel membership, threading, search, permissions. The Agent component's internal thread state is an implementation detail of execution, not the conversation record of truth. If the Agent component's model evolves to be a closer fit, we can lean into it more. A fully custom orchestration layer remains an option if needed.
19.2 External channel bridges at scale¶
Resolution: start with Convex HTTP actions as bridges. Migrate to a Go bidirectional bridge service later if throughput demands it.
512 concurrent actions on the Pro plan is sufficient for early and moderate scale. Bridge HTTP actions should be thin (validate webhook, write inbound message as mutation, return 200) with agent execution triggered asynchronously via a scheduled workflow. This frees the bridge action's concurrency slot quickly. If external channel volume outgrows Convex action concurrency, a Go bridge service can sit in front: it receives webhooks, queues them, and feeds messages into Convex at a controlled rate. This is a scale optimisation, not a design change.
19.3 Embedding model hosting¶
Resolution: start with OpenAI embeddings API (text-embedding-3-small, 1536 dimensions). Add local embedding as an enterprise option in Phase 5.
External API calls are the simplest path: low latency (roughly 50 to 200 ms), minimal cost ($0.02/M tokens), well-understood quality. The embedding endpoint is behind an abstraction in the ingestion pipeline, so swapping to a local model (e.g., all-MiniLM-L6-v2 via a Python sidecar) for airgapped enterprise deployments is a configuration change, not an architecture change. Starting at 1536 dimensions keeps the quality ceiling high; we can re-embed at lower dimensions later if vector storage cost becomes a factor.
19.4 Convex document size limit (1MB)¶
Resolution: non-issue with sensible chunk sizing.
RAG best practice puts chunks at 500 to 2,000 tokens (roughly 2,000 to 8,000 characters). At that size, each knowledge item document including its embedding (roughly 12KB for 1536 floats) is well under 100KB. Raw documents live in R2; only parsed chunks live in Convex. Message content gets an application-layer length limit (50,000 characters); anything longer should be uploaded as a file attachment.
19.5 Clerk webhook reliability¶
Resolution: rely on Svix (Clerk's webhook infrastructure) for delivery with retries. Add a weekly reconciliation cron as a safety net.
Svix provides automatic retries with exponential backoff, webhook signature verification, and a delivery log. The Convex HTTP action receiving webhooks processes events idempotently using the event ID to deduplicate. A Convex cron job periodically calls the Clerk API to list current organisation memberships and syncs any discrepancies. This handles the edge case of a prolonged outage exhausting Svix's retry window. Real-world usage will inform whether the reconciliation frequency needs adjustment.
19.6 Mobile client strategy¶
Resolution: build the React/Next.js web app first with full Convex plus Clerk integration. Begin Flutter validation immediately in Phase 1 to test the Clerk Dart SDK plus convex_flutter three-way integration. Maintain parallel web and Flutter development from there.
The Flutter proof-of-concept should establish: Clerk auth working in Dart, Convex reactive subscriptions working via convex_flutter, and the auth token flowing through correctly. If the integration proves solid, Flutter becomes the primary mobile platform. If it proves unreliable, Plan B is native apps in Swift (iOS) and Kotlin (Android), potentially using Kotlin Multiplatform (KMP) for shared business logic. Clerk ships native Swift and Kotlin Convex integrations (as of February 2026), so the native path is well-supported.
20. Legacy architecture (archival)¶
This section preserves the Go-service architecture (old doc 04) that preceded the Convex-first rebuild. It is retained for reference because migration scripts, historical commits, and the Go cmd binaries still exist in the repository, and because some conceptual work from the legacy design (event envelopes, event types, service responsibilities, harness semantics, deployment topology) informs the current system. The legacy stack is being retired. Do not treat this section as current design guidance for new work. For new work, refer to sections 3 through 19 of this document.
20.1 Legacy design principles¶
The legacy architecture was an event-sourced distributed Go service system. Its original design principles carried forward into the Convex-first era where still applicable.
Event-sourced core. Every interaction, decision, and state change in Thinklio was captured as an immutable event. The system's state at any point in time could be reconstructed from its event history. This provided complete audit trails, event replay for recovery and testing, decoupled services that communicated through events, and natural support for asynchronous distributed processing. The audit trail principle carries forward via the audit_event table and triggers; the cross-service event bus was retired when services collapsed into Convex.
Durable execution. Agent operations ran inside a durable execution harness that tracked each reasoning step independently. Steps were persisted before execution and results recorded on completion. If a process was interrupted, it resumed from the last completed step. The Convex Workflow component is the current embodiment of this principle.
Channel agnosticism, multi-tenancy, unified capability model. These carry forward unchanged; see section 3.
20.2 Legacy service map¶
┌─────────────────────────────────────────────────────────┐
│ External Channels │
│ Telegram · Email · Web Chat · Voice · API │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ Gateway Service │
│ Channel adapters · API surface routing · Auth · │
│ Rate limiting · Webhook delivery │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ Event Bus │
│ Event routing · Stream persistence · Pub/sub │
│ (Redis Streams) │
└──┬──────────┬──────────┬──────────┬──────────┬──────────┘
│ │ │ │ │
┌──▼───┐ ┌──▼───┐ ┌──▼───┐ ┌──▼───┐ ┌──▼───┐
│Agent │ │Context│ │ Tool │ │Queue │ │Usage │
│Svc │ │Svc │ │ Svc │ │Svc │ │Svc │
└──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘
│ │ │ │ │
┌──▼─────────▼─────────▼─────────▼─────────▼──────────────┐
│ Data Layer │
│ Supabase Cloud · Redis · Cloudflare R2 │
└─────────────────────────────────────────────────────────┘
20.3 Legacy Gateway Service¶
Purpose: channel abstraction, authentication, rate limiting, API surface routing.
Responsibilities:
- Accept inbound messages from all connected channels
- Route requests across the three external API surfaces (Channel, Platform, Integration)
- Authenticate requests, normalise channel-specific payloads to universal events
- Publish events to the event bus
- Subscribe to response events and deliver outbound messages via the appropriate channel
- Enforce rate limits per user, per channel, and per API key
- Handle webhook delivery for Platform and Integration API subscribers
Design:
Gateway
├── ChannelRegistry # Registered channel adapters
│ ├── TelegramAdapter
│ ├── APIAdapter # Channel API surface
│ └── WebChatAdapter
├── APISurfaceRouter # Routes to Channel, Platform, or Integration API handlers
├── Authenticator # Channel-specific → platform identity
│ # API key → account/service account identity
├── RateLimiter # Token bucket per user/channel/API key
├── WebhookDelivery # Outbound webhook dispatch
├── EventPublisher # Publishes to event bus
├── ResponseSubscriber # Listens for outbound events
└── HealthChecker
Channel adapter interface:
type ChannelAdapter interface {
Name() string
Capabilities() []string
HandleInbound(payload []byte) (*UniversalEvent, error)
SendOutbound(event *UniversalEvent) error
Start(ctx context.Context) error
Stop(ctx context.Context) error
}
Scaling: stateless. Multiple instances behind a load balancer. Rate limit state in Redis.
20.4 Legacy Agent Service¶
Purpose: AI reasoning, LLM interaction, durable execution orchestration, job notification handling.
Responsibilities:
- Receive message events from the event bus
- Receive job state change events and create follow-up interactions
- Create interaction and step records (harness initialisation)
- Request context assembly from the context service
- Execute the think, act, observe loop with step-level durability
- Make LLM calls with assembled context
- Decide on tool use and delegate to the tool service (including agent-as-tool delegations)
- Dispatch deferred work and create job records
- Handle interaction resumption after failures
Design:
Agent Service
├── InteractionManager # Creates and manages interaction lifecycle
├── HarnessExecutor # Runs the step state machine
│ ├── ContextStep # Requests context assembly
│ ├── ThinkStep # LLM call for reasoning
│ ├── ActStep # Tool execution delegation
│ │ ├── ImmediateMode # Synchronous execution
│ │ ├── DeferredMode # Job dispatch; step succeeds on dispatch
│ │ └── InteractiveMode # Result feeds back to think step
│ ├── ObserveStep # LLM call for synthesis
│ ├── RespondStep # Publishes response event
│ └── ExtractStep # Queues knowledge extraction
├── JobNotificationHandler # Receives job.state_changed events, creates follow-up interactions
├── LLMClient # Multi-provider LLM abstraction (OpenRouter / Anthropic)
├── StepStore # Persists step state to PostgreSQL
├── JobStore # Redis-backed active job storage
└── CostTracker # Records per-step costs
LLM provider abstraction:
type LLMProvider interface {
Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
EstimateCost(req CompletionRequest) CostEstimate
Name() string
Available() bool
}
Scaling: CPU/memory intensive due to LLM calls and context processing.
20.5 Legacy Context Service¶
Purpose: knowledge retrieval and context assembly across all four layers.
Responsibilities:
- Assemble multi-layered context for a given agent, user, team, and account
- Retrieve and rank knowledge facts using keyword and vector similarity
- Apply token budgeting to fit context within model limits
- Cache assembled contexts for active sessions
- Handle conversation history retrieval and summarisation references
- Include job context (context_bundle and subjob results) for follow-up interactions
Design:
Context Service
├── ContextAssembler # Orchestrates multi-layer assembly
│ ├── AgentKnowledgeLoader
│ ├── AccountKnowledgeLoader
│ ├── TeamKnowledgeLoader
│ ├── UserKnowledgeLoader
│ └── JobContextLoader # Injects job results and context_bundle
├── TokenBudgeter # Allocates token budget across layers
├── VectorSearcher # Semantic similarity retrieval (pgvector)
├── HistoryRetriever # Conversation history with summarisation
├── ContextCache # Redis-backed cache for active sessions
└── API # HTTP endpoints for context requests
Token budget allocation order matched the current design (system prompt, account policies, job context, history, team, user).
Scaling: memory intensive. Scale based on context complexity and concurrent sessions.
20.6 Legacy Tool Service¶
Purpose: tool registry, policy enforcement, tool execution, delegation management.
Responsibilities:
- Maintain the tool registry (static and dynamically registered tools)
- Evaluate tool execution requests against the policy engine, including per-assignment restrictions
- Execute tools: internal, external API calls, or agent-as-tool delegations
- For agent-type tools: validate delegation chain (depth and cycles), create child interaction
- Handle approval workflows for high-risk tools
- Circuit-break unhealthy external tools
Design:
Tool Service
├── ToolRegistry # Tool definitions and metadata (static + dynamic)
├── PolicyEngine # Trust-based access evaluation
│ ├── TrustLevelRule
│ ├── BudgetRule
│ ├── RateLimitRule
│ ├── ApprovalRule
│ ├── DelegationDepthRule # Checks max_delegation_depth
│ └── DelegationCycleRule # Checks delegation chain for cycles
├── AssignmentRestrictions # Evaluates per-assignment tool narrowing
├── ToolExecutor
│ ├── InternalExecutor # Platform-provided tools
│ ├── ExternalExecutor # Webhook/API-based tools
│ └── AgentExecutor # Creates child interaction for delegate agent
├── ApprovalManager # Manages approval workflows
├── ResultCache # Caches tool results by input hash
├── HealthMonitor # Monitors external tool availability
├── CircuitBreaker
└── AuditLogger
Policy decision flow:
Request -> TrustLevelRule -> AssignmentRestrictions -> BudgetRule -> RateLimitRule -> DelegationRules -> ApprovalRule -> Execute
| | | | | |
deny deny deny deny deny require_approval
Scaling: I/O intensive. Scale based on tool execution volume.
20.7 Legacy Queue Service¶
Purpose: asynchronous task processing, scheduled jobs, deferred work.
Responsibilities:
- Manage worker pools per task type with configurable concurrency
- Execute tasks with retry logic and exponential backoff
- Handle scheduled tasks (digests, check-ins, periodic summaries)
- Manage dead letter queue for permanently failed tasks
- Run the job timeout monitor (scans for expired jobs)
Design:
Queue Service
├── TaskRouter # Routes tasks to appropriate workers
├── WorkerPoolManager
│ ├── KnowledgeExtractionWorker
│ ├── SummarisationWorker
│ ├── DigestWorker
│ ├── ScheduledTaskWorker
│ └── NotificationWorker
├── Scheduler # Cron-style scheduled task management
├── JobTimeoutMonitor # Scans Redis for expired jobs, transitions to timed_out
├── RetryManager # Exponential backoff, max attempts
├── DeadLetterQueue
└── MetricsCollector
Scaling: task-volume based. Worker pools scale independently per task type.
20.8 Legacy Usage Service¶
Purpose: cost metering, budget enforcement, usage reporting.
Responsibilities:
- Receive cost events from all services
- Aggregate costs per step, interaction, user, team, account, and agent
- Aggregate costs across delegation chains
- Enforce budget limits with pre-execution checks
- Generate usage reports and billing data
- Emit budget warning and exceeded events
Design:
Usage Service
├── CostAggregator # Step-level aggregation; delegation chain rollup
├── BudgetEnforcer
│ ├── AccountBudget
│ ├── TeamBudget
│ └── UserBudget
├── UsageReporter # Generates usage reports
├── AlertEmitter # Budget warning/exceeded events
└── BillingExporter # Data for billing system integration
Scaling: write-heavy. Scale based on interaction volume.
20.9 Legacy data layer¶
Supabase Cloud. Managed PostgreSQL with pgvector, row-level security, Vault, and Auth. The durable store and system of record. Stored events, knowledge facts, step execution records, agent configurations, user profiles, team structures, terminal jobs, and all persistent state. Row-level security enforced tenant isolation at the database level. Supabase Auth provided authentication. Supabase Vault managed credentials for external service integrations, API keys, and OAuth tokens.
Redis. Caching, event streaming, and active job storage. Multi-tier caching (session state, knowledge facts, context, tool results) with configurable TTLs and invalidation strategies. Redis Streams provided the event bus for inter-service communication. Also handled rate limiting state, ephemeral session data, and the operational store for active jobs (flushed to PostgreSQL when terminal).
Cloudflare R2. S3-compatible object storage with multi-bucket architecture. Three bucket tiers: platform-shared (default), enterprise-dedicated (isolated per account), and account-supplied (BYOB, accounts provide their own R2 or S3 credentials). Stored file attachments, document uploads, and media associated with conversations. R2 carries forward into the current architecture unchanged.
What was not used from Supabase. To avoid vendor lock-in and maintain architectural clarity, the Go stack deliberately did not use Supabase Realtime (Redis Streams handled the event bus), Supabase Storage (R2 was used instead), Supabase Edge Functions (services were Go on Hetzner), PostgREST API (Go services used direct PostgreSQL connections via pgx), or Supabase client libraries for the backend (backend used pgx directly; frontend apps used Supabase client for auth only).
20.10 Legacy execution model¶
The durable execution harness. When a message arrived for an agent, the harness created an interaction, a tracked unit of work consisting of one or more steps. Each step had an independent lifecycle:
Step states: created -> running -> success
-> failed (reason in metadata)
Interaction state: derived from constituent steps
The harness persisted each step's state before and after execution. Failure reasons (timeout, error, governance denial, budget exceeded, user cancellation, system interruption, delegation_depth_exceeded, delegation_cycle_detected) were captured as metadata.
This model provided resumability (pick up from the last completed step after a crash), auditability (complete record of every decision and outcome), cost attribution (each step's cost independently tracked), and retry granularity (retry a single failed step, not the entire interaction).
Step execution modes. Act steps (tool calls) supported three execution modes:
- Immediate. The default. The tool call executed synchronously within the interaction.
- Deferred. Dispatched work to an external engine and completed immediately with a job reference. Results arrived via a follow-up interaction when work was complete.
- Interactive. The step result fed back into a new reasoning cycle, enabling multi-turn sequential decision-making within a single interaction.
A single interaction could mix modes.
The job system. For work that outlived a single interaction, a job represented a unit of deferred work, containing one or more subjobs. Jobs progressed: pending -> dispatched -> in_progress -> resolved / failed / cancelled / timed_out. Multiple observers could register interest in a job and receive notifications on state changes. Partial output notifications alerted observers when useful intermediate results were available. The creating agent received results via a follow-up interaction in the same conversational session. Jobs inherited the governance context of the interaction that created them.
In the current architecture, the job concept is carried by Convex Workflows (for multi-step durable work) and Workpool (for parallel background jobs) with reactive queries replacing observer notifications. See doc 03 Agent Architecture for the current delegation and long-running-work model.
20.11 Legacy communication patterns¶
Event bus (Redis Streams). Stream naming convention: events:{event_type}:{agent_id}. Consumer groups ensured at-least-once delivery, load distribution across service instances, independent consumption, and acknowledgement tracking.
Event envelope:
type Event struct {
ID string `json:"id"`
Type string `json:"type"`
Source string `json:"source"`
AgentID string `json:"agent_id"`
UserID string `json:"user_id,omitempty"`
TeamID string `json:"team_id,omitempty"`
AccountID string `json:"account_id,omitempty"`
SessionID string `json:"session_id,omitempty"`
ParentID string `json:"parent_id,omitempty"`
Payload json.RawMessage `json:"payload"`
Metadata EventMetadata `json:"metadata"`
CreatedAt time.Time `json:"created_at"`
}
type EventMetadata struct {
TraceID string `json:"trace_id"`
Version string `json:"version"`
Priority int `json:"priority"`
}
Core event types:
message.received # Inbound message from channel
message.response # Outbound response to channel
interaction.started # Harness initiated
interaction.completed # Harness finished
step.created # Step registered
step.running # Step execution started
step.completed # Step finished
tool.requested # Tool execution requested
tool.completed # Tool execution finished
knowledge.extracted # New facts extracted
budget.warning # Usage threshold reached
budget.exceeded # Budget limit hit
agent.paused # Kill switch activated
agent.resumed # Agent reactivated
job.created # Job record created
job.dispatched # Work sent to execution engine
job.state_changed # State transition (observer notification)
job.resolved # Terminal: all subjobs done, some succeeded
job.failed # Terminal: all subjobs done, none succeeded
job.cancelled # Terminal: explicitly cancelled
job.timed_out # Terminal: deadline exceeded
note.shared # Note visibility extended to a user or team
comm.dispatched # Outbound communication dispatched
mention.created # @ mention in a note or message
Most of these event types are preserved semantically in the current architecture, either as mutations on the corresponding table (e.g. message inserts fire a trigger that writes an audit event) or as workflow step transitions. The Redis Streams stream naming and consumer-group coordination is retired; Convex reactivity replaces the pub/sub pattern.
Service-to-service communication (legacy):
- Agent -> Context Service: synchronous HTTP. Agent needed context before it could think.
- Agent -> Tool Service: synchronous HTTP. Agent needed tool result before it could observe.
- Agent -> Job Store (Redis): synchronous. Job creation and state queries.
- Agent -> Queue Service: asynchronous event. Extraction could happen later.
- Agent -> Usage Service: asynchronous event. Cost recording was fire-and-forget.
- Gateway -> Event Bus: asynchronous. Inbound messages published.
- Event Bus -> Gateway: asynchronous. Responses delivered.
- Gateway -> External Webhooks: asynchronous. Webhook delivery for Platform and Integration APIs.
20.12 Legacy monorepo structure¶
thinklio/
├── cmd/ # Service entry points
│ ├── gateway/
│ ├── agent/
│ ├── context/
│ ├── tools/
│ ├── queue/
│ ├── usage/
│ └── server/ # Single-binary entrypoint (legacy deployment mode)
├── internal/ # Internal Go packages (29 packages across four groups)
│ ├── # Core
│ ├── auth/ # Authentication, identity, JWT validation
│ ├── config/ # Configuration management
│ ├── database/ # Supabase/pgx connection, query helpers
│ ├── health/ # Health check endpoints
│ ├── tenant/ # Tenant context, RLS assertion helpers
│ ├── # Execution
│ ├── harness/ # Durable execution harness
│ ├── event/ # Event types, bus client, serialisation
│ ├── jobs/ # Job system, subjobs, observers
│ ├── planning/ # Predictive planning, Bayesian scoring
│ ├── # Features
│ ├── api/ # API handler registration
│ ├── admin/ # Admin API handlers
│ ├── channel/ # Channel identity and linking
│ ├── comms/ # Communication dispatch
│ ├── documents/ # Document ingestion and processing
│ ├── email/ # Email channel (Postmark)
│ ├── feedback/ # Execution outcome collection
│ ├── knowledge/ # Knowledge layer abstractions
│ ├── llm/ # Multi-provider LLM abstraction
│ ├── notification/ # User and agent notifications
│ ├── oauth/ # OAuth flows
│ ├── platform/ # Platform services and config
│ ├── storage/ # Cloudflare R2 object storage
│ ├── telegram/ # Telegram adapter
│ ├── templates/ # Response and message templates
│ ├── tools/ # Tool registry and execution
│ ├── usage/ # Cost metering and budget enforcement
│ └── webhooks/ # Webhook delivery
├── pkg/ # Public Go packages
│ ├── integrations/ # External service integrations
│ │ ├── calendar/ # Google Calendar
│ │ ├── crm/ # HubSpot
│ │ ├── gmail/ # Gmail
│ │ ├── tasks/ # Todoist
│ │ ├── tavily/ # Web search
│ │ └── web/ # Web scraper
│ └── api/ # Shared API type definitions
├── apps/ # Client applications
│ ├── web/ # Next.js 15 / React 19 web app (app.thinklio.ai)
│ └── mobile/ # Flutter mobile app
├── migrations/ # PostgreSQL migration SQL files (legacy)
├── deploy/ # Docker, Compose, deployment configs
│ ├── docker/
│ ├── coolify/
│ └── scripts/
├── docs/ # This document set
└── tests/ # Integration and E2E tests
├── integration/
└── e2e/
20.13 Legacy deployment¶
Single-binary deployment. During early development, all logical services ran within a single Go binary (cmd/server):
Hetzner VPS (Ubuntu 24.04, managed via Coolify)
├── Docker Compose
│ ├── thinklio (single Go binary, all services; port 8080)
│ ├── redis (Redis 7, port 6379)
│ └── monitoring (Prometheus + Grafana, planned)
├── Nginx (reverse proxy, SSL termination)
└── Coolify (deployment management)
External services:
├── Supabase Cloud (Frankfurt, EU)
│ ├── PostgreSQL 16 + pgvector
│ ├── Supabase Auth
│ └── Supabase Vault
└── Cloudflare R2 (global)
└── Multi-bucket object storage
Latency between Hetzner Nuremberg and Supabase Frankfurt was around 2 to 5 ms.
Service split (general availability). When the platform would have moved to GA under the legacy design, services would split into independent binaries. This required only configuration changes, no architectural changes, because services communicated through events and explicit HTTP APIs, not in-process function calls.
Hetzner VPS 1 (Edge) Hetzner VPS 2 (Compute)
├── gateway (port 8001) ├── agent (port 8002)
├── nginx └── context (port 8003)
└── redis
Hetzner VPS 3 (Background) External
├── tools (port 8004) ├── Supabase Cloud (PostgreSQL + Auth)
├── queue (port 8005) └── Cloudflare R2
└── usage (port 8006)
Container strategy. Each service built to a minimal container, target image size under 20MB:
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o service ./cmd/{service}
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/service /usr/local/bin/
USER nobody
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s CMD wget -q --spider http://localhost:8080/health || exit 1
CMD ["service"]
20.14 Legacy configuration¶
All services read configuration from environment variables with sensible defaults:
type Config struct {
// Service identity
ServiceName string
ServicePort int
// Supabase / database
SupabaseURL string // Project URL (for auth and REST)
SupabaseServiceKey string // service_role key (bypasses RLS for backend ops)
DatabaseURL string // Direct connection URL (migrations, admin)
DatabasePoolURL string // Supavisor pooled URL (application queries)
// Redis
RedisURL string
// Object storage
R2AccountID string
R2AccessKeyID string
R2SecretAccessKey string
R2DefaultBucket string
// Service discovery (split-binary mode)
ContextServiceURL string
ToolServiceURL string
// LLM
OpenRouterAPIKey string
AnthropicAPIKey string
// Job system
JobTimeoutCheckInterval time.Duration
DefaultJobTimeout time.Duration
MaxJobTimeout time.Duration
// Delegation
DefaultMaxDelegationDepth int
// Observability
LogLevel string
TraceEndpoint string
MetricsPort int
// Feature flags
Features map[string]bool
}
20.15 Legacy observability¶
Structured logging. All services used structured JSON logging with consistent fields: service, trace_id, agent_id, user_id, level, message, timestamp. For delegations: delegation_depth, parent_interaction_id. For jobs: job_id, job_state. Sensitive data was never logged.
Distributed tracing. OpenTelemetry integration across all services. Trace context propagated through event metadata and HTTP headers. Delegation chains produced linked traces showing the full execution path across agent boundaries.
Metrics. Prometheus-compatible endpoints on every service: request rates, latencies, error rates, queue depths and processing rates, cache hit rates, database connection pool stats, budget utilisation, active job count, dispatch rate, completion rate, delegation depth distribution.
Health checks. Every service exposed /health returning service status and dependency checks (database connectivity, Redis connectivity, event bus reachability).
20.16 Legacy technology stack¶
| Concern | Technology |
|---|---|
| Services | Go (all services) |
| Database | Supabase Cloud (managed PostgreSQL 16 + pgvector + RLS + Vault) |
| Auth | Supabase Auth |
| Cache / Event Bus | Redis 7 with Streams |
| Object Storage | Cloudflare R2 (multi-bucket) |
| Web app | Next.js 15, React 19, TypeScript, Tailwind CSS v4 |
| Mobile app | Flutter / Dart |
| Containerisation | Docker + Docker Compose |
| Infrastructure | Hetzner Cloud (EU), managed via Coolify |
| LLM | OpenRouter / Anthropic API |
20.17 Legacy API surfaces¶
Thinklio's legacy design exposed three external API surfaces. The current design preserves the three-surface model at the application layer; surfaces are implemented as Convex HTTP actions rather than Go handlers.
Channel API. Conversational access. An external system acts as a channel adapter, sending messages to agents on behalf of users and receiving responses. Full governance applies.
Platform API. Orchestration and management. Programmatic control over agent lifecycle, job dispatch and observation, knowledge management, and usage reporting.
Integration API. Bidirectional capability exchange. External systems register as tools available to agents (with governance approval), subscribe to platform events via webhooks, invoke agents via Platform API job dispatch.
All three surfaces resolve to the same authorisation model, governance framework, cost attribution, and audit trail. See doc 09 External API & Tool Integration for the current surface-by-surface contract.
20.18 Legacy predictive planning¶
The legacy platform collected execution outcomes and used Bayesian scoring to evaluate candidate plans based on historical performance, context similarity, and cost efficiency. Over time, agents learned which approaches work best for specific request types without manual tuning. The predictive planning data model and scoring algorithm are preserved and carry forward to the current architecture with outcomes stored in Convex (see doc 08 Agents Catalogue & Platform Services for the current implementation and doc 04 Data Model for the current schema).
21. Migration context (archival)¶
This section preserves the history of the two-plane migration proposal (old doc 43) that analysed Convex adoption before the greenfield rebuild was chosen. It is retained because the Convex capability analysis, integration risks, and phased migration concepts remain useful reference material. The two-plane architecture was not adopted; the greenfield rebuild described in sections 3 through 19 was chosen instead.
21.1 Why the migration was proposed¶
The current legacy architecture routed every agent interaction through a chain of Go services, a Redis event bus, and Supabase Postgres. A typical agent turn followed the path: Gateway -> Redis Streams publish -> Agent Service consumes -> Postgres read (context assembly) -> Redis cache check -> LLM call -> Postgres write (step state) -> Redis publish (response) -> Gateway consumes -> channel delivery. Each hop added latency. The multi-tier caching strategy (legacy doc 06) existed specifically to mitigate the cost of this chain, but it introduced its own complexity: TTL management, cache invalidation, stale data risks, and a large surface area for subtle bugs.
For an AI agent platform, the interaction loop is the product. If Thinklio's agents feel slow, if there is perceptible delay between a user's message and the first token of a response, no amount of governance sophistication or knowledge architecture will matter. Users will leave.
The core performance bottleneck was structural, not tunable. The legacy architecture was designed for correctness and flexibility (event sourcing, durable execution, multi-tenancy via RLS), and it achieved those goals. But the communication patterns between services (publish/subscribe via Redis Streams, synchronous HTTP between Agent and Context services, write-through caching with Postgres as the durable store) imposed a latency floor that could not be eliminated without changing the architecture.
Additional pressures. Beyond raw latency, the legacy architecture had secondary pressures:
- Supabase dependency without full utilisation. The platform used Supabase for two things: Postgres hosting and authentication. It explicitly did not use Supabase Realtime, Storage, Edge Functions, or PostgREST. This meant Thinklio paid for a managed platform while using it primarily as a database host. Removing Supabase Auth was a prerequisite for any architecture change, and doing so opened the door to a more fundamental rethink.
- Custom durable execution harness. The HarnessExecutor was approximately 200 lines of careful Go that managed step state persistence, crash recovery, execution mode routing (immediate/deferred/interactive), and resumption. It worked, but it was custom infrastructure that had to be maintained, tested, and extended as the agent execution model grew more complex.
- Knowledge retrieval via pgvector. Semantic search used pgvector on Supabase Postgres. This worked but required managing embeddings as table columns, maintaining vector indexes, and accepting the performance characteristics of a general-purpose database doing vector similarity.
- Redis Streams as event bus. Redis Streams provided the inter-service communication layer. It was reliable but required consumer group management, message acknowledgement tracking, dead letter handling, and careful attention to stream trimming. For the client-facing delivery path (streaming agent responses to web/mobile), it added an unnecessary relay hop.
21.2 Why Convex¶
The two-plane proposal identified the following Convex properties as strong fits for the agent execution workload:
- Reactive queries. Every query function in Convex tracks its data dependencies. When any dependency changes, the query re-runs and every client subscription updates automatically. This eliminates the entire caching layer for agent-facing data. When a knowledge fact is updated, every active agent session that depends on it sees the change in the next query cycle. No TTLs, no invalidation, no stale data.
- Native durable workflows. The Workflow component persists each step's arguments and return values in a journal. On failure or restart, the workflow replays completed steps from the journal rather than re-executing them. Steps can be queries, mutations, or actions. Workflows can pause indefinitely waiting for external events without consuming resources. This replaces the custom HarnessExecutor with a platform feature.
- Built-in vector and text search. Convex provides native vector search (up to 256 results per query, with filter expressions) and full-text search. The RAG component adds namespaced document search with importance weighting, chunk context retrieval, and graceful migrations. The Agent component integrates both for automatic context assembly from message history.
- WebSocket-based streaming. Convex streams data to clients over persistent WebSocket connections, not HTTP streams. Every subscribed client receives updates simultaneously. The Persistent Text Streaming component streams LLM output token-by-token while persisting the final text. No WebSocket infrastructure to build, no connection state to manage.
- Agent component. A purpose-built abstraction for AI agents that manages threads, messages, tool calls, multi-step reasoning, context assembly (recent messages + search), and streaming. Agents are defined declaratively with a model, instructions, and tools. The component handles persistence, ordering, and search automatically.
- Component ecosystem. First-party components for Workflow, Workpool, Rate Limiter, Crons, Aggregate, Sharded Counter, Action Retrier, and Persistent Text Streaming. Community components for Audit Log, Webhook Sender, Message Queue, and LLM Cache.
- Self-hosted option. The Convex backend is open source (Apache-2.0) and runs in Docker with the same code as the cloud service. Self-hosted deployments can use SQLite or Postgres as the backing store.
- Enterprise trajectory. Convex has a growing base of enterprise clients and raised $24M USD. The platform is SOC 2 Type II compliant, HIPAA compliant, and GDPR verified. Cloud hosting includes an EU region (Ireland), which aligns with Thinklio's data residency requirements.
21.3 Two-plane architecture (superseded by section 19 greenfield rebuild)¶
The two-plane proposal would have kept the Go services for authentication, policy, usage, and admin while moving agent execution to Convex. The agent plane (Convex) would handle thread and message management, context assembly, knowledge retrieval, the durable execution loop, LLM orchestration, streaming, and background task processing. The platform plane (Go plus Postgres plus Redis) would handle identity, accounts, governance, cost metering, billing, audit logging, channel protocol handling, and system-of-record data. The two planes would communicate over HTTP.
This design was analysed in depth and partially prototyped during the knowledge-layer phase. It was ultimately superseded by the greenfield Convex-first rebuild (sections 3 through 19) because:
- Maintaining two backend languages for a small team was expensive.
- The HTTP policy check from Convex to Go on every tool execution introduced a coordination point that could be implemented as in-process Convex middleware reading cached policy documents with zero overhead.
- The authentication migration from Supabase Auth to Keycloak was a significant piece of work, and Clerk offered the same capability plus pre-built UI plus first-class Convex integration with a much smaller operational surface.
- Postgres self-hosting for the system-of-record plane added infrastructure that was not obviously better than Convex plus Fivetran CDC streaming to an external warehouse.
21.4 Risks identified in the two-plane analysis¶
These risks were identified and analysed during the migration proposal. Most carry forward to the greenfield rebuild in reduced form; they are listed here for traceability.
| Risk | Mitigation |
|---|---|
| Two-language codebase | Greenfield rebuild eliminates this; the backend is TypeScript on Convex. |
| Platform maturity | Convex open-source self-hosted option provides an escape hatch; system-of-record replicated to external Postgres via Fivetran. |
| Cross-plane latency | Greenfield rebuild eliminates this; policy checks are in-process Convex middleware. |
| Data consistency between planes | Greenfield rebuild eliminates this; all data lives in one Convex project. |
| Loss of database-level tenant isolation | Consistent accountQuery middleware in every Convex function; automated test suite verifies cross-tenant isolation; periodic audit queries. |
| Vendor dependency | Convex is open source and self-hostable. System-of-record replicated externally. |
| Auth migration | Clerk chosen over Keycloak; migration path is simpler with pre-built UI and managed service. |
| Vector search limits (256 results) | Adequate for context assembly (typical LLM context windows consume 10 to 50 chunks). Pre-filtering via RAG namespaces reduces the search space. |
21.5 Questions that remained open at the time of supersession¶
The two-plane proposal left the following questions open. The greenfield rebuild resolved most of them:
- Convex Cloud pricing at scale. Monitored during Phase 1; reviewed against Professional plan allocation and Enterprise plan pricing. Tier 3 external queue path exists for workloads that outgrow the Convex-native budget.
- Auth provider choice. Resolved: Clerk chosen over Keycloak and Convex Auth for managed operational surface, pre-built UI, first-class Convex integration, and enterprise features.
- Agent configuration source of truth. Resolved: all agent configuration lives in Convex in the current architecture. There is no separate source of truth.
- Job system mapping. Resolved: Workflow component covers multi-step durable work; Workpool covers parallel background work; reactive queries replace observer notifications.
- External agent execution contract. Open: externally-executed agents call an HTTP endpoint operated by the developer. The current contract calls the external endpoint from a Convex action inside the agent Workflow. See doc 03 Agent Architecture for the current contract.
- Predictive planning. Resolved: execution outcomes are written to Convex and scored by a Convex function reading the outcome history. See doc 08 Agents Catalogue & Platform Services.
22. Open questions¶
- Plugin and extension architecture. Cowork mode and IDE integrations introduce the notion of user-installable plugins. A dedicated design pass is needed for how plugin skills, MCP connectors, and commands register into an account, how they are scoped by permissions, and how they coexist with account-level agent tools. Tentative placement: doc 09 External API & Tool Integration.
- Cross-account agent sharing. There is currently no mechanism for a vendor to publish an agent that another Thinklio account can install without re-configuration. The design for this (marketplace, manifest, trust model, update propagation) is deferred.
- Multi-region residency. Convex Cloud currently deploys to one region per project. Customers with strict residency requirements outside the EU region need the self-hosted path. A multi-region managed option would require per-region Convex projects with a routing layer, which is not currently planned.
- Voice channel bridge. The Tier 1 classification assumes near-real-time transcription. The concrete bridge design (whisper service, streaming ASR, partial transcription, end-of-utterance detection) is deferred.
- Cost model for Tier 3 external queue. The break-even point between upgrading Convex plans and running an external queue depends on workload shape. A concrete monitoring dashboard plus a cost model is needed before Tier 3 migration becomes real.
23. Revision history¶
| Date | Change |
|---|---|
| 2026-03-22 | Original System Architecture (doc 04 v0.2.0) published |
| 2026-03-26 | Merged Architecture Overview (doc 02) into System Architecture (doc 04 v0.3.0) |
| 2026-03-31 | Proposed: two-plane Convex migration architecture (doc 43 v0.1.0) |
| 2026-04-01 | Proposed: greenfield Convex-first architecture (doc 44 v0.1.0) |
| 2026-04-06 | Added: execution tiers and workflow budget (doc 54 v0.1.0) |
| 2026-04-16 | Consolidated docs 02, 04, 43, 44, and 54 into this document (v1.0.0). All sources archived. |