Skip to content

Tenancy & Deployment Topology

Overview

Thinklio is multi-tenant. The tenancy boundary is the account (a Clerk organisation), with teams and users nested beneath it. Budget enforcement and governance policies act at all three layers, with account policies overriding lower layers.

This document is authoritative for how tenants are isolated, where they live, and how they move. It covers three things:

  1. What we do now — a single Convex deployment serving all tenants, with isolation enforced in application code.
  2. Why — the reasoning and the scale envelope that makes this the correct default.
  3. Future options — how a tenant graduates to a dedicated Convex deployment or a customer-owned Convex account, how new tenants are provisioned in each model, and the procedure for migrating a tenant out of the shared deployment.

For the data model of accounts, teams, and users see Data Model. For the isolation, credential, and policy mechanics see Security & Governance. For the overall service topology see System Architecture. The decision recorded here is ADR-022 in the Decision Log.


1. Tenancy model

1.1 The hierarchy

account (Clerk organisation)   ← tenancy boundary; billing + governance root
  ├── team                     ← collective scoping for knowledge, agents, data
  │     └── team_member        ← user ∈ team, with role
  ├── account_user             ← user ∈ account, with role (admin | member)
  └── (all tenant data)        ← every row carries accountId
  • An account maps one-to-one to a Clerk organisation. It is the billing entity, the governance root, and the unit of isolation.
  • A team is a group within an account that owns collective knowledge, shared agents, and team-scoped data.
  • A user has a single global identity (Clerk user) and may belong to multiple accounts.

1.2 Where budget and governance act

Layer Budget Governance
Account Credit balance, monthly spend caps account_policies — highest authority; override all lower layers
Team Per-team budget allocation team_policies — apply within the team
User Per-user / per-assignment budget User-scoped preferences and restrictions

Account policies are the ceiling: a team or user can be more restricted than the account allows, never less. See Security & Governance § for the policy evaluation order.


2. Current model — single pooled Convex deployment

All tenants share one Convex deployment. Tenancy is enforced entirely in application code; Convex provides no row-level security at the database layer, so isolation is a property of our middleware discipline, not of the platform.

2.1 The isolation boundary

The tenant boundary rides on the verified Clerk JWT, not on any client-supplied argument. The account-scoped function wrappers in convex/lib/middleware.ts derive accountId from the org claim in the authenticated identity:

accountQuery / accountMutation
  → ctx.auth.getUserIdentity()          (verified JWT)
  → orgId = identity["o.id"]            (Clerk org claim)
  → ctx.accountId = orgId               (injected; not an argument)

Because accountId comes from the token, a caller cannot request another tenant's data by passing a different ID. This is isolation-by-construction at the function boundary.

2.2 The invariants that keep it safe

The single deployment is safe only while these hold. They are non-negotiable:

  1. No raw ctx.db in business logic. Every data-touching query/mutation goes through accountQuery / accountMutation (or the authed* variants for account-agnostic identity work). Raw query / mutation on tenant tables is forbidden.
  2. Every index on a tenant table leads with the tenant boundaryaccountId (or a parent owned by an account, e.g. channelId). No unscoped scans, no JS-side filtering across tenants.
  3. Re-check ownership after db.get. A direct fetch by Id bypasses the index scope, so handlers re-assert doc.accountId === ctx.accountId before returning (see getTask in tasksCrud.ts).

A single violation of (1) or (2) is a potential cross-tenant data leak whose blast radius is all tenants. These invariants should be enforced by lint/convention, not memory.

2.3 What is shared vs isolated

  • Isolated per account: all tenant data — channels, messages, tasks, contacts, items, notes, knowledge, files, agents, policies, usage. Every row carries accountId and is reachable only through the scoped wrappers.
  • Shared across all tenants (by design): the agent catalog (templates), platform LLM configuration, the Convex component infrastructure (Agent, RAG, Workflow, Rate Limiter), and the deployed code itself. These are platform-level, not tenant data.

2.4 Noisy-neighbour control

Because function compute and platform rate limits are shared in a pooled deployment, one tenant could in principle starve others. Two existing mechanisms cap this without needing physical isolation:

  • Per-account budgets (the governance layer) limit how much LLM/tool work a tenant can drive.
  • The Rate Limiter component throttles per-account request volume against shared function capacity.

This is a genuine advantage of having built governance early: the budget layer doubles as the multi-tenant fairness control.

2.5 Known residual risks

Risk Mitigation
Middleware bug leaks across tenants Invariants §2.2 enforced by lint; code review; export/delete tooling (§5) limits exposure window
Noisy neighbour Per-account budgets + Rate Limiter (§2.4)
Per-deployment scaling limits if a few "whale" tenants dominate document/throughput volume Graduation to a dedicated deployment (§4); not a concern at current scale (§3)
Per-tenant hard-delete (GDPR) is a careful cascade rather than a DROP Build the cascade once, reuse as the migration mechanism (§5, §6)

3. Why single instance — and the scale envelope

The expected scale is the decisive factor. Thinklio serves B2B accounts (organisations of tens to low-hundreds of users). We expect a handful of accounts initially and, even with strong success, on the order of a few thousand at the ceiling.

A few thousand small orgs is comfortably within one Convex deployment. The constraint that matters is not the number of accounts but total document volume and hot-path read amplification — both controlled by the index discipline in §2.2.

Against that, the cost of running thousands of separate deployments — provisioning, migrations across N deployments, version skew, lost usage pooling, and the control-plane needed to manage them — is prohibitive for a small team and buys isolation the long tail does not need.

Decision: stay single-instance and pooled by default. Treat dedicated isolation as a premium capability offered per tenant on demand, not the global architecture. Recorded as ADR-022.


4. Future options — deployment topologies

We do not choose single-vs-separate globally. We adopt the cell pattern: every tenant starts pooled and graduates to stronger isolation only when there is a reason. Three topologies, in increasing isolation:

Tier Topology Convex account When Isolation
T1 — Pooled (default) Shared deployment, app-layer isolation Thinklio's All SMB / self-serve Logical
T2 — Dedicated One deployment per tenant Thinklio's Enterprise contract, compliance, data residency, scale Physical (data + compute)
T3 — Customer-owned One deployment in the customer's own Convex account Customer's Full data sovereignty / procurement requirement Physical + administrative

Triggers for graduating a tenant from T1:

  • Contractual / compliance — the customer requires their data physically isolated or in their own account.
  • Data residency — a region requirement a shared deployment cannot satisfy.
  • Scale / noisy neighbour — a whale tenant whose volume justifies its own deployment.

4.1 The enabler: deployment-agnostic code

The cell pattern only works if the same codebase runs unchanged in any deployment. "Which deployment" and "which Clerk" must be configuration (environment), never assumptions baked into code. We are essentially here today and must not regress:

  • No hardcoded deployment URLs or single-tenant assumptions in functions.
  • Auth config (auth.config.ts) reads the Clerk issuer/keys from env, so a deployment can point at the shared Clerk (filtered to one org) or a customer's own Clerk.
  • The agent catalog and other shared seed data are reproducible via seed functions so a fresh deployment can be stood up identically.

5. Provisioning a new tenant

5.1 Pooled (T1) — the default, instant

A new account requires no provisioning. The Clerk organization.created webhook (http.ts/clerk-webhook) writes an account_records row. The tenant is live immediately; all data is created lazily through the scoped wrappers. This is the entire point of pooling.

5.2 Dedicated (T2) / customer-owned (T3)

Standing up an isolated deployment is a deliberate, scripted process (future tooling — not yet built):

  1. Create the deployment. A new Convex deployment in Thinklio's account (T2) or the customer's account (T3).
  2. Configure environment. Set the Convex URL, R2 storage credentials/bucket, LLM provider keys, and Clerk issuer (shared Clerk filtered to the one org, or the customer's own Clerk).
  3. Deploy code. Push the same codebase; run seed to populate the agent catalog and platform defaults.
  4. Register in the control plane. A tenant-routing layer (future) maps the account → its deployment so the web/app clients connect to the right Convex URL. Until this exists, T2/T3 is a manual cut-over.
  5. Verify isolation, auth, and a smoke-test agent turn before routing live traffic.

The control plane (a small registry of accountId → deploymentURL plus the provisioning scripts) is the main net-new component T2/T3 requires. It is deliberately deferred until the first enterprise deal justifies it.


6. Migration — moving a tenant out of the pool

This is the path for a tenant that starts pooled (T1) and later needs a dedicated or customer-owned deployment (T2/T3), and it reuses the GDPR export/delete machinery. The ability to lift a tenant cleanly out of the pool is what de-risks the entire single-instance bet — we are never trapped.

6.1 The two reusable primitives

Both are required for GDPR independently of migration, so they are not migration-only cost:

  • Per-tenant export — serialise the full subgraph of one accountId: every tenant table, plus the tenant's Agent-component threads/messages, RAG embeddings, and R2 file objects.
  • Per-tenant hard-delete (cascade) — remove every trace of one accountId from a deployment.

6.2 Procedure (pooled → dedicated)

  1. Export the tenant subgraph from the shared deployment.
  2. Stand up the target deployment (§5.2).
  3. Import with ID remapping (see §6.3).
  4. Cut over Clerk/routing so the account's clients point at the new deployment (via the control plane).
  5. Verify parity — record counts, a sample of chats, agent turns, file fetches.
  6. Hard-delete the tenant from the shared deployment once parity is confirmed and a retention window has passed.

6.3 What makes migration non-trivial

These are the real engineering costs and must be designed into the export/import tooling:

  • Convex IDs are deployment-specific. Every v.id(...) foreign key changes on import. The importer must insert in dependency order and maintain an old-ID → new-ID map, rewriting all references. This is the bulk of the work.
  • Agent-component state. @convex-dev/agent stores threads and messages in its own component tables; these must be exported and re-imported (or chats re-seeded) or the tenant loses conversational history.
  • RAG embeddings. Vector entries in the RAG component either move with the export or are re-embedded on import (re-embedding is simpler but costs tokens and changes vector IDs).
  • R2 file objects. If the tenant uses the platform_shared bucket, objects must be copied to the target's bucket and files.r2Key rewritten. If the tenant already uses an account_supplied bucket, the objects may stay in place and only references move.
  • Referential integrity across components. Tenant tables, Agent component, and RAG component must be exported at a consistent point; agent execution for the tenant should be quiesced during the final cut-over.

7. Action items to keep options open

These are cheap now and expensive later. Do them while pooled:

  1. Enforce the no-unscoped-ctx.db invariant (§2.2) via lint/convention, not review alone.
  2. Keep code deployment-agnostic (§4.1) — all deployment/Clerk specifics in env.
  3. Build per-tenant export + hard-delete cascade (§6.1) — required for GDPR, doubles as the migration mechanism.
  4. Defer the control plane (§5.2 step 4) until the first T2/T3 tenant — but design export/import (§6.3) now so it is ready when that deal lands.

Cross-references