Tenancy & Deployment Topology¶
Overview¶
Thinklio is multi-tenant. The tenancy boundary is the account (a Clerk organisation), with teams and users nested beneath it. Budget enforcement and governance policies act at all three layers, with account policies overriding lower layers.
This document is authoritative for how tenants are isolated, where they live, and how they move. It covers three things:
- What we do now — a single Convex deployment serving all tenants, with isolation enforced in application code.
- Why — the reasoning and the scale envelope that makes this the correct default.
- Future options — how a tenant graduates to a dedicated Convex deployment or a customer-owned Convex account, how new tenants are provisioned in each model, and the procedure for migrating a tenant out of the shared deployment.
For the data model of accounts, teams, and users see Data Model. For the isolation, credential, and policy mechanics see Security & Governance. For the overall service topology see System Architecture. The decision recorded here is ADR-022 in the Decision Log.
1. Tenancy model¶
1.1 The hierarchy¶
account (Clerk organisation) ← tenancy boundary; billing + governance root
├── team ← collective scoping for knowledge, agents, data
│ └── team_member ← user ∈ team, with role
├── account_user ← user ∈ account, with role (admin | member)
└── (all tenant data) ← every row carries accountId
- An account maps one-to-one to a Clerk organisation. It is the billing entity, the governance root, and the unit of isolation.
- A team is a group within an account that owns collective knowledge, shared agents, and team-scoped data.
- A user has a single global identity (Clerk user) and may belong to multiple accounts.
1.2 Where budget and governance act¶
| Layer | Budget | Governance |
|---|---|---|
| Account | Credit balance, monthly spend caps | account_policies — highest authority; override all lower layers |
| Team | Per-team budget allocation | team_policies — apply within the team |
| User | Per-user / per-assignment budget | User-scoped preferences and restrictions |
Account policies are the ceiling: a team or user can be more restricted than the account allows, never less. See Security & Governance § for the policy evaluation order.
2. Current model — single pooled Convex deployment¶
All tenants share one Convex deployment. Tenancy is enforced entirely in application code; Convex provides no row-level security at the database layer, so isolation is a property of our middleware discipline, not of the platform.
2.1 The isolation boundary¶
The tenant boundary rides on the verified Clerk JWT, not on any client-supplied argument. The account-scoped function wrappers in convex/lib/middleware.ts derive accountId from the org claim in the authenticated identity:
accountQuery / accountMutation
→ ctx.auth.getUserIdentity() (verified JWT)
→ orgId = identity["o.id"] (Clerk org claim)
→ ctx.accountId = orgId (injected; not an argument)
Because accountId comes from the token, a caller cannot request another tenant's data by passing a different ID. This is isolation-by-construction at the function boundary.
2.2 The invariants that keep it safe¶
The single deployment is safe only while these hold. They are non-negotiable:
- No raw
ctx.dbin business logic. Every data-touching query/mutation goes throughaccountQuery/accountMutation(or theauthed*variants for account-agnostic identity work). Rawquery/mutationon tenant tables is forbidden. - Every index on a tenant table leads with the tenant boundary —
accountId(or a parent owned by an account, e.g.channelId). No unscoped scans, no JS-side filtering across tenants. - Re-check ownership after
db.get. A direct fetch byIdbypasses the index scope, so handlers re-assertdoc.accountId === ctx.accountIdbefore returning (seegetTaskintasksCrud.ts).
A single violation of (1) or (2) is a potential cross-tenant data leak whose blast radius is all tenants. These invariants should be enforced by lint/convention, not memory.
2.3 What is shared vs isolated¶
- Isolated per account: all tenant data — channels, messages, tasks, contacts, items, notes, knowledge, files, agents, policies, usage. Every row carries
accountIdand is reachable only through the scoped wrappers. - Shared across all tenants (by design): the agent catalog (templates), platform LLM configuration, the Convex component infrastructure (Agent, RAG, Workflow, Rate Limiter), and the deployed code itself. These are platform-level, not tenant data.
2.4 Noisy-neighbour control¶
Because function compute and platform rate limits are shared in a pooled deployment, one tenant could in principle starve others. Two existing mechanisms cap this without needing physical isolation:
- Per-account budgets (the governance layer) limit how much LLM/tool work a tenant can drive.
- The Rate Limiter component throttles per-account request volume against shared function capacity.
This is a genuine advantage of having built governance early: the budget layer doubles as the multi-tenant fairness control.
2.5 Known residual risks¶
| Risk | Mitigation |
|---|---|
| Middleware bug leaks across tenants | Invariants §2.2 enforced by lint; code review; export/delete tooling (§5) limits exposure window |
| Noisy neighbour | Per-account budgets + Rate Limiter (§2.4) |
| Per-deployment scaling limits if a few "whale" tenants dominate document/throughput volume | Graduation to a dedicated deployment (§4); not a concern at current scale (§3) |
Per-tenant hard-delete (GDPR) is a careful cascade rather than a DROP |
Build the cascade once, reuse as the migration mechanism (§5, §6) |
3. Why single instance — and the scale envelope¶
The expected scale is the decisive factor. Thinklio serves B2B accounts (organisations of tens to low-hundreds of users). We expect a handful of accounts initially and, even with strong success, on the order of a few thousand at the ceiling.
A few thousand small orgs is comfortably within one Convex deployment. The constraint that matters is not the number of accounts but total document volume and hot-path read amplification — both controlled by the index discipline in §2.2.
Against that, the cost of running thousands of separate deployments — provisioning, migrations across N deployments, version skew, lost usage pooling, and the control-plane needed to manage them — is prohibitive for a small team and buys isolation the long tail does not need.
Decision: stay single-instance and pooled by default. Treat dedicated isolation as a premium capability offered per tenant on demand, not the global architecture. Recorded as ADR-022.
4. Future options — deployment topologies¶
We do not choose single-vs-separate globally. We adopt the cell pattern: every tenant starts pooled and graduates to stronger isolation only when there is a reason. Three topologies, in increasing isolation:
| Tier | Topology | Convex account | When | Isolation |
|---|---|---|---|---|
| T1 — Pooled (default) | Shared deployment, app-layer isolation | Thinklio's | All SMB / self-serve | Logical |
| T2 — Dedicated | One deployment per tenant | Thinklio's | Enterprise contract, compliance, data residency, scale | Physical (data + compute) |
| T3 — Customer-owned | One deployment in the customer's own Convex account | Customer's | Full data sovereignty / procurement requirement | Physical + administrative |
Triggers for graduating a tenant from T1:
- Contractual / compliance — the customer requires their data physically isolated or in their own account.
- Data residency — a region requirement a shared deployment cannot satisfy.
- Scale / noisy neighbour — a whale tenant whose volume justifies its own deployment.
4.1 The enabler: deployment-agnostic code¶
The cell pattern only works if the same codebase runs unchanged in any deployment. "Which deployment" and "which Clerk" must be configuration (environment), never assumptions baked into code. We are essentially here today and must not regress:
- No hardcoded deployment URLs or single-tenant assumptions in functions.
- Auth config (
auth.config.ts) reads the Clerk issuer/keys from env, so a deployment can point at the shared Clerk (filtered to one org) or a customer's own Clerk. - The agent catalog and other shared seed data are reproducible via
seedfunctions so a fresh deployment can be stood up identically.
5. Provisioning a new tenant¶
5.1 Pooled (T1) — the default, instant¶
A new account requires no provisioning. The Clerk organization.created webhook (http.ts → /clerk-webhook) writes an account_records row. The tenant is live immediately; all data is created lazily through the scoped wrappers. This is the entire point of pooling.
5.2 Dedicated (T2) / customer-owned (T3)¶
Standing up an isolated deployment is a deliberate, scripted process (future tooling — not yet built):
- Create the deployment. A new Convex deployment in Thinklio's account (T2) or the customer's account (T3).
- Configure environment. Set the Convex URL, R2 storage credentials/bucket, LLM provider keys, and Clerk issuer (shared Clerk filtered to the one org, or the customer's own Clerk).
- Deploy code. Push the same codebase; run
seedto populate the agent catalog and platform defaults. - Register in the control plane. A tenant-routing layer (future) maps the account → its deployment so the web/app clients connect to the right Convex URL. Until this exists, T2/T3 is a manual cut-over.
- Verify isolation, auth, and a smoke-test agent turn before routing live traffic.
The control plane (a small registry of accountId → deploymentURL plus the provisioning scripts) is the main net-new component T2/T3 requires. It is deliberately deferred until the first enterprise deal justifies it.
6. Migration — moving a tenant out of the pool¶
This is the path for a tenant that starts pooled (T1) and later needs a dedicated or customer-owned deployment (T2/T3), and it reuses the GDPR export/delete machinery. The ability to lift a tenant cleanly out of the pool is what de-risks the entire single-instance bet — we are never trapped.
6.1 The two reusable primitives¶
Both are required for GDPR independently of migration, so they are not migration-only cost:
- Per-tenant export — serialise the full subgraph of one
accountId: every tenant table, plus the tenant's Agent-component threads/messages, RAG embeddings, and R2 file objects. - Per-tenant hard-delete (cascade) — remove every trace of one
accountIdfrom a deployment.
6.2 Procedure (pooled → dedicated)¶
- Export the tenant subgraph from the shared deployment.
- Stand up the target deployment (§5.2).
- Import with ID remapping (see §6.3).
- Cut over Clerk/routing so the account's clients point at the new deployment (via the control plane).
- Verify parity — record counts, a sample of chats, agent turns, file fetches.
- Hard-delete the tenant from the shared deployment once parity is confirmed and a retention window has passed.
6.3 What makes migration non-trivial¶
These are the real engineering costs and must be designed into the export/import tooling:
- Convex IDs are deployment-specific. Every
v.id(...)foreign key changes on import. The importer must insert in dependency order and maintain an old-ID → new-ID map, rewriting all references. This is the bulk of the work. - Agent-component state.
@convex-dev/agentstores threads and messages in its own component tables; these must be exported and re-imported (or chats re-seeded) or the tenant loses conversational history. - RAG embeddings. Vector entries in the RAG component either move with the export or are re-embedded on import (re-embedding is simpler but costs tokens and changes vector IDs).
- R2 file objects. If the tenant uses the
platform_sharedbucket, objects must be copied to the target's bucket andfiles.r2Keyrewritten. If the tenant already uses anaccount_suppliedbucket, the objects may stay in place and only references move. - Referential integrity across components. Tenant tables, Agent component, and RAG component must be exported at a consistent point; agent execution for the tenant should be quiesced during the final cut-over.
7. Action items to keep options open¶
These are cheap now and expensive later. Do them while pooled:
- Enforce the no-unscoped-
ctx.dbinvariant (§2.2) via lint/convention, not review alone. - Keep code deployment-agnostic (§4.1) — all deployment/Clerk specifics in env.
- Build per-tenant export + hard-delete cascade (§6.1) — required for GDPR, doubles as the migration mechanism.
- Defer the control plane (§5.2 step 4) until the first T2/T3 tenant — but design export/import (§6.3) now so it is ready when that deal lands.
Cross-references¶
- 02 System Architecture — service topology and deployment context.
- 04 Data Model — account, team, user, and tenant-table definitions.
- 07 Security & Governance — isolation middleware, policy evaluation, credential scoping.
- Decision Log — ADR-022 records the single-instance decision.