Events, Channels & Messaging¶

Precis¶

Events, the durable harness, and channels are the runtime spine of Thinklio. Every inbound message, every agent turn, every tool call, and every outbound reply flows through this layer. The current design is Convex-native: events are documents in an event table that agents and services subscribe to via reactive queries, durable execution is the Convex Workflow component, and channel ingress and egress run as Convex HTTP actions bridging external transports (Telegram, Postmark, planned WhatsApp and SMS) to the reactive core.

This document replaces the Redis Streams event bus and the Go HarnessExecutor with their Convex-era equivalents. The event envelope, step semantics, channel identity model, multi-channel delivery algorithm, and messaging UX conventions carry forward; the implementation changes. A dedicated Go service for channel ingress or the API surface remains a Tier 3 promotion candidate per the execution-tier model in 02 System Architecture, taken only when analytics show Convex HTTP action latency, cold starts, or inbound rate limits constraining a specific path. It is not the default.

Entity definitions (including event, interaction, step, channel, user_channel) live in 04 Data Model. The Convex platform capabilities backing this design are in 11 Convex Reference. The system architecture that places messaging at the centre is in 02 System Architecture. The retired Redis/Go design is preserved under archive/legacy-event-system-harness-design.md.

This event/channel layer is Thinklio's implementation of a cross-product contract — the Interaction Protocol (dev/docs/interaction-protocol.md, outside this repo), shared with Twikka and Couple Tools. The protocol is authoritative for the product-neutral envelope/kind taxonomy, the four input lanes (structured / language / media / ambient), and the affordance round-trip; the rendered-message and affordance specifics for Thinklio are in 16 Chats, Channels & Identity §8.

Table of contents¶

Purpose and scope
Runtime layer overview
The event model
Event distribution via reactive queries
The durable harness on the Workflow component
Step lifecycle and resumability
Channel identity model
Channel connection lifecycle
Channel ingress via HTTP actions
Multi-channel response delivery
Email channel: bidirectional Postmark
Admin control of channels
Messaging UX conventions
Tier 3 promotion to Go services
Edge cases, security, and failure modes
Monitoring
Implementation phases
Revision history

1. Purpose and scope¶

Scope:

The event model: shape of events, envelope, types, scope rules.
Event distribution: how subscribers (agents, services, UI) see events as they land.
The durable harness: how Convex's Workflow component runs agent turns so they survive crashes, Convex restarts, and scale events.
Channel identity: how external users (phone numbers, email addresses, Telegram IDs) map to platform users.
Channel ingress: how inbound traffic reaches Convex from Postmark, Telegram, and future transports.
Multi-channel response delivery: which channel an agent reply goes out through and why.
Messaging UX: conventions the client applications follow for message display, typing indicators, receipts.

Out of scope:

The agent execution contract, prompt construction, tool invocation semantics: see 03 Agent Architecture & Extensibility.
Storage, persistence, and media ingestion: see 05 Persistence, Storage & Ingestion.
Security policies, governance enforcement, and credential management: see 07 Security & Governance.
The three API surfaces (Channel, Platform, Integration): see 09 External API & Tool Integration.

The retired Redis Streams + Go HarnessExecutor design is preserved under archive/legacy-event-system-harness-design.md.

2. Runtime layer overview¶

Three concepts, one system:

Concept	What it is in the current design	Backed by
Event	A durable record of something that happened: message received, step completed, delegation opened, tool called, policy evaluated, credit consumed. Documents in the `event` table.	Convex database
Harness	The mechanism that runs an agent turn step by step, persisting each step, resuming after failure, enforcing governance at every boundary.	Convex Workflow component
Channel	A conceptual surface through which users and agents exchange messages: DM, group chat, email thread, Telegram chat. External transports plug in via HTTP actions.	Convex tables (`channel`, `user_channel`, `channel_identity`) + HTTP actions

These are not three independent services. They are three views of a single reactive data layer. Writing an event fires subscribers; subscribers include the harness (which may advance a workflow step), the channel delivery path (which may fan out a response), and the client UI (which updates live). There is no queue to drain, no consumer group to coordinate, no backpressure to tune explicitly.

The event bus that was Redis Streams in the legacy design collapses into this reactive model. The HarnessExecutor that was a Go goroutine pool is now a Convex Workflow. The channel bridge services that were Go processes are now Convex HTTP actions.

3. The event model¶

Every significant platform action produces an event. Events are immutable, tenant-scoped, append-only, and durable for the full retention window set by the account's plan.

3.1 Event envelope¶

event: defineTable({
  accountId: v.string(),
  kind: v.string(),                    // "message.received", "step.completed", etc.
  actor: v.object({
    kind: v.union(v.literal("user"), v.literal("agent"), v.literal("system")),
    id: v.string(),
  }),
  target: v.optional(v.object({
    kind: v.string(),                  // "channel", "interaction", "job", ...
    id: v.string(),
  })),
  payload: v.any(),                    // kind-specific structured body
  causationId: v.optional(v.id("event")),  // the event that caused this one
  correlationId: v.optional(v.string()),   // groups related events (turn, session)
  at: v.number(),                       // millis since epoch
})
  .index("by_account_at", ["accountId", "at"])
  .index("by_kind_at", ["accountId", "kind", "at"])
  .index("by_correlation", ["correlationId"])
  .index("by_target", ["target.kind", "target.id"]);

3.2 Event kinds¶

Events use a hierarchical namespace domain.action. The core domains:

Domain	Representative kinds
`message`	`message.received`, `message.sent`, `message.redacted`
`interaction`	`interaction.started`, `interaction.completed`, `interaction.failed`
`step`	`step.started`, `step.completed`, `step.retried`
`delegation`	`delegation.opened`, `delegation.replied`, `delegation.closed`
`tool`	`tool.invoked`, `tool.returned`, `tool.failed`
`policy`	`policy.evaluated`, `policy.blocked`, `policy.overridden`
`channel`	`channel.connected`, `channel.disconnected`, `channel.suspended`
`credit`	`credit.consumed`, `credit.granted`
`account`	`account.created`, `account.deactivated`

New kinds are added freely. Consumers ignore kinds they do not recognise.

3.3 Properties¶

Immutable. Events never update. Corrections are new events.
Tenant-scoped. Every event carries an accountId; there is no cross-account event.
Indexed for time and kind. The two hot query shapes are "recent events for this account" and "recent events of this kind".
Causation chains. The causationId field lets a consumer walk backwards through a chain: this tool call was caused by this step, which was caused by this message. This is how the audit UI reconstructs a turn.
Correlation IDs group related events across causation chains. A user turn has one correlation ID spanning the message, the resulting steps, any delegation, the final response.

3.4 Event vs audit event¶

Audit events live in a separate audit_event table with a different schema, different retention, and different indexing optimised for compliance queries. The two exist because audit has different needs: redaction rules, longer retention, export for external storage, stricter access. An event can also be an audit event; the writers are different codepaths. See 07 Security & Governance for the audit model.

4. Event distribution via reactive queries¶

There is no event bus as a separate system. Distribution is the Convex reactive query engine: every consumer subscribes to a query that reads events, and the query result updates automatically when new events land.

4.1 Query patterns¶

Recent events for a channel's UI:

export const recentChannelEvents = query({
  args: { chatId: v.id("chats") },
  handler: async (ctx, { chatId }) => {
    await requireChatMember(ctx, chatId);
    return ctx.db
      .query("event")
      .withIndex("by_target", q =>
        q.eq("target.kind", "channel").eq("target.id", chatId)
      )
      .order("desc")
      .take(100);
  },
});

Events of a kind for an account (admin dashboard):

export const auditEventsOfKind = query({
  args: { kind: v.string() },
  handler: async (ctx, { kind }) => {
    const { accountId } = await requireAccountAdmin(ctx);
    return ctx.db
      .query("event")
      .withIndex("by_kind_at", q =>
        q.eq("accountId", accountId).eq("kind", kind)
      )
      .order("desc")
      .take(500);
  },
});

Client subscribers to these queries get live updates. Writing a matching event in any mutation, anywhere, immediately re-evaluates every subscription and pushes the result to connected clients. There is no pub/sub to configure.

4.2 Server-side subscribers¶

Agents and platform services are server-side subscribers of the same event table. The harness (section 5) reads event state inside its workflow steps; when a step waits for an external reply, it checks for a matching event and proceeds when one lands. The Outcome Collector that feeds the predictive planner (see 08 Agents Catalogue & Platform Services section 3) is a scheduled Convex function that reads the event table periodically, aggregates interaction outcomes, and writes execution_outcome rows.

4.3 Fan-out¶

Fan-out happens implicitly via query subscriptions. One event write reaches:

Every UI client subscribed to a matching query (live).
Every workflow whose current step reads the event kind (on next poll or step resumption).
Every webhook subscription (via the webhook delivery path in section 10).
Every downstream Convex scheduled function reading the event table on its cadence.

There is no broker to manage, no consumer group to balance, no partition rebalance.

4.4 Retention¶

Events persist 90 days by default. The platform admin configures longer retention per account for compliance. A nightly sweep archives events older than the retention window to R2 (JSONL files, one per account per day) and hard-deletes the Convex rows. Archived events remain queryable via an ad-hoc restore action; they are not reactive.

5. The durable harness on the Workflow component¶

Agent turns run as Convex workflows. The Workflow component (documented in 11 Convex Reference) provides the durability, step-level state, retry, and resumption semantics that the legacy Go HarnessExecutor provided, without running any of our own infrastructure.

5.1 Why a Workflow¶

A simple mutation is not sufficient for a turn: a turn may take seconds to minutes, call external LLMs, call tools that call other LLMs, delegate to sub-agents, and must survive Convex process restarts without losing state or duplicating side effects. Workflows give step durability, idempotency enforcement, scheduled resumption, and observability.

5.2 Workflow shape¶

// convex/harness.ts
export const runTurn = workflow.run({
  args: {
    interactionId: v.id("interaction"),
  },
  handler: async (step, { interactionId }) => {
    const turn = await step.runQuery(api.harness.loadTurn, { interactionId });
    await step.runMutation(api.harness.recordStart, { interactionId });

    const policy = await step.runQuery(api.governance.resolvePolicy, {
      accountId: turn.accountId,
      agentId: turn.agentId,
    });

    const retrieval = await step.runAction(api.retrieval.gather, {
      query: turn.userMessage,
      agentId: turn.agentId,
    });

    const plan = await step.runAction(api.llm.plan, {
      turn,
      retrieval,
      policy,
    });

    for (const action of plan.actions) {
      await step.runAction(api.tools.invoke, {
        turn,
        action,
        policy,
      });
    }

    await step.runMutation(api.harness.recordComplete, {
      interactionId,
      plan,
    });
  },
});

Each step.runX call is a durable step. Its input and output are persisted before and after execution. If the Convex process dies mid-step, the workflow resumes by re-executing that step with its persisted input. Steps are expected to be idempotent, and the harness enforces idempotency keys for the side-effectful ones (tool calls, message writes).

5.3 Invocation¶

A turn starts when an inbound message lands:

channel HTTP action -> mutation "messages.receive"
  -> insert message into message table
  -> insert event "message.received"
  -> insert interaction row (status = running)
  -> workflow.start(runTurn, { interactionId })

Workflow start is a scheduled action that enqueues the turn; it returns immediately. The actual execution runs asynchronously against Tier 2 Workflow slots (see 02 System Architecture section 12).

5.4 Relationship to the legacy harness¶

The conceptual model is identical: a turn is a sequence of think-act-observe steps with durability at every boundary. The implementation moved from "a Go struct with a state machine backed by Postgres rows and Redis locks" to "a Workflow component whose step state is a first-class Convex primitive". The observable behaviour from the agent's perspective is unchanged.

6. Step lifecycle and resumability¶

6.1 Step states¶

A step moves through: pending (input persisted, not yet run) → running → completed or failed. On failed, a retry policy decides whether to re-enter pending with a backoff or to abandon the workflow.

6.2 Step persistence¶

Every step records:

Step kind (query, mutation, action, scheduled resume).
Input: the exact args passed to the step function.
Output: the result, on success.
Error: message, classification, retry count, on failure.
Start time, end time.

The Workflow component stores these rows in a dedicated internal table. They are queryable via the component's admin API, which Thinklio surfaces in the admin UI for any workflow that reached a terminal state, plus any in-flight workflow that a user might want to inspect.

6.3 Retry and backoff¶

Steps declare their retry policy at definition time. Defaults:

Step kind	Retries	Backoff
`runQuery`	0	Queries are deterministic; failure is a bug.
`runMutation`	2	Short backoff (1s, 5s). Failure after two retries abandons the workflow.
`runAction`	Configurable	Typical: 3 retries with exponential backoff (30s, 2m, 10m).
External LLM call	5	Exponential with jitter, capped at 30 min.

6.4 Idempotency¶

Side-effectful steps carry idempotency keys derived from the workflow ID and step index. A retry re-emits the same key. External systems that honour idempotency (Postmark, Telegram, the LLM providers) receive duplicate keys and ignore duplicates. For systems that do not, the harness inserts a mutation-level guard: "have I already recorded this side effect?"

6.5 Resumption¶

A workflow is resumable at any step. On Convex process restart, Convex's scheduler re-delivers in-flight workflows to a worker which resumes at the last recorded step. For long-lived workflows (delegation, human-in-the-loop waits), the workflow can be explicitly suspended with step.sleep or an event wait; it reclaims its slot budget immediately and resumes when the wait resolves.

6.6 Observability¶

The admin UI surfaces, per workflow:

Current step and its input/output.
Step history with durations.
Retry counts.
Elapsed wall-clock time vs active time.
Reason for suspension, if suspended.

This was an investment we explicitly made when migrating from the Go harness: observability into turn execution is critical when turns span minutes.

7. Channel identity model¶

External users arrive via external identities: a Telegram user ID, an email address, a phone number. Internally, Thinklio only cares about platform users. The channel identity layer maps between the two.

7.1 Entities¶

Table	Purpose
`channel`	A conceptual channel (DM, group, email thread). Carries type, settings, participants.
`user_channel`	A user's participation in a channel. Many-to-many between `user_profile` and `channel`.
`channel_identity`	A mapping from an external identity to a platform user within a channel type. For example, the mapping from a Telegram user ID to a Thinklio user, valid only when the user is transacting over Telegram.

7.2 Identity resolution¶

When an inbound message arrives, the ingress HTTP action extracts the external identity, resolves it:

inbound.telegram { from: tgUserId, text }
  -> channel_identity lookup by (channelType=telegram, externalId=tgUserId)
  -> if found: resolve to platform userId
  -> if not found: invoke onboarding flow (see 08 Agents Catalogue sec 2.12)
  -> identify target channel (DM, group) -> chatId
  -> mutation messages.receive { userId, chatId, text, channelType }

The resolution is cached in a Convex query, which benefits from the reactive cache: a repeat lookup for the same external identity within a short window costs nothing.

7.3 Multiple channels per user¶

A single platform user commonly has channels across multiple transports: a Telegram DM, an email alias, a web-app session. Each is a separate user_channel row, all linked to the same user_profile. Delivery decisions (section 10) use the set of active user_channel rows to choose where to send a response.

7.4 Channel types¶

The channel type is an enum:

web — the Thinklio web or mobile app chat surface.
email — bidirectional email via Postmark.
telegram — inbound and outbound via Telegram Bot API.
api — programmatic Channel API client (headless).
whatsapp — planned, not yet implemented.
sms — planned, not yet implemented.
voice — planned, significant design work pending.

8. Channel connection lifecycle¶

8.1 States¶

A user_channel row carries a lifecycle status:

pending — the connection has been initiated but not verified. Email: confirmation link not yet clicked. Telegram: bot added but first message not received.
active — verified and usable.
paused — user or admin has temporarily disabled delivery on this channel.
revoked — user or admin has revoked permission. Historical messages retained; no further delivery.

8.2 Connection flow¶

user invokes "connect Telegram"
  -> platform creates user_channel { status: pending, verificationToken }
  -> user messages bot with the token
  -> bot webhook (HTTP action) verifies, transitions to active
  -> channel_identity row created linking TG user ID to platform user

Analogous flows for email (verify link in a confirmation email) and for API keys (provisioned by the user via the platform API).

8.3 Admin control¶

Account admins can suspend or revoke any user's channel. Platform admins can do the same platform-wide. Suspend transitions to paused; revoke transitions to revoked. Suspended users see no new agent output; revoked channels are disconnected and require a fresh invocation to reinstate.

8.4 Inactivity¶

A channel idle for 12 months transitions to paused automatically via a scheduled sweep. Accounts can override the inactivity threshold.

9. Channel ingress via HTTP actions¶

Every external transport reaches Convex through an HTTP action. An HTTP action is a Convex function that receives an HTTP request, authenticates it, and triggers mutations or actions to process the payload.

9.1 Shape¶

// convex/http.ts
http.route({
  path: "/webhook/telegram/:botId",
  method: "POST",
  handler: httpAction(async (ctx, request) => {
    const { botId } = getPathParams(request);
    const body = await request.json();
    await verifyTelegramSignature(request, botId);

    // Convert to internal event shape and hand off.
    await ctx.runMutation(api.channels.telegram.ingest, {
      botId,
      update: body,
    });

    return new Response("ok", { status: 200 });
  }),
});

The HTTP action is the transport-specific adapter. It validates the webhook signature, extracts the payload, and hands it to a transport-specific mutation that:

Resolves the sender via channel_identity.
Identifies or creates the channel.
Writes the inbound message.
Emits message.received.
Starts the harness workflow.

9.2 Transports currently routed through HTTP actions¶

Postmark inbound email (webhook).
Telegram Bot API (webhook).
The Channel API programmatic client.

Planned:

WhatsApp Business webhook.
Twilio SMS webhook.

9.3 Rate limiting¶

Convex's platform rate limits apply to HTTP actions. For transports with bursty input (group chats, email blasts), a lightweight token bucket runs inside the ingress mutation to smooth the write rate on the hot path. Heavier smoothing, if ever required, is done by moving the transport into a dedicated Go service (section 14).

9.4 Signature verification¶

Every inbound webhook verifies the transport's signature scheme:

Postmark: HMAC SHA256 over the body using the webhook token.
Telegram: secret token in header.
Custom API clients: Thinklio API key validated via Clerk-linked metadata or the platform API key store.

Verification happens before any database work.

9.5 Why HTTP actions instead of a dedicated service¶

Convex HTTP actions today handle every inbound rate Thinklio has observed. They start cold in the tens of milliseconds, scale out automatically, and remove an entire deployment and observability surface. They are also trivially testable. The trade-offs (cold-start latency, per-action timeouts, limited concurrency ceiling) are well within what the current product requires. When any of those ceilings start to bite, the Tier 3 promotion path in section 14 is ready.

10. Multi-channel response delivery¶

An agent composes one response. The delivery path chooses which of the user's channels it goes out on.

10.1 Delivery decision¶

Inputs:

The user's active user_channel rows.
The channel the inbound message arrived on (strong default to reply on the same channel).
Agent-configured delivery preferences (some agents always reply via email, for example).
User preferences (a user can configure "never over SMS").
The reply's content shape (rich attachments favour channels that render them).

Algorithm:

1. If the inbound channel is usable (active, user has not opted out of this reply kind): use it.
2. Else: prefer the user's highest-priority active channel.
3. If no channel is usable: queue the reply on the user's web inbox and emit a fallback notification.
4. If the reply exceeds channel limits (SMS length, Telegram attachment size): split or degrade.

10.2 Delivery mutations¶

mutation messages.send { interactionId, replyBody, targetChannel }
  -> insert message into message table
  -> insert event "message.sent"
  -> enqueue channel-specific egress action

The egress action formats the message for the specific transport and calls the transport's outbound API (Postmark send, Telegram send, web push). Egress actions are idempotent: duplicate invocations with the same messageId do not duplicate sends.

10.3 Rich content and degradation¶

Agents produce responses with structured content (text, tables, citations, attachments). Each transport renders what it can:

Web app: full fidelity.
Email: HTML with inline attachments.
Telegram: Markdown with attachment support up to transport limits.
SMS (planned): plain text only; long responses truncated with a "see the app" link.

Degradation rules live in channel_config and are applied in egress actions.

10.4 Delivery confirmations¶

When a transport provides delivery confirmation (Postmark bounce/delivery webhooks, Telegram delivery status), the inbound HTTP action writes a delivery.confirmed or delivery.failed event. The UI surfaces delivery state in the chat view.

10.5 Webhook delivery¶

External Integration API subscribers (see 09 External API & Tool Integration) receive events via webhooks. Webhook delivery is a Convex action triggered by a scheduled function that reads the event table against each active subscription, signs the payload, POSTs it, and records the delivery result. Failed deliveries retry with exponential backoff.

11. Email channel: bidirectional Postmark¶

Thinklio runs bidirectional email through Postmark. Every agent has a human-readable email alias; users interact with agents by emailing the alias as if it were a person.

11.1 Alias shape¶

{agentHandle}@{accountSubdomain}.thinklio.com

Examples:
dion@acme.thinklio.com          the Dion research agent for account "acme"
coach@acme.thinklio.com         the Coach agent
finance@acme.thinklio.com       the Finance agent

Reply-all threads preserve multi-party context: a user CCs a colleague, the agent threads the colleague in naturally.

11.2 Inbound flow¶

user sends email to dion@acme.thinklio.com
  -> Postmark routes to inbound webhook
  -> HTTP action verifies signature, extracts sender, subject, body, attachments
  -> channel_identity resolves sender email to platform user
  -> channel resolution: lookup or create an email thread channel by Postmark MessageID / InReplyTo
  -> attachments saved to R2 via presigned upload (see 05 Persistence sec 9)
  -> message row inserted, event emitted
  -> harness workflow starts for Dion agent

11.3 Outbound flow¶

Agent reply gets formatted as HTML + plain-text multipart, signed with DKIM via Postmark, and sent to the thread's participants. Threading headers preserve the chat so the reply appears in the user's existing email thread.

11.4 Signed replies¶

Emails sent by Thinklio carry a DKIM signature proving authenticity. The account's SPF record includes Postmark, avoiding spam classification. For enterprise accounts with strict email policies, Thinklio supports a "send via your own Postmark server" configuration so the emails come from the account's own DKIM/SPF footprint.

11.5 Bounce and complaint handling¶

Postmark's delivery and complaint webhooks write delivery.bounced and delivery.complained events. The channel lifecycle transitions email channels with repeated bounces to paused. Complaints (spam reports) transition to revoked immediately.

11.6 Limits¶

Postmark's attachment size cap is 10 MB per message. Thinklio's own limits are set at 20 MB per message (attachments stored in R2; the email carries a signed download link for anything over Postmark's cap).

12. Admin control of channels¶

12.1 Admin capabilities¶

Role	Can
Platform admin	Suspend any user globally; configure platform-level channel policies; disable an entire transport across the platform.
Account admin	Suspend channels for users in their account; configure account channel policies; rotate channel credentials (for example, Postmark server token).
User	Pause or revoke their own channels; update channel preferences (delivery preferences, do-not-disturb).

12.2 Audit¶

Every admin action against a channel produces an audit_event. Suspensions include actor, target, reason, timestamp. See 07 Security & Governance for the audit model.

12.3 Emergency controls¶

The platform admin can disable an entire transport platform-wide in an emergency (credential compromise, abuse event). This is a single-mutation toggle that the ingress HTTP actions check on every inbound. A disabled transport refuses inbound and sends a canned response.

13. Messaging UX conventions¶

The client applications (10 Client Applications & UX) implement a consistent messaging UX across surfaces. The conventions drive both the schema shape and the reactive-query patterns this layer supports.

13.1 Delivery receipts¶

Every message has a delivery state: pending, delivered, read, failed. Reactive queries surface the state live. On the web app the sender sees a ticked indicator; on mobile a native-style receipt. The state updates flow from the same event pipeline: delivery.confirmed events update the message row, subscribers re-render.

13.2 Typing indicators¶

An agent producing a response writes a message.typing event at the start of the turn and clears it when the response is written. Subscribers render a typing indicator. The event is short-lived and not persisted long term (swept hourly).

13.3 Read receipts¶

Read state is tracked per user per message in a message_read join table. Reading a message in the UI writes a read marker via a debounced mutation. Cross-channel read tracking aligns with each transport's native semantics: email messages are marked read when opened in the app, not via mail client tracking pixels.

13.4 Threads and replies¶

Threaded replies are modelled as messages with a replyToId. Client UIs render threads based on this link. Across transports, thread affinity depends on transport capability: email preserves threading natively via In-Reply-To; Telegram uses reply-to message IDs; SMS does not preserve threading and responses are linearised.

13.5 Mentions¶

In-channel mentions (of users, agents, tasks, items) are recorded as message_mention rows linking the message to the target entity. This enables live notifications ("you were mentioned by the Coach agent in the Q3 planning thread") and surfaces in the target entity's activity feed.

13.6 Presence¶

Presence (who is online) is derived from the UI's last-active timestamp. It is not a hot-path concern and is updated on a throttled cadence. Agents do not have presence; they are always available.

13.7 Drafts¶

Drafts are local to the client. They are not persisted to Convex, since they are personal, private, and volatile. Mobile clients store drafts in platform-native storage.

14. Tier 3 promotion to Go services¶

A dedicated Go service is not part of the current operational stack, but it is a reserved promotion path for specific runtime domains where Convex HTTP actions or Workflow slots are no longer sufficient.

14.1 Candidates for promotion¶

Domain	Why a Go service might be warranted later
Channel ingress	Very high inbound rate (SMS blast, WhatsApp business-scale traffic), long-running streaming connections (voice), or transports requiring persistent sockets not modelled well by HTTP actions.
External API surface	Sustained request rate approaching Convex HTTP action concurrency limits; heavy per-request CPU work (media transcoding, ML-enriched parsing) better served near the edge.
Egress connection pooling	Outbound calls to a single provider at rates that Convex actions cannot pool efficiently.
Voice	Real-time voice streams are a poor fit for Convex's request-response model and are a natural candidate for a Go or native service.

14.2 Promotion criteria¶

Promotion is analytics-driven. A domain is promoted when monitoring shows:

Sustained Convex HTTP action latency above product target (for example, p95 above 1 second) attributable to cold starts or concurrency contention.
Workflow slot occupancy sustained above 90% with the same shape of workload.
Transport-specific needs (persistent sockets, streaming, very long-lived connections) that fit badly inside the action model.

None of these have been observed at current scale.

14.3 Promotion shape¶

A promoted Go service sits in front of Convex, not beside it. It runs close to the transport, does the transport-specific work (parse the inbound, verify the signature, buffer the outbound stream), and emits events into Convex via the same mutations that HTTP actions use. The reactive core remains the source of truth. The promoted service is stateless.

14.4 Reversal¶

A promoted service can be demoted back to HTTP actions if analytics show the justification has disappeared. Demotion is the Tier 3 → Tier 2 path described in 02 System Architecture section 12, applied to this layer.

15. Edge cases, security, and failure modes¶

15.1 Double-delivery¶

A webhook that retries can cause duplicate inbound messages. Ingress mutations deduplicate by transport-specific message ID (Postmark MessageID, Telegram update ID). Duplicates are dropped silently.

15.2 Out-of-order arrival¶

Webhooks arrive out of order. The ingress mutation orders messages by the transport's own sequencing (for example, Postmark's Date header) rather than arrival time, so the channel timeline reflects true send order.

15.3 Spoofed inbound¶

Without signature verification, a well-formed POST could impersonate a transport. Every HTTP action verifies the transport's signature before any database work. Unverified requests receive a 401 with no body.

15.4 Credential compromise¶

A compromised transport credential (Postmark token, Telegram bot token) is rotated at the transport and then rotated in Convex via the platform admin UI. All in-flight webhooks verifying with the old token are rejected. A new token becomes active immediately. The secrets vault (07 Security & Governance) holds the credentials.

15.5 User suspension¶

When a user is suspended, every HTTP action checks user_profile.status before dispatching. Suspended users see a canned response (or no response at all, depending on policy) and no event fires into the harness. This works regardless of which specific user_channel is inbound.

15.6 Outbound provider outage¶

If Postmark or Telegram are down, outbound egress actions fail and retry on the standard action backoff. After abandonment, the message is marked delivery.failed and a fallback notification is surfaced in the user's web inbox. Outages of under an hour typically resolve without user-visible impact.

15.7 Workflow runaway¶

A workflow that fails to make progress (step retries exhausted, then retries exhausted again) is marked abandoned and an audit event emitted. The admin UI surfaces abandoned workflows; a platform admin can re-enqueue them manually. Abandoned workflows do not reclaim their slots automatically; they need explicit action to avoid masking a real product bug.

16. Monitoring¶

16.1 Key metrics¶

Metric	What it tells you
Inbound rate per transport	Traffic shape and seasonality.
Webhook signature failure rate	Credential health or attempted spoofing.
HTTP action latency p50/p95/p99	Ingress responsiveness.
Workflow slot occupancy	Proximity to Tier 2 saturation.
Workflow success rate by agent	Harness reliability.
Average turn wall-clock	End-to-end latency from inbound to final reply.
Outbound delivery success rate	Egress reliability per transport.
Event table growth rate	Retention pressure and archive cadence.
Bounce/complaint rate (email)	Sender reputation.

16.2 Alerting¶

Webhook signature failure rate > 1% over 15 minutes (suggests spoofing or credential drift).
Workflow slot occupancy > 90% for 5 minutes (Tier 3 promotion trigger).
Outbound delivery success rate < 98% for any transport over 15 minutes (transport outage).
Abandoned workflows accumulating faster than they are reviewed.
Event table growth anomaly (spike > 5x baseline for 10 minutes; likely runaway or abuse).

16.3 Admin surfaces¶

Per-account channel health, per-agent turn success rate, per-workflow step view, event timeline for a given turn. These are implemented in the admin dashboard; see 12 Developer Guide.

17. Implementation phases¶

Phase 1: Reactive core (complete)¶

Event table, reactive subscription patterns, standard indexes.
Workflow-backed harness with step-level durability.
Telegram ingress via HTTP action.
Web-app inbox as the primary channel surface.

Phase 2: Email channel¶

Postmark inbound webhook and outbound send.
Bidirectional threading.
Bounce and complaint handling.
Enterprise own-Postmark configuration.

Phase 3: Multi-channel delivery¶

Delivery decision algorithm productionised.
Delivery receipts, read receipts, typing indicators surfaced live.
Webhook delivery for Integration API subscribers.

Phase 4: New transports¶

WhatsApp Business.
Twilio SMS.
Programmatic Channel API (first external developers).

Phase 5: Scale-out¶

Tier 3 promotion of the highest-volume transport if analytics justify it.
Voice transport design and pilot.
Cross-transport mentions and unified notifications.

Status tracking is in 13 Implementation Plan & Status.

18. Revision history¶

Date	Version	Change
2026-04-17	1.0.0	Initial consolidated release. Merged old docs 07 (Event System & Harness v03), 25 (Channel Architecture v01), 27 (Email Channel Postmark v01), and 49 (Messaging UX v01).
2026-04-17	2.0.0	Full rewrite for Convex-first event, harness, and channel model. The previous version (Redis Streams + Go HarnessExecutor + Go channel bridges) is preserved under `archive/legacy-event-system-harness-design.md`. This version covers the event table as the reactive substrate, the Convex Workflow component as the durable harness, HTTP actions for channel ingress, and the Tier 3 promotion path for dedicated Go services as an analytics-driven future option.