ADR-0020: Outbound Slack delivery is decoupled via a Postgres queue
ODD Platform accepts a Slack message with 202, persists it, and delivers it from a background worker that holds a Postgres advisory lock — so delivery needs no message broker, only Postgres.
Status
Accepted. Reconstructed from the codebase on 2026-05-30; the decision is live in the source today.
Context
When a user posts a Data Collaboration message to Slack, the platform calls an external API (chat.postMessage) that can be slow, rate-limited, or temporarily down. Doing that call inline on the request thread would couple the user's HTTP latency to Slack's, and would lose the message if the request failed mid-call. The platform also runs in multi-replica deployments, so naive background delivery would risk two replicas sending the same message twice.
The platform needed durable, decoupled, exactly-once-per-message delivery — ideally without adding a message broker to its runtime dependencies (ODD's posture is Postgres-as-only-runtime-dependency).
Decision
The post-message request persists the message and returns 202 Accepted; a background worker delivers it later under a Postgres advisory lock. The controller does not call Slack inline — it creates the message row and responds 202 (ResponseEntity.status(HttpStatus.ACCEPTED)), signalling "accepted, delivery is asynchronous."
Delivery is handled by DataCollaborationMessageSenderJob, which acquires a Postgres advisory lock (leaderElectionManager.acquire(senderMessageAdvisoryLockId, true), the blocking form) before draining the queue. The advisory lock is the cross-replica coordination primitive: only the lock-holding replica sends, so a message is delivered once across the whole deployment. The lock id is an operator-tunable property (datacollaboration.sender-message-advisory-lock-id, default 120), drawn from a disjoint per-subsystem namespace so the platform's several single-leader workers don't collide.
The worker drains candidates one at a time, calls the provider client, and on failure retries up to datacollaboration.sending-messages-retry-count (default 3, incrementing a per-message try-count) before marking the message failed; on success it records the provider message timestamp. Choosing a Postgres advisory lock over Redis/Kafka/SQS is the load-bearing decision — it keeps delivery coordination inside the database the platform already requires.
Consequences
The user's request latency is decoupled from Slack's — the post returns as soon as the message is durably queued, and a transient Slack outage delays delivery rather than failing the request.
Delivery is once-per-message cluster-wide without a broker: the advisory lock serialises sending to a single replica. The same single-leader-via-Postgres-advisory-lock mechanism coordinates the notifications WAL consumer (ADR-0043); the two use distinct lock ids from the shared namespace.
Because sending is single-leader, adding replicas does not increase Slack delivery throughput — outbound delivery is intentionally serialised, not horizontally scaled.
A caller that received
202cannot observe final delivery success from that response; terminal state lives on the message row (delivered, or failed after the retry budget). Surfacing post-202failure to the user is a known limitation of the decoupled model, not a property of this decision.
Evidence
odd-platform-api/.../datacollaboration/controller/DataCollaborationController.java:34-39—postMessageInSlackcreates the message and returnsResponseEntity.status(HttpStatus.ACCEPTED).body(message); no inline Slack call.odd-platform-api/.../datacollaboration/job/DataCollaborationMessageSenderJob.java:93-95—acquireLeaderElectionConnection()callsleaderElectionManager.acquire(dataCollaborationProperties.getSenderMessageAdvisoryLockId(), true)(blocking) before the drain loop.odd-platform-api/.../datacollaboration/job/DataCollaborationMessageSenderJob.java:36-67— the drain loop: pollgetSendingCandidate(),postMessage(...), and on exception retry (incrementMessageTryCount) ormarkMessageAsFailed;:89-91—shouldRetrybounds retries bygetSendingMessagesRetryCount().odd-platform-api/src/main/resources/application.yml:202,204—sender-message-advisory-lock-id: 120andsending-messages-retry-count: 3as operator-tunable properties.
See also
Data Collaboration — the feature and its message lifecycle.
ADR-0019 — Data Collaboration ships disabled by default — the feature must be enabled before this delivery path runs.
ADR-0043 — Notification WAL consumer is a leader-elected singleton — the same Postgres-advisory-lock single-leader mechanism, applied to notification delivery.
Last updated