> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/developer-guides/architecture-decision-log/adr-0020-decoupled-outbound-slack-delivery.md).

# ADR-0020: Outbound Slack delivery is decoupled via a Postgres queue

## Status

**Accepted.** Reconstructed from the codebase on 2026-05-30; the decision is live in the source today.

## Context

When a user posts a Data Collaboration message to Slack, the platform calls an external API (`chat.postMessage`) that can be slow, rate-limited, or temporarily down. Doing that call inline on the request thread would couple the user's HTTP latency to Slack's, and would lose the message if the request failed mid-call. The platform also runs in multi-replica deployments, so naive background delivery would risk two replicas sending the same message twice.

The platform needed durable, decoupled, exactly-once-per-message delivery — ideally without adding a message broker to its runtime dependencies (ODD's posture is Postgres-as-only-runtime-dependency).

## Decision

**The post-message request persists the message and returns `202 Accepted`; a background worker delivers it later under a Postgres advisory lock.** The controller does not call Slack inline — it creates the message row and responds `202` (`ResponseEntity.status(HttpStatus.ACCEPTED)`), signalling "accepted, delivery is asynchronous."

Delivery is handled by `DataCollaborationMessageSenderJob`, which **acquires a Postgres advisory lock** (`leaderElectionManager.acquire(senderMessageAdvisoryLockId, true)`, the blocking form) before draining the queue. The advisory lock is the cross-replica coordination primitive: only the lock-holding replica sends, so a message is delivered once across the whole deployment. The lock id is an operator-tunable property (`datacollaboration.sender-message-advisory-lock-id`, default `120`), drawn from a disjoint per-subsystem namespace so the platform's several single-leader workers don't collide.

The worker drains candidates one at a time, calls the provider client, and on failure **retries up to `datacollaboration.sending-messages-retry-count`** (default `3`, incrementing a per-message try-count) before marking the message failed; on success it records the provider message timestamp. Choosing a Postgres advisory lock over Redis/Kafka/SQS is the load-bearing decision — it keeps delivery coordination inside the database the platform already requires.

## Consequences

* The user's request latency is decoupled from Slack's — the post returns as soon as the message is durably queued, and a transient Slack outage delays delivery rather than failing the request.
* Delivery is **once-per-message cluster-wide** without a broker: the advisory lock serialises sending to a single replica. The same single-leader-via-Postgres-advisory-lock mechanism coordinates the notifications WAL consumer (ADR-0043); the two use distinct lock ids from the shared namespace.
* Because sending is single-leader, adding replicas does **not** increase Slack delivery throughput — outbound delivery is intentionally serialised, not horizontally scaled.
* A caller that received `202` cannot observe final delivery success from that response; terminal state lives on the message row (delivered, or failed after the retry budget). Surfacing post-`202` failure to the user is a known limitation of the decoupled model, not a property of this decision.

## Evidence

* `odd-platform-api/.../datacollaboration/controller/DataCollaborationController.java:34-39` — `postMessageInSlack` creates the message and returns `ResponseEntity.status(HttpStatus.ACCEPTED).body(message)`; no inline Slack call.
* `odd-platform-api/.../datacollaboration/job/DataCollaborationMessageSenderJob.java:93-95` — `acquireLeaderElectionConnection()` calls `leaderElectionManager.acquire(dataCollaborationProperties.getSenderMessageAdvisoryLockId(), true)` (blocking) before the drain loop.
* `odd-platform-api/.../datacollaboration/job/DataCollaborationMessageSenderJob.java:36-67` — the drain loop: poll `getSendingCandidate()`, `postMessage(...)`, and on exception retry (`incrementMessageTryCount`) or `markMessageAsFailed`; `:89-91` — `shouldRetry` bounds retries by `getSendingMessagesRetryCount()`.
* `odd-platform-api/src/main/resources/application.yml:202,204` — `sender-message-advisory-lock-id: 120` and `sending-messages-retry-count: 3` as operator-tunable properties.

## See also

* [Data Collaboration](/features/active-platform-features/data-collaboration.md) — the feature and its message lifecycle.
* [ADR-0019 — Data Collaboration ships disabled by default](/developer-guides/architecture-decision-log/adr-0019-data-collaboration-disabled-by-default.md) — the feature must be enabled before this delivery path runs.
* [ADR-0043 — Notification WAL consumer is a leader-elected singleton](/developer-guides/architecture-decision-log/adr-0043-notification-wal-single-leader.md) — the same Postgres-advisory-lock single-leader mechanism, applied to notification delivery.