> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/developer-guides/architecture-decision-log/adr-0043-notification-wal-single-leader.md).

# ADR-0043: The notification WAL consumer is a leader-elected singleton

## Status

**Accepted.** Reconstructed from the codebase on 2026-05-30; the decision is live in the source today.

## Context

Notifications are driven by the Postgres write-ahead log: when an alert row is written, a logical-replication consumer reads it and fans it out. In a multi-replica deployment, if every replica consumed the WAL, every alert would be delivered N times. The platform needs exactly-once-cluster-wide consumption — and, in keeping with its Postgres-as-only-runtime-dependency posture, without introducing an external coordinator (ZooKeeper/Consul/etcd) for leader election.

## Decision

**The WAL consumer runs on a single thread that holds a Postgres advisory lock; only the lock-holding replica consumes.** At application startup (`ApplicationReadyEvent`), `NotificationSubscriberStarter` submits the subscriber to a single-thread executor whose thread is named `notification-subscriber-thread`. The subscriber's first action is `leaderElectionManager.acquire(walProperties.getAdvisoryLockId(), true)` — the **blocking** form, so a replica that is not the leader blocks here and never reads the WAL. Only the replica that holds the advisory lock opens the logical-replication stream and processes messages.

The advisory lock id is operator-tunable (`notifications.wal.advisory-lock-id`, default `100`), drawn from the same disjoint per-subsystem namespace as the Data Collaboration sender (ADR-0020). On the leader, consumption is single-threaded by construction (one executor thread), so WAL messages are processed in order. If the leader dies, it drops the lock and a waiting replica acquires it and takes over.

This is the **same single-leader-via-Postgres-advisory-lock mechanism** as ADR-0020's outbound Slack sender; the two are instances of one cluster-coordination convention, each keyed by a distinct lock id.

## Consequences

* Each alert is consumed once cluster-wide: non-leader replicas block on the lock and never double-deliver, with no external coordinator — Postgres is the only dependency.
* Consumption does not scale horizontally — adding replicas adds standby leaders, not parallel consumers; throughput is bounded by the single consumer thread. This is intentional (ordering + exactly-once over throughput).
* Failover is automatic via advisory-lock release semantics: killing the leader frees the lock and a standby takes over on its next acquire attempt.
* Because the single thread both reads the WAL and drives fan-out, a sender that blocks the thread would stall consumption — which is exactly why fan-out is fail-soft (ADR-0042), so one slow/broken channel cannot wedge the WAL.

## Evidence

* `odd-platform-api/.../notification/NotificationSubscriberStarter.java:21-23` — `Executors.newSingleThreadExecutor(r -> new Thread(r, "notification-subscriber-thread"))`; `:30-35` — `@EventListener(ApplicationReadyEvent.class)` submits the subscriber at startup.
* `odd-platform-api/.../notification/NotificationSubscriber.java:47` — `leaderElectionManager.acquire(walProperties.getAdvisoryLockId(), true)` (blocking acquire) wraps the replication-stream loop; non-leaders block here.
* `odd-platform-api/.../leaderelection/PostgreSQLLeaderElectionManagerImpl.java:21-23` — `acquire(...)` prepares and `execute()`s `SELECT pg_advisory_lock(<id>)` — the **blocking** Postgres lock function (not the `try_` variant), so the call returns only once the lock is held; the connection is then returned and kept open to hold the lock for the session.
* `odd-platform-api/src/main/resources/application.yml:177` — `advisory-lock-id: 100` under `notifications.wal`, the operator-tunable lock id.

## See also

* [ADR-0020 — Outbound Slack delivery is decoupled via a Postgres queue](/developer-guides/architecture-decision-log/adr-0020-decoupled-outbound-slack-delivery.md) — the same single-leader Postgres-advisory-lock mechanism, applied to outbound delivery (distinct lock id).
* [ADR-0042 — Notification fan-out is fail-soft per channel](/developer-guides/architecture-decision-log/adr-0042-notification-fail-soft-fan-out.md) — keeps a bad channel from stalling this single consumer thread.
* [ADR-0044 — Postgres replication artefacts are lazy-created, never dropped](/developer-guides/architecture-decision-log/adr-0044-postgres-artefact-lazy-create-no-drop.md) — the slot and publication this consumer relies on.