> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/developer-guides/architecture-decision-log/adr-0043-notification-wal-single-leader.md).

# ADR-0043: The notification WAL consumer is a leader-elected singleton

## Status

**Accepted.** Reconstructed from the codebase on 2026-05-30; the decision is live in the source today.

## Context

Notifications are driven by the Postgres write-ahead log: when an alert row is written, a logical-replication consumer reads it and fans it out. In a multi-replica deployment, if every replica consumed the WAL, every alert would be delivered N times. The platform needs exactly-once-cluster-wide consumption — and, in keeping with its Postgres-as-only-runtime-dependency posture, without introducing an external coordinator (ZooKeeper/Consul/etcd) for leader election.

## Decision

**The WAL consumer runs on a single thread that holds a Postgres advisory lock; only the lock-holding replica consumes.** At application startup (`ApplicationReadyEvent`), `NotificationSubscriberStarter` submits the subscriber to a single-thread executor whose thread is named `notification-subscriber-thread`. The subscriber's first action is `leaderElectionManager.acquire(walProperties.getAdvisoryLockId(), true)` — the **blocking** form, so a replica that is not the leader blocks here and never reads the WAL. Only the replica that holds the advisory lock opens the logical-replication stream and processes messages.

The advisory lock id is operator-tunable (`notifications.wal.advisory-lock-id`, default `100`), drawn from the same disjoint per-subsystem namespace as the Data Collaboration sender (ADR-0020). On the leader, consumption is single-threaded by construction (one executor thread), so WAL messages are processed in order. If the leader dies, it drops the lock and a waiting replica acquires it and takes over.

This is the **same single-leader-via-Postgres-advisory-lock mechanism** as ADR-0020's outbound Slack sender; the two are instances of one cluster-coordination convention, each keyed by a distinct lock id.

## Consequences

* Each alert is consumed once cluster-wide: non-leader replicas block on the lock and never double-deliver, with no external coordinator — Postgres is the only dependency.
* Consumption does not scale horizontally — adding replicas adds standby leaders, not parallel consumers; throughput is bounded by the single consumer thread. This is intentional (ordering + exactly-once over throughput).
* Failover is automatic via advisory-lock release semantics: killing the leader frees the lock and a standby takes over on its next acquire attempt.
* Because the single thread both reads the WAL and drives fan-out, a sender that blocks the thread would stall consumption — which is exactly why fan-out is fail-soft (ADR-0042), so one slow/broken channel cannot wedge the WAL.

## Evidence

* `odd-platform-api/.../notification/NotificationSubscriberStarter.java:21-23` — `Executors.newSingleThreadExecutor(r -> new Thread(r, "notification-subscriber-thread"))`; `:30-35` — `@EventListener(ApplicationReadyEvent.class)` submits the subscriber at startup.
* `odd-platform-api/.../notification/NotificationSubscriber.java:47` — `leaderElectionManager.acquire(walProperties.getAdvisoryLockId(), true)` (blocking acquire) wraps the replication-stream loop; non-leaders block here.
* `odd-platform-api/.../leaderelection/PostgreSQLLeaderElectionManagerImpl.java:21-23` — `acquire(...)` prepares and `execute()`s `SELECT pg_advisory_lock(<id>)` — the **blocking** Postgres lock function (not the `try_` variant), so the call returns only once the lock is held; the connection is then returned and kept open to hold the lock for the session.
* `odd-platform-api/src/main/resources/application.yml:177` — `advisory-lock-id: 100` under `notifications.wal`, the operator-tunable lock id.

## See also

* [ADR-0020 — Outbound Slack delivery is decoupled via a Postgres queue](/developer-guides/architecture-decision-log/adr-0020-decoupled-outbound-slack-delivery.md) — the same single-leader Postgres-advisory-lock mechanism, applied to outbound delivery (distinct lock id).
* [ADR-0042 — Notification fan-out is fail-soft per channel](/developer-guides/architecture-decision-log/adr-0042-notification-fail-soft-fan-out.md) — keeps a bad channel from stalling this single consumer thread.
* [ADR-0044 — Postgres replication artefacts are lazy-created, never dropped](/developer-guides/architecture-decision-log/adr-0044-postgres-artefact-lazy-create-no-drop.md) — the slot and publication this consumer relies on.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/developer-guides/architecture-decision-log/adr-0043-notification-wal-single-leader.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
