> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/developer-guides/architecture-decision-log/adr-0042-notification-fail-soft-fan-out.md).

# ADR-0042: Notification fan-out is fail-soft per channel

## Status

**Accepted.** Reconstructed from the codebase on 2026-05-30; the decision is live in the source today.

## Context

A single alert can fan out to several notification channels (Slack, webhook, email). Any one of them can fail transiently — a webhook endpoint is down, an SMTP server times out. The platform has to decide what happens to the *other* channels, and to the *next* alert, when one send fails: stop the whole fan-out, or carry on.

## Decision

**Fan-out is fail-soft per channel: a send failure is caught, logged at ERROR naming the channel, and the loop continues to the next channel.** `AlertNotificationMessageProcessor.process` iterates the configured senders and calls each inside a `try/catch (NotificationSenderException)`; on exception it logs the failing channel's `receiverId()` and proceeds to the next sender. The exception does not propagate, so the next sender still runs and the next WAL message is still processed.

The decision encodes "one bad channel does not block the others" as the operational stance. The alternative — let the first failure abort the fan-out — would couple every channel's delivery to the least reliable one, and would stall WAL progress behind a single bad endpoint.

## Consequences

* A misconfigured or down channel does not stop the others: Slack still gets the alert if the webhook is failing, and the WAL keeps advancing rather than wedging behind a failed send.
* 📌 **Partial failure is operator-visible only in logs.** Because the failure is caught and logged rather than surfaced, the platform keeps no delivery-status record, counter, or alert for a channel that is silently failing — an operator learns of a dead channel only by inspecting ERROR logs. Closing that blind spot (a delivery audit trail or a failure metric) would be an additive change that does not alter the fail-soft stance.
* The stance is consistent with the platform's broader "best-effort across a list of independent operations" convention (the same continue-on-failure shape used by the partition-management orchestrator).

## Evidence

* `odd-platform-api/.../notification/processor/AlertNotificationMessageProcessor.java:26-35` — the fan-out loop: `for (… notificationSender : notificationSenders) { try { notificationSender.send(notificationMessage); } catch (NotificationSenderException e) { log.error(…"Error occurred while sending notification via %s"…, notificationSender.receiverId()…); } }` — caught, logged, loop continues; no rethrow.
* `odd-platform-api/.../notification/processor/AlertNotificationMessageProcessor.java:19` — `private final List<NotificationSender<AlertNotificationMessage>> notificationSenders;` — the fan-out target is the list of activated channel senders (per ADR-0041).

## See also

* [ADR-0041 — Notification channels activate by the presence of their keys](/developer-guides/architecture-decision-log/adr-0041-notification-per-channel-presence-activation.md) — what populates the list of senders this fan-out iterates.
* [ADR-0043 — Notification WAL consumer is a leader-elected singleton](/developer-guides/architecture-decision-log/adr-0043-notification-wal-single-leader.md) — the WAL loop whose progress fail-soft fan-out protects.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/developer-guides/architecture-decision-log/adr-0042-notification-fail-soft-fan-out.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
