# Configure ODD Platform

This page is the post-deployment configuration reference for the running Platform — every `application.yml` key the Platform consumes. For the deployment path itself (Docker Compose, Helm, AWS EKS, build from source), start at [Deployment Options](/configuration-and-deployment/deployment.md).

## Configuration approaches

There are two ways to configure the Platform:

* **Environment variables** are used for simple entries
* Configuring via **YAML** can come in handy when it is necessary to define a complex configuration block (e.g OAuth2 authentication or logging levels).

<details>

<summary>YAML entries VS environment variables</summary>

Here is an example of how to define the following block and configure the Platform with it using environment variables.

YAML:

```yaml
spring:
    datasource:
        url: URL
        username: USERNAME
        password: PASSWORD
    custom-datasource:
        url: URL
        username: USERNAME
        password: PASSWORD
```

To configure the Platform using environment variables, replace semicolons with underscores and uppercasing words, like so:

* `SPRING_DATASOURCE_URL=URL`
* `SPRING_DATASOURCE_USERNAME=USERNAME`
* `SPRING_DATASOURCE_PASSWORD=PASSWORD`
* `SPRING_CUSTOM_DATASOURCE_URL=URL`
* `SPRING_CUSTOM_DATASOURCE_USERNAME=USERNAME`
* `SPRING_CUSTOM_DATASOURCE_PASSWORD=PASSWORD`

</details>

## Connect your database

For all of its features ODD Platform uses PostgreSQL database and PostgreSQL database only. These variables are needed to be defined to connect ODD Platform to database:

* `spring.datasource.url`: [JDBC string](https://jdbc.postgresql.org/documentation/80/connect.html) of your PostgreSQL database. Default value is `jdbc:postgresql://127.0.0.1:5432/odd-platform`
* `spring.datasource.username`: your PostgreSQL user's name. Default value is `odd-platform`
* `spring.datasource.password`: your PostgreSQL user's password. Default value is `odd-platform-password`. **Override this before any non-localhost deployment** — see [Management endpoint exposure and credential hygiene](#management-endpoint-exposure-and-credential-hygiene) for why the shipped default is a load-bearing operator-override.

These variables are optional and will be used to connect to PostgreSQL and store Lookup Tables. Each of the three keys is declared in `R2DBCConfiguration` as `@Value("${spring.custom-datasource.X:}")` — the trailing colon with no value means **the @Value default is the empty string**, not the JDBC URL / username / password values listed below. When a key is unset (or blank), the bean factory falls back to the corresponding primary `spring.datasource.*` value at startup. The values below are therefore the **fallback** an operator observes with a default deployment, not the `spring.custom-datasource.*` keys' own defaults — so overriding `spring.datasource.url` will also change what `spring.custom-datasource.url` resolves to:

* `spring.custom-datasource.url`: [JDBC string](https://jdbc.postgresql.org/documentation/80/connect.html) of your PostgreSQL database where we store Lookup Tables. Falls back to `spring.datasource.url` when unset; the platform's primary `spring.datasource.url` default is `jdbc:postgresql://127.0.0.1:5432/odd-platform`. Note: you can specify any {database\_host}, {database\_port} or {database\_name} but schema, where Lookup Tables are stored always is lookup\_tables\_schema.
* `spring.custom-datasource.username`: your PostgreSQL user's name for custom-datasource. Falls back to `spring.datasource.username` when unset; the platform's primary `spring.datasource.username` default is `odd-platform`.
* `spring.custom-datasource.password`: your PostgreSQL user's password for custom-datasource. Falls back to `spring.datasource.password` when unset; the platform's primary `spring.datasource.password` default is `odd-platform-password`.

So that your database connection defining block would look like this:

{% tabs %}
{% tab title="YAML" %}

```yaml
spring:
    datasource:
        url: jdbc:postgresql://{database_host}:{database_port}/{database_name}
        username: {database_username}
        password: {database_password}
#    [OPTIONAL]
     custom-datasource:
        url: jdbc:postgresql://{database_host}:{database_port}/{database_name}
        username: {database_username}
        password: {database_password}
```

{% endtab %}

{% tab title="Environment variables" %}

```
SPRING_DATASOURCE_URL=jdbc:postgresql://{database_host}:{database_port}/{database_name}
SPRING_DATASOURCE_USERNAME={database_username}
SPRING_DATASOURCE_PASSWORD={database_password}
# [OPTIONAL]
SPRING_CUSTOM_DATASOURCE_URL=jdbc:postgresql://{database_host}:{database_port}/{database_name}
SPRING_CUSTOM_DATASOURCE_USERNAME={database_username}
SPRING_CUSTOM_DATASOURCE_PASSWORD={database_password}
```

{% endtab %}
{% endtabs %}

## Security

Please follow the [Enable security](/configuration-and-deployment/enable-security.md) section for enabling security in ODD Platform.

### Management endpoint exposure and credential hygiene

The platform's Spring Boot Actuator endpoints (`/actuator/**`) are intentionally **whitelisted ahead of the authentication chain** in every `auth.type`, and the shipped configuration enables the `env` and `info` endpoints. The shipped database password is a well-known string. Together these defaults turn a default deployment into a one-line-away-from-full-PostgreSQL-compromise system if exposed on a non-trusted network. The mitigations below are the operator's responsibility today. For the *monitoring* use of these endpoints — wiring liveness/readiness probes to `/actuator/health` and scraping `/actuator/prometheus` — see [Health and monitoring](/configuration-and-deployment/health-and-monitoring.md).

#### `/actuator/**` is anonymously reachable in every auth mode

`SecurityConstants.WHITELIST_PATHS` contains `/actuator/**`. Reachable before the auth chain runs in `DISABLED`, `LOGIN_FORM`, `OAUTH2`, and `LDAP` alike. The shipped `application.yml` enables `management.endpoint.env.enabled=true` but sets **no** `management.endpoint.env.show-values`, so the Spring Boot default (`NEVER`) applies — `/actuator/env` redacts *every* property **value** (`******`) for every caller, authenticated or not, including `spring.datasource.url`. What an unauthenticated caller scraping `/actuator/env` *does* learn is the **configuration-key schema**: which keys and property sources are present — which OAuth2 providers are wired (by their key prefixes), whether LDAP is configured, whether REMOTE attachment storage is set up, and that a JDBC datasource is configured (the key, not its value). Values stay masked unless an operator sets `show-values` to `WHEN_AUTHORIZED` or `ALWAYS`; the exposure to mitigate is the unauthenticated reachability of the endpoint and the configuration schema it reveals.

Apply at least one of the mitigations below for any deployment reachable from outside a fully-trusted network:

| Mitigation                 | How                                                                                                               | Recommended for                                                                       |
| -------------------------- | ----------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| Separate management port   | Set `management.server.port: 8081` and route `:8081` only on your internal management network.                    | All production deployments.                                                           |
| Firewall the actuator path | Add a reverse-proxy rule rejecting `/actuator/**` from the public CIDR range; allow only your monitoring network. | Single-port deployments where a separate management port is infeasible.               |
| Restrict default exposure  | Set `management.endpoints.web.exposure.include: health,prometheus` (drop `env, info`).                            | All production deployments — combine with one of the network-level mitigations above. |

A platform-side default-restriction is tracked upstream; until it ships, do not rely on the platform default for any reachable deployment.

#### The database password ships with a well-known default

`application.yml` ships `spring.datasource.password: odd-platform-password` as the default. An operator deploying ODD without explicitly overriding the property deploys with a public, documented credential. (The JDBC URL *value* is masked at `/actuator/env` under the default `show-values: NEVER` described above — but the password itself is public, and a database whose host is co-located with or guessable from the deployment topology is then one well-known credential away from compromise.) Override `spring.datasource.password` (and `spring.custom-datasource.password` if `spring.custom-datasource.*` is configured separately from the primary datasource) before exposing the platform on any non-localhost network. This is the same class of silent-insecure-default risk that previously affected attachment storage on container restart — read once, override before deployment, never assume the shipped default is safe.

#### Configuration-properties classes include credentials in `toString()`

`ODDLDAPProperties` and `ODDOAuth2Properties.OAuth2Provider` carry Lombok's `@Data` annotation alongside their `password` and `clientSecret` fields, respectively. `@Data` generates a `toString()` that includes every field verbatim — there is no `@ToString.Exclude` on any credential field today. A future log statement (`log.info("loaded properties: {}", properties)`) or an exception handler that emits properties on boot failure would write LDAP passwords and OAuth client secrets in cleartext to log infrastructure. Treat the platform's application logs as credential-sensitive: route them to an audit-grade log sink, redact at the log-pipeline tier if you cannot guarantee end-to-end access control, and do not store them in unrestricted long-term archives. A platform-side `@ToString.Exclude` rollout across credential fields is tracked upstream.

## Select session provider

ODD Platform stores HTTP session state in one of three places: the platform JVM (in-memory), the platform's PostgreSQL database, or an external Redis data store. The provider is selected with `session.provider` (`SESSION_PROVIDER` env var) and accepts one of three values:

* `IN_MEMORY` — sessions live in a `ConcurrentHashMap` inside the JVM. **ODD Platform defaults to this value.**
* `INTERNAL_POSTGRESQL` — sessions are persisted to the platform's PostgreSQL database (`SPRING_SESSION` / `SPRING_SESSION_ATTRIBUTES` tables).
* `REDIS` — sessions are persisted to an external [Redis](https://redis.io/) data store via Spring Session's `@EnableRedisWebSession`.

Quick selection guidance:

* Single-instance deployment, restart-tolerant logout acceptable → `IN_MEMORY`
* Multi-instance deployment or persistence across restarts is required → `INTERNAL_POSTGRESQL` (no extra infrastructure) or `REDIS` (if you already operate Redis or need sub-millisecond session reads)

Each provider has operator-visible characteristics that affect sizing, multi-instance behavior, and connection wiring. Read the relevant subsection before deploying.

### `IN_MEMORY` (default)

Sessions are kept in a `ConcurrentHashMap` inside the platform JVM, wrapped by Spring Session's `ReactiveMapSessionRepository`. Suitable for local development and single-instance evaluations where session loss on restart is acceptable.

#### Characteristics & caveats

* **Sessions are lost on every platform restart.** The session map lives in heap; any restart (deploy, crash, container recycle) clears it and forces every authenticated user to log in again.
* **No multi-instance support.** Two ODD Platform instances behind a load balancer each maintain a separate session map. A request that lands on a different instance than the one that authenticated the user appears unauthenticated. **Collector data-source registration is especially affected** — the `/ingestion/datasources` filter writes a `collectorId` into the request's session, and the subsequent `POST /ingestion/datasources` handler reads it back; if the two requests hit different replicas, the handler raises an `IllegalStateException("Collector id is null")` returned to the collector as HTTP 500. For multi-replica deployments choose `INTERNAL_POSTGRESQL` or `REDIS`.
* **Eviction is by Spring Session expiry only.** The repository wraps a raw `ConcurrentHashMap` with no secondary eviction policy (no LRU, no max-entries cap). A long-running platform with many short-lived sessions accumulates map entries until each entry's TTL elapses; high-traffic deployments running with the shipped default `spring.session.timeout: -1` (no timeout) accumulate sessions indefinitely. **Set a finite `spring.session.timeout`** (see [Session lifetime](#session-lifetime-spring-session-timeout) below) to bound the in-memory footprint.

{% tabs %}
{% tab title="YAML" %}

```yaml
session:
  provider: IN_MEMORY
```

{% endtab %}

{% tab title="Environment variables" %}

```
SESSION_PROVIDER=IN_MEMORY
```

{% endtab %}
{% endtabs %}

### `INTERNAL_POSTGRESQL`

Sessions are persisted in the platform's own PostgreSQL database, in the `SPRING_SESSION` and `SPRING_SESSION_ATTRIBUTES` tables. ODD Platform implements a custom JOOQ-based reactive `JooqSessionRepository` for this provider — the standard `spring.session.jdbc.*` Spring Session keys do **not** apply. Connection settings reuse the existing platform `spring.datasource.*` configuration; no additional database wiring is required.

#### Characteristics & caveats

* **Sessions survive platform restarts.** Authenticated users remain logged in across deploys (until their session row's TTL has passed).
* **Multi-instance support.** All ODD Platform instances point at the same database, share the session tables, and can serve requests for any authenticated user regardless of which instance answered the original login.
* **Expired-session cleanup runs hourly and is not configurable.** A `@Scheduled(fixedRate = 1, timeUnit = HOURS)` housekeeping job (`PostgreSQLSessionHousekeepingJobHandler.deleteExpiredSessions`) deletes rows whose `EXPIRY_TIME` is in the past from both `SPRING_SESSION` and `SPRING_SESSION_ATTRIBUTES`. Expired session rows therefore remain in the tables for **up to one hour past their TTL** before being cleaned. The cadence is hardcoded — there is no config key to tune it.
* **Sizing implication.** When sizing the database (connection pool, disk, vacuum schedule), assume the session tables hold the high-water-mark count of authenticated users plus up to one hour of post-expiry stragglers. For high-cardinality / short-TTL deployments (many users, short `spring.session.timeout`), the post-expiry overhang can dominate steady-state row count.

{% tabs %}
{% tab title="YAML" %}

```yaml
session:
  provider: INTERNAL_POSTGRESQL
```

{% endtab %}

{% tab title="Environment variables" %}

```
SESSION_PROVIDER=INTERNAL_POSTGRESQL
```

{% endtab %}
{% endtabs %}

### `REDIS`

Sessions are persisted to an external Redis data store via Spring Session's `@EnableRedisWebSession`. Suitable for multi-instance deployments that already operate Redis, or that need sub-millisecond session reads. ODD Platform does not bundle Redis; the operator must provide a Redis 6+ instance and supply its connection settings under the `spring.data.redis.*` namespace (Spring Boot 3.x; the legacy `spring.redis.*` prefix from Spring Boot 2.x has been removed and will not bind).

#### Characteristics & caveats

* **Sessions survive platform restarts and span instances** — same persistence behavior as `INTERNAL_POSTGRESQL`, but reads and writes happen against Redis directly.
* **Connection wiring is operator-supplied.** Unlike `INTERNAL_POSTGRESQL` (which reuses the platform's existing PostgreSQL connection), Redis settings must be configured separately. ODD Platform's `application.yml` ships **no Redis defaults** — every operator deploying with `REDIS` must set at least the host and port, plus credentials and TLS for any production deployment.
* **TLS, pool sizing, and command timeouts inherit Spring Data Redis defaults** unless explicitly overridden. For managed Redis providers (AWS ElastiCache, Redis Cloud, Azure Cache for Redis) and any TLS-required Redis deployment, set `spring.data.redis.ssl.enabled: true`. For high-concurrency deployments, tune the Lettuce connection pool with `spring.data.redis.lettuce.pool.*`.
* **Eviction is delegated to Redis.** ODD Platform does not run a housekeeping job for Redis-stored sessions; the Redis server's own per-key TTL and `maxmemory-policy` govern session eviction. Configure your Redis instance accordingly.

{% hint style="warning" %}
**The** [**health endpoint**](/configuration-and-deployment/health-and-monitoring.md) **is blind to Redis by default.** With `REDIS` selected, every authenticated request depends on Redis — but the bundled configuration ships `management.health.redis.enabled: false`, and the `REDIS` session wiring registers no health contributor of its own. A Redis outage (server down, unreachable, or evicting under `maxmemory`) therefore returns errors to every logged-in user while `/actuator/health` keeps reporting `UP` — a load balancer or Kubernetes readiness probe pointed at it keeps routing traffic to a platform that cannot serve a single authenticated request. If you deploy with `REDIS`, set `management.health.redis.enabled: true` so the Redis indicator participates in the health verdict, and do not rely on a bare `/actuator/health` probe alone to detect a session-store outage.
{% endhint %}

#### Required and optional connection keys (Spring Boot 3.x — `spring.data.redis.*`)

* `spring.data.redis.host`: Redis host. Defaults to `localhost`.
* `spring.data.redis.port`: Redis port. Defaults to `6379`.
* `spring.data.redis.username`: Redis ACL username. Optional; omit for password-only or no-auth Redis.
* `spring.data.redis.password`: Redis password. Optional but recommended for any production deployment.
* `spring.data.redis.database`: Redis logical database index. Defaults to `0`.
* `spring.data.redis.ssl.enabled`: enable TLS for the Redis connection. Boolean, defaults to `false`. Set to `true` for any managed-Redis or TLS-terminated Redis deployment.
* `spring.data.redis.timeout`: command timeout. Duration string (for example `5s`). Defaults to Spring Data Redis's internal default.
* `spring.data.redis.lettuce.pool.*`: Lettuce connection-pool sizing (`max-active`, `max-idle`, `min-idle`, `max-wait`). Optional; tune for high-concurrency deployments.

ODD Platform does not extend or override Spring Boot's Redis property catalogue — the full set of keys recognized under `spring.data.redis.*` in your Spring Boot version applies as-is.

{% tabs %}
{% tab title="YAML" %}

```yaml
session:
  provider: REDIS
spring:
  data:
    redis:
      host: redis.your-domain.com
      port: 6380
      username: odd-platform
      password: ${REDIS_PASSWORD}
      database: 0
      ssl:
        enabled: true
      timeout: 5s
```

{% endtab %}

{% tab title="Environment variables" %}

```
SESSION_PROVIDER=REDIS
SPRING_DATA_REDIS_HOST=redis.your-domain.com
SPRING_DATA_REDIS_PORT=6380
SPRING_DATA_REDIS_USERNAME=odd-platform
SPRING_DATA_REDIS_PASSWORD=...
SPRING_DATA_REDIS_DATABASE=0
SPRING_DATA_REDIS_SSL_ENABLED=true
SPRING_DATA_REDIS_TIMEOUT=5s
```

{% endtab %}
{% endtabs %}

{% hint style="warning" %}
**`spring.redis.*` (the Spring Boot 2.x prefix) is silently ignored.** Spring Boot 3.x removed the `spring.redis.*` namespace and relocated all Redis properties under `spring.data.redis.*`. Configuration written against the older prefix will not bind, the platform falls back to `localhost:6379` defaults, and the symptom is connection failures against your real Redis instance with no obvious "wrong key" error. Migrate any pre-3.x configuration to `spring.data.redis.*` (and `SPRING_DATA_REDIS_*` for env vars).
{% endhint %}

### Session lifetime (`spring.session.timeout`)

Spring Session's timeout controls how long an authenticated session remains valid between requests. ODD Platform's shipped default is `-1`, which means **sessions never expire**.

{% hint style="warning" %}
**`spring.session.timeout: -1` means sessions never expire.** A user who logs in once remains authenticated until their session record is explicitly invalidated (logout, cache eviction, or — for `IN_MEMORY` — platform restart). For any deployment that is internet-facing or serves multiple users, set `spring.session.timeout` to a finite duration so stolen cookies and forgotten sessions eventually lapse.
{% endhint %}

* `spring.session.timeout`: session idle timeout. Duration string (for example `30m`, `8h`, `1d`). Defaults to `-1` (no timeout). Applies to all three providers (`IN_MEMORY`, `INTERNAL_POSTGRESQL`, `REDIS`).

{% tabs %}
{% tab title="YAML" %}

```yaml
spring:
    session:
        timeout: 30m
```

{% endtab %}

{% tab title="Environment variables" %}

```
SPRING_SESSION_TIMEOUT=30m
```

{% endtab %}
{% endtabs %}

### Cookie attributes (`Secure`, `SameSite`, `HttpOnly`)

ODD Platform does **not** stamp `Secure`, `SameSite`, or `HttpOnly` attributes on the session cookie at the application tier — there is no `CookieWebSessionIdResolver` bean in the platform's session configuration today. The browser-side cookie posture is whatever Spring's default `SESSION` cookie defaults are (no `Secure`, no `SameSite` directive, `HttpOnly` set), which is unsuitable for any internet-facing deployment.

Operators **must** stamp the production attributes at the deployment topology layer — typically the TLS-terminating reverse proxy or load balancer. For nginx, the directive looks like:

```nginx
location / {
    proxy_pass http://odd-platform:8080;
    proxy_cookie_path / "/; Secure; HttpOnly; SameSite=Strict";
}
```

Match the equivalent for your ingress controller (Traefik, Envoy, Cloud Load Balancer, etc.). Until a platform-side default-stamping bean ships upstream, this stamping is the operator's responsibility — running ODD over plain HTTP or behind a permissive proxy means the session cookie travels in clear and is vulnerable to cross-site-request and cookie-leak attacks regardless of which `auth.type` is configured.

### Java-serialised session attributes under `INTERNAL_POSTGRESQL`

The `INTERNAL_POSTGRESQL` provider stores session attribute values as raw bytes produced by Java's native `SerializationUtils.serialize` / `.deserialize`. Java native serialisation has a well-known deserialisation-gadget surface — code paths reachable on attribute load are influenced by the byte stream, so a write-access compromise of the `SPRING_SESSION_ATTRIBUTES` table yields a deserialisation entry point on the next session read.

Defence-in-depth recommendations for deployments running `INTERNAL_POSTGRESQL`:

* Restrict write access to the `SPRING_SESSION_ATTRIBUTES` table to a single platform service account; do not share database credentials with other applications that store data in the same Postgres instance.
* Deploy the platform's PostgreSQL with strong network segmentation — the database should not be reachable from any service except the platform itself.
* If you cannot guarantee write-access isolation, prefer the `REDIS` provider — Spring Session's Redis serialiser uses a string-key Jackson JSON serialiser rather than Java native serialisation.

A platform-side migration to JSON serialisation for session attributes is tracked upstream.

## Enable Metrics

ODD Platform can represent some of the metadata it ingests as time-series charts — for example, row counts on a MySQL table or the on-disk size of a Redshift database. Metrics handling splits into two independent concerns that share the `metrics.*` config namespace but do different jobs:

* **Storage** (`metrics.storage`) — the storage tier the platform uses for ingested metrics. This selects where the platform **writes** metric points as they arrive from collectors **and** where it **reads them back** when rendering UI charts. Both directions hit the same backend — you cannot write to one and read from another.
* **Export** (`metrics.export.*`) — where the platform **pushes metrics out** as OpenTelemetry telemetry, for long-term retention and dashboarding in your observability stack.

Configure the two independently; it is valid (and common) to run with `INTERNAL_POSTGRES` storage and no OTLP export, or with `PROMETHEUS` storage and OTLP export disabled, or any other combination.

### Metric storage backend

`metrics.storage` selects the storage tier for metric writes and reads:

* `INTERNAL_POSTGRES` (default) — metrics are **written to and read from** the ODD Platform's own PostgreSQL database (`metric_series` / `metric_point` tables). Zero additional infrastructure; suitable for most single-cluster deployments.
* `PROMETHEUS` — metrics are **remote-written to** an external Prometheus instance (via the [Prometheus remote-write protocol](https://prometheus.io/docs/specs/remote_write_spec/) at `/api/v1/write`, using Snappy-compressed Protobuf-encoded write requests) **and queried from** the same instance (via the [instant-query API](https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries) at `/api/v1/query`). Suitable when you already run Prometheus for observability and want to avoid storing duplicate metric data in ODD's PostgreSQL.

`metrics.prometheus-host` is the base URL of the Prometheus instance and is only consulted when `metrics.storage=PROMETHEUS`. Both `/api/v1/write` and `/api/v1/query` are called on this single host. Defaults to `http://localhost:9090`.

{% hint style="warning" %}
**`metrics.storage=PROMETHEUS` requires `metrics.prometheus-host` to be set.** The platform validates this at startup — if `metrics.prometheus-host` is empty (or unset) while `metrics.storage=PROMETHEUS`, ODD Platform fails to start with `IllegalStateException: Prometheus host is not defined`. Set it to the Prometheus base URL (for example `http://prometheus:9090`) in the same configuration change that flips the storage backend.
{% endhint %}

{% hint style="warning" %}
**The Prometheus instance must accept remote-write AND queries on the same endpoint.** ODD Platform does not support splitting read and write paths across different hosts.

* **Prometheus server flag** — `--web.enable-remote-write-receiver` must be enabled on the Prometheus process. It is **disabled by default** in Prometheus v2.33+; without it, every ODD Platform metric write returns `404 Not Found` and is silently dropped. The ingestion API still returns `200` to the collector because the remote-write happens downstream of the HTTP acknowledgement, so collector logs will not surface the failure — the symptom is empty charts in the UI.
* **Endpoint must support both paths** — `POST /api/v1/write` (for writes) and `GET /api/v1/query` (for reads) must both resolve to the same Prometheus-compatible host.
* **Read-only Prometheus-compatible backends do not work.** A [Thanos](https://thanos.io/) querier, [Mimir](https://grafana.com/oss/mimir/) in query-only mode, or any other backend that exposes `/api/v1/query` but rejects `/api/v1/write` cannot be used as a `metrics.storage=PROMETHEUS` target. Point `metrics.prometheus-host` at the write-accepting Prometheus instance itself (or at a Mimir distributor that terminates both paths).
  {% endhint %}

{% tabs %}
{% tab title="YAML" %}

```yaml
metrics:
    storage: PROMETHEUS        # INTERNAL_POSTGRES (default) or PROMETHEUS
    prometheus-host: http://prometheus:9090
```

{% endtab %}

{% tab title="Environment variables" %}

```
METRICS_STORAGE=PROMETHEUS
METRICS_PROMETHEUS_HOST=http://prometheus:9090
```

{% endtab %}
{% endtabs %}

{% hint style="danger" %}
**Multi-tenant deployments cannot share an `INTERNAL_POSTGRES` instance — the default backend has no tenant column.** The `odd.tenant-id` configuration is **only** appended to Prometheus series (see [Prometheus tenant label](#prometheus-tenant-label-odd-tenant-id) below); on `INTERNAL_POSTGRES` the metric tables (`metric_series`, `metric_point`, `metric_entity`) have no `tenant_id` column at all. Two ODD Platform deployments writing to the same Postgres instance see each other's metrics on every entity's Metrics tab — there is no platform-side filter. If your deployment needs metric isolation across tenants, choose `PROMETHEUS` storage and configure `odd.tenant-id` per deployment, or run each deployment against its own Postgres instance / schema. The same class of silent-default risk that previously affected attachment storage on container restart applies here. The operator-facing framing of this caveat — including the workflow guidance for choosing between the two backends — is on [Active platform features → Metrics Ingestion](/features/active-platform-features/metrics-ingestion.md#known-operator-caveats).
{% endhint %}

{% hint style="warning" %}
**Switching `metrics.storage` after a deployment has been live is one-way — historical metric data does not migrate.** The two storage backends are independent stores; the platform writes to whichever is configured and reads from the same one. After a switch (either direction), the previously-stored history remains in the old backend but is **no longer queryable from the platform UI or API**. Plan storage-backend changes as one-time cutovers and annotate the cutover date in your runbook — the Metrics tab on each entity will show no data older than the switch. Operator-facing framing on [Active platform features → Metrics Ingestion](/features/active-platform-features/metrics-ingestion.md#known-operator-caveats).
{% endhint %}

### Metric export to OTLP

Independent of where metrics are stored, ODD Platform can push metrics as OpenTelemetry telemetry to an [OTLP collector](https://opentelemetry.io/docs/collector/). Downstream you can forward that stream to [Prometheus](https://prometheus.io/), [New Relic](https://newrelic.com/), or any backend that accepts [OTLP exporters](https://aws-otel.github.io/docs/components/otlp-exporter).

* `metrics.export.enabled`: must be set to `true` to build and wire the OTLP exporter bean. Defaults to `false`.
* `metrics.export.otlp-endpoint`: OTLP collector endpoint (gRPC). Defaults to `http://localhost:4317`.

{% tabs %}
{% tab title="YAML" %}

```yaml
metrics:
    export:
        enabled: true
        otlp-endpoint: {otlp-endpoint-url}
```

{% endtab %}

{% tab title="Environment variables" %}

```
METRICS_EXPORT_ENABLED=true
METRICS_EXPORT_OTLP_ENDPOINT={otlp-endpoint-url}
```

{% endtab %}
{% endtabs %}

## Enable Alert Notifications

Any alert that is created inside the platform can be sent via webhook and/or [Slack incoming webhook](https://docs.slack.dev/messaging/sending-messages-using-incoming-webhooks) and/or email notifications (via [Google SMTP](https://support.google.com/a/answer/176600?hl=en), [AWS SMTP](https://repost.aws/knowledge-center/ses-set-up-connect-smtp), etc). Such notifications contain information such as:

1. Name of the entity upon which alert has been created
2. Data source and namespace of an entity
3. Owners of an entity
4. Possibly affected entities

ODD Platform's outbound notification delivery tails the `alert` table through a PostgreSQL logical-replication slot. Because the slot durably tracks its position in the write-ahead log, delivery resumes from the last unprocessed alert after a platform restart or a transient interruption of the database connection — alerts raised during the downtime are delivered once delivery catches up, not dropped. Alert *creation* itself is a plain database insert and does not depend on replication; this prerequisite applies only to outbound notification delivery. To enable it, the underlying PostgreSQL database must be configured for logical replication.

For the user-facing description of the alerting feature — alert types, the per-entity alert tabs, the lifecycle, and per-entity halt configuration — see [Active platform features → Alerting](/features/active-platform-features/alerting.md). For the user-facing description of the outbound notification channels (Slack incoming webhook, email, generic webhook) and the Prometheus AlertManager inbound webhook, see [Active platform features → Notifications](/features/active-platform-features/notifications.md).

{% hint style="info" %}
**Slack here is the outgoing alert webhook, not the Discussions Slack app.** The alert-notifications integration is a one-way [Slack incoming webhook](https://docs.slack.dev/messaging/sending-messages-using-incoming-webhooks) — the platform POSTs alert messages to a channel via `notifications.receivers.slack.url`. It is **distinct** from the [full Slack app](#enable-data-collaboration) used by Data Collaboration for in-app per-entity discussion threads (OAuth + Events API; bidirectional). Each integration is configured separately: enabling the alert webhook does not surface the **Discussions** tab on data-entity pages, and enabling Data Collaboration does not route alerts. See [Main Concepts → Terms & Aliases](/introduction/main-concepts.md#terms-and-aliases) for the side-by-side comparison.
{% endhint %}

### PostgreSQL Configuration

PostgreSQL database must be [configured](https://www.postgresql.org/docs/current/config-setting.html) in order to leverage the replication mechanism of the Platform along with the granting the database user replication permissions.

#### Database settings

To configure the database, add the following entries to the `postgresql.conf` file:

```
max_wal_senders = 1
wal_keep_size = 16
wal_level = logical
max_replication_slots = 1
```

Or if the replication mechanism is already configured, just increment the `max_wal_senders` and `max_replication_slots` numbers.

#### Database user permissions

ODD Platform database user must be granted with replication permissions:

```sql
ALTER ROLE {database_username} WITH REPLICATION
```

{% hint style="info" %}
User permissions and database configuration may vary from one on-demand/cloud provider to another.

For instance, In AWS RDS, PostgreSQL instances are managed services where certain aspects of replication management are automated. This is done to minimize the risk of misconfiguration. Due to this managed nature, some settings are either not exposed or are altered differently compared to a standard PostgreSQL setup. To enable notifications in such an environment, follow these steps (only differences are mentioned): 1. Alter the `rds.logical_replication` parameter in your database instance's Parameter Group by setting it to `1`, instead of directly modifying the `wal_level` parameter. 2. Ensure the ODD user connecting to the database has the `rds_replication` role. The Master username of the database typically already has this role by default. If using a different username, you may need to assign the necessary role using the command `GRANT rds_replication TO {your_database_username}; 3.`If you changed max\_wal\_senders to 5 (as it's mentioned as a minimal value in Parameter Group) and then constantly getting messages like "The parameter max\_wal\_senders was set to a value incompatible with replication. It has been adjusted from 5 to 55" in the events list of the database instance, please, consider adjusting the parameter from 5 to the mentioned value in the parameter group to exclude automatic change done by RDS.
{% endhint %}

### ODD Platform configuration

Following variables need to be defined:

* `notifications.enabled`: must be set to `true`. Defaults to `false`. **Feature toggling**: this value is captured at JVM boot and frozen for the lifetime of the process; restart the JVM for a change to take effect. The same boot-immutable pattern applies to every platform-feature flag in this document — see [Features → Data Collaboration](/features/features.md#data-collaboration) for the catalogue and the chrome-invariance framing.
* `notifications.message.downstream-entities-depth`: limits the amount of fetching of affected data entities **in terms of lineage graph level.** Defaults to 1
* `notifications.wal.advisory-lock-id`: ODD Platform uses [PostgreSQL advisory lock](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS) in order to make sure that in a case of horizontal scaling only one instance of the Platform processes alert messages. This setting defines advisory lock id. Defaults to `100`
* `notifications.wal.replication-slot-name`: PostgreSQL replication slot name will be created if it doesn't exist yet. Defaults to `odd_platform_replication_slot`
* `notifications.wal.publication-name`: PostgreSQL publication name will be created if it doesn't exist yet. Defaults to `odd_platform_publication_alert`
* `notifications.receivers.slack.url`: [Slack incoming webhook](https://docs.slack.dev/messaging/sending-messages-using-incoming-webhooks) URL. The clickable links rendered inside Slack messages use [`odd.platform-base-url`](#odd-platform-base-url) — there is **no** `notifications.receivers.slack.*` base-URL setting.
* `notifications.receivers.webhook.url`: Generic webhook URL
* `notifications.receivers.email.host`: the SMTP server.
* `notifications.receivers.email.port`: the port used for the email protocol (SMTP, IMAP, or POP3)
* `notifications.receivers.email.protocol`: the email transport protocol. **Use the lowercase value `smtp`** — any other value (including uppercase `SMTP`) silently disables STARTTLS and SMTP AUTH; see the caveat below.
* `notifications.receivers.email.smtp.auth`: a boolean value (true or false) indicating whether the SMTP server requires authentication
* `notifications.receivers.email.smtp.starttls`: a boolean indicating whether to use STARTTLS, a security protocol that upgrades an unencrypted connection to an encrypted one
* `notifications.receivers.email.password`: the password used for email authentication
* `notifications.receivers.email.sender`: the email address sending the notifications
* `notifications.receivers.email.notification.emails`: the list of recipients for the email notifications

{% hint style="warning" %}
**A generic-webhook receiver must respond `HTTP 200` — `201` / `202` / `204` are treated as a delivery failure and the alert is dropped.** The platform's webhook sender treats any response status other than exactly `200 OK` as a failed delivery, so a receiver that returns `202 Accepted` (a common async-ingest convention) silently loses alerts with no operator-visible cause. Configure the endpoint behind `notifications.receivers.webhook.url` to return `200` on accept.
{% endhint %}

{% hint style="warning" %}
**The email `protocol` value must be the lowercase string `smtp` for STARTTLS and SMTP AUTH to engage.** The platform sets `mail.smtp.auth` and `mail.smtp.starttls.enable` only when `notifications.receivers.email.protocol` equals `smtp` exactly. Any other value — including uppercase `SMTP` — takes a fall-through branch that sets neither, so authentication and STARTTLS never engage and credentials can transit unauthenticated and unencrypted, with no boot warning. Always configure `protocol: smtp` (lowercase).
{% endhint %}

#### `odd.platform-base-url`

ODD Platform URL exposed to **three** internal consumers — the Slack-notification sender, the email-notification sender, and the integration-parameter substitution context. The two notification senders use it to build clickable links inside alert messages (the generic webhook receiver does **not** consume this key — it gets the full alert payload directly and is expected to construct any URLs it needs from that payload). The platform also substitutes the resolved value as the `platform_url` parameter in integration configurations — this is how Airflow plugins, dbt artifacts, and similar integrations resolve their reference to the ODD platform URL at runtime. **Defaults are inconsistent across consumers**: the notification senders default to `http://localhost:8080`, while the integration-substitution context defaults to the placeholder string `http://your.odd.platform`. Both defaults are unreachable from outside the host machine; set this key to your real deployment URL (for example `https://odd.your-domain.com`) in any non-local environment.

{% hint style="warning" %}
**Operators deploying integrations must set `ODD_PLATFORM_BASE_URL` even if alert notifications are disabled.** The integration-parameter substitution context reads the same key to populate the `platform_url` parameter exposed to integration configurations. If the key is unset, integrations that reference `platform_url` receive the literal string `http://your.odd.platform` — a placeholder that will not connect to anything — and the integration will fail in confusing ways at runtime with no error from ODD Platform itself.
{% endhint %}

ODD Platform configuration would look like this:

{% tabs %}
{% tab title="YAML" %}

```yaml
notifications:
  enabled: true
  message:
    downstream-entities-depth: {downstream_entities_depth_to_fetch}
  wal:
    advisory-lock-id: {postgresql_advisory_lock_id}
    replication-slot-name: {postgresql_replication_slot_name}
    publication-name: {postgresql_publication_name}
  receivers:
    slack:
      url: {slack_incoming_webhook_url}
    webhook:
      url: {webhook_url}
    email: 
      host: {host} 
      port: {port}
      protocol: {protocol}  # SMTP, SMTPS, IMAP, IMAPS, POP3, POP3S 
      smtp: 
        auth: true # Set to true if SMTP server requires authentication 
        starttls: true # Set to true to enable STARTTLS 
      password: {email_password}
      sender: {sender_email} 
      notification: 
        emails: {1@mail.com,2@mail.com}   
odd:
  platform-base-url: {platform_url}
```

{% endtab %}

{% tab title="Environment variables" %}

```
NOTIFICATIONS_ENABLED=true
NOTIFICATIONS_MESSAGE_DOWNSTREAM_ENTITIES_DEPTH={downstream_entities_depth_to_fetch}
NOTIFICATIONS_WAL_ADVISORY_LOCK_ID={postgresql_advisory_lock_id}
NOTIFICATIONS_WAL_REPLICATION_SLOT_NAME={postgresql_replication_slot_name}
NOTIFICATIONS_WAL_PUBLICATION_NAME={postgresql_publication_name}
NOTIFICATIONS_RECEIVERS_SLACK_URL={slack_incoming_webhook_url}
NOTIFICATIONS_RECEIVERS_WEBHOOK_URL={webhook_url}
NOTIFICATIONS_RECEIVERS_EMAIL_HOST={host}
NOTIFICATIONS_RECEIVERS_EMAIL_PORT={port}
NOTIFICATIONS_RECEIVERS_EMAIL_PROTOCOL={protocol} # SMTP, SMTPS, IMAP, IMAPS, POP3, POP3S
NOTIFICATIONS_RECEIVERS_EMAIL_SMTP_AUTH=true      # Set to true if SMTP server requires authentication
NOTIFICATIONS_RECEIVERS_EMAIL_SMTP_STARTTLS=true  # Set to true to enable STARTTLS
NOTIFICATIONS_RECEIVERS_EMAIL_PASSWORD={email_password}
NOTIFICATIONS_RECEIVERS_EMAIL_SENDER={sender_email}
NOTIFICATIONS_RECEIVERS_EMAIL_NOTIFICATION_EMAILS={1@mail.com,2@mail.com}
ODD_PLATFORM_BASE_URL={platform_url}
```

{% endtab %}
{% endtabs %}

### Example: Gmail SMTP

A minimal, working configuration for Gmail's SMTP over STARTTLS. Gmail requires an [**app password**](https://support.google.com/accounts/answer/185833) (generated from your Google account with 2-Step Verification enabled) — your regular account password will not work.

{% tabs %}
{% tab title="YAML" %}

```yaml
notifications:
  enabled: true
  wal:
    advisory-lock-id: 100
    replication-slot-name: odd_platform_replication_slot
    publication-name: odd_platform_publication_alert
  receivers:
    email:
      host: smtp.gmail.com
      port: 587
      protocol: SMTP
      smtp:
        auth: true
        starttls: true
      sender: odd-alerts@your-domain.com
      password: {gmail_app_password}
      notification:
        emails: ops@your-domain.com,data-team@your-domain.com
odd:
  platform-base-url: https://odd.your-domain.com
```

{% endtab %}

{% tab title="Environment variables" %}

```
NOTIFICATIONS_ENABLED=true
NOTIFICATIONS_WAL_ADVISORY_LOCK_ID=100
NOTIFICATIONS_WAL_REPLICATION_SLOT_NAME=odd_platform_replication_slot
NOTIFICATIONS_WAL_PUBLICATION_NAME=odd_platform_publication_alert
NOTIFICATIONS_RECEIVERS_EMAIL_HOST=smtp.gmail.com
NOTIFICATIONS_RECEIVERS_EMAIL_PORT=587
NOTIFICATIONS_RECEIVERS_EMAIL_PROTOCOL=SMTP
NOTIFICATIONS_RECEIVERS_EMAIL_SMTP_AUTH=true
NOTIFICATIONS_RECEIVERS_EMAIL_SMTP_STARTTLS=true
NOTIFICATIONS_RECEIVERS_EMAIL_SENDER=odd-alerts@your-domain.com
NOTIFICATIONS_RECEIVERS_EMAIL_PASSWORD={gmail_app_password}
NOTIFICATIONS_RECEIVERS_EMAIL_NOTIFICATION_EMAILS=ops@your-domain.com,data-team@your-domain.com
ODD_PLATFORM_BASE_URL=https://odd.your-domain.com
```

{% endtab %}
{% endtabs %}

### Known limitations

ODD Platform builds its `JavaMailSender` with only the keys documented above. The JavaMail session inherits defaults for every other SMTP parameter, and several of those defaults are operator-hostile in production deployments. None of the following is currently exposed as an ODD configuration key — where a workaround exists it is noted, but the limitations are real and should drive your choice of SMTP relay.

{% hint style="warning" %}
**SMTP timeouts are unset — an unreachable SMTP server will hang notification delivery.** The JavaMail defaults for `mail.smtp.connectiontimeout`, `mail.smtp.timeout` (read), and `mail.smtp.writetimeout` are **infinite**. If the configured SMTP host is unreachable, slow, or stalls mid-response, the notification thread blocks until the TCP stack eventually tears the connection down — there is no application-level timeout to cut it short. Use an SMTP relay you control (or a trusted managed service) and monitor its availability separately from ODD Platform.
{% endhint %}

{% hint style="warning" %}
**Only STARTTLS is supported — implicit-TLS ports (e.g. Gmail port 465, many corporate relays) will not work.** ODD Platform exposes `notifications.receivers.email.smtp.starttls` but does not expose `mail.smtp.ssl.enable`, which is the JavaMail flag required to open an implicit-TLS connection. If your SMTP server only accepts connections on an implicit-TLS port, you must front it with a STARTTLS-capable relay (port 587 is the common choice). Gmail over port 587 with STARTTLS (the example above) works; Gmail over port 465 does not.
{% endhint %}

{% hint style="warning" %}
**Self-signed or internal-CA SMTP certificates require a JVM-level workaround.** `mail.smtp.ssl.trust` is not exposed as an ODD configuration key. If your SMTP relay presents a certificate signed by a private CA, the connection will fail certificate validation unless you either (a) add the CA to the JVM truststore of the ODD Platform container (`$JAVA_HOME/lib/security/cacerts` or a `-Djavax.net.ssl.trustStore=...` override) before starting the process, or (b) use an SMTP relay with a publicly-trusted certificate. There is no configuration-file path to this.
{% endhint %}

{% hint style="warning" %}
**Non-ASCII subjects and bodies may be mangled.** The MIME message is built without an explicit charset, so JavaMail falls back to the JVM default. Containers that do not set `file.encoding` or `LANG` explicitly can end up with `US-ASCII` defaults, which corrupt non-Latin alert content. If your alert text includes non-ASCII characters, set `JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8` on the ODD Platform container.
{% endhint %}

{% hint style="danger" %}
**Silent partial delivery: if one recipient fails, subsequent recipients are skipped.** `EmailNotificationSender` iterates over the recipient list in `notifications.receivers.email.notification.emails` and calls the SMTP server once per recipient. If recipient N fails (bad address, mailbox full, server-side policy rejection), the exception is wrapped as a `RuntimeException` and the loop terminates — recipients N+1, N+2, … **never receive the alert**. There is no retry and no partial-failure metric. Keep the recipient list short, use distribution lists on the SMTP side for fan-out, and validate addresses before adding them to the list.
{% endhint %}

### Cleaning up

{% hint style="danger" %}
ODD Platform **doesn't clean up** replication slot it has created. If you need to disable Alert Notification functionality, please perform the following steps along with disabling a feature on a ODD Platform side
{% endhint %}

In order to remove replication slot and publication, these SQL queries must be run against the database:

* ```sql
  SELECT pg_drop_replication_slot('<>');
  ```

  where `<>` is a name of replication slot defined in the ODD Platform. Default is `odd_platform_replication_slot`
* ```sql
  DROP PUBLICATION IF EXISTS <>;
  ```

  where `<>` is a name of publication defined in the ODD Platform. Default is `odd_platform_publication_alert`

## Prometheus AlertManager Integration

In addition to raising alerts internally (failed jobs, data-quality tests, schema changes, distribution anomalies — see the [Alerting](/features/active-platform-features/alerting.md) feature), ODD Platform exposes an **inbound webhook** that accepts Prometheus [AlertManager](https://prometheus.io/docs/alerting/latest/alertmanager/) notifications. Each inbound alert becomes a **Distribution Anomaly** alert on the referenced data entity, visible in the Alerts section and on the entity's page.

### Endpoint

```
POST /ingestion/alert/alertmanager
```

Response: `204 No Content` on success. The endpoint consumes the AlertManager webhook body and always returns empty.

### Payload shape

The platform accepts a subset of the [AlertManager webhook schema](https://prometheus.io/docs/alerting/latest/configuration/#webhook_config) — specifically `alerts[].labels`, `alerts[].generatorURL`, and `alerts[].startsAt`. Other top-level AlertManager fields (`version`, `status`, `receiver`, `groupLabels`, `commonLabels`, …) are accepted and ignored.

```json
{
  "alerts": [
    {
      "labels": {
        "entity_oddrn": "//postgresql/host/pg-host/databases/shop/schemas/public/tables/orders",
        "alertname": "OrdersRowCountDropped"
      },
      "generatorURL": "https://prometheus.example.com/graph?g0.expr=...",
      "startsAt": "2026-04-24T12:34:56"
    }
  ]
}
```

{% hint style="warning" %}
**The `entity_oddrn` label is required for the alert to route to a data entity.** ODD Platform reads `alerts[].labels["entity_oddrn"]` to determine which data entity the alert belongs to. An alert submitted without this label is stored with an empty owner, will not appear on any entity's page, and is effectively orphaned. Configure your AlertManager route or your alerting rules to include the target entity's ODDRN as a label.
{% endhint %}

### Example AlertManager receiver configuration

A minimal `alertmanager.yml` receiver forwarding every alert to ODD Platform:

```yaml
route:
  group_by: ['job']
  group_wait: 1s
  group_interval: 5m
  repeat_interval: 12h
  receiver: odd-platform
receivers:
  - name: odd-platform
    webhook_configs:
      - url: 'http://odd-platform:8080/ingestion/alert/alertmanager'
```

The reference example shipped with the platform is at [`docker/examples/config/alertmanager.yaml`](https://github.com/opendatadiscovery/odd-platform/blob/main/docker/examples/config/alertmanager.yaml) in the odd-platform repo. To make an alert route to a specific entity, attach `entity_oddrn` as a label in your Prometheus alerting rules — for example:

```yaml
groups:
  - name: orders
    rules:
      - alert: OrdersRowCountDropped
        expr: row_count{table="orders"} < 1000
        labels:
          entity_oddrn: "//postgresql/host/pg-host/databases/shop/schemas/public/tables/orders"
        annotations:
          summary: "Orders table row count dropped below 1000"
```

### Authentication

{% hint style="danger" %}
**The AlertManager webhook endpoint is not authenticated.** ODD Platform whitelists the entire `/ingestion/**` namespace in Spring Security, and the ingestion auth filter controlled by `auth.ingestion.filter.enabled` only guards `/ingestion/entities` (POST) — it does **not** cover `/ingestion/alert/alertmanager`. Anyone with network reach to the platform can POST arbitrary AlertManager-shaped payloads and create alerts on any data entity whose ODDRN they can guess. Toggling `auth.ingestion.filter.enabled` has no effect on this endpoint.
{% endhint %}

Because no application-level authentication is enforced on this endpoint today, protect it at the perimeter. Any of these approaches works:

* **Network segmentation** — expose ODD Platform only on a private network or VPN; in Kubernetes, keep AlertManager and the platform in the same cluster and use a NetworkPolicy so only the AlertManager pod can reach `/ingestion/alert/alertmanager`.
* **Reverse proxy with its own authentication** — put an authenticating proxy in front of ODD Platform (for example, nginx with `auth_request` delegating to an SSO sidecar, or Envoy with `ext_authz`) and require AlertManager to present a proxy-validated credential on every webhook call.
* **mTLS termination** — require client certificates on `/ingestion/alert/alertmanager` at the ingress or load balancer layer, and issue a certificate only to the AlertManager pod.

A platform-side fix to extend the ingestion auth filter to cover this endpoint is tracked upstream. Until it ships, apply one of the perimeter controls above for any deployment where the platform's network is not fully trusted.

For the broader ingestion-auth model — what `auth.ingestion.filter.enabled` does cover, the [per-endpoint deployment matrix](/configuration-and-deployment/enable-security.md#deployment-matrix-per-endpoint-per-auth-config) showing reachability under each `auth.type` value, and the write-shape caveats on the [statistics endpoint](/configuration-and-deployment/enable-security.md#statistics-endpoint-write-shape-and-replay-behaviour) — see [Enable security](/configuration-and-deployment/enable-security.md) and [Server-to-server (S2S) API keys](/configuration-and-deployment/enable-security/authentication/s2s.md).

## Enable Data Collaboration

Data collaboration feature allows users to initiate discussion about specific data entity in messengers directly from the ODD Platform. Thread replies are tracked by ODD Platform and saved in it, allowing users to retrieve conversation's context and decisions from one place.

For the user-facing description of the feature — the per-entity **Discussions** tab, how a discussion flows from the platform out to Slack and back, the message-lifecycle model — see [Active platform features → Data Collaboration](/features/active-platform-features/data-collaboration.md).

At the moment ODD Platform supports only Slack as a target messenger. It uses Slack APIs to send messages and [Slack Events API](https://docs.slack.dev/apis/events-api/) to receive message's thread replies.

{% hint style="info" %}
**Slack here is the full Slack app for in-app discussions, not the alert webhook.** The Data Collaboration integration uses an OAuth-token-driven Slack app (`datacollaboration.slack-oauth-token`) and the [Slack Events API](https://docs.slack.dev/apis/events-api/) webhook to read replies back into the platform — bidirectional. It is **distinct** from the [outgoing alert webhook](#enable-alert-notifications) used by alert notifications (`notifications.receivers.slack.url`, one-way write only). Each integration is configured separately: enabling this one does not route alerts, and enabling the alert webhook does not surface the **Discussions** tab on data-entity pages. See [Main Concepts → Terms & Aliases](/introduction/main-concepts.md#terms-and-aliases) for the side-by-side comparison.
{% endhint %}

### Creating Slack application

Go to the [Slack apps](https://api.slack.com/apps) website and click on `Create New App -> From an app manifest`

<figure><img src="/files/F6wguAALdeeQu1JHMlqj" alt=""><figcaption><p>Creating an app</p></figcaption></figure>

Select a workspace you want to add an application to and click `Next`

<figure><img src="/files/3xLSCQ1NmAO6Kwy0Fr6n" alt=""><figcaption><p>Selecting a workspace to install application to</p></figcaption></figure>

Enter the following manifest into the YAML section, replace the `<ODD_PLATFORM_BASE_URL>` with URL of your ODD Platform deployment and click `Next`.

The four bot scopes below match exactly what the platform exercises today (`channels:history` and `channels:read` for reading messages and metadata, `chat:write` for posting via the OAuth bot token, `users:read` for resolving user display names). Previous versions of this manifest also requested `incoming-webhook` — that scope was copy-paste leftover from a Slack example and was never used by the platform; if you are reinstalling or auditing scopes, you can safely omit it.

```yaml
display_information:
  name: ODD Data Collaboration
features:
  bot_user:
    display_name: ODD Data Collaboration
    always_online: false
oauth_config:
  scopes:
    bot:
      - channels:history
      - channels:read
      - chat:write
      - users:read
settings:
  event_subscriptions:
    request_url: https://<ODD_PLATFORM_BASE_URL>/api/slack/events
    bot_events:
      - message.channels
  org_deploy_enabled: false
  socket_mode_enabled: false
  token_rotation_enabled: false
```

<figure><img src="/files/LivoebXjoALDHIFThuQT" alt=""><figcaption><p>Inserting a YAML manifest</p></figcaption></figure>

Review your application's scopes and permissions and click `Create`

<figure><img src="/files/QennrbEOGBHdiwdGZnlO" alt=""><figcaption><p>Reviewing scopes and permissions</p></figcaption></figure>

Proceed with Slack instructions on how to install application into workspace and you should be good to go.

### ODD Platform configuration

Following variables need to be defined:

* `datacollaboration.enabled`: must be set to `true`. Defaults to `false`. **Feature toggling**: this value is captured at JVM boot and frozen for the lifetime of the process — runtime configuration changes (for example via Spring Boot Actuator's `/actuator/refresh`) are not reflected by the feature resolver or by the platform's feature-active endpoint. Restart the JVM process for a change to take effect. Top-level UI navigation tabs (Data Modelling and adjacent surfaces) remain visible regardless of this setting; the per-page affordances inside those tabs do honour the flag. See [Features → Data Collaboration](/features/features.md#data-collaboration) for the chrome-invariance caveat.
* `datacollaboration.receive-event-advisory-lock-id`: PostgreSQL advisory lock id for a job, which translates events from messengers to messages. Defaults to `110`
* `datacollaboration.sender-message-advisory-lock-id`: PostgreSQL advisory lock id for a job, which sends messages created in the platform to messengers. Defaults to `120`
* `datacollaboration.message-partition-period`: time interval in days for a message table partition in PostgreSQL. Defaults to `30`
* `datacollaboration.sending-messages-retry-count`: how many times the Platform will attempt to send a message to provider. Cannot be less than zero. Defaults to `3`
* `datacollaboration.slack-oauth-token`: Slack application OAuth token used for communicating with Slack. Can be retrieved in the `OAuth & Permissions` section of a Slack application.\\

  <figure><img src="/files/AzMss9tW2uVxkLA7hlzh" alt=""><figcaption><p>Retrieving OAuth Token</p></figcaption></figure>

{% tabs %}
{% tab title="YAML" %}

```yaml
datacollaboration:
  receive-event-advisory-lock-id: {receive_event_advisory_lock_id}
  sender-message-advisory-lock-id: {sender_message_advisory_lock_id}
  message-partition-period: {message_partition_period}
  sending-messages-retry-count: {sending-messages-retry-count}
  enabled: true
  slack-oauth-token: {slack_oauth_token}

odd:
  platform-base-url: {platform_url}
```

{% endtab %}

{% tab title="Environment variables" %}

```
DATACOLLABORATION_ENABLED=true
DATACOLLABORATION_RECEIVE_EVENT_ADVISORY_LOCK_ID={receive_event_advisory_lock_id}
DATACOLLABORATION_SENDER_MESSAGE_ADVISORY_LOCK_ID={sender_message_advisory_lock_id}
DATACOLLABORATION_MESSAGE_PARTITION_PERIOD={message_partition_period}
DATACOLLABORATION_SENDING_MESSAGES_RETRY_COUNT={sending_messages_retry_count}
DATACOLLABORATION_SLACK_OAUTH_TOKEN={slack_oauth_token}
ODD_PLATFORM_BASE_URL={odd_platform_base_url}
```

{% endtab %}
{% endtabs %}

### Known limitations

#### Slack at-least-once delivery surfaces as duplicate messages

Slack's Events API retries an event delivery whenever the platform's `POST /api/slack/events` handler does not return a 2xx acknowledgement within roughly three seconds — the API guarantees at-least-once delivery, not exactly-once. ODD Platform does not currently deduplicate incoming events: the `message_provider_event` table has no `UNIQUE (provider, event_id)` constraint, and the INSERT in `ReactiveMessageRepository.createMessageEvent` issues no `ON CONFLICT` clause. The result is that occasional Slack retries — which happen routinely on transient network or processing delays — insert duplicate rows; the downstream processor materialises a child `message` row for each, so the same Slack reply can appear two or more times on the data-entity **Discussions** tab.

**Operator-side mitigation today.** Until the platform-side dedup ships upstream, audit `message_provider_event` for `(provider, event_id)` duplicates as a one-off clean-up baseline; the duplicate rows are safe to delete after confirming the downstream `message` rows have been similarly deduplicated. Long-term, expect the platform to add the `UNIQUE` constraint + `ON CONFLICT DO NOTHING` on the INSERT — track the upstream issue if you depend on exactly-once delivery.

#### Slack Events webhook has no signature verification

ODD Platform does not verify Slack's `X-Slack-Signature` header on incoming `/api/slack/events` callbacks. Any caller on the network that can reach the platform's events endpoint can submit Slack-shaped payloads and have them processed as if they came from Slack. Restrict network reach to the platform's `/api/slack/events` path to Slack's IP ranges at your reverse proxy, or terminate at a proxy that verifies the signature itself; a platform-side verifier is tracked upstream.

`datacollaboration.message-partition-period` (default `30`) is read by `MessageTablePartitionManager` (`@Value("${datacollaboration.message-partition-period:30}")`) — separate from `DataCollaborationProperties`, which only carries the two advisory-lock IDs and the retry count. The partition manager creates a new PostgreSQL partition for the messages table every N days; lowering the value increases partition churn, raising it reduces partition count but enlarges each partition.

### API surface

The full HTTP API for Data Collaboration is documented at [API Reference → Data Collaboration](/developer-guides/api-reference/data-collaboration.md) — 7 routes across three groups (outbound to the provider, per-entity threads & history, inbound webhook from Slack), all gated by `@ConditionalOnDataCollaboration` and returning `404 Not Found` when `datacollaboration.enabled=false`.

## Housekeeping Settings Configuration

ODD Platform runs a background **housekeeping job** that permanently deletes stale data on a schedule. The job fires every **15 minutes**, is guarded by a ShedLock so only one platform instance runs it at a time in a multi-instance deployment, and iterates through five cleanup tasks: resolved alerts, search-facet history, soft-deleted data entities, empty `activity` table partitions, and empty `message` table partitions. The first three consume the `housekeeping.ttl.*` keys below; the two partition reapers do not consume any TTL key — they drop empty past partitions when the partition-rotation orchestrator advances the partition window (see [Activity-feed partitioning](#activity-feed-partitioning-odd-activity-partition-period) for the partition WIDTH key, and the [Advisory-lock registry](#advisory-lock-registry) for the orchestrator's leader election).

### Configuration keys

* `housekeeping.enabled`: enables the background job. Defaults to `true`. See the caveat below before disabling.
* `housekeeping.ttl.resolved_alerts_days`: how many days an alert in `RESOLVED_AUTOMATICALLY` status is kept after its status-update timestamp before the housekeeping job permanently deletes it (alongside its chunk records). Integer, days. Defaults to `30`. **Note:** the retention window is intended to apply to both `RESOLVED` (manual) and `RESOLVED_AUTOMATICALLY` (system) states, but a known platform bug currently exempts manual resolutions from the retention check — manual `RESOLVED` alerts are hard-deleted on the next housekeeping run regardless of this value. See [Alerting → Auto-cleanup of resolved alerts](/features/active-platform-features/alerting.md#auto-cleanup-of-resolved-alerts) for the operator-side workaround.
* `housekeeping.ttl.search_facets_days`: how many days a saved search-facet entry is kept past its `last_accessed_at` timestamp before being deleted. Integer, days. Defaults to `30`.
* `housekeeping.ttl.data_entity_delete_days`: how many days a data entity with status `DELETED` is kept after its status-update timestamp. After this, the entity and its cascading related rows — metadata values, ownerships, lineage, tags, terms, alerts, messages, metrics, **attachment files (including objects in S3 / MinIO storage)**, task runs, group relations, and (for datasets) dataset structure and enum values — are **permanently and irreversibly deleted** on the next housekeeping cycle, with no restore path. Integer, days. Defaults to `30`. The retention clock is the entity's `status_updated_at` timestamp, which the soft-delete path stamps at the moment the entity is moved to `DELETED` — so the key is honoured exactly as documented; a default install purges `DELETED` entities 30 days after deletion. See [Data entity statuses → soft-delete TTL](/features/data-discovery/statuses.md#the-soft-delete-ttl) for the user-facing lifecycle (a separate, cosmetic `status_updated_at` mapper defect affects only non-`DELETED` transitions and does not change this retention behaviour).

For the user-facing entity lifecycle (how operators set `DELETED` and the other status states from the UI), see [Data entity statuses](/features/data-discovery/statuses.md).

{% hint style="warning" %}
**Disabling housekeeping (`housekeeping.enabled: false`) stops all five cleanup jobs.** Resolved alerts, search-facet history, soft-deleted data entities, and empty `activity` / `message` partitions will accumulate indefinitely and the PostgreSQL database will grow without bound. Leave the job enabled in production; disable only for debugging or offline migrations, and re-enable (or run a manual cleanup) afterwards.
{% endhint %}

{% hint style="danger" %}
**The Java-side default for every `housekeeping.ttl.*` key is `0`. A partial-override deployment silently wipes historical data on the next 15-minute cycle.** The shipped `application.yml` supplies `30` for each of the three TTL keys, so a default install behaves as documented. But the `HousekeepingTTLProperties` class declares the fields as `private int` with no field initialiser — Spring binds primitive `int` default `0` if an operator-supplied override (typical Helm-chart values overlay, `--spring.config.location` to a profile that omits the `housekeeping:` block, Spring Cloud Config slice, Kubernetes ConfigMap mount) does not re-supply the block. With `0`, the housekeeping cycle computes `cutoff = now() - 0 days = now()` and deletes every RESOLVED alert + every search-facet entry + every soft-deleted data entity (and cascades through \~25 child tables, including S3 attachments). The platform emits no boot warning, no log line above DEBUG, no Prometheus counter — operators discover the loss only when the data is gone.

**Always re-supply the full `housekeeping:` block in Helm/Kustomize overlays**, OR set explicit non-zero values for every TTL key, AND verify post-restart by sampling `pg_stat_user_tables.n_tup_del` after one cycle. This is the same class of silent-default risk that previously affected attachment storage on container restart — read once, configure explicitly, never trust a partial overlay to inherit the bundled YAML defaults.
{% endhint %}

{% hint style="warning" %}
**Manual `RESOLVED` alerts are not deleted by `resolved_alerts_days`.** A jOOQ operator-precedence issue in `AlertHousekeepingJob` causes the emitted SQL to read as `WHERE (STATUS='RESOLVED') OR (STATUS='RESOLVED_AUTOMATICALLY' AND STATUS_UPDATED_AT <= cutoff)` — the TTL predicate binds only to the auto-resolved branch. Operators relying on compliance-style retention (SOC2, SOX, HIPAA) for manual resolutions cannot use `resolved_alerts_days` as the retention floor today. Track the upstream fix; until it ships, treat manual-RESOLVED alerts as having no platform-side TTL. Cross-link: the operator-side workaround on [Alerting → Auto-cleanup of resolved alerts](/features/active-platform-features/alerting.md#auto-cleanup-of-resolved-alerts).
{% endhint %}

{% hint style="warning" %}
**Housekeeping deletions are unobservable on a default deployment — no metric, no audit trail, DEBUG-only logs.** The subsystem that permanently deletes data exposes no operational telemetry: there is no metrics counter or gauge for any of the five jobs (nothing housekeeping-related appears at [`/actuator/prometheus`](/configuration-and-deployment/health-and-monitoring.md#prometheus-metrics)), no structured audit event records what was deleted and when, and every per-job deletion count is logged at `DEBUG` — below the shipped `info` default for the package — so a default deployment emits **nothing** on a successful cycle (only failures log at `ERROR`). Three consequences: (a) you cannot observe that deletions are happening or at what volume; (b) there is no signal that would reveal a **stuck or wedged cycle** (for example, one blocked behind a held advisory lock); (c) a compliance requirement that "deletions are logged/audited" is **not** satisfied out of the box — and raising the log level still yields best-effort log lines, not a durable audit trail. To observe deletions, set `logging.level.org.opendatadiscovery.oddplatform.housekeeping: debug` (see [Logging Settings Configuration](#logging-settings-configuration)); to verify a cycle ran at all, sample `pg_stat_user_tables.n_tup_del` across a 15-minute window, as in the TTL caveat above.
{% endhint %}

{% hint style="info" %}
**The session-housekeeping job runs N× redundantly on multi-replica deployments.** Spring's `PostgreSQLSessionHousekeepingJobHandler` fires hourly with `@Scheduled(fixedRate = 1, timeUnit = HOURS)` and has **no leader-election guard** — no `@SchedulerLock`, no advisory-lock acquisition (inconsistent with the rest of the platform's scheduled jobs, which join the [Advisory-lock registry](#advisory-lock-registry)). On an `INTERNAL_POSTGRESQL` session-provider deployment with N replicas, every replica runs the `DELETE FROM SPRING_SESSION WHERE expiry_time < now()` query every hour. The deletes are idempotent so data integrity is fine — the operator cost is N× redundant database load. Note that with the shipped default `spring.session.timeout: -1` ([sessions never expire](#session-lifetime-spring-session-timeout)), the job is a no-op regardless of leader count.
{% endhint %}

{% tabs %}
{% tab title="YAML" %}

```yaml
housekeeping:
  enabled: true
  ttl:
    resolved_alerts_days: 30
    search_facets_days: 30
    data_entity_delete_days: 30
```

{% endtab %}

{% tab title="Environment variables" %}

```
HOUSEKEEPING_ENABLED=true
HOUSEKEEPING_TTL_RESOLVED_ALERTS_DAYS=30
HOUSEKEEPING_TTL_SEARCH_FACETS_DAYS=30
HOUSEKEEPING_TTL_DATA_ENTITY_DELETE_DAYS=30
```

{% endtab %}
{% endtabs %}

## Advisory-lock registry

Several ODD Platform subsystems use PostgreSQL **advisory locks** to ensure that only one platform replica runs a given background loop at a time (the leader-election pattern for multi-replica deployments). Each subsystem owns one or more advisory-lock IDs, configured via dedicated `*.advisory-lock-id` keys. Operators overriding any of these IDs in a deployment overlay must treat them as a **single flat namespace across the platform** — collisions are not detected at startup and produce silent feature wedges (see the warning below).

| Configuration key                                   | Default ID | Owning subsystem                                                                                       | `@ConfigurationProperties` class | Single-leader role                                     |
| --------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------ | -------------------------------- | ------------------------------------------------------ |
| `notifications.wal.advisory-lock-id`                | `100`      | Notifications subscriber that reads from the WAL `replication-slot-name` and dispatches alert messages | `OddNotificationsProperties`     | One platform replica subscribes to the WAL stream      |
| `partition.advisory-lock-id`                        | `90`       | Partition orchestrator that creates next-period partitions on `activity` and `message` tables          | `PartitionProperties`            | One platform replica advances the partition window     |
| `datacollaboration.receive-event-advisory-lock-id`  | `110`      | Data Collaboration inbound event reader (Slack Events → `message_provider_event` queue)                | `DataCollaborationProperties`    | One platform replica drains the inbound event queue    |
| `datacollaboration.sender-message-advisory-lock-id` | `120`      | Data Collaboration outbound message sender (`message` queue → Slack)                                   | `DataCollaborationProperties`    | One platform replica drains the outbound message queue |

**`partition.advisory-lock-id` is deliberately shared between two managers** — `ActivityTablePartitionManager` and `MessageTablePartitionManager` both acquire ID `90`. This is intentional: one platform replica is elected as the global partition leader and serialises the partition-rotation work for both tables. Treat it as one logical leader, not two colliding subsystems.

{% hint style="warning" %}
**`pg_advisory_lock` blocks forever — on collision a subsystem wedges silently with no diagnostic signal.** The platform's leader-election manager executes the **blocking** variant of `pg_advisory_lock` against the configured ID — there is no `pg_try_advisory_lock` fast-path, no `statement_timeout`, no fallback to a degraded-mode bean. If two subsystems are configured to use the same advisory-lock ID (typically because an operator overrode one key in a Helm overlay and unintentionally matched another), the second subsystem's startup thread enters PostgreSQL lock-wait state and never returns. The Spring container does not detect the wedge — Spring's bean construction returned, so `/actuator/beans` and `/actuator/health` continue to report "running" — the wedge surfaces only as **one feature silently not working** (notifications stop arriving, Data Collaboration thread replies stop flowing, partition rotation stops creating future partitions). Operators MUST audit any per-environment advisory-lock-id override against the table above before applying it.

A platform-side fail-fast wrapper (`pg_try_advisory_lock` + a configurable timeout + a `subsystem_leader_state` Prometheus gauge + a boot-time INFO log enumerating the registry) is tracked upstream.
{% endhint %}

The Housekeeping orchestrator (see [Housekeeping Settings Configuration](#housekeeping-settings-configuration) above) does not appear in this table because it uses **ShedLock** (a Spring-side distributed-lock library) rather than a PostgreSQL advisory lock. ShedLock writes to a `shedlock` table to coordinate the leaders, so its multi-replica behaviour is documented separately.

## Platform-level settings (`odd.*`)

The `odd.*` namespace groups four platform-wide settings that do not belong to any subsystem: stale-metadata detection, the optional Prometheus tenant label, the Activity-feed partitioning period, and a list of additional navigation links surfaced in the App Info menu. A fifth key in the same namespace, `odd.platform-base-url`, is documented above in [Enable Alert Notifications → `odd.platform-base-url`](#odd-platform-base-url) — that section is the primary operator-facing context where the key is introduced, but the same key is also consumed by the integration-parameter substitution context, so any non-local deployment must set it regardless of which subsystems (notifications, integrations, or both) are enabled.

### Detecting stale metadata

Stale metadata is metadata that has not been refreshed from its source for longer than an operator-defined window. This typically happens when a collector is paused, deactivated, or failing to reach the source system. When the platform judges an entity to be stale, the UI surfaces it with a "Stale" indicator so users can distinguish data whose freshness is uncertain from actively-maintained metadata. For the user-facing surface (where the indicator appears, how the freshness signal differs from runtime alerts), see [Stale-metadata indicator](/features/data-discovery/metadata-stale.md).

* `odd.data-entity-stale-period`: number of days after the entity's last successful ingestion before it is labeled "Stale" in the UI and API. Integer, days. Defaults to `7`.

Operators running collectors on schedules longer than a week should raise this value to match the collector cadence — otherwise entities that were ingested successfully will be flagged stale between runs.

{% tabs %}
{% tab title="YAML" %}

```yaml
odd:
  data-entity-stale-period: 7 # days
```

{% endtab %}

{% tab title="Environment variables" %}

```
ODD_DATA_ENTITY_STALE_PERIOD=7
```

{% endtab %}
{% endtabs %}

### Prometheus tenant label (`odd.tenant-id`)

When [`metrics.storage`](#metric-storage-backend) is set to `PROMETHEUS`, the platform appends `tenant_id={value}` as a label on every Prometheus instant query it issues. This lets a single shared Prometheus instance serve metric data for multiple ODD Platform deployments without their metric series colliding — each deployment queries only its own tenant-labeled series.

* `odd.tenant-id`: tenant identifier appended as a Prometheus query label. String, no default (empty means no label is applied, and the Prometheus query returns series across all tenants). Ignored when `metrics.storage=INTERNAL_POSTGRES`.

{% tabs %}
{% tab title="YAML" %}

```yaml
odd:
  tenant-id: my-odd-deployment
```

{% endtab %}

{% tab title="Environment variables" %}

```
ODD_TENANT_ID=my-odd-deployment
```

{% endtab %}
{% endtabs %}

### Activity-feed partitioning (`odd.activity.partition-period`)

The ODD Platform `activity` table is range-partitioned on a rolling date window; `odd.activity.partition-period` sets the partition width in days. The default creates a new partition every 30 days, which is appropriate for most deployments. Operators running high-volume deployments (millions of activity events per day) can tune this downward to narrow partitions — smaller partitions speed up vacuum and partition-prune operations on the activity feed.

* `odd.activity.partition-period`: partition width in days for the `activity` table. Integer, days. Defaults to `30`.

{% tabs %}
{% tab title="YAML" %}

```yaml
odd:
  activity:
    partition-period: 30
```

{% endtab %}

{% tab title="Environment variables" %}

```
ODD_ACTIVITY_PARTITION_PERIOD=30
```

{% endtab %}
{% endtabs %}

### Additional navigation links (`odd.links`)

Operators can attach a list of arbitrary navigation links — pointers to internal wikis, runbooks, dashboards, or any other page teams should reach from inside ODD Platform. The platform UI surfaces them inside the App Info menu (the popup behind the **information icon** in the top-right toolbar). Each link renders as a menu item showing its title and opens the configured URL in a new tab when clicked.

* `odd.links`: list of link objects. Each entry has two required fields:
  * `title`: the menu-item label shown in the App Info menu. String, required.
  * `url`: the absolute URL the menu item opens in a new tab. String, required.

Defaults to an empty list — when unset, the App Info menu omits the additional-links section entirely.

{% tabs %}
{% tab title="YAML" %}

```yaml
odd:
  links:
    - title: Internal Wiki
      url: https://wiki.example.com/data-platform
    - title: On-call Runbook
      url: https://runbook.example.com/odd
```

{% endtab %}

{% tab title="Environment variables" %}

```
ODD_LINKS_0_TITLE=Internal Wiki
ODD_LINKS_0_URL=https://wiki.example.com/data-platform
ODD_LINKS_1_TITLE=On-call Runbook
ODD_LINKS_1_URL=https://runbook.example.com/odd
```

{% endtab %}
{% endtabs %}

{% hint style="info" %}
The links are exposed to the UI through the authenticated `GET /api/links` endpoint and are visible to every user signed in to the platform. Use them for navigation hints only — do not embed credentials, session tokens, or one-time secrets in link URLs, since any logged-in user can read them.
{% endhint %}

#### Validation and operator-link risks

Three known limitations apply to `odd.links` and the App Info menu that renders them. None of these is blocking for typical operator-curated link sets, but all three matter when the link source is less trusted (free-text Helm chart overrides, multi-tenant config templates, anything an end-user can influence).

{% hint style="warning" %}
**`odd.links` is not validated at config-load time.** The `AdditionalLinkProperties` record declares `title` and `url` as plain `String` with no `@NotBlank`, no `@URL`, no `@Pattern`, and no `@PostConstruct validate()`. The platform accepts and renders:

* Missing `title` — the menu item appears as an invisible-but-clickable area.
* Missing `url` — the menu item renders as a non-clickable `<a>`.
* `javascript:`, `data:`, `file:`, or `vbscript:` schemes — modern browsers will sandbox or refuse, but the platform does not reject these at config time.
* Relative paths (`wiki.internal`) — the browser interprets them as relative to the current ODD page, producing surprising navigation.

**Treat the `odd.links` config as a security-relevant input.** Validate the URLs in your Helm chart values before applying, restrict edit access to operators only, and never permit non-operators to author the override. A platform-side validator that rejects non-`http(s)` schemes and enforces `@NotBlank` is tracked upstream.
{% endhint %}

{% hint style="warning" %}
**The App Info menu links render `target="_blank"` without `rel="noopener noreferrer"`.** All five link sites in the App Info menu (operator-configured `odd.links` entries, the ODD Platform version link to GitHub, the Documentation link, the Slack link, and the Feedback link) open in a new tab without the `rel` attribute that isolates the destination from `window.opener`. A page at any of those destinations — including the operator-configured links above — can run `window.opener.location = "phishing.example.com"` in the background, replacing the parent ODD tab with an attacker-controlled login page (reverse-tabnabbing). Risk is operator-amplified — an unvalidated link (per the previous caveat) combined with this caveat is a workspace-wide tabnabbing vector. A platform-side `rel` fix is tracked upstream.
{% endhint %}

{% hint style="info" %}
**The App Info menu is not keyboard- or touch-accessible today.** The information-icon button declares `aria-haspopup="true"` and `aria-controls={menuId}` — surfaces that announce keyboard accessibility to assistive technology — but the open handler is wired only on `onMouseEnter`. There is no `onClick`, no `onKeyDown`, and no `onFocus`. Touch-device users (iOS Safari, Android Chrome) do not generate `mouseenter`; keyboard-only and screen-reader users cannot open the menu. The Documentation, Slack, Feedback, and operator-configured `odd.links` destinations are unreachable through the menu for these audiences — direct URLs are the workaround until a platform-side `onClick` / `onKeyDown` fix ships. Operators serving keyboard-only or screen-reader audiences should treat this as a known WCAG 2.1 SC 2.1.1 limitation.
{% endhint %}

## Attachment Storage Configuration

ODD Platform allows users to attach files and links to data entities from the UI. This section covers the operator-facing configuration for **where** those uploaded files are stored. For the user-facing upload workflow (what users can attach, the per-entity Attachments tab, the `DATA_ENTITY_ATTACHMENT_MANAGE` permission), see [Attachments and links](/features/data-discovery/attachments.md).

{% hint style="danger" %}
**The default `LOCAL` storage mode is ephemeral.** Attachments are written to `/tmp/odd/attachments` inside the ODD Platform container filesystem. Any container or pod restart — routine deployment, node drain, crash, Kubernetes eviction — permanently deletes all uploaded files.

**Use `REMOTE` (S3 / MinIO) storage for any Kubernetes or Docker deployment where users will actually upload attachments.** `LOCAL` mode is suitable only for single-host evaluations or local development where losing attachments on restart is acceptable.
{% endhint %}

### Configuration keys

* `attachment.storage`: storage backend. One of `LOCAL` or `REMOTE`. Defaults to `LOCAL`.
* `attachment.max-file-size`: the per-file upload limit the **UI enforces before upload**, in **megabytes**. Defaults to `20`. The platform surfaces this value to the web UI as a client-side pre-upload size check; the server does **not** re-validate per-file size on the upload path. `spring.codec.max-in-memory-size` (below) bounds only the in-memory buffer for a single request/chunk — and because attachment uploads are chunked and streamed to disk, it is **not** a ceiling on the assembled file. There is therefore **no effective server-side total-file size limit**: a direct (non-UI) API caller can exceed `attachment.max-file-size` by any amount. See the hint below if raising this above 20 MB.
* `attachment.local.path`: filesystem directory where attachments are written when `storage=LOCAL`. Defaults to `/tmp/odd/attachments` (ephemeral — see warning above).
* `attachment.remote.url`: S3-compatible endpoint URL when `storage=REMOTE` (for example `https://s3.us-east-1.amazonaws.com` for AWS S3 or `http://minio:9000` for a MinIO service). See the **Known limitations (REMOTE mode)** subsection below before choosing your endpoint — in particular the `us-east-1` restriction for AWS S3 and the chunked-upload staging behavior.
* `attachment.remote.access-key`: access key for the S3-compatible bucket.
* `attachment.remote.secret-key`: secret key for the S3-compatible bucket.
* `attachment.remote.bucket`: bucket name used to store attachment objects. The bucket must already exist — ODD Platform does not create it.
* `spring.codec.max-in-memory-size`: platform-wide cap on the in-memory buffer Spring WebFlux uses when reading a **single request body / upload chunk**. Defaults to `20MB`. A single chunk larger than this fails at the codec layer; because uploads are chunked, this does **not** bound the total assembled file. Accepts a size string (`20MB`, `100MB`, `1GB`).

{% hint style="warning" %}
**`attachment.max-file-size` must not exceed `spring.codec.max-in-memory-size`.** Both ship with the same `20 MB` default, so the attachment cap is effective out of the box. If you raise `attachment.max-file-size` to allow larger uploads — for example `100 MB` — you must raise `spring.codec.max-in-memory-size` to at least the size of a single upload **chunk**, otherwise a chunk above `20 MB` fails at the WebFlux codec layer with `DataBufferLimitException`. This codec bound applies per chunk, not to the total file (see the per-file note above): the platform enforces no server-side cap on the assembled file size.
{% endhint %}

### Example: REMOTE storage with S3-compatible backend (MinIO or AWS S3)

{% tabs %}
{% tab title="YAML" %}

```yaml
attachment:
  storage: REMOTE
  max-file-size: 50 # mb
  remote:
    url: {s3_endpoint_url}
    access-key: {access_key}
    secret-key: {secret_key}
    bucket: {bucket_name}
```

{% endtab %}

{% tab title="Environment variables" %}

```
ATTACHMENT_STORAGE=REMOTE
ATTACHMENT_MAX_FILE_SIZE=50
ATTACHMENT_REMOTE_URL={s3_endpoint_url}
ATTACHMENT_REMOTE_ACCESS_KEY={access_key}
ATTACHMENT_REMOTE_SECRET_KEY={secret_key}
ATTACHMENT_REMOTE_BUCKET={bucket_name}
```

{% endtab %}
{% endtabs %}

### Known limitations (REMOTE mode)

ODD Platform builds its `MinioAsyncClient` with only the endpoint and credentials documented above. The MinIO Java SDK inherits defaults for every other parameter, and the attachment-upload code path carries a small amount of additional behavior that is not configurable. None of the following is currently exposed as an ODD configuration key — plan your deployment around these limits rather than assuming a config flag will fix them.

{% hint style="warning" %}
**AWS S3 region pinned to `us-east-1`.** The attachment client is built without an explicit region, so it uses the MinIO Java SDK's default region (`us-east-1`) for request signing. Against **AWS S3 this means only buckets in `us-east-1` work** — buckets in any other region fail signature validation with errors such as `AuthorizationHeaderMalformed` or `PermanentRedirect`. If you need AWS S3 in another region, either host your bucket in `us-east-1` or use a MinIO server in front of it. Self-hosted MinIO and most other S3-compatible services ignore the region header and are unaffected.
{% endhint %}

{% hint style="warning" %}
**HTTP client timeouts are the MinIO SDK defaults (\~5 minutes), not configurable.** ODD Platform does not supply a custom `OkHttpClient` to the MinIO builder, so the SDK's built-in defaults apply: roughly a 5-minute read/write timeout. A single large upload whose end-to-end wall time (network transfer + S3 ingest) exceeds that limit fails with a socket-timeout error even though the content was being streamed successfully. If your users upload near the `attachment.max-file-size` limit over a slow link, keep `attachment.max-file-size` below the size a typical upload can complete inside 5 minutes at your network's real throughput.
{% endhint %}

{% hint style="danger" %}
**Chunked uploads are assembled on the container's local filesystem before they are sent to `REMOTE` storage — a mid-upload container restart loses the staged chunks.** The UI splits large files into chunks and uploads each chunk individually; the platform writes each chunk to a hardcoded local directory — `/tmp/odd/chunks`, **independent of `attachment.local.path`**, so pointing `attachment.local.path` at a durable path does not move chunk staging — and reassembles the full file there before streaming it to the S3-compatible backend. **This is true even when `attachment.storage=REMOTE`.** If the ODD Platform container is restarted, evicted, or rescheduled during an in-flight chunked upload, the local directory is wiped and the partial upload is unrecoverable — the user must re-upload from scratch. In Kubernetes deployments, either mount a persistent volume at the chunk-staging directory (`/tmp/odd/chunks`) or limit the maximum upload size so single-request uploads are the norm. The `LOCAL`-mode ephemeral warning above applies to chunk-staging in `REMOTE` mode as well.
{% endhint %}

{% hint style="warning" %}
**No retry on transient S3 / MinIO errors.** Put, get, and remove operations against the bucket do not retry on transient failures — a single 503 from S3, a connection reset from the network, or a short MinIO outage surfaces as a failed operation with no automatic recovery. If your alerting pipeline treats attachment failures as user-impacting errors, add retry at the infrastructure layer (for example an S3-proxy sidecar with retry) rather than expecting the platform to paper over it.
{% endhint %}

{% hint style="warning" %}
**No IAM-role support today — static access-key / secret-key are the only credentials path.** The platform's MinIO client builder calls `.credentials(accessKey, secretKey)` against the values configured above; it does not call `.credentialsProvider(...)` with the AWS SDK's `DefaultCredentialsProvider`. Operators on AWS EKS using IAM Roles for Service Accounts (IRSA), on ECS task roles, or on any other AWS-native credential-injection mechanism that expects the SDK to walk the default credentials chain — environment variables → web-identity-token → EC2 instance metadata → ECS task-role — get **no automatic credential resolution**. Static `attachment.remote.access-key` + `attachment.remote.secret-key` values must be supplied via `application.yml`, Helm secrets, or the equivalent operator-managed credential store, and rotated on the operator's own cadence. This is itself a credential-hygiene concern in deployments where IAM-role injection is the standard. The upstream fix is a conditional switch — if no static credentials are supplied, call `.credentialsProvider(DefaultAWSCredentialsProviderChain.getInstance())` to enable IAM-role workflows; the doc-side caveat is in place until that lands.
{% endhint %}

### Example: LOCAL storage (single-host / local evaluation only)

{% tabs %}
{% tab title="YAML" %}

```yaml
attachment:
  storage: LOCAL
  max-file-size: 20 # mb
  local:
    path: /var/lib/odd/attachments
```

{% endtab %}

{% tab title="Environment variables" %}

```
ATTACHMENT_STORAGE=LOCAL
ATTACHMENT_MAX_FILE_SIZE=20
ATTACHMENT_LOCAL_PATH=/var/lib/odd/attachments
```

{% endtab %}
{% endtabs %}

If you keep `LOCAL` mode, override `attachment.local.path` to a persistent volume mount rather than the default `/tmp/odd/attachments`, and confirm the volume is actually persistent across restarts in your deployment topology.

## Logging Settings Configuration

Logs provide detailed information about errors in the application helping its users quickly identify and fix problems. Setting up logging is recommended for ensuring operational excellence, system reliability, effective monitoring and troubleshooting.\
Here is a code snippet for setting up logs in ODD Platform:

{% tabs %}
{% tab title="YAML" %}

```yaml
logging:
  level:
    org.springframework.transaction.interceptor: info
    org.jooq.tools.LoggerListener: info
    io.r2dbc.postgresql.QUERY: info
    io.r2dbc.postgresql.PARAM: info
    org.opendatadiscovery.oddplatform.notification: info
    org.opendatadiscovery.oddplatform.housekeeping: info
    org.opendatadiscovery.oddplatform.partition: info
    org.opendatadiscovery.oddplatform.datacollaboration: info
    org.opendatadiscovery.oddplatform.service.ingestion: info
```

{% endtab %}

{% tab title="Environment variables" %}

```
LOGGING_LEVEL_ORG_SPRINGFRAMEWORK_TRANSACTION_INTERCEPTOR=info
LOGGING_LEVEL_ORG_JOOQ_TOOLS_LOGGERLISTENER=info
LOGGING_LEVEL_IO_R2DBC_POSTGRESQL_QUERY=info
LOGGING_LEVEL_IO_R2DBC_POSTGRESQL_PARAM=info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_NOTIFICATION=info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_HOUSEKEEPING=info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_PARTITION=info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_DATACOLLABORATION=info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_SERVICE_INGESTION=info
```

{% endtab %}
{% endtabs %}

Setting the logging level to `info` allows you to see useful messages about the platform’s functioning without being overwhelmed by too much detail as with `trace` or `debug` or missing important issues as with `warn` or higher level.\
However, feel free to adjust the logging level as needed to get more or less information based on your specific requirements.

## GenAI Configuration

The platform can proxy natural-language questions to an external AI service via three keys under the `genai` prefix (`@ConfigurationProperties("genai")` per `GenAIProperties.java`). The feature is **disabled by default** and is **API-only** today (no in-app UI affordance calls the endpoint).

* `genai.enabled` (boolean, env `GENAI_ENABLED`) — feature toggle. **Default `false`** (set explicitly at `application.yml` line 18). When `false`, `POST /api/genai/ask` returns HTTP 400 with the message "Gen AI is disabled". **Feature toggling**: this value is captured at JVM boot — restart the JVM process for a change to take effect; runtime configuration changes are not honoured. See [Features → Data Collaboration](/features/features.md#data-collaboration) for the platform-wide boot-immutability caveat.
* `genai.url` (string, env `GENAI_URL`) — base URL of the external AI service. The platform's `genAiWebClient` is built at startup with this as `baseUrl` and POSTs each request to `{genai.url}/query_data`. **No `@ConfigurationProperties` default — the field has no initializer in `GenAIProperties.java`, so its Java default is `null`.** The example in `application.yml` line 19 (`# url: http://localhost:5000`) is commented out, not a default.
* `genai.request_timeout` (integer, env `GENAI_REQUEST_TIMEOUT`) — outbound response timeout, **in minutes**. Wired into `WebClientConfiguration.java:23` as `Duration.ofMinutes(genAIProperties.getRequestTimeout())`. **No `@ConfigurationProperties` default — the Java primitive `int` default is `0`, which means immediate timeout.** The example in `application.yml` line 20 (`# request_timeout: 2`) is commented out, not a default.

{% hint style="warning" %}
**Setting only `genai.enabled=true` will silently misconfigure the feature.** With `url` defaulting to `null` and `request_timeout` defaulting to `0`, the WebClient is built with no `baseUrl` and a `Duration.ofMinutes(0)` timeout — every `POST /api/genai/ask` will fail before the external service has a chance to respond. Always set all three keys when enabling.

`WebClientConfiguration` reads `genai.url` and `genai.request_timeout` once at startup when constructing the `genAiWebClient` Spring bean. Changing those values requires a Platform restart.
{% endhint %}

A working configuration block:

{% tabs %}
{% tab title="application.yml" %}

```yaml
genai:
  enabled: true
  url: "http://my-ai-service.internal:5000"
  request_timeout: 5     # minutes
```

{% endtab %}

{% tab title="Environment variables" %}

```bash
GENAI_ENABLED=true
GENAI_URL=http://my-ai-service.internal:5000
GENAI_REQUEST_TIMEOUT=5
```

{% endtab %}
{% endtabs %}

The platform sends **no authentication** to the external AI service and does **not retry**. See the dedicated [GenAI assistant](/features/active-platform-features/genai.md) page for the external service contract (`POST /query_data` with JSON `{"question": "..."}`), the platform's `/api/genai/ask` request/response schemas, and the per-error behavior.

## Machine-to-Machine (M2M) Tokens Configuration

ODD Platform supports a static API-key authentication mode for non-UI callers (CI/CD jobs, ingestion pipelines, automation scripts) — also referred to as Machine-to-Machine (M2M) tokens. It is **disabled by default**.

For the full configuration keys, the header contract, the curl example, and security considerations (token rotation, HTTPS, blast radius), see [Server-to-server (S2S) authentication](/configuration-and-deployment/enable-security/authentication/s2s.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/configuration-and-deployment/odd-platform.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
