Configure ODD Platform

This section defines how to configure ODD Platform in order to leverage all of its functionality and features.

This page is the post-deployment configuration reference for the running Platform — every application.yml key the Platform consumes. For the deployment path itself (Docker Compose, Helm, AWS EKS, build from source), start at Deployment Options.

Configuration approaches

There are two ways to configure the Platform:

  • Environment variables are used for simple entries

  • Configuring via YAML can come in handy when it is necessary to define a complex configuration block (e.g OAuth2 authentication or logging levels).

YAML entries VS environment variables

Here is an example of how to define the following block and configure the Platform with it using environment variables.

YAML:

spring:
    datasource:
        url: URL
        username: USERNAME
        password: PASSWORD
    custom-datasource:
        url: URL
        username: USERNAME
        password: PASSWORD

To configure the Platform using environment variables, replace semicolons with underscores and uppercasing words, like so:

  • SPRING_DATASOURCE_URL=URL

  • SPRING_DATASOURCE_USERNAME=USERNAME

  • SPRING_DATASOURCE_PASSWORD=PASSWORD

  • SPRING_CUSTOM_DATASOURCE_URL=URL

  • SPRING_CUSTOM_DATASOURCE_USERNAME=USERNAME

  • SPRING_CUSTOM_DATASOURCE_PASSWORD=PASSWORD

Connect your database

For all of its features ODD Platform uses PostgreSQL database and PostgreSQL database only. These variables are needed to be defined to connect ODD Platform to database:

  • spring.datasource.url: JDBC string of your PostgreSQL database. Default value is jdbc:postgresql://127.0.0.1:5432/odd-platform

  • spring.datasource.username: your PostgreSQL user's name. Default value is odd-platform

  • spring.datasource.password: your PostgreSQL user's password. Default value is odd-platform-password. Override this before any non-localhost deployment — see Management endpoint exposure and credential hygiene for why the shipped default is a load-bearing operator-override.

These variables are optional and will be used to connect to PostgreSQL and store Lookup Tables. Each of the three keys is declared in R2DBCConfiguration as @Value("${spring.custom-datasource.X:}") — the trailing colon with no value means the @Value default is the empty string, not the JDBC URL / username / password values listed below. When a key is unset (or blank), the bean factory falls back to the corresponding primary spring.datasource.* value at startup. The values below are therefore the fallback an operator observes with a default deployment, not the spring.custom-datasource.* keys' own defaults — so overriding spring.datasource.url will also change what spring.custom-datasource.url resolves to:

  • spring.custom-datasource.url: JDBC string of your PostgreSQL database where we store Lookup Tables. Falls back to spring.datasource.url when unset; the platform's primary spring.datasource.url default is jdbc:postgresql://127.0.0.1:5432/odd-platform. Note: you can specify any {database_host}, {database_port} or {database_name} but schema, where Lookup Tables are stored always is lookup_tables_schema.

  • spring.custom-datasource.username: your PostgreSQL user's name for custom-datasource. Falls back to spring.datasource.username when unset; the platform's primary spring.datasource.username default is odd-platform.

  • spring.custom-datasource.password: your PostgreSQL user's password for custom-datasource. Falls back to spring.datasource.password when unset; the platform's primary spring.datasource.password default is odd-platform-password.

So that your database connection defining block would look like this:

Security

Please follow the Enable security section for enabling security in ODD Platform.

Management endpoint exposure and credential hygiene

The platform's Spring Boot Actuator endpoints (/actuator/**) are intentionally whitelisted ahead of the authentication chain in every auth.type, and the shipped configuration enables the env and info endpoints. The shipped database password is a well-known string. Together these defaults turn a default deployment into a one-line-away-from-full-PostgreSQL-compromise system if exposed on a non-trusted network. The mitigations below are the operator's responsibility today. For the monitoring use of these endpoints — wiring liveness/readiness probes to /actuator/health and scraping /actuator/prometheus — see Health and monitoring.

/actuator/** is anonymously reachable in every auth mode

SecurityConstants.WHITELIST_PATHS contains /actuator/**. Reachable before the auth chain runs in DISABLED, LOGIN_FORM, OAUTH2, and LDAP alike. The shipped application.yml enables management.endpoint.env.enabled=true but sets no management.endpoint.env.show-values, so the Spring Boot default (NEVER) applies — /actuator/env redacts every property value (******) for every caller, authenticated or not, including spring.datasource.url. What an unauthenticated caller scraping /actuator/env does learn is the configuration-key schema: which keys and property sources are present — which OAuth2 providers are wired (by their key prefixes), whether LDAP is configured, whether REMOTE attachment storage is set up, and that a JDBC datasource is configured (the key, not its value). Values stay masked unless an operator sets show-values to WHEN_AUTHORIZED or ALWAYS; the exposure to mitigate is the unauthenticated reachability of the endpoint and the configuration schema it reveals.

Apply at least one of the mitigations below for any deployment reachable from outside a fully-trusted network:

Mitigation
How
Recommended for

Separate management port

Set management.server.port: 8081 and route :8081 only on your internal management network.

All production deployments.

Firewall the actuator path

Add a reverse-proxy rule rejecting /actuator/** from the public CIDR range; allow only your monitoring network.

Single-port deployments where a separate management port is infeasible.

Restrict default exposure

Set management.endpoints.web.exposure.include: health,prometheus (drop env, info).

All production deployments — combine with one of the network-level mitigations above.

A platform-side default-restriction is tracked upstream; until it ships, do not rely on the platform default for any reachable deployment.

The database password ships with a well-known default

application.yml ships spring.datasource.password: odd-platform-password as the default. An operator deploying ODD without explicitly overriding the property deploys with a public, documented credential. (The JDBC URL value is masked at /actuator/env under the default show-values: NEVER described above — but the password itself is public, and a database whose host is co-located with or guessable from the deployment topology is then one well-known credential away from compromise.) Override spring.datasource.password (and spring.custom-datasource.password if spring.custom-datasource.* is configured separately from the primary datasource) before exposing the platform on any non-localhost network. This is the same class of silent-insecure-default risk that previously affected attachment storage on container restart — read once, override before deployment, never assume the shipped default is safe.

Configuration-properties classes include credentials in toString()

ODDLDAPProperties and ODDOAuth2Properties.OAuth2Provider carry Lombok's @Data annotation alongside their password and clientSecret fields, respectively. @Data generates a toString() that includes every field verbatim — there is no @ToString.Exclude on any credential field today. A future log statement (log.info("loaded properties: {}", properties)) or an exception handler that emits properties on boot failure would write LDAP passwords and OAuth client secrets in cleartext to log infrastructure. Treat the platform's application logs as credential-sensitive: route them to an audit-grade log sink, redact at the log-pipeline tier if you cannot guarantee end-to-end access control, and do not store them in unrestricted long-term archives. A platform-side @ToString.Exclude rollout across credential fields is tracked upstream.

Select session provider

ODD Platform stores HTTP session state in one of three places: the platform JVM (in-memory), the platform's PostgreSQL database, or an external Redis data store. The provider is selected with session.provider (SESSION_PROVIDER env var) and accepts one of three values:

  • IN_MEMORY — sessions live in a ConcurrentHashMap inside the JVM. ODD Platform defaults to this value.

  • INTERNAL_POSTGRESQL — sessions are persisted to the platform's PostgreSQL database (SPRING_SESSION / SPRING_SESSION_ATTRIBUTES tables).

  • REDIS — sessions are persisted to an external Redis data store via Spring Session's @EnableRedisWebSession.

Quick selection guidance:

  • Single-instance deployment, restart-tolerant logout acceptable → IN_MEMORY

  • Multi-instance deployment or persistence across restarts is required → INTERNAL_POSTGRESQL (no extra infrastructure) or REDIS (if you already operate Redis or need sub-millisecond session reads)

Each provider has operator-visible characteristics that affect sizing, multi-instance behavior, and connection wiring. Read the relevant subsection before deploying.

IN_MEMORY (default)

Sessions are kept in a ConcurrentHashMap inside the platform JVM, wrapped by Spring Session's ReactiveMapSessionRepository. Suitable for local development and single-instance evaluations where session loss on restart is acceptable.

Characteristics & caveats

  • Sessions are lost on every platform restart. The session map lives in heap; any restart (deploy, crash, container recycle) clears it and forces every authenticated user to log in again.

  • No multi-instance support. Two ODD Platform instances behind a load balancer each maintain a separate session map. A request that lands on a different instance than the one that authenticated the user appears unauthenticated. Collector data-source registration is especially affected — the /ingestion/datasources filter writes a collectorId into the request's session, and the subsequent POST /ingestion/datasources handler reads it back; if the two requests hit different replicas, the handler raises an IllegalStateException("Collector id is null") returned to the collector as HTTP 500. For multi-replica deployments choose INTERNAL_POSTGRESQL or REDIS.

  • Eviction is by Spring Session expiry only. The repository wraps a raw ConcurrentHashMap with no secondary eviction policy (no LRU, no max-entries cap). A long-running platform with many short-lived sessions accumulates map entries until each entry's TTL elapses; high-traffic deployments running with the shipped default spring.session.timeout: -1 (no timeout) accumulate sessions indefinitely. Set a finite spring.session.timeout (see Session lifetime below) to bound the in-memory footprint.

INTERNAL_POSTGRESQL

Sessions are persisted in the platform's own PostgreSQL database, in the SPRING_SESSION and SPRING_SESSION_ATTRIBUTES tables. ODD Platform implements a custom JOOQ-based reactive JooqSessionRepository for this provider — the standard spring.session.jdbc.* Spring Session keys do not apply. Connection settings reuse the existing platform spring.datasource.* configuration; no additional database wiring is required.

Characteristics & caveats

  • Sessions survive platform restarts. Authenticated users remain logged in across deploys (until their session row's TTL has passed).

  • Multi-instance support. All ODD Platform instances point at the same database, share the session tables, and can serve requests for any authenticated user regardless of which instance answered the original login.

  • Expired-session cleanup runs hourly and is not configurable. A @Scheduled(fixedRate = 1, timeUnit = HOURS) housekeeping job (PostgreSQLSessionHousekeepingJobHandler.deleteExpiredSessions) deletes rows whose EXPIRY_TIME is in the past from both SPRING_SESSION and SPRING_SESSION_ATTRIBUTES. Expired session rows therefore remain in the tables for up to one hour past their TTL before being cleaned. The cadence is hardcoded — there is no config key to tune it.

  • Sizing implication. When sizing the database (connection pool, disk, vacuum schedule), assume the session tables hold the high-water-mark count of authenticated users plus up to one hour of post-expiry stragglers. For high-cardinality / short-TTL deployments (many users, short spring.session.timeout), the post-expiry overhang can dominate steady-state row count.

REDIS

Sessions are persisted to an external Redis data store via Spring Session's @EnableRedisWebSession. Suitable for multi-instance deployments that already operate Redis, or that need sub-millisecond session reads. ODD Platform does not bundle Redis; the operator must provide a Redis 6+ instance and supply its connection settings under the spring.data.redis.* namespace (Spring Boot 3.x; the legacy spring.redis.* prefix from Spring Boot 2.x has been removed and will not bind).

Characteristics & caveats

  • Sessions survive platform restarts and span instances — same persistence behavior as INTERNAL_POSTGRESQL, but reads and writes happen against Redis directly.

  • Connection wiring is operator-supplied. Unlike INTERNAL_POSTGRESQL (which reuses the platform's existing PostgreSQL connection), Redis settings must be configured separately. ODD Platform's application.yml ships no Redis defaults — every operator deploying with REDIS must set at least the host and port, plus credentials and TLS for any production deployment.

  • TLS, pool sizing, and command timeouts inherit Spring Data Redis defaults unless explicitly overridden. For managed Redis providers (AWS ElastiCache, Redis Cloud, Azure Cache for Redis) and any TLS-required Redis deployment, set spring.data.redis.ssl.enabled: true. For high-concurrency deployments, tune the Lettuce connection pool with spring.data.redis.lettuce.pool.*.

  • Eviction is delegated to Redis. ODD Platform does not run a housekeeping job for Redis-stored sessions; the Redis server's own per-key TTL and maxmemory-policy govern session eviction. Configure your Redis instance accordingly.

Required and optional connection keys (Spring Boot 3.x — spring.data.redis.*)

  • spring.data.redis.host: Redis host. Defaults to localhost.

  • spring.data.redis.port: Redis port. Defaults to 6379.

  • spring.data.redis.username: Redis ACL username. Optional; omit for password-only or no-auth Redis.

  • spring.data.redis.password: Redis password. Optional but recommended for any production deployment.

  • spring.data.redis.database: Redis logical database index. Defaults to 0.

  • spring.data.redis.ssl.enabled: enable TLS for the Redis connection. Boolean, defaults to false. Set to true for any managed-Redis or TLS-terminated Redis deployment.

  • spring.data.redis.timeout: command timeout. Duration string (for example 5s). Defaults to Spring Data Redis's internal default.

  • spring.data.redis.lettuce.pool.*: Lettuce connection-pool sizing (max-active, max-idle, min-idle, max-wait). Optional; tune for high-concurrency deployments.

ODD Platform does not extend or override Spring Boot's Redis property catalogue — the full set of keys recognized under spring.data.redis.* in your Spring Boot version applies as-is.

Session lifetime (spring.session.timeout)

Spring Session's timeout controls how long an authenticated session remains valid between requests. ODD Platform's shipped default is -1, which means sessions never expire.

  • spring.session.timeout: session idle timeout. Duration string (for example 30m, 8h, 1d). Defaults to -1 (no timeout). Applies to all three providers (IN_MEMORY, INTERNAL_POSTGRESQL, REDIS).

ODD Platform does not stamp Secure, SameSite, or HttpOnly attributes on the session cookie at the application tier — there is no CookieWebSessionIdResolver bean in the platform's session configuration today. The browser-side cookie posture is whatever Spring's default SESSION cookie defaults are (no Secure, no SameSite directive, HttpOnly set), which is unsuitable for any internet-facing deployment.

Operators must stamp the production attributes at the deployment topology layer — typically the TLS-terminating reverse proxy or load balancer. For nginx, the directive looks like:

Match the equivalent for your ingress controller (Traefik, Envoy, Cloud Load Balancer, etc.). Until a platform-side default-stamping bean ships upstream, this stamping is the operator's responsibility — running ODD over plain HTTP or behind a permissive proxy means the session cookie travels in clear and is vulnerable to cross-site-request and cookie-leak attacks regardless of which auth.type is configured.

Java-serialised session attributes under INTERNAL_POSTGRESQL

The INTERNAL_POSTGRESQL provider stores session attribute values as raw bytes produced by Java's native SerializationUtils.serialize / .deserialize. Java native serialisation has a well-known deserialisation-gadget surface — code paths reachable on attribute load are influenced by the byte stream, so a write-access compromise of the SPRING_SESSION_ATTRIBUTES table yields a deserialisation entry point on the next session read.

Defence-in-depth recommendations for deployments running INTERNAL_POSTGRESQL:

  • Restrict write access to the SPRING_SESSION_ATTRIBUTES table to a single platform service account; do not share database credentials with other applications that store data in the same Postgres instance.

  • Deploy the platform's PostgreSQL with strong network segmentation — the database should not be reachable from any service except the platform itself.

  • If you cannot guarantee write-access isolation, prefer the REDIS provider — Spring Session's Redis serialiser uses a string-key Jackson JSON serialiser rather than Java native serialisation.

A platform-side migration to JSON serialisation for session attributes is tracked upstream.

Enable Metrics

ODD Platform can represent some of the metadata it ingests as time-series charts — for example, row counts on a MySQL table or the on-disk size of a Redshift database. Metrics handling splits into two independent concerns that share the metrics.* config namespace but do different jobs:

  • Storage (metrics.storage) — the storage tier the platform uses for ingested metrics. This selects where the platform writes metric points as they arrive from collectors and where it reads them back when rendering UI charts. Both directions hit the same backend — you cannot write to one and read from another.

  • Export (metrics.export.*) — where the platform pushes metrics out as OpenTelemetry telemetry, for long-term retention and dashboarding in your observability stack.

Configure the two independently; it is valid (and common) to run with INTERNAL_POSTGRES storage and no OTLP export, or with PROMETHEUS storage and OTLP export disabled, or any other combination.

Metric storage backend

metrics.storage selects the storage tier for metric writes and reads:

  • INTERNAL_POSTGRES (default) — metrics are written to and read from the ODD Platform's own PostgreSQL database (metric_series / metric_point tables). Zero additional infrastructure; suitable for most single-cluster deployments.

  • PROMETHEUS — metrics are remote-written to an external Prometheus instance (via the Prometheus remote-write protocol at /api/v1/write, using Snappy-compressed Protobuf-encoded write requests) and queried from the same instance (via the instant-query API at /api/v1/query). Suitable when you already run Prometheus for observability and want to avoid storing duplicate metric data in ODD's PostgreSQL.

metrics.prometheus-host is the base URL of the Prometheus instance and is only consulted when metrics.storage=PROMETHEUS. Both /api/v1/write and /api/v1/query are called on this single host. Defaults to http://localhost:9090.

Metric export to OTLP

Independent of where metrics are stored, ODD Platform can push metrics as OpenTelemetry telemetry to an OTLP collector. Downstream you can forward that stream to Prometheus, New Relic, or any backend that accepts OTLP exporters.

  • metrics.export.enabled: must be set to true to build and wire the OTLP exporter bean. Defaults to false.

  • metrics.export.otlp-endpoint: OTLP collector endpoint (gRPC). Defaults to http://localhost:4317.

Enable Alert Notifications

Any alert that is created inside the platform can be sent via webhook and/or Slack incoming webhook and/or email notifications (via Google SMTP, AWS SMTP, etc). Such notifications contain information such as:

  1. Name of the entity upon which alert has been created

  2. Data source and namespace of an entity

  3. Owners of an entity

  4. Possibly affected entities

ODD Platform's outbound notification delivery tails the alert table through a PostgreSQL logical-replication slot. Because the slot durably tracks its position in the write-ahead log, delivery resumes from the last unprocessed alert after a platform restart or a transient interruption of the database connection — alerts raised during the downtime are delivered once delivery catches up, not dropped. Alert creation itself is a plain database insert and does not depend on replication; this prerequisite applies only to outbound notification delivery. To enable it, the underlying PostgreSQL database must be configured for logical replication.

For the user-facing description of the alerting feature — alert types, the per-entity alert tabs, the lifecycle, and per-entity halt configuration — see Active platform features → Alerting. For the user-facing description of the outbound notification channels (Slack incoming webhook, email, generic webhook) and the Prometheus AlertManager inbound webhook, see Active platform features → Notifications.

Slack here is the outgoing alert webhook, not the Discussions Slack app. The alert-notifications integration is a one-way Slack incoming webhook — the platform POSTs alert messages to a channel via notifications.receivers.slack.url. It is distinct from the full Slack app used by Data Collaboration for in-app per-entity discussion threads (OAuth + Events API; bidirectional). Each integration is configured separately: enabling the alert webhook does not surface the Discussions tab on data-entity pages, and enabling Data Collaboration does not route alerts. See Main Concepts → Terms & Aliases for the side-by-side comparison.

PostgreSQL Configuration

PostgreSQL database must be configured in order to leverage the replication mechanism of the Platform along with the granting the database user replication permissions.

Database settings

To configure the database, add the following entries to the postgresql.conf file:

Or if the replication mechanism is already configured, just increment the max_wal_senders and max_replication_slots numbers.

Database user permissions

ODD Platform database user must be granted with replication permissions:

User permissions and database configuration may vary from one on-demand/cloud provider to another.

For instance, In AWS RDS, PostgreSQL instances are managed services where certain aspects of replication management are automated. This is done to minimize the risk of misconfiguration. Due to this managed nature, some settings are either not exposed or are altered differently compared to a standard PostgreSQL setup. To enable notifications in such an environment, follow these steps (only differences are mentioned): 1. Alter the rds.logical_replication parameter in your database instance's Parameter Group by setting it to 1, instead of directly modifying the wal_level parameter. 2. Ensure the ODD user connecting to the database has the rds_replication role. The Master username of the database typically already has this role by default. If using a different username, you may need to assign the necessary role using the command GRANT rds_replication TO {your_database_username}; 3.If you changed max_wal_senders to 5 (as it's mentioned as a minimal value in Parameter Group) and then constantly getting messages like "The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 5 to 55" in the events list of the database instance, please, consider adjusting the parameter from 5 to the mentioned value in the parameter group to exclude automatic change done by RDS.

ODD Platform configuration

Following variables need to be defined:

  • notifications.enabled: must be set to true. Defaults to false. Feature toggling: this value is captured at JVM boot and frozen for the lifetime of the process; restart the JVM for a change to take effect. The same boot-immutable pattern applies to every platform-feature flag in this document — see Features → Data Collaboration for the catalogue and the chrome-invariance framing.

  • notifications.message.downstream-entities-depth: limits the amount of fetching of affected data entities in terms of lineage graph level. Defaults to 1

  • notifications.wal.advisory-lock-id: ODD Platform uses PostgreSQL advisory lock in order to make sure that in a case of horizontal scaling only one instance of the Platform processes alert messages. This setting defines advisory lock id. Defaults to 100

  • notifications.wal.replication-slot-name: PostgreSQL replication slot name will be created if it doesn't exist yet. Defaults to odd_platform_replication_slot

  • notifications.wal.publication-name: PostgreSQL publication name will be created if it doesn't exist yet. Defaults to odd_platform_publication_alert

  • notifications.receivers.slack.url: Slack incoming webhook URL. The clickable links rendered inside Slack messages use odd.platform-base-url — there is no notifications.receivers.slack.* base-URL setting.

  • notifications.receivers.webhook.url: Generic webhook URL

  • notifications.receivers.email.host: the SMTP server.

  • notifications.receivers.email.port: the port used for the email protocol (SMTP, IMAP, or POP3)

  • notifications.receivers.email.protocol: the email transport protocol. Use the lowercase value smtp — any other value (including uppercase SMTP) silently disables STARTTLS and SMTP AUTH; see the caveat below.

  • notifications.receivers.email.smtp.auth: a boolean value (true or false) indicating whether the SMTP server requires authentication

  • notifications.receivers.email.smtp.starttls: a boolean indicating whether to use STARTTLS, a security protocol that upgrades an unencrypted connection to an encrypted one

  • notifications.receivers.email.password: the password used for email authentication

  • notifications.receivers.email.sender: the email address sending the notifications

  • notifications.receivers.email.notification.emails: the list of recipients for the email notifications

odd.platform-base-url

ODD Platform URL exposed to three internal consumers — the Slack-notification sender, the email-notification sender, and the integration-parameter substitution context. The two notification senders use it to build clickable links inside alert messages (the generic webhook receiver does not consume this key — it gets the full alert payload directly and is expected to construct any URLs it needs from that payload). The platform also substitutes the resolved value as the platform_url parameter in integration configurations — this is how Airflow plugins, dbt artifacts, and similar integrations resolve their reference to the ODD platform URL at runtime. Defaults are inconsistent across consumers: the notification senders default to http://localhost:8080, while the integration-substitution context defaults to the placeholder string http://your.odd.platform. Both defaults are unreachable from outside the host machine; set this key to your real deployment URL (for example https://odd.your-domain.com) in any non-local environment.

ODD Platform configuration would look like this:

Example: Gmail SMTP

A minimal, working configuration for Gmail's SMTP over STARTTLS. Gmail requires an app password (generated from your Google account with 2-Step Verification enabled) — your regular account password will not work.

Known limitations

ODD Platform builds its JavaMailSender with only the keys documented above. The JavaMail session inherits defaults for every other SMTP parameter, and several of those defaults are operator-hostile in production deployments. None of the following is currently exposed as an ODD configuration key — where a workaround exists it is noted, but the limitations are real and should drive your choice of SMTP relay.

Cleaning up

In order to remove replication slot and publication, these SQL queries must be run against the database:

  • where <> is a name of replication slot defined in the ODD Platform. Default is odd_platform_replication_slot

  • where <> is a name of publication defined in the ODD Platform. Default is odd_platform_publication_alert

Prometheus AlertManager Integration

In addition to raising alerts internally (failed jobs, data-quality tests, schema changes, distribution anomalies — see the Alerting feature), ODD Platform exposes an inbound webhook that accepts Prometheus AlertManager notifications. Each inbound alert becomes a Distribution Anomaly alert on the referenced data entity, visible in the Alerts section and on the entity's page.

Endpoint

Response: 204 No Content on success. The endpoint consumes the AlertManager webhook body and always returns empty.

Payload shape

The platform accepts a subset of the AlertManager webhook schema — specifically alerts[].labels, alerts[].generatorURL, and alerts[].startsAt. Other top-level AlertManager fields (version, status, receiver, groupLabels, commonLabels, …) are accepted and ignored.

Example AlertManager receiver configuration

A minimal alertmanager.yml receiver forwarding every alert to ODD Platform:

The reference example shipped with the platform is at docker/examples/config/alertmanager.yaml in the odd-platform repo. To make an alert route to a specific entity, attach entity_oddrn as a label in your Prometheus alerting rules — for example:

Authentication

Because no application-level authentication is enforced on this endpoint today, protect it at the perimeter. Any of these approaches works:

  • Network segmentation — expose ODD Platform only on a private network or VPN; in Kubernetes, keep AlertManager and the platform in the same cluster and use a NetworkPolicy so only the AlertManager pod can reach /ingestion/alert/alertmanager.

  • Reverse proxy with its own authentication — put an authenticating proxy in front of ODD Platform (for example, nginx with auth_request delegating to an SSO sidecar, or Envoy with ext_authz) and require AlertManager to present a proxy-validated credential on every webhook call.

  • mTLS termination — require client certificates on /ingestion/alert/alertmanager at the ingress or load balancer layer, and issue a certificate only to the AlertManager pod.

A platform-side fix to extend the ingestion auth filter to cover this endpoint is tracked upstream. Until it ships, apply one of the perimeter controls above for any deployment where the platform's network is not fully trusted.

For the broader ingestion-auth model — what auth.ingestion.filter.enabled does cover, the per-endpoint deployment matrix showing reachability under each auth.type value, and the write-shape caveats on the statistics endpoint — see Enable security and Server-to-server (S2S) API keys.

Enable Data Collaboration

Data collaboration feature allows users to initiate discussion about specific data entity in messengers directly from the ODD Platform. Thread replies are tracked by ODD Platform and saved in it, allowing users to retrieve conversation's context and decisions from one place.

For the user-facing description of the feature — the per-entity Discussions tab, how a discussion flows from the platform out to Slack and back, the message-lifecycle model — see Active platform features → Data Collaboration.

At the moment ODD Platform supports only Slack as a target messenger. It uses Slack APIs to send messages and Slack Events API to receive message's thread replies.

Slack here is the full Slack app for in-app discussions, not the alert webhook. The Data Collaboration integration uses an OAuth-token-driven Slack app (datacollaboration.slack-oauth-token) and the Slack Events API webhook to read replies back into the platform — bidirectional. It is distinct from the outgoing alert webhook used by alert notifications (notifications.receivers.slack.url, one-way write only). Each integration is configured separately: enabling this one does not route alerts, and enabling the alert webhook does not surface the Discussions tab on data-entity pages. See Main Concepts → Terms & Aliases for the side-by-side comparison.

Creating Slack application

Go to the Slack apps website and click on Create New App -> From an app manifest

Creating an app

Select a workspace you want to add an application to and click Next

Selecting a workspace to install application to

Enter the following manifest into the YAML section, replace the <ODD_PLATFORM_BASE_URL> with URL of your ODD Platform deployment and click Next.

The four bot scopes below match exactly what the platform exercises today (channels:history and channels:read for reading messages and metadata, chat:write for posting via the OAuth bot token, users:read for resolving user display names). Previous versions of this manifest also requested incoming-webhook — that scope was copy-paste leftover from a Slack example and was never used by the platform; if you are reinstalling or auditing scopes, you can safely omit it.

Inserting a YAML manifest

Review your application's scopes and permissions and click Create

Reviewing scopes and permissions

Proceed with Slack instructions on how to install application into workspace and you should be good to go.

ODD Platform configuration

Following variables need to be defined:

  • datacollaboration.enabled: must be set to true. Defaults to false. Feature toggling: this value is captured at JVM boot and frozen for the lifetime of the process — runtime configuration changes (for example via Spring Boot Actuator's /actuator/refresh) are not reflected by the feature resolver or by the platform's feature-active endpoint. Restart the JVM process for a change to take effect. Top-level UI navigation tabs (Data Modelling and adjacent surfaces) remain visible regardless of this setting; the per-page affordances inside those tabs do honour the flag. See Features → Data Collaboration for the chrome-invariance caveat.

  • datacollaboration.receive-event-advisory-lock-id: PostgreSQL advisory lock id for a job, which translates events from messengers to messages. Defaults to 110

  • datacollaboration.sender-message-advisory-lock-id: PostgreSQL advisory lock id for a job, which sends messages created in the platform to messengers. Defaults to 120

  • datacollaboration.message-partition-period: time interval in days for a message table partition in PostgreSQL. Defaults to 30

  • datacollaboration.sending-messages-retry-count: how many times the Platform will attempt to send a message to provider. Cannot be less than zero. Defaults to 3

  • datacollaboration.slack-oauth-token: Slack application OAuth token used for communicating with Slack. Can be retrieved in the OAuth & Permissions section of a Slack application.\

    Retrieving OAuth Token

Known limitations

Slack at-least-once delivery surfaces as duplicate messages

Slack's Events API retries an event delivery whenever the platform's POST /api/slack/events handler does not return a 2xx acknowledgement within roughly three seconds — the API guarantees at-least-once delivery, not exactly-once. ODD Platform does not currently deduplicate incoming events: the message_provider_event table has no UNIQUE (provider, event_id) constraint, and the INSERT in ReactiveMessageRepository.createMessageEvent issues no ON CONFLICT clause. The result is that occasional Slack retries — which happen routinely on transient network or processing delays — insert duplicate rows; the downstream processor materialises a child message row for each, so the same Slack reply can appear two or more times on the data-entity Discussions tab.

Operator-side mitigation today. Until the platform-side dedup ships upstream, audit message_provider_event for (provider, event_id) duplicates as a one-off clean-up baseline; the duplicate rows are safe to delete after confirming the downstream message rows have been similarly deduplicated. Long-term, expect the platform to add the UNIQUE constraint + ON CONFLICT DO NOTHING on the INSERT — track the upstream issue if you depend on exactly-once delivery.

Slack Events webhook has no signature verification

ODD Platform does not verify Slack's X-Slack-Signature header on incoming /api/slack/events callbacks. Any caller on the network that can reach the platform's events endpoint can submit Slack-shaped payloads and have them processed as if they came from Slack. Restrict network reach to the platform's /api/slack/events path to Slack's IP ranges at your reverse proxy, or terminate at a proxy that verifies the signature itself; a platform-side verifier is tracked upstream.

datacollaboration.message-partition-period (default 30) is read by MessageTablePartitionManager (@Value("${datacollaboration.message-partition-period:30}")) — separate from DataCollaborationProperties, which only carries the two advisory-lock IDs and the retry count. The partition manager creates a new PostgreSQL partition for the messages table every N days; lowering the value increases partition churn, raising it reduces partition count but enlarges each partition.

API surface

The full HTTP API for Data Collaboration is documented at API Reference → Data Collaboration — 7 routes across three groups (outbound to the provider, per-entity threads & history, inbound webhook from Slack), all gated by @ConditionalOnDataCollaboration and returning 404 Not Found when datacollaboration.enabled=false.

Housekeeping Settings Configuration

ODD Platform runs a background housekeeping job that permanently deletes stale data on a schedule. The job fires every 15 minutes, is guarded by a ShedLock so only one platform instance runs it at a time in a multi-instance deployment, and iterates through five cleanup tasks: resolved alerts, search-facet history, soft-deleted data entities, empty activity table partitions, and empty message table partitions. The first three consume the housekeeping.ttl.* keys below; the two partition reapers do not consume any TTL key — they drop empty past partitions when the partition-rotation orchestrator advances the partition window (see Activity-feed partitioning for the partition WIDTH key, and the Advisory-lock registry for the orchestrator's leader election).

Configuration keys

  • housekeeping.enabled: enables the background job. Defaults to true. See the caveat below before disabling.

  • housekeeping.ttl.resolved_alerts_days: how many days an alert in RESOLVED_AUTOMATICALLY status is kept after its status-update timestamp before the housekeeping job permanently deletes it (alongside its chunk records). Integer, days. Defaults to 30. Note: the retention window is intended to apply to both RESOLVED (manual) and RESOLVED_AUTOMATICALLY (system) states, but a known platform bug currently exempts manual resolutions from the retention check — manual RESOLVED alerts are hard-deleted on the next housekeeping run regardless of this value. See Alerting → Auto-cleanup of resolved alerts for the operator-side workaround.

  • housekeeping.ttl.search_facets_days: how many days a saved search-facet entry is kept past its last_accessed_at timestamp before being deleted. Integer, days. Defaults to 30.

  • housekeeping.ttl.data_entity_delete_days: how many days a data entity with status DELETED is kept after its status-update timestamp. After this, the entity and its cascading related rows — metadata values, ownerships, lineage, tags, terms, alerts, messages, metrics, attachment files (including objects in S3 / MinIO storage), task runs, group relations, and (for datasets) dataset structure and enum values — are permanently and irreversibly deleted on the next housekeeping cycle, with no restore path. Integer, days. Defaults to 30. The retention clock is the entity's status_updated_at timestamp, which the soft-delete path stamps at the moment the entity is moved to DELETED — so the key is honoured exactly as documented; a default install purges DELETED entities 30 days after deletion. See Data entity statuses → soft-delete TTL for the user-facing lifecycle (a separate, cosmetic status_updated_at mapper defect affects only non-DELETED transitions and does not change this retention behaviour).

For the user-facing entity lifecycle (how operators set DELETED and the other status states from the UI), see Data entity statuses.

The session-housekeeping job runs N× redundantly on multi-replica deployments. Spring's PostgreSQLSessionHousekeepingJobHandler fires hourly with @Scheduled(fixedRate = 1, timeUnit = HOURS) and has no leader-election guard — no @SchedulerLock, no advisory-lock acquisition (inconsistent with the rest of the platform's scheduled jobs, which join the Advisory-lock registry). On an INTERNAL_POSTGRESQL session-provider deployment with N replicas, every replica runs the DELETE FROM SPRING_SESSION WHERE expiry_time < now() query every hour. The deletes are idempotent so data integrity is fine — the operator cost is N× redundant database load. Note that with the shipped default spring.session.timeout: -1 (sessions never expire), the job is a no-op regardless of leader count.

Advisory-lock registry

Several ODD Platform subsystems use PostgreSQL advisory locks to ensure that only one platform replica runs a given background loop at a time (the leader-election pattern for multi-replica deployments). Each subsystem owns one or more advisory-lock IDs, configured via dedicated *.advisory-lock-id keys. Operators overriding any of these IDs in a deployment overlay must treat them as a single flat namespace across the platform — collisions are not detected at startup and produce silent feature wedges (see the warning below).

Configuration key

Default ID

Owning subsystem

@ConfigurationProperties class

Single-leader role

notifications.wal.advisory-lock-id

100

Notifications subscriber that reads from the WAL replication-slot-name and dispatches alert messages

OddNotificationsProperties

One platform replica subscribes to the WAL stream

partition.advisory-lock-id

90

Partition orchestrator that creates next-period partitions on activity and message tables

PartitionProperties

One platform replica advances the partition window

datacollaboration.receive-event-advisory-lock-id

110

Data Collaboration inbound event reader (Slack Events → message_provider_event queue)

DataCollaborationProperties

One platform replica drains the inbound event queue

datacollaboration.sender-message-advisory-lock-id

120

Data Collaboration outbound message sender (message queue → Slack)

DataCollaborationProperties

One platform replica drains the outbound message queue

partition.advisory-lock-id is deliberately shared between two managersActivityTablePartitionManager and MessageTablePartitionManager both acquire ID 90. This is intentional: one platform replica is elected as the global partition leader and serialises the partition-rotation work for both tables. Treat it as one logical leader, not two colliding subsystems.

The Housekeeping orchestrator (see Housekeeping Settings Configuration above) does not appear in this table because it uses ShedLock (a Spring-side distributed-lock library) rather than a PostgreSQL advisory lock. ShedLock writes to a shedlock table to coordinate the leaders, so its multi-replica behaviour is documented separately.

Platform-level settings (odd.*)

The odd.* namespace groups four platform-wide settings that do not belong to any subsystem: stale-metadata detection, the optional Prometheus tenant label, the Activity-feed partitioning period, and a list of additional navigation links surfaced in the App Info menu. A fifth key in the same namespace, odd.platform-base-url, is documented above in Enable Alert Notifications → odd.platform-base-url — that section is the primary operator-facing context where the key is introduced, but the same key is also consumed by the integration-parameter substitution context, so any non-local deployment must set it regardless of which subsystems (notifications, integrations, or both) are enabled.

Detecting stale metadata

Stale metadata is metadata that has not been refreshed from its source for longer than an operator-defined window. This typically happens when a collector is paused, deactivated, or failing to reach the source system. When the platform judges an entity to be stale, the UI surfaces it with a "Stale" indicator so users can distinguish data whose freshness is uncertain from actively-maintained metadata. For the user-facing surface (where the indicator appears, how the freshness signal differs from runtime alerts), see Stale-metadata indicator.

  • odd.data-entity-stale-period: number of days after the entity's last successful ingestion before it is labeled "Stale" in the UI and API. Integer, days. Defaults to 7.

Operators running collectors on schedules longer than a week should raise this value to match the collector cadence — otherwise entities that were ingested successfully will be flagged stale between runs.

Prometheus tenant label (odd.tenant-id)

When metrics.storage is set to PROMETHEUS, the platform appends tenant_id={value} as a label on every Prometheus instant query it issues. This lets a single shared Prometheus instance serve metric data for multiple ODD Platform deployments without their metric series colliding — each deployment queries only its own tenant-labeled series.

  • odd.tenant-id: tenant identifier appended as a Prometheus query label. String, no default (empty means no label is applied, and the Prometheus query returns series across all tenants). Ignored when metrics.storage=INTERNAL_POSTGRES.

Activity-feed partitioning (odd.activity.partition-period)

The ODD Platform activity table is range-partitioned on a rolling date window; odd.activity.partition-period sets the partition width in days. The default creates a new partition every 30 days, which is appropriate for most deployments. Operators running high-volume deployments (millions of activity events per day) can tune this downward to narrow partitions — smaller partitions speed up vacuum and partition-prune operations on the activity feed.

  • odd.activity.partition-period: partition width in days for the activity table. Integer, days. Defaults to 30.

Operators can attach a list of arbitrary navigation links — pointers to internal wikis, runbooks, dashboards, or any other page teams should reach from inside ODD Platform. The platform UI surfaces them inside the App Info menu (the popup behind the information icon in the top-right toolbar). Each link renders as a menu item showing its title and opens the configured URL in a new tab when clicked.

  • odd.links: list of link objects. Each entry has two required fields:

    • title: the menu-item label shown in the App Info menu. String, required.

    • url: the absolute URL the menu item opens in a new tab. String, required.

Defaults to an empty list — when unset, the App Info menu omits the additional-links section entirely.

The links are exposed to the UI through the authenticated GET /api/links endpoint and are visible to every user signed in to the platform. Use them for navigation hints only — do not embed credentials, session tokens, or one-time secrets in link URLs, since any logged-in user can read them.

Three known limitations apply to odd.links and the App Info menu that renders them. None of these is blocking for typical operator-curated link sets, but all three matter when the link source is less trusted (free-text Helm chart overrides, multi-tenant config templates, anything an end-user can influence).

The App Info menu is not keyboard- or touch-accessible today. The information-icon button declares aria-haspopup="true" and aria-controls={menuId} — surfaces that announce keyboard accessibility to assistive technology — but the open handler is wired only on onMouseEnter. There is no onClick, no onKeyDown, and no onFocus. Touch-device users (iOS Safari, Android Chrome) do not generate mouseenter; keyboard-only and screen-reader users cannot open the menu. The Documentation, Slack, Feedback, and operator-configured odd.links destinations are unreachable through the menu for these audiences — direct URLs are the workaround until a platform-side onClick / onKeyDown fix ships. Operators serving keyboard-only or screen-reader audiences should treat this as a known WCAG 2.1 SC 2.1.1 limitation.

Attachment Storage Configuration

ODD Platform allows users to attach files and links to data entities from the UI. This section covers the operator-facing configuration for where those uploaded files are stored. For the user-facing upload workflow (what users can attach, the per-entity Attachments tab, the DATA_ENTITY_ATTACHMENT_MANAGE permission), see Attachments and links.

Configuration keys

  • attachment.storage: storage backend. One of LOCAL or REMOTE. Defaults to LOCAL.

  • attachment.max-file-size: the per-file upload limit the UI enforces before upload, in megabytes. Defaults to 20. The platform surfaces this value to the web UI as a client-side pre-upload size check; the server does not re-validate per-file size on the upload path. spring.codec.max-in-memory-size (below) bounds only the in-memory buffer for a single request/chunk — and because attachment uploads are chunked and streamed to disk, it is not a ceiling on the assembled file. There is therefore no effective server-side total-file size limit: a direct (non-UI) API caller can exceed attachment.max-file-size by any amount. See the hint below if raising this above 20 MB.

  • attachment.local.path: filesystem directory where attachments are written when storage=LOCAL. Defaults to /tmp/odd/attachments (ephemeral — see warning above).

  • attachment.remote.url: S3-compatible endpoint URL when storage=REMOTE (for example https://s3.us-east-1.amazonaws.com for AWS S3 or http://minio:9000 for a MinIO service). See the Known limitations (REMOTE mode) subsection below before choosing your endpoint — in particular the us-east-1 restriction for AWS S3 and the chunked-upload staging behavior.

  • attachment.remote.access-key: access key for the S3-compatible bucket.

  • attachment.remote.secret-key: secret key for the S3-compatible bucket.

  • attachment.remote.bucket: bucket name used to store attachment objects. The bucket must already exist — ODD Platform does not create it.

  • spring.codec.max-in-memory-size: platform-wide cap on the in-memory buffer Spring WebFlux uses when reading a single request body / upload chunk. Defaults to 20MB. A single chunk larger than this fails at the codec layer; because uploads are chunked, this does not bound the total assembled file. Accepts a size string (20MB, 100MB, 1GB).

Example: REMOTE storage with S3-compatible backend (MinIO or AWS S3)

Known limitations (REMOTE mode)

ODD Platform builds its MinioAsyncClient with only the endpoint and credentials documented above. The MinIO Java SDK inherits defaults for every other parameter, and the attachment-upload code path carries a small amount of additional behavior that is not configurable. None of the following is currently exposed as an ODD configuration key — plan your deployment around these limits rather than assuming a config flag will fix them.

Example: LOCAL storage (single-host / local evaluation only)

If you keep LOCAL mode, override attachment.local.path to a persistent volume mount rather than the default /tmp/odd/attachments, and confirm the volume is actually persistent across restarts in your deployment topology.

Logging Settings Configuration

Logs provide detailed information about errors in the application helping its users quickly identify and fix problems. Setting up logging is recommended for ensuring operational excellence, system reliability, effective monitoring and troubleshooting. Here is a code snippet for setting up logs in ODD Platform:

Setting the logging level to info allows you to see useful messages about the platform’s functioning without being overwhelmed by too much detail as with trace or debug or missing important issues as with warn or higher level. However, feel free to adjust the logging level as needed to get more or less information based on your specific requirements.

GenAI Configuration

The platform can proxy natural-language questions to an external AI service via three keys under the genai prefix (@ConfigurationProperties("genai") per GenAIProperties.java). The feature is disabled by default and is API-only today (no in-app UI affordance calls the endpoint).

  • genai.enabled (boolean, env GENAI_ENABLED) — feature toggle. Default false (set explicitly at application.yml line 18). When false, POST /api/genai/ask returns HTTP 400 with the message "Gen AI is disabled". Feature toggling: this value is captured at JVM boot — restart the JVM process for a change to take effect; runtime configuration changes are not honoured. See Features → Data Collaboration for the platform-wide boot-immutability caveat.

  • genai.url (string, env GENAI_URL) — base URL of the external AI service. The platform's genAiWebClient is built at startup with this as baseUrl and POSTs each request to {genai.url}/query_data. No @ConfigurationProperties default — the field has no initializer in GenAIProperties.java, so its Java default is null. The example in application.yml line 19 (# url: http://localhost:5000) is commented out, not a default.

  • genai.request_timeout (integer, env GENAI_REQUEST_TIMEOUT) — outbound response timeout, in minutes. Wired into WebClientConfiguration.java:23 as Duration.ofMinutes(genAIProperties.getRequestTimeout()). No @ConfigurationProperties default — the Java primitive int default is 0, which means immediate timeout. The example in application.yml line 20 (# request_timeout: 2) is commented out, not a default.

A working configuration block:

The platform sends no authentication to the external AI service and does not retry. See the dedicated GenAI assistant page for the external service contract (POST /query_data with JSON {"question": "..."}), the platform's /api/genai/ask request/response schemas, and the per-error behavior.

Machine-to-Machine (M2M) Tokens Configuration

ODD Platform supports a static API-key authentication mode for non-UI callers (CI/CD jobs, ingestion pipelines, automation scripts) — also referred to as Machine-to-Machine (M2M) tokens. It is disabled by default.

For the full configuration keys, the header contract, the curl example, and security considerations (token rotation, HTTPS, blast radius), see Server-to-server (S2S) authentication.

Last updated