Configure ODD Platform
This section defines how to configure ODD Platform in order to leverage all of its functionality and features.
This page is the post-deployment configuration reference for the running Platform — every application.yml key the Platform consumes. For the deployment path itself (Docker Compose, Helm, AWS EKS, build from source), start at Deployment Options.
Configuration approaches
There are two ways to configure the Platform:
Environment variables are used for simple entries
Configuring via YAML can come in handy when it is necessary to define a complex configuration block (e.g OAuth2 authentication or logging levels).
YAML entries VS environment variables
Here is an example of how to define the following block and configure the Platform with it using environment variables.
YAML:
spring:
datasource:
url: URL
username: USERNAME
password: PASSWORD
custom-datasource:
url: URL
username: USERNAME
password: PASSWORDTo configure the Platform using environment variables, replace semicolons with underscores and uppercasing words, like so:
SPRING_DATASOURCE_URL=URLSPRING_DATASOURCE_USERNAME=USERNAMESPRING_DATASOURCE_PASSWORD=PASSWORDSPRING_CUSTOM_DATASOURCE_URL=URLSPRING_CUSTOM_DATASOURCE_USERNAME=USERNAMESPRING_CUSTOM_DATASOURCE_PASSWORD=PASSWORD
Connect your database
For all of its features ODD Platform uses PostgreSQL database and PostgreSQL database only. These variables are needed to be defined to connect ODD Platform to database:
spring.datasource.url: JDBC string of your PostgreSQL database. Default value isjdbc:postgresql://127.0.0.1:5432/odd-platformspring.datasource.username: your PostgreSQL user's name. Default value isodd-platformspring.datasource.password: your PostgreSQL user's password. Default value isodd-platform-password. Override this before any non-localhost deployment — see Management endpoint exposure and credential hygiene for why the shipped default is a load-bearing operator-override.
These variables are optional and will be used to connect to PostgreSQL and store Lookup Tables. Each of the three keys is declared in R2DBCConfiguration as @Value("${spring.custom-datasource.X:}") — the trailing colon with no value means the @Value default is the empty string, not the JDBC URL / username / password values listed below. When a key is unset (or blank), the bean factory falls back to the corresponding primary spring.datasource.* value at startup. The values below are therefore the fallback an operator observes with a default deployment, not the spring.custom-datasource.* keys' own defaults — so overriding spring.datasource.url will also change what spring.custom-datasource.url resolves to:
spring.custom-datasource.url: JDBC string of your PostgreSQL database where we store Lookup Tables. Falls back tospring.datasource.urlwhen unset; the platform's primaryspring.datasource.urldefault isjdbc:postgresql://127.0.0.1:5432/odd-platform. Note: you can specify any {database_host}, {database_port} or {database_name} but schema, where Lookup Tables are stored always is lookup_tables_schema.spring.custom-datasource.username: your PostgreSQL user's name for custom-datasource. Falls back tospring.datasource.usernamewhen unset; the platform's primaryspring.datasource.usernamedefault isodd-platform.spring.custom-datasource.password: your PostgreSQL user's password for custom-datasource. Falls back tospring.datasource.passwordwhen unset; the platform's primaryspring.datasource.passworddefault isodd-platform-password.
So that your database connection defining block would look like this:
Security
Please follow the Enable security section for enabling security in ODD Platform.
Management endpoint exposure and credential hygiene
The platform's Spring Boot Actuator endpoints (/actuator/**) are intentionally whitelisted ahead of the authentication chain in every auth.type, and the shipped configuration enables the env and info endpoints. The shipped database password is a well-known string. Together these defaults turn a default deployment into a one-line-away-from-full-PostgreSQL-compromise system if exposed on a non-trusted network. The mitigations below are the operator's responsibility today. For the monitoring use of these endpoints — wiring liveness/readiness probes to /actuator/health and scraping /actuator/prometheus — see Health and monitoring.
/actuator/** is anonymously reachable in every auth mode
/actuator/** is anonymously reachable in every auth modeSecurityConstants.WHITELIST_PATHS contains /actuator/**. Reachable before the auth chain runs in DISABLED, LOGIN_FORM, OAUTH2, and LDAP alike. The shipped application.yml enables management.endpoint.env.enabled=true but sets no management.endpoint.env.show-values, so the Spring Boot default (NEVER) applies — /actuator/env redacts every property value (******) for every caller, authenticated or not, including spring.datasource.url. What an unauthenticated caller scraping /actuator/env does learn is the configuration-key schema: which keys and property sources are present — which OAuth2 providers are wired (by their key prefixes), whether LDAP is configured, whether REMOTE attachment storage is set up, and that a JDBC datasource is configured (the key, not its value). Values stay masked unless an operator sets show-values to WHEN_AUTHORIZED or ALWAYS; the exposure to mitigate is the unauthenticated reachability of the endpoint and the configuration schema it reveals.
Apply at least one of the mitigations below for any deployment reachable from outside a fully-trusted network:
Separate management port
Set management.server.port: 8081 and route :8081 only on your internal management network.
All production deployments.
Firewall the actuator path
Add a reverse-proxy rule rejecting /actuator/** from the public CIDR range; allow only your monitoring network.
Single-port deployments where a separate management port is infeasible.
Restrict default exposure
Set management.endpoints.web.exposure.include: health,prometheus (drop env, info).
All production deployments — combine with one of the network-level mitigations above.
A platform-side default-restriction is tracked upstream; until it ships, do not rely on the platform default for any reachable deployment.
The database password ships with a well-known default
application.yml ships spring.datasource.password: odd-platform-password as the default. An operator deploying ODD without explicitly overriding the property deploys with a public, documented credential. (The JDBC URL value is masked at /actuator/env under the default show-values: NEVER described above — but the password itself is public, and a database whose host is co-located with or guessable from the deployment topology is then one well-known credential away from compromise.) Override spring.datasource.password (and spring.custom-datasource.password if spring.custom-datasource.* is configured separately from the primary datasource) before exposing the platform on any non-localhost network. This is the same class of silent-insecure-default risk that previously affected attachment storage on container restart — read once, override before deployment, never assume the shipped default is safe.
Configuration-properties classes include credentials in toString()
toString()ODDLDAPProperties and ODDOAuth2Properties.OAuth2Provider carry Lombok's @Data annotation alongside their password and clientSecret fields, respectively. @Data generates a toString() that includes every field verbatim — there is no @ToString.Exclude on any credential field today. A future log statement (log.info("loaded properties: {}", properties)) or an exception handler that emits properties on boot failure would write LDAP passwords and OAuth client secrets in cleartext to log infrastructure. Treat the platform's application logs as credential-sensitive: route them to an audit-grade log sink, redact at the log-pipeline tier if you cannot guarantee end-to-end access control, and do not store them in unrestricted long-term archives. A platform-side @ToString.Exclude rollout across credential fields is tracked upstream.
Select session provider
ODD Platform stores HTTP session state in one of three places: the platform JVM (in-memory), the platform's PostgreSQL database, or an external Redis data store. The provider is selected with session.provider (SESSION_PROVIDER env var) and accepts one of three values:
IN_MEMORY— sessions live in aConcurrentHashMapinside the JVM. ODD Platform defaults to this value.INTERNAL_POSTGRESQL— sessions are persisted to the platform's PostgreSQL database (SPRING_SESSION/SPRING_SESSION_ATTRIBUTEStables).REDIS— sessions are persisted to an external Redis data store via Spring Session's@EnableRedisWebSession.
Quick selection guidance:
Single-instance deployment, restart-tolerant logout acceptable →
IN_MEMORYMulti-instance deployment or persistence across restarts is required →
INTERNAL_POSTGRESQL(no extra infrastructure) orREDIS(if you already operate Redis or need sub-millisecond session reads)
Each provider has operator-visible characteristics that affect sizing, multi-instance behavior, and connection wiring. Read the relevant subsection before deploying.
IN_MEMORY (default)
IN_MEMORY (default)Sessions are kept in a ConcurrentHashMap inside the platform JVM, wrapped by Spring Session's ReactiveMapSessionRepository. Suitable for local development and single-instance evaluations where session loss on restart is acceptable.
Characteristics & caveats
Sessions are lost on every platform restart. The session map lives in heap; any restart (deploy, crash, container recycle) clears it and forces every authenticated user to log in again.
No multi-instance support. Two ODD Platform instances behind a load balancer each maintain a separate session map. A request that lands on a different instance than the one that authenticated the user appears unauthenticated. Collector data-source registration is especially affected — the
/ingestion/datasourcesfilter writes acollectorIdinto the request's session, and the subsequentPOST /ingestion/datasourceshandler reads it back; if the two requests hit different replicas, the handler raises anIllegalStateException("Collector id is null")returned to the collector as HTTP 500. For multi-replica deployments chooseINTERNAL_POSTGRESQLorREDIS.Eviction is by Spring Session expiry only. The repository wraps a raw
ConcurrentHashMapwith no secondary eviction policy (no LRU, no max-entries cap). A long-running platform with many short-lived sessions accumulates map entries until each entry's TTL elapses; high-traffic deployments running with the shipped defaultspring.session.timeout: -1(no timeout) accumulate sessions indefinitely. Set a finitespring.session.timeout(see Session lifetime below) to bound the in-memory footprint.
INTERNAL_POSTGRESQL
INTERNAL_POSTGRESQLSessions are persisted in the platform's own PostgreSQL database, in the SPRING_SESSION and SPRING_SESSION_ATTRIBUTES tables. ODD Platform implements a custom JOOQ-based reactive JooqSessionRepository for this provider — the standard spring.session.jdbc.* Spring Session keys do not apply. Connection settings reuse the existing platform spring.datasource.* configuration; no additional database wiring is required.
Characteristics & caveats
Sessions survive platform restarts. Authenticated users remain logged in across deploys (until their session row's TTL has passed).
Multi-instance support. All ODD Platform instances point at the same database, share the session tables, and can serve requests for any authenticated user regardless of which instance answered the original login.
Expired-session cleanup runs hourly and is not configurable. A
@Scheduled(fixedRate = 1, timeUnit = HOURS)housekeeping job (PostgreSQLSessionHousekeepingJobHandler.deleteExpiredSessions) deletes rows whoseEXPIRY_TIMEis in the past from bothSPRING_SESSIONandSPRING_SESSION_ATTRIBUTES. Expired session rows therefore remain in the tables for up to one hour past their TTL before being cleaned. The cadence is hardcoded — there is no config key to tune it.Sizing implication. When sizing the database (connection pool, disk, vacuum schedule), assume the session tables hold the high-water-mark count of authenticated users plus up to one hour of post-expiry stragglers. For high-cardinality / short-TTL deployments (many users, short
spring.session.timeout), the post-expiry overhang can dominate steady-state row count.
REDIS
REDISSessions are persisted to an external Redis data store via Spring Session's @EnableRedisWebSession. Suitable for multi-instance deployments that already operate Redis, or that need sub-millisecond session reads. ODD Platform does not bundle Redis; the operator must provide a Redis 6+ instance and supply its connection settings under the spring.data.redis.* namespace (Spring Boot 3.x; the legacy spring.redis.* prefix from Spring Boot 2.x has been removed and will not bind).
Characteristics & caveats
Sessions survive platform restarts and span instances — same persistence behavior as
INTERNAL_POSTGRESQL, but reads and writes happen against Redis directly.Connection wiring is operator-supplied. Unlike
INTERNAL_POSTGRESQL(which reuses the platform's existing PostgreSQL connection), Redis settings must be configured separately. ODD Platform'sapplication.ymlships no Redis defaults — every operator deploying withREDISmust set at least the host and port, plus credentials and TLS for any production deployment.TLS, pool sizing, and command timeouts inherit Spring Data Redis defaults unless explicitly overridden. For managed Redis providers (AWS ElastiCache, Redis Cloud, Azure Cache for Redis) and any TLS-required Redis deployment, set
spring.data.redis.ssl.enabled: true. For high-concurrency deployments, tune the Lettuce connection pool withspring.data.redis.lettuce.pool.*.Eviction is delegated to Redis. ODD Platform does not run a housekeeping job for Redis-stored sessions; the Redis server's own per-key TTL and
maxmemory-policygovern session eviction. Configure your Redis instance accordingly.
The health endpoint is blind to Redis by default. With REDIS selected, every authenticated request depends on Redis — but the bundled configuration ships management.health.redis.enabled: false, and the REDIS session wiring registers no health contributor of its own. A Redis outage (server down, unreachable, or evicting under maxmemory) therefore returns errors to every logged-in user while /actuator/health keeps reporting UP — a load balancer or Kubernetes readiness probe pointed at it keeps routing traffic to a platform that cannot serve a single authenticated request. If you deploy with REDIS, set management.health.redis.enabled: true so the Redis indicator participates in the health verdict, and do not rely on a bare /actuator/health probe alone to detect a session-store outage.
Required and optional connection keys (Spring Boot 3.x — spring.data.redis.*)
spring.data.redis.*)spring.data.redis.host: Redis host. Defaults tolocalhost.spring.data.redis.port: Redis port. Defaults to6379.spring.data.redis.username: Redis ACL username. Optional; omit for password-only or no-auth Redis.spring.data.redis.password: Redis password. Optional but recommended for any production deployment.spring.data.redis.database: Redis logical database index. Defaults to0.spring.data.redis.ssl.enabled: enable TLS for the Redis connection. Boolean, defaults tofalse. Set totruefor any managed-Redis or TLS-terminated Redis deployment.spring.data.redis.timeout: command timeout. Duration string (for example5s). Defaults to Spring Data Redis's internal default.spring.data.redis.lettuce.pool.*: Lettuce connection-pool sizing (max-active,max-idle,min-idle,max-wait). Optional; tune for high-concurrency deployments.
ODD Platform does not extend or override Spring Boot's Redis property catalogue — the full set of keys recognized under spring.data.redis.* in your Spring Boot version applies as-is.
spring.redis.* (the Spring Boot 2.x prefix) is silently ignored. Spring Boot 3.x removed the spring.redis.* namespace and relocated all Redis properties under spring.data.redis.*. Configuration written against the older prefix will not bind, the platform falls back to localhost:6379 defaults, and the symptom is connection failures against your real Redis instance with no obvious "wrong key" error. Migrate any pre-3.x configuration to spring.data.redis.* (and SPRING_DATA_REDIS_* for env vars).
Session lifetime (spring.session.timeout)
spring.session.timeout)Spring Session's timeout controls how long an authenticated session remains valid between requests. ODD Platform's shipped default is -1, which means sessions never expire.
spring.session.timeout: -1 means sessions never expire. A user who logs in once remains authenticated until their session record is explicitly invalidated (logout, cache eviction, or — for IN_MEMORY — platform restart). For any deployment that is internet-facing or serves multiple users, set spring.session.timeout to a finite duration so stolen cookies and forgotten sessions eventually lapse.
spring.session.timeout: session idle timeout. Duration string (for example30m,8h,1d). Defaults to-1(no timeout). Applies to all three providers (IN_MEMORY,INTERNAL_POSTGRESQL,REDIS).
Cookie attributes (Secure, SameSite, HttpOnly)
Secure, SameSite, HttpOnly)ODD Platform does not stamp Secure, SameSite, or HttpOnly attributes on the session cookie at the application tier — there is no CookieWebSessionIdResolver bean in the platform's session configuration today. The browser-side cookie posture is whatever Spring's default SESSION cookie defaults are (no Secure, no SameSite directive, HttpOnly set), which is unsuitable for any internet-facing deployment.
Operators must stamp the production attributes at the deployment topology layer — typically the TLS-terminating reverse proxy or load balancer. For nginx, the directive looks like:
Match the equivalent for your ingress controller (Traefik, Envoy, Cloud Load Balancer, etc.). Until a platform-side default-stamping bean ships upstream, this stamping is the operator's responsibility — running ODD over plain HTTP or behind a permissive proxy means the session cookie travels in clear and is vulnerable to cross-site-request and cookie-leak attacks regardless of which auth.type is configured.
Java-serialised session attributes under INTERNAL_POSTGRESQL
INTERNAL_POSTGRESQLThe INTERNAL_POSTGRESQL provider stores session attribute values as raw bytes produced by Java's native SerializationUtils.serialize / .deserialize. Java native serialisation has a well-known deserialisation-gadget surface — code paths reachable on attribute load are influenced by the byte stream, so a write-access compromise of the SPRING_SESSION_ATTRIBUTES table yields a deserialisation entry point on the next session read.
Defence-in-depth recommendations for deployments running INTERNAL_POSTGRESQL:
Restrict write access to the
SPRING_SESSION_ATTRIBUTEStable to a single platform service account; do not share database credentials with other applications that store data in the same Postgres instance.Deploy the platform's PostgreSQL with strong network segmentation — the database should not be reachable from any service except the platform itself.
If you cannot guarantee write-access isolation, prefer the
REDISprovider — Spring Session's Redis serialiser uses a string-key Jackson JSON serialiser rather than Java native serialisation.
A platform-side migration to JSON serialisation for session attributes is tracked upstream.
Enable Metrics
ODD Platform can represent some of the metadata it ingests as time-series charts — for example, row counts on a MySQL table or the on-disk size of a Redshift database. Metrics handling splits into two independent concerns that share the metrics.* config namespace but do different jobs:
Storage (
metrics.storage) — the storage tier the platform uses for ingested metrics. This selects where the platform writes metric points as they arrive from collectors and where it reads them back when rendering UI charts. Both directions hit the same backend — you cannot write to one and read from another.Export (
metrics.export.*) — where the platform pushes metrics out as OpenTelemetry telemetry, for long-term retention and dashboarding in your observability stack.
Configure the two independently; it is valid (and common) to run with INTERNAL_POSTGRES storage and no OTLP export, or with PROMETHEUS storage and OTLP export disabled, or any other combination.
Metric storage backend
metrics.storage selects the storage tier for metric writes and reads:
INTERNAL_POSTGRES(default) — metrics are written to and read from the ODD Platform's own PostgreSQL database (metric_series/metric_pointtables). Zero additional infrastructure; suitable for most single-cluster deployments.PROMETHEUS— metrics are remote-written to an external Prometheus instance (via the Prometheus remote-write protocol at/api/v1/write, using Snappy-compressed Protobuf-encoded write requests) and queried from the same instance (via the instant-query API at/api/v1/query). Suitable when you already run Prometheus for observability and want to avoid storing duplicate metric data in ODD's PostgreSQL.
metrics.prometheus-host is the base URL of the Prometheus instance and is only consulted when metrics.storage=PROMETHEUS. Both /api/v1/write and /api/v1/query are called on this single host. Defaults to http://localhost:9090.
metrics.storage=PROMETHEUS requires metrics.prometheus-host to be set. The platform validates this at startup — if metrics.prometheus-host is empty (or unset) while metrics.storage=PROMETHEUS, ODD Platform fails to start with IllegalStateException: Prometheus host is not defined. Set it to the Prometheus base URL (for example http://prometheus:9090) in the same configuration change that flips the storage backend.
The Prometheus instance must accept remote-write AND queries on the same endpoint. ODD Platform does not support splitting read and write paths across different hosts.
Prometheus server flag —
--web.enable-remote-write-receivermust be enabled on the Prometheus process. It is disabled by default in Prometheus v2.33+; without it, every ODD Platform metric write returns404 Not Foundand is silently dropped. The ingestion API still returns200to the collector because the remote-write happens downstream of the HTTP acknowledgement, so collector logs will not surface the failure — the symptom is empty charts in the UI.Endpoint must support both paths —
POST /api/v1/write(for writes) andGET /api/v1/query(for reads) must both resolve to the same Prometheus-compatible host.Read-only Prometheus-compatible backends do not work. A Thanos querier, Mimir in query-only mode, or any other backend that exposes
/api/v1/querybut rejects/api/v1/writecannot be used as ametrics.storage=PROMETHEUStarget. Pointmetrics.prometheus-hostat the write-accepting Prometheus instance itself (or at a Mimir distributor that terminates both paths).
Multi-tenant deployments cannot share an INTERNAL_POSTGRES instance — the default backend has no tenant column. The odd.tenant-id configuration is only appended to Prometheus series (see Prometheus tenant label below); on INTERNAL_POSTGRES the metric tables (metric_series, metric_point, metric_entity) have no tenant_id column at all. Two ODD Platform deployments writing to the same Postgres instance see each other's metrics on every entity's Metrics tab — there is no platform-side filter. If your deployment needs metric isolation across tenants, choose PROMETHEUS storage and configure odd.tenant-id per deployment, or run each deployment against its own Postgres instance / schema. The same class of silent-default risk that previously affected attachment storage on container restart applies here. The operator-facing framing of this caveat — including the workflow guidance for choosing between the two backends — is on Active platform features → Metrics Ingestion.
Switching metrics.storage after a deployment has been live is one-way — historical metric data does not migrate. The two storage backends are independent stores; the platform writes to whichever is configured and reads from the same one. After a switch (either direction), the previously-stored history remains in the old backend but is no longer queryable from the platform UI or API. Plan storage-backend changes as one-time cutovers and annotate the cutover date in your runbook — the Metrics tab on each entity will show no data older than the switch. Operator-facing framing on Active platform features → Metrics Ingestion.
Metric export to OTLP
Independent of where metrics are stored, ODD Platform can push metrics as OpenTelemetry telemetry to an OTLP collector. Downstream you can forward that stream to Prometheus, New Relic, or any backend that accepts OTLP exporters.
metrics.export.enabled: must be set totrueto build and wire the OTLP exporter bean. Defaults tofalse.metrics.export.otlp-endpoint: OTLP collector endpoint (gRPC). Defaults tohttp://localhost:4317.
Enable Alert Notifications
Any alert that is created inside the platform can be sent via webhook and/or Slack incoming webhook and/or email notifications (via Google SMTP, AWS SMTP, etc). Such notifications contain information such as:
Name of the entity upon which alert has been created
Data source and namespace of an entity
Owners of an entity
Possibly affected entities
ODD Platform's outbound notification delivery tails the alert table through a PostgreSQL logical-replication slot. Because the slot durably tracks its position in the write-ahead log, delivery resumes from the last unprocessed alert after a platform restart or a transient interruption of the database connection — alerts raised during the downtime are delivered once delivery catches up, not dropped. Alert creation itself is a plain database insert and does not depend on replication; this prerequisite applies only to outbound notification delivery. To enable it, the underlying PostgreSQL database must be configured for logical replication.
For the user-facing description of the alerting feature — alert types, the per-entity alert tabs, the lifecycle, and per-entity halt configuration — see Active platform features → Alerting. For the user-facing description of the outbound notification channels (Slack incoming webhook, email, generic webhook) and the Prometheus AlertManager inbound webhook, see Active platform features → Notifications.
Slack here is the outgoing alert webhook, not the Discussions Slack app. The alert-notifications integration is a one-way Slack incoming webhook — the platform POSTs alert messages to a channel via notifications.receivers.slack.url. It is distinct from the full Slack app used by Data Collaboration for in-app per-entity discussion threads (OAuth + Events API; bidirectional). Each integration is configured separately: enabling the alert webhook does not surface the Discussions tab on data-entity pages, and enabling Data Collaboration does not route alerts. See Main Concepts → Terms & Aliases for the side-by-side comparison.
PostgreSQL Configuration
PostgreSQL database must be configured in order to leverage the replication mechanism of the Platform along with the granting the database user replication permissions.
Database settings
To configure the database, add the following entries to the postgresql.conf file:
Or if the replication mechanism is already configured, just increment the max_wal_senders and max_replication_slots numbers.
Database user permissions
ODD Platform database user must be granted with replication permissions:
User permissions and database configuration may vary from one on-demand/cloud provider to another.
For instance, In AWS RDS, PostgreSQL instances are managed services where certain aspects of replication management are automated. This is done to minimize the risk of misconfiguration. Due to this managed nature, some settings are either not exposed or are altered differently compared to a standard PostgreSQL setup. To enable notifications in such an environment, follow these steps (only differences are mentioned): 1. Alter the rds.logical_replication parameter in your database instance's Parameter Group by setting it to 1, instead of directly modifying the wal_level parameter. 2. Ensure the ODD user connecting to the database has the rds_replication role. The Master username of the database typically already has this role by default. If using a different username, you may need to assign the necessary role using the command GRANT rds_replication TO {your_database_username}; 3.If you changed max_wal_senders to 5 (as it's mentioned as a minimal value in Parameter Group) and then constantly getting messages like "The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 5 to 55" in the events list of the database instance, please, consider adjusting the parameter from 5 to the mentioned value in the parameter group to exclude automatic change done by RDS.
ODD Platform configuration
Following variables need to be defined:
notifications.enabled: must be set totrue. Defaults tofalse. Feature toggling: this value is captured at JVM boot and frozen for the lifetime of the process; restart the JVM for a change to take effect. The same boot-immutable pattern applies to every platform-feature flag in this document — see Features → Data Collaboration for the catalogue and the chrome-invariance framing.notifications.message.downstream-entities-depth: limits the amount of fetching of affected data entities in terms of lineage graph level. Defaults to 1notifications.wal.advisory-lock-id: ODD Platform uses PostgreSQL advisory lock in order to make sure that in a case of horizontal scaling only one instance of the Platform processes alert messages. This setting defines advisory lock id. Defaults to100notifications.wal.replication-slot-name: PostgreSQL replication slot name will be created if it doesn't exist yet. Defaults toodd_platform_replication_slotnotifications.wal.publication-name: PostgreSQL publication name will be created if it doesn't exist yet. Defaults toodd_platform_publication_alertnotifications.receivers.slack.url: Slack incoming webhook URL. The clickable links rendered inside Slack messages useodd.platform-base-url— there is nonotifications.receivers.slack.*base-URL setting.notifications.receivers.webhook.url: Generic webhook URLnotifications.receivers.email.host: the SMTP server.notifications.receivers.email.port: the port used for the email protocol (SMTP, IMAP, or POP3)notifications.receivers.email.protocol: the email transport protocol. Use the lowercase valuesmtp— any other value (including uppercaseSMTP) silently disables STARTTLS and SMTP AUTH; see the caveat below.notifications.receivers.email.smtp.auth: a boolean value (true or false) indicating whether the SMTP server requires authenticationnotifications.receivers.email.smtp.starttls: a boolean indicating whether to use STARTTLS, a security protocol that upgrades an unencrypted connection to an encrypted onenotifications.receivers.email.password: the password used for email authenticationnotifications.receivers.email.sender: the email address sending the notificationsnotifications.receivers.email.notification.emails: the list of recipients for the email notifications
A generic-webhook receiver must respond HTTP 200 — 201 / 202 / 204 are treated as a delivery failure and the alert is dropped. The platform's webhook sender treats any response status other than exactly 200 OK as a failed delivery, so a receiver that returns 202 Accepted (a common async-ingest convention) silently loses alerts with no operator-visible cause. Configure the endpoint behind notifications.receivers.webhook.url to return 200 on accept.
The email protocol value must be the lowercase string smtp for STARTTLS and SMTP AUTH to engage. The platform sets mail.smtp.auth and mail.smtp.starttls.enable only when notifications.receivers.email.protocol equals smtp exactly. Any other value — including uppercase SMTP — takes a fall-through branch that sets neither, so authentication and STARTTLS never engage and credentials can transit unauthenticated and unencrypted, with no boot warning. Always configure protocol: smtp (lowercase).
odd.platform-base-url
odd.platform-base-urlODD Platform URL exposed to three internal consumers — the Slack-notification sender, the email-notification sender, and the integration-parameter substitution context. The two notification senders use it to build clickable links inside alert messages (the generic webhook receiver does not consume this key — it gets the full alert payload directly and is expected to construct any URLs it needs from that payload). The platform also substitutes the resolved value as the platform_url parameter in integration configurations — this is how Airflow plugins, dbt artifacts, and similar integrations resolve their reference to the ODD platform URL at runtime. Defaults are inconsistent across consumers: the notification senders default to http://localhost:8080, while the integration-substitution context defaults to the placeholder string http://your.odd.platform. Both defaults are unreachable from outside the host machine; set this key to your real deployment URL (for example https://odd.your-domain.com) in any non-local environment.
Operators deploying integrations must set ODD_PLATFORM_BASE_URL even if alert notifications are disabled. The integration-parameter substitution context reads the same key to populate the platform_url parameter exposed to integration configurations. If the key is unset, integrations that reference platform_url receive the literal string http://your.odd.platform — a placeholder that will not connect to anything — and the integration will fail in confusing ways at runtime with no error from ODD Platform itself.
ODD Platform configuration would look like this:
Example: Gmail SMTP
A minimal, working configuration for Gmail's SMTP over STARTTLS. Gmail requires an app password (generated from your Google account with 2-Step Verification enabled) — your regular account password will not work.
Known limitations
ODD Platform builds its JavaMailSender with only the keys documented above. The JavaMail session inherits defaults for every other SMTP parameter, and several of those defaults are operator-hostile in production deployments. None of the following is currently exposed as an ODD configuration key — where a workaround exists it is noted, but the limitations are real and should drive your choice of SMTP relay.
SMTP timeouts are unset — an unreachable SMTP server will hang notification delivery. The JavaMail defaults for mail.smtp.connectiontimeout, mail.smtp.timeout (read), and mail.smtp.writetimeout are infinite. If the configured SMTP host is unreachable, slow, or stalls mid-response, the notification thread blocks until the TCP stack eventually tears the connection down — there is no application-level timeout to cut it short. Use an SMTP relay you control (or a trusted managed service) and monitor its availability separately from ODD Platform.
Only STARTTLS is supported — implicit-TLS ports (e.g. Gmail port 465, many corporate relays) will not work. ODD Platform exposes notifications.receivers.email.smtp.starttls but does not expose mail.smtp.ssl.enable, which is the JavaMail flag required to open an implicit-TLS connection. If your SMTP server only accepts connections on an implicit-TLS port, you must front it with a STARTTLS-capable relay (port 587 is the common choice). Gmail over port 587 with STARTTLS (the example above) works; Gmail over port 465 does not.
Self-signed or internal-CA SMTP certificates require a JVM-level workaround. mail.smtp.ssl.trust is not exposed as an ODD configuration key. If your SMTP relay presents a certificate signed by a private CA, the connection will fail certificate validation unless you either (a) add the CA to the JVM truststore of the ODD Platform container ($JAVA_HOME/lib/security/cacerts or a -Djavax.net.ssl.trustStore=... override) before starting the process, or (b) use an SMTP relay with a publicly-trusted certificate. There is no configuration-file path to this.
Non-ASCII subjects and bodies may be mangled. The MIME message is built without an explicit charset, so JavaMail falls back to the JVM default. Containers that do not set file.encoding or LANG explicitly can end up with US-ASCII defaults, which corrupt non-Latin alert content. If your alert text includes non-ASCII characters, set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8 on the ODD Platform container.
Silent partial delivery: if one recipient fails, subsequent recipients are skipped. EmailNotificationSender iterates over the recipient list in notifications.receivers.email.notification.emails and calls the SMTP server once per recipient. If recipient N fails (bad address, mailbox full, server-side policy rejection), the exception is wrapped as a RuntimeException and the loop terminates — recipients N+1, N+2, … never receive the alert. There is no retry and no partial-failure metric. Keep the recipient list short, use distribution lists on the SMTP side for fan-out, and validate addresses before adding them to the list.
Cleaning up
ODD Platform doesn't clean up replication slot it has created. If you need to disable Alert Notification functionality, please perform the following steps along with disabling a feature on a ODD Platform side
In order to remove replication slot and publication, these SQL queries must be run against the database:
where
<>is a name of replication slot defined in the ODD Platform. Default isodd_platform_replication_slotwhere
<>is a name of publication defined in the ODD Platform. Default isodd_platform_publication_alert
Prometheus AlertManager Integration
In addition to raising alerts internally (failed jobs, data-quality tests, schema changes, distribution anomalies — see the Alerting feature), ODD Platform exposes an inbound webhook that accepts Prometheus AlertManager notifications. Each inbound alert becomes a Distribution Anomaly alert on the referenced data entity, visible in the Alerts section and on the entity's page.
Endpoint
Response: 204 No Content on success. The endpoint consumes the AlertManager webhook body and always returns empty.
Payload shape
The platform accepts a subset of the AlertManager webhook schema — specifically alerts[].labels, alerts[].generatorURL, and alerts[].startsAt. Other top-level AlertManager fields (version, status, receiver, groupLabels, commonLabels, …) are accepted and ignored.
The entity_oddrn label is required for the alert to route to a data entity. ODD Platform reads alerts[].labels["entity_oddrn"] to determine which data entity the alert belongs to. An alert submitted without this label is stored with an empty owner, will not appear on any entity's page, and is effectively orphaned. Configure your AlertManager route or your alerting rules to include the target entity's ODDRN as a label.
Example AlertManager receiver configuration
A minimal alertmanager.yml receiver forwarding every alert to ODD Platform:
The reference example shipped with the platform is at docker/examples/config/alertmanager.yaml in the odd-platform repo. To make an alert route to a specific entity, attach entity_oddrn as a label in your Prometheus alerting rules — for example:
Authentication
The AlertManager webhook endpoint is not authenticated. ODD Platform whitelists the entire /ingestion/** namespace in Spring Security, and the ingestion auth filter controlled by auth.ingestion.filter.enabled only guards /ingestion/entities (POST) — it does not cover /ingestion/alert/alertmanager. Anyone with network reach to the platform can POST arbitrary AlertManager-shaped payloads and create alerts on any data entity whose ODDRN they can guess. Toggling auth.ingestion.filter.enabled has no effect on this endpoint.
Because no application-level authentication is enforced on this endpoint today, protect it at the perimeter. Any of these approaches works:
Network segmentation — expose ODD Platform only on a private network or VPN; in Kubernetes, keep AlertManager and the platform in the same cluster and use a NetworkPolicy so only the AlertManager pod can reach
/ingestion/alert/alertmanager.Reverse proxy with its own authentication — put an authenticating proxy in front of ODD Platform (for example, nginx with
auth_requestdelegating to an SSO sidecar, or Envoy withext_authz) and require AlertManager to present a proxy-validated credential on every webhook call.mTLS termination — require client certificates on
/ingestion/alert/alertmanagerat the ingress or load balancer layer, and issue a certificate only to the AlertManager pod.
A platform-side fix to extend the ingestion auth filter to cover this endpoint is tracked upstream. Until it ships, apply one of the perimeter controls above for any deployment where the platform's network is not fully trusted.
For the broader ingestion-auth model — what auth.ingestion.filter.enabled does cover, the per-endpoint deployment matrix showing reachability under each auth.type value, and the write-shape caveats on the statistics endpoint — see Enable security and Server-to-server (S2S) API keys.
Enable Data Collaboration
Data collaboration feature allows users to initiate discussion about specific data entity in messengers directly from the ODD Platform. Thread replies are tracked by ODD Platform and saved in it, allowing users to retrieve conversation's context and decisions from one place.
For the user-facing description of the feature — the per-entity Discussions tab, how a discussion flows from the platform out to Slack and back, the message-lifecycle model — see Active platform features → Data Collaboration.
At the moment ODD Platform supports only Slack as a target messenger. It uses Slack APIs to send messages and Slack Events API to receive message's thread replies.
Slack here is the full Slack app for in-app discussions, not the alert webhook. The Data Collaboration integration uses an OAuth-token-driven Slack app (datacollaboration.slack-oauth-token) and the Slack Events API webhook to read replies back into the platform — bidirectional. It is distinct from the outgoing alert webhook used by alert notifications (notifications.receivers.slack.url, one-way write only). Each integration is configured separately: enabling this one does not route alerts, and enabling the alert webhook does not surface the Discussions tab on data-entity pages. See Main Concepts → Terms & Aliases for the side-by-side comparison.
Creating Slack application
Go to the Slack apps website and click on Create New App -> From an app manifest

Select a workspace you want to add an application to and click Next

Enter the following manifest into the YAML section, replace the <ODD_PLATFORM_BASE_URL> with URL of your ODD Platform deployment and click Next.
The four bot scopes below match exactly what the platform exercises today (channels:history and channels:read for reading messages and metadata, chat:write for posting via the OAuth bot token, users:read for resolving user display names). Previous versions of this manifest also requested incoming-webhook — that scope was copy-paste leftover from a Slack example and was never used by the platform; if you are reinstalling or auditing scopes, you can safely omit it.

Review your application's scopes and permissions and click Create

Proceed with Slack instructions on how to install application into workspace and you should be good to go.
ODD Platform configuration
Following variables need to be defined:
datacollaboration.enabled: must be set totrue. Defaults tofalse. Feature toggling: this value is captured at JVM boot and frozen for the lifetime of the process — runtime configuration changes (for example via Spring Boot Actuator's/actuator/refresh) are not reflected by the feature resolver or by the platform's feature-active endpoint. Restart the JVM process for a change to take effect. Top-level UI navigation tabs (Data Modelling and adjacent surfaces) remain visible regardless of this setting; the per-page affordances inside those tabs do honour the flag. See Features → Data Collaboration for the chrome-invariance caveat.datacollaboration.receive-event-advisory-lock-id: PostgreSQL advisory lock id for a job, which translates events from messengers to messages. Defaults to110datacollaboration.sender-message-advisory-lock-id: PostgreSQL advisory lock id for a job, which sends messages created in the platform to messengers. Defaults to120datacollaboration.message-partition-period: time interval in days for a message table partition in PostgreSQL. Defaults to30datacollaboration.sending-messages-retry-count: how many times the Platform will attempt to send a message to provider. Cannot be less than zero. Defaults to3datacollaboration.slack-oauth-token: Slack application OAuth token used for communicating with Slack. Can be retrieved in theOAuth & Permissionssection of a Slack application.\
Retrieving OAuth Token
Known limitations
Slack at-least-once delivery surfaces as duplicate messages
Slack's Events API retries an event delivery whenever the platform's POST /api/slack/events handler does not return a 2xx acknowledgement within roughly three seconds — the API guarantees at-least-once delivery, not exactly-once. ODD Platform does not currently deduplicate incoming events: the message_provider_event table has no UNIQUE (provider, event_id) constraint, and the INSERT in ReactiveMessageRepository.createMessageEvent issues no ON CONFLICT clause. The result is that occasional Slack retries — which happen routinely on transient network or processing delays — insert duplicate rows; the downstream processor materialises a child message row for each, so the same Slack reply can appear two or more times on the data-entity Discussions tab.
Operator-side mitigation today. Until the platform-side dedup ships upstream, audit message_provider_event for (provider, event_id) duplicates as a one-off clean-up baseline; the duplicate rows are safe to delete after confirming the downstream message rows have been similarly deduplicated. Long-term, expect the platform to add the UNIQUE constraint + ON CONFLICT DO NOTHING on the INSERT — track the upstream issue if you depend on exactly-once delivery.
Slack Events webhook has no signature verification
ODD Platform does not verify Slack's X-Slack-Signature header on incoming /api/slack/events callbacks. Any caller on the network that can reach the platform's events endpoint can submit Slack-shaped payloads and have them processed as if they came from Slack. Restrict network reach to the platform's /api/slack/events path to Slack's IP ranges at your reverse proxy, or terminate at a proxy that verifies the signature itself; a platform-side verifier is tracked upstream.
datacollaboration.message-partition-period (default 30) is read by MessageTablePartitionManager (@Value("${datacollaboration.message-partition-period:30}")) — separate from DataCollaborationProperties, which only carries the two advisory-lock IDs and the retry count. The partition manager creates a new PostgreSQL partition for the messages table every N days; lowering the value increases partition churn, raising it reduces partition count but enlarges each partition.
API surface
The full HTTP API for Data Collaboration is documented at API Reference → Data Collaboration — 7 routes across three groups (outbound to the provider, per-entity threads & history, inbound webhook from Slack), all gated by @ConditionalOnDataCollaboration and returning 404 Not Found when datacollaboration.enabled=false.
Housekeeping Settings Configuration
ODD Platform runs a background housekeeping job that permanently deletes stale data on a schedule. The job fires every 15 minutes, is guarded by a ShedLock so only one platform instance runs it at a time in a multi-instance deployment, and iterates through five cleanup tasks: resolved alerts, search-facet history, soft-deleted data entities, empty activity table partitions, and empty message table partitions. The first three consume the housekeeping.ttl.* keys below; the two partition reapers do not consume any TTL key — they drop empty past partitions when the partition-rotation orchestrator advances the partition window (see Activity-feed partitioning for the partition WIDTH key, and the Advisory-lock registry for the orchestrator's leader election).
Configuration keys
housekeeping.enabled: enables the background job. Defaults totrue. See the caveat below before disabling.housekeeping.ttl.resolved_alerts_days: how many days an alert inRESOLVED_AUTOMATICALLYstatus is kept after its status-update timestamp before the housekeeping job permanently deletes it (alongside its chunk records). Integer, days. Defaults to30. Note: the retention window is intended to apply to bothRESOLVED(manual) andRESOLVED_AUTOMATICALLY(system) states, but a known platform bug currently exempts manual resolutions from the retention check — manualRESOLVEDalerts are hard-deleted on the next housekeeping run regardless of this value. See Alerting → Auto-cleanup of resolved alerts for the operator-side workaround.housekeeping.ttl.search_facets_days: how many days a saved search-facet entry is kept past itslast_accessed_attimestamp before being deleted. Integer, days. Defaults to30.housekeeping.ttl.data_entity_delete_days: how many days a data entity with statusDELETEDis kept after its status-update timestamp. After this, the entity and its cascading related rows — metadata values, ownerships, lineage, tags, terms, alerts, messages, metrics, attachment files (including objects in S3 / MinIO storage), task runs, group relations, and (for datasets) dataset structure and enum values — are permanently and irreversibly deleted on the next housekeeping cycle, with no restore path. Integer, days. Defaults to30. The retention clock is the entity'sstatus_updated_attimestamp, which the soft-delete path stamps at the moment the entity is moved toDELETED— so the key is honoured exactly as documented; a default install purgesDELETEDentities 30 days after deletion. See Data entity statuses → soft-delete TTL for the user-facing lifecycle (a separate, cosmeticstatus_updated_atmapper defect affects only non-DELETEDtransitions and does not change this retention behaviour).
For the user-facing entity lifecycle (how operators set DELETED and the other status states from the UI), see Data entity statuses.
Disabling housekeeping (housekeeping.enabled: false) stops all five cleanup jobs. Resolved alerts, search-facet history, soft-deleted data entities, and empty activity / message partitions will accumulate indefinitely and the PostgreSQL database will grow without bound. Leave the job enabled in production; disable only for debugging or offline migrations, and re-enable (or run a manual cleanup) afterwards.
The Java-side default for every housekeeping.ttl.* key is 0. A partial-override deployment silently wipes historical data on the next 15-minute cycle. The shipped application.yml supplies 30 for each of the three TTL keys, so a default install behaves as documented. But the HousekeepingTTLProperties class declares the fields as private int with no field initialiser — Spring binds primitive int default 0 if an operator-supplied override (typical Helm-chart values overlay, --spring.config.location to a profile that omits the housekeeping: block, Spring Cloud Config slice, Kubernetes ConfigMap mount) does not re-supply the block. With 0, the housekeeping cycle computes cutoff = now() - 0 days = now() and deletes every RESOLVED alert + every search-facet entry + every soft-deleted data entity (and cascades through ~25 child tables, including S3 attachments). The platform emits no boot warning, no log line above DEBUG, no Prometheus counter — operators discover the loss only when the data is gone.
Always re-supply the full housekeeping: block in Helm/Kustomize overlays, OR set explicit non-zero values for every TTL key, AND verify post-restart by sampling pg_stat_user_tables.n_tup_del after one cycle. This is the same class of silent-default risk that previously affected attachment storage on container restart — read once, configure explicitly, never trust a partial overlay to inherit the bundled YAML defaults.
Manual RESOLVED alerts are not deleted by resolved_alerts_days. A jOOQ operator-precedence issue in AlertHousekeepingJob causes the emitted SQL to read as WHERE (STATUS='RESOLVED') OR (STATUS='RESOLVED_AUTOMATICALLY' AND STATUS_UPDATED_AT <= cutoff) — the TTL predicate binds only to the auto-resolved branch. Operators relying on compliance-style retention (SOC2, SOX, HIPAA) for manual resolutions cannot use resolved_alerts_days as the retention floor today. Track the upstream fix; until it ships, treat manual-RESOLVED alerts as having no platform-side TTL. Cross-link: the operator-side workaround on Alerting → Auto-cleanup of resolved alerts.
Housekeeping deletions are unobservable on a default deployment — no metric, no audit trail, DEBUG-only logs. The subsystem that permanently deletes data exposes no operational telemetry: there is no metrics counter or gauge for any of the five jobs (nothing housekeeping-related appears at /actuator/prometheus), no structured audit event records what was deleted and when, and every per-job deletion count is logged at DEBUG — below the shipped info default for the package — so a default deployment emits nothing on a successful cycle (only failures log at ERROR). Three consequences: (a) you cannot observe that deletions are happening or at what volume; (b) there is no signal that would reveal a stuck or wedged cycle (for example, one blocked behind a held advisory lock); (c) a compliance requirement that "deletions are logged/audited" is not satisfied out of the box — and raising the log level still yields best-effort log lines, not a durable audit trail. To observe deletions, set logging.level.org.opendatadiscovery.oddplatform.housekeeping: debug (see Logging Settings Configuration); to verify a cycle ran at all, sample pg_stat_user_tables.n_tup_del across a 15-minute window, as in the TTL caveat above.
The session-housekeeping job runs N× redundantly on multi-replica deployments. Spring's PostgreSQLSessionHousekeepingJobHandler fires hourly with @Scheduled(fixedRate = 1, timeUnit = HOURS) and has no leader-election guard — no @SchedulerLock, no advisory-lock acquisition (inconsistent with the rest of the platform's scheduled jobs, which join the Advisory-lock registry). On an INTERNAL_POSTGRESQL session-provider deployment with N replicas, every replica runs the DELETE FROM SPRING_SESSION WHERE expiry_time < now() query every hour. The deletes are idempotent so data integrity is fine — the operator cost is N× redundant database load. Note that with the shipped default spring.session.timeout: -1 (sessions never expire), the job is a no-op regardless of leader count.
Advisory-lock registry
Several ODD Platform subsystems use PostgreSQL advisory locks to ensure that only one platform replica runs a given background loop at a time (the leader-election pattern for multi-replica deployments). Each subsystem owns one or more advisory-lock IDs, configured via dedicated *.advisory-lock-id keys. Operators overriding any of these IDs in a deployment overlay must treat them as a single flat namespace across the platform — collisions are not detected at startup and produce silent feature wedges (see the warning below).
Configuration key
Default ID
Owning subsystem
@ConfigurationProperties class
Single-leader role
notifications.wal.advisory-lock-id
100
Notifications subscriber that reads from the WAL replication-slot-name and dispatches alert messages
OddNotificationsProperties
One platform replica subscribes to the WAL stream
partition.advisory-lock-id
90
Partition orchestrator that creates next-period partitions on activity and message tables
PartitionProperties
One platform replica advances the partition window
datacollaboration.receive-event-advisory-lock-id
110
Data Collaboration inbound event reader (Slack Events → message_provider_event queue)
DataCollaborationProperties
One platform replica drains the inbound event queue
datacollaboration.sender-message-advisory-lock-id
120
Data Collaboration outbound message sender (message queue → Slack)
DataCollaborationProperties
One platform replica drains the outbound message queue
partition.advisory-lock-id is deliberately shared between two managers — ActivityTablePartitionManager and MessageTablePartitionManager both acquire ID 90. This is intentional: one platform replica is elected as the global partition leader and serialises the partition-rotation work for both tables. Treat it as one logical leader, not two colliding subsystems.
pg_advisory_lock blocks forever — on collision a subsystem wedges silently with no diagnostic signal. The platform's leader-election manager executes the blocking variant of pg_advisory_lock against the configured ID — there is no pg_try_advisory_lock fast-path, no statement_timeout, no fallback to a degraded-mode bean. If two subsystems are configured to use the same advisory-lock ID (typically because an operator overrode one key in a Helm overlay and unintentionally matched another), the second subsystem's startup thread enters PostgreSQL lock-wait state and never returns. The Spring container does not detect the wedge — Spring's bean construction returned, so /actuator/beans and /actuator/health continue to report "running" — the wedge surfaces only as one feature silently not working (notifications stop arriving, Data Collaboration thread replies stop flowing, partition rotation stops creating future partitions). Operators MUST audit any per-environment advisory-lock-id override against the table above before applying it.
A platform-side fail-fast wrapper (pg_try_advisory_lock + a configurable timeout + a subsystem_leader_state Prometheus gauge + a boot-time INFO log enumerating the registry) is tracked upstream.
The Housekeeping orchestrator (see Housekeeping Settings Configuration above) does not appear in this table because it uses ShedLock (a Spring-side distributed-lock library) rather than a PostgreSQL advisory lock. ShedLock writes to a shedlock table to coordinate the leaders, so its multi-replica behaviour is documented separately.
Platform-level settings (odd.*)
odd.*)The odd.* namespace groups four platform-wide settings that do not belong to any subsystem: stale-metadata detection, the optional Prometheus tenant label, the Activity-feed partitioning period, and a list of additional navigation links surfaced in the App Info menu. A fifth key in the same namespace, odd.platform-base-url, is documented above in Enable Alert Notifications → odd.platform-base-url — that section is the primary operator-facing context where the key is introduced, but the same key is also consumed by the integration-parameter substitution context, so any non-local deployment must set it regardless of which subsystems (notifications, integrations, or both) are enabled.
Detecting stale metadata
Stale metadata is metadata that has not been refreshed from its source for longer than an operator-defined window. This typically happens when a collector is paused, deactivated, or failing to reach the source system. When the platform judges an entity to be stale, the UI surfaces it with a "Stale" indicator so users can distinguish data whose freshness is uncertain from actively-maintained metadata. For the user-facing surface (where the indicator appears, how the freshness signal differs from runtime alerts), see Stale-metadata indicator.
odd.data-entity-stale-period: number of days after the entity's last successful ingestion before it is labeled "Stale" in the UI and API. Integer, days. Defaults to7.
Operators running collectors on schedules longer than a week should raise this value to match the collector cadence — otherwise entities that were ingested successfully will be flagged stale between runs.
Prometheus tenant label (odd.tenant-id)
odd.tenant-id)When metrics.storage is set to PROMETHEUS, the platform appends tenant_id={value} as a label on every Prometheus instant query it issues. This lets a single shared Prometheus instance serve metric data for multiple ODD Platform deployments without their metric series colliding — each deployment queries only its own tenant-labeled series.
odd.tenant-id: tenant identifier appended as a Prometheus query label. String, no default (empty means no label is applied, and the Prometheus query returns series across all tenants). Ignored whenmetrics.storage=INTERNAL_POSTGRES.
Activity-feed partitioning (odd.activity.partition-period)
odd.activity.partition-period)The ODD Platform activity table is range-partitioned on a rolling date window; odd.activity.partition-period sets the partition width in days. The default creates a new partition every 30 days, which is appropriate for most deployments. Operators running high-volume deployments (millions of activity events per day) can tune this downward to narrow partitions — smaller partitions speed up vacuum and partition-prune operations on the activity feed.
odd.activity.partition-period: partition width in days for theactivitytable. Integer, days. Defaults to30.
Additional navigation links (odd.links)
odd.links)Operators can attach a list of arbitrary navigation links — pointers to internal wikis, runbooks, dashboards, or any other page teams should reach from inside ODD Platform. The platform UI surfaces them inside the App Info menu (the popup behind the information icon in the top-right toolbar). Each link renders as a menu item showing its title and opens the configured URL in a new tab when clicked.
odd.links: list of link objects. Each entry has two required fields:title: the menu-item label shown in the App Info menu. String, required.url: the absolute URL the menu item opens in a new tab. String, required.
Defaults to an empty list — when unset, the App Info menu omits the additional-links section entirely.
The links are exposed to the UI through the authenticated GET /api/links endpoint and are visible to every user signed in to the platform. Use them for navigation hints only — do not embed credentials, session tokens, or one-time secrets in link URLs, since any logged-in user can read them.
Validation and operator-link risks
Three known limitations apply to odd.links and the App Info menu that renders them. None of these is blocking for typical operator-curated link sets, but all three matter when the link source is less trusted (free-text Helm chart overrides, multi-tenant config templates, anything an end-user can influence).
odd.links is not validated at config-load time. The AdditionalLinkProperties record declares title and url as plain String with no @NotBlank, no @URL, no @Pattern, and no @PostConstruct validate(). The platform accepts and renders:
Missing
title— the menu item appears as an invisible-but-clickable area.Missing
url— the menu item renders as a non-clickable<a>.javascript:,data:,file:, orvbscript:schemes — modern browsers will sandbox or refuse, but the platform does not reject these at config time.Relative paths (
wiki.internal) — the browser interprets them as relative to the current ODD page, producing surprising navigation.
Treat the odd.links config as a security-relevant input. Validate the URLs in your Helm chart values before applying, restrict edit access to operators only, and never permit non-operators to author the override. A platform-side validator that rejects non-http(s) schemes and enforces @NotBlank is tracked upstream.
The App Info menu links render target="_blank" without rel="noopener noreferrer". All five link sites in the App Info menu (operator-configured odd.links entries, the ODD Platform version link to GitHub, the Documentation link, the Slack link, and the Feedback link) open in a new tab without the rel attribute that isolates the destination from window.opener. A page at any of those destinations — including the operator-configured links above — can run window.opener.location = "phishing.example.com" in the background, replacing the parent ODD tab with an attacker-controlled login page (reverse-tabnabbing). Risk is operator-amplified — an unvalidated link (per the previous caveat) combined with this caveat is a workspace-wide tabnabbing vector. A platform-side rel fix is tracked upstream.
The App Info menu is not keyboard- or touch-accessible today. The information-icon button declares aria-haspopup="true" and aria-controls={menuId} — surfaces that announce keyboard accessibility to assistive technology — but the open handler is wired only on onMouseEnter. There is no onClick, no onKeyDown, and no onFocus. Touch-device users (iOS Safari, Android Chrome) do not generate mouseenter; keyboard-only and screen-reader users cannot open the menu. The Documentation, Slack, Feedback, and operator-configured odd.links destinations are unreachable through the menu for these audiences — direct URLs are the workaround until a platform-side onClick / onKeyDown fix ships. Operators serving keyboard-only or screen-reader audiences should treat this as a known WCAG 2.1 SC 2.1.1 limitation.
Attachment Storage Configuration
ODD Platform allows users to attach files and links to data entities from the UI. This section covers the operator-facing configuration for where those uploaded files are stored. For the user-facing upload workflow (what users can attach, the per-entity Attachments tab, the DATA_ENTITY_ATTACHMENT_MANAGE permission), see Attachments and links.
The default LOCAL storage mode is ephemeral. Attachments are written to /tmp/odd/attachments inside the ODD Platform container filesystem. Any container or pod restart — routine deployment, node drain, crash, Kubernetes eviction — permanently deletes all uploaded files.
Use REMOTE (S3 / MinIO) storage for any Kubernetes or Docker deployment where users will actually upload attachments. LOCAL mode is suitable only for single-host evaluations or local development where losing attachments on restart is acceptable.
Configuration keys
attachment.storage: storage backend. One ofLOCALorREMOTE. Defaults toLOCAL.attachment.max-file-size: the per-file upload limit the UI enforces before upload, in megabytes. Defaults to20. The platform surfaces this value to the web UI as a client-side pre-upload size check; the server does not re-validate per-file size on the upload path.spring.codec.max-in-memory-size(below) bounds only the in-memory buffer for a single request/chunk — and because attachment uploads are chunked and streamed to disk, it is not a ceiling on the assembled file. There is therefore no effective server-side total-file size limit: a direct (non-UI) API caller can exceedattachment.max-file-sizeby any amount. See the hint below if raising this above 20 MB.attachment.local.path: filesystem directory where attachments are written whenstorage=LOCAL. Defaults to/tmp/odd/attachments(ephemeral — see warning above).attachment.remote.url: S3-compatible endpoint URL whenstorage=REMOTE(for examplehttps://s3.us-east-1.amazonaws.comfor AWS S3 orhttp://minio:9000for a MinIO service). See the Known limitations (REMOTE mode) subsection below before choosing your endpoint — in particular theus-east-1restriction for AWS S3 and the chunked-upload staging behavior.attachment.remote.access-key: access key for the S3-compatible bucket.attachment.remote.secret-key: secret key for the S3-compatible bucket.attachment.remote.bucket: bucket name used to store attachment objects. The bucket must already exist — ODD Platform does not create it.spring.codec.max-in-memory-size: platform-wide cap on the in-memory buffer Spring WebFlux uses when reading a single request body / upload chunk. Defaults to20MB. A single chunk larger than this fails at the codec layer; because uploads are chunked, this does not bound the total assembled file. Accepts a size string (20MB,100MB,1GB).
attachment.max-file-size must not exceed spring.codec.max-in-memory-size. Both ship with the same 20 MB default, so the attachment cap is effective out of the box. If you raise attachment.max-file-size to allow larger uploads — for example 100 MB — you must raise spring.codec.max-in-memory-size to at least the size of a single upload chunk, otherwise a chunk above 20 MB fails at the WebFlux codec layer with DataBufferLimitException. This codec bound applies per chunk, not to the total file (see the per-file note above): the platform enforces no server-side cap on the assembled file size.
Example: REMOTE storage with S3-compatible backend (MinIO or AWS S3)
Known limitations (REMOTE mode)
ODD Platform builds its MinioAsyncClient with only the endpoint and credentials documented above. The MinIO Java SDK inherits defaults for every other parameter, and the attachment-upload code path carries a small amount of additional behavior that is not configurable. None of the following is currently exposed as an ODD configuration key — plan your deployment around these limits rather than assuming a config flag will fix them.
AWS S3 region pinned to us-east-1. The attachment client is built without an explicit region, so it uses the MinIO Java SDK's default region (us-east-1) for request signing. Against AWS S3 this means only buckets in us-east-1 work — buckets in any other region fail signature validation with errors such as AuthorizationHeaderMalformed or PermanentRedirect. If you need AWS S3 in another region, either host your bucket in us-east-1 or use a MinIO server in front of it. Self-hosted MinIO and most other S3-compatible services ignore the region header and are unaffected.
HTTP client timeouts are the MinIO SDK defaults (~5 minutes), not configurable. ODD Platform does not supply a custom OkHttpClient to the MinIO builder, so the SDK's built-in defaults apply: roughly a 5-minute read/write timeout. A single large upload whose end-to-end wall time (network transfer + S3 ingest) exceeds that limit fails with a socket-timeout error even though the content was being streamed successfully. If your users upload near the attachment.max-file-size limit over a slow link, keep attachment.max-file-size below the size a typical upload can complete inside 5 minutes at your network's real throughput.
Chunked uploads are assembled on the container's local filesystem before they are sent to REMOTE storage — a mid-upload container restart loses the staged chunks. The UI splits large files into chunks and uploads each chunk individually; the platform writes each chunk to a hardcoded local directory — /tmp/odd/chunks, independent of attachment.local.path, so pointing attachment.local.path at a durable path does not move chunk staging — and reassembles the full file there before streaming it to the S3-compatible backend. This is true even when attachment.storage=REMOTE. If the ODD Platform container is restarted, evicted, or rescheduled during an in-flight chunked upload, the local directory is wiped and the partial upload is unrecoverable — the user must re-upload from scratch. In Kubernetes deployments, either mount a persistent volume at the chunk-staging directory (/tmp/odd/chunks) or limit the maximum upload size so single-request uploads are the norm. The LOCAL-mode ephemeral warning above applies to chunk-staging in REMOTE mode as well.
No retry on transient S3 / MinIO errors. Put, get, and remove operations against the bucket do not retry on transient failures — a single 503 from S3, a connection reset from the network, or a short MinIO outage surfaces as a failed operation with no automatic recovery. If your alerting pipeline treats attachment failures as user-impacting errors, add retry at the infrastructure layer (for example an S3-proxy sidecar with retry) rather than expecting the platform to paper over it.
No IAM-role support today — static access-key / secret-key are the only credentials path. The platform's MinIO client builder calls .credentials(accessKey, secretKey) against the values configured above; it does not call .credentialsProvider(...) with the AWS SDK's DefaultCredentialsProvider. Operators on AWS EKS using IAM Roles for Service Accounts (IRSA), on ECS task roles, or on any other AWS-native credential-injection mechanism that expects the SDK to walk the default credentials chain — environment variables → web-identity-token → EC2 instance metadata → ECS task-role — get no automatic credential resolution. Static attachment.remote.access-key + attachment.remote.secret-key values must be supplied via application.yml, Helm secrets, or the equivalent operator-managed credential store, and rotated on the operator's own cadence. This is itself a credential-hygiene concern in deployments where IAM-role injection is the standard. The upstream fix is a conditional switch — if no static credentials are supplied, call .credentialsProvider(DefaultAWSCredentialsProviderChain.getInstance()) to enable IAM-role workflows; the doc-side caveat is in place until that lands.
Example: LOCAL storage (single-host / local evaluation only)
If you keep LOCAL mode, override attachment.local.path to a persistent volume mount rather than the default /tmp/odd/attachments, and confirm the volume is actually persistent across restarts in your deployment topology.
Logging Settings Configuration
Logs provide detailed information about errors in the application helping its users quickly identify and fix problems. Setting up logging is recommended for ensuring operational excellence, system reliability, effective monitoring and troubleshooting. Here is a code snippet for setting up logs in ODD Platform:
Setting the logging level to info allows you to see useful messages about the platform’s functioning without being overwhelmed by too much detail as with trace or debug or missing important issues as with warn or higher level.
However, feel free to adjust the logging level as needed to get more or less information based on your specific requirements.
GenAI Configuration
The platform can proxy natural-language questions to an external AI service via three keys under the genai prefix (@ConfigurationProperties("genai") per GenAIProperties.java). The feature is disabled by default and is API-only today (no in-app UI affordance calls the endpoint).
genai.enabled(boolean, envGENAI_ENABLED) — feature toggle. Defaultfalse(set explicitly atapplication.ymlline 18). Whenfalse,POST /api/genai/askreturns HTTP 400 with the message "Gen AI is disabled". Feature toggling: this value is captured at JVM boot — restart the JVM process for a change to take effect; runtime configuration changes are not honoured. See Features → Data Collaboration for the platform-wide boot-immutability caveat.genai.url(string, envGENAI_URL) — base URL of the external AI service. The platform'sgenAiWebClientis built at startup with this asbaseUrland POSTs each request to{genai.url}/query_data. No@ConfigurationPropertiesdefault — the field has no initializer inGenAIProperties.java, so its Java default isnull. The example inapplication.ymlline 19 (# url: http://localhost:5000) is commented out, not a default.genai.request_timeout(integer, envGENAI_REQUEST_TIMEOUT) — outbound response timeout, in minutes. Wired intoWebClientConfiguration.java:23asDuration.ofMinutes(genAIProperties.getRequestTimeout()). No@ConfigurationPropertiesdefault — the Java primitiveintdefault is0, which means immediate timeout. The example inapplication.ymlline 20 (# request_timeout: 2) is commented out, not a default.
Setting only genai.enabled=true will silently misconfigure the feature. With url defaulting to null and request_timeout defaulting to 0, the WebClient is built with no baseUrl and a Duration.ofMinutes(0) timeout — every POST /api/genai/ask will fail before the external service has a chance to respond. Always set all three keys when enabling.
WebClientConfiguration reads genai.url and genai.request_timeout once at startup when constructing the genAiWebClient Spring bean. Changing those values requires a Platform restart.
A working configuration block:
The platform sends no authentication to the external AI service and does not retry. See the dedicated GenAI assistant page for the external service contract (POST /query_data with JSON {"question": "..."}), the platform's /api/genai/ask request/response schemas, and the per-error behavior.
Machine-to-Machine (M2M) Tokens Configuration
ODD Platform supports a static API-key authentication mode for non-UI callers (CI/CD jobs, ingestion pipelines, automation scripts) — also referred to as Machine-to-Machine (M2M) tokens. It is disabled by default.
For the full configuration keys, the header contract, the curl example, and security considerations (token rotation, HTTPS, blast radius), see Server-to-server (S2S) authentication.
Last updated