Configure ODD Platform

This section defines how to configure ODD Platform in order to leverage all of its functionality and features.

Configuration approaches

There are two ways to configure the Platform:

Environment variables are used for simple entries
Configuring via YAML can come in handy when it is necessary to define a complex configuration block (e.g OAuth2 authentication or logging levels).

YAML entries VS environment variables

Here is an example of how to define the following block and configure the Platform with it using environment variables.

YAML:

spring:
    datasource:
        url: URL
        username: USERNAME
        password: PASSWORD
    custom-datasource:
        url: URL
        username: USERNAME
        password: PASSWORD

To configure the Platform using environment variables, replace semicolons with underscores and uppercasing words, like so:

SPRING_DATASOURCE_URL=URL
SPRING_DATASOURCE_USERNAME=USERNAME
SPRING_DATASOURCE_PASSWORD=PASSWORD
SPRING_CUSTOM_DATASOURCE_URL=URL
SPRING_CUSTOM_DATASOURCE_USERNAME=USERNAME
SPRING_CUSTOM_DATASOURCE_PASSWORD=PASSWORD

Connect your database

For all of its features ODD Platform uses PostgreSQL database and PostgreSQL database only. These variables are needed to be defined to connect ODD Platform to database:

spring.datasource.url: JDBC string of your PostgreSQL database. Default value is jdbc:postgresql://127.0.0.1:5432/odd-platform
spring.datasource.username: your PostgreSQL user's name. Default value is odd-platform
spring.datasource.password: your PostgreSQL user's password. Default value is odd-platform-password

These variables are optional (by default, they have the same value as spring.datasource) and will be used to connect to PostgreSQL and store Lookup Tables :

spring.custom-datasource.url: JDBC string of your PostgreSQL database where we store Lookup Tables. Default value is jdbc:postgresql://127.0.0.1:5432/odd-platform. Note: you can specify any {database_host}, {database_port} or {database_name} but schema, where Lookup Tables are stored always is lookup_tables_schema.
spring.custom-datasource.username: your PostgreSQL user's name for custom-datasource. Default value is odd-platform
spring.custom-datasource.password: your PostgreSQL user's password for custom-datasource. Default value is odd-platform-password

So that your database connection defining block would look like this:

spring:
    datasource:
        url: jdbc:postgresql://{database_host}:{database_port}/{database_name}
        username: {database_username}
        password: {database_password}
#    [OPTIONAL]
     custom-datasource:
        url: jdbc:postgresql://{database_host}:{database_port}/{database_name}
        username: {database_username}
        password: {database_password}

SPRING_DATASOURCE_URL=jdbc:postgresql://{database_host}:{database_port}/{database_name}
SPRING_DATASOURCE_USERNAME={database_username}
SPRING_DATASOURCE_PASSWORD={database_password}
# [OPTIONAL]
SPRING_CUSTOM_DATASOURCE_URL=jdbc:postgresql://{database_host}:{database_port}/{database_name}
SPRING_CUSTOM_DATASOURCE_USERNAME={database_username}
SPRING_CUSTOM_DATASOURCE_PASSWORD={database_password}

Security

Please follow Enable security section for enabling security in ODD Platform.

Select session provider

ODD Platform is able to keep users' sessions in several places such as in memory, PostgreSQL database or Redis. A session provider can be set via session.provider variable with following expected values:

IN_MEMORY: Local in-memory storage. ODD Platform defaults to this value
INTERNAL_POSTGRESQL: Underlying PostgreSQL database
REDIS: Redis data-store.

If you'd like to use only one instance of ODD Platform and you're ready to tolerate users' logouts each time the Platform restarts, the best choice would be IN_MEMORY

If you already have a Redis in your infrastructure or you're willing to install it, the best choice would be REDIS

Otherwise INTERNAL_POSTGRESQL is the best pick

In memory (default)

session:
    provider: IN_MEMORY

Internal PostgreSQL

session:
    provider: INTERNAL_POSTGRESQL

Redis

In order to connect to Redis following variables are needed to be defined:

spring.redis.host: Redis host
spring.redis.port: Redis port
spring.redis.username: Redis user's name
spring.redis.password: Redis user's password
spring.redis.database: Redis database index

YAML for Redis session provider

spring:
    redis:
        host: {redis_host}
        port: {redis_port}
        username: {redis_username}
        password: {redis_password}
session:
    provider: REDIS

In memory (default)

SESSION_PROVIDER=IN_MEMORY

Internal PostgreSQL

SESSION_PROVIDER=IN_MEMORY

Redis

In order to connect to Redis following variables are needed to be defined:

spring.redis.host: Redis host
spring.redis.port: Redis port
spring.redis.username: Redis user's name
spring.redis.password: Redis user's password
spring.redis.database: Redis database index

Environment variables for Redis session provider:

SESSION_PROVIDER=REDIS
SPRING_REDIS_HOST={redis_host}
SPRING_REDIS_PORT={redis_port}
SPRING_REDIS_USERNAME={redis_username}
SPRING_REDIS_PASSWORD={redis_password}
SPRING_REDIS_DATABASE={redis_database}

Enable Metrics

Some of metadata ODD Platform ingests can be conveniently represented in a shape of time-series chart, for example, an amount of data in a MySQL table or a physical size of a Redshift database. ODD Platform pushes metadata to the OTLP collector as a telemetry in order to be able to create charts in Prometheus, New Relic or any other backend that supports OTLP Exporters. These variables are needed to be set in order to leverage this functionality:

metrics.export.enabled: Must be set to true
metrics.export.otlp-endpoint: OTLP Collector endpoint

metrics:
    export:
        enabled: true
        otlp-endpoint: {otlp-endpoint-url}

METRICS_EXPORT_ENABLED=true
METRICS_EXPORT_OTLP-ENDPOINT={otlp-endpoint-url}

Enable Alert Notifications

Any alert that is created inside the platform can be sent via webhook and/or Slack incoming webhook and/or email notifications (via Google SMTP, AWS SMTP, etc). Such notifications contain information such as:

Name of the entity upon which alert has been created
Data source and namespace of an entity
Owners of an entity
Possibly affected entities

ODD Platform uses the PostgreSQL replication mechanism to be able to send a notification even if there's a network lag occurred or the Platform crushes. In order to enable this functionality, an underlying PostgreSQL database needs to be configured as well.

PostgreSQL Configuration

PostgreSQL database must be configured in order to leverage the replication mechanism of the Platform along with the granting the database user replication permissions.

Database settings

To configure the database, add the following entries to the postgresql.conf file:

max_wal_senders = 1
wal_keep_size = 16
wal_level = logical
max_replication_slots = 1

Or if the replication mechanism is already configured, just increment the max_wal_senders and max_replication_slots numbers.

Database user permissions

ODD Platform database user must be granted with replication permissions:

ALTER ROLE {database_username} WITH REPLICATION

User permissions and database configuration may vary from one on-demand/cloud provider to another.

For instance, In AWS RDS, PostgreSQL instances are managed services where certain aspects of replication management are automated. This is done to minimize the risk of misconfiguration. Due to this managed nature, some settings are either not exposed or are altered differently compared to a standard PostgreSQL setup. To enable notifications in such an environment, follow these steps (only differences are mentioned): 1. Alter the rds.logical_replication parameter in your database instance's Parameter Group by setting it to 1, instead of directly modifying the wal_level parameter. 2. Ensure the ODD user connecting to the database has the rds_replication role. The Master username of the database typically already has this role by default. If using a different username, you may need to assign the necessary role using the command GRANT rds_replication TO {your_database_username}; 3.If you changed max_wal_senders to 5 (as it's mentioned as a minimal value in Parameter Group) and then constantly getting messages like "The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 5 to 55" in the events list of the database instance, please, consider adjusting the parameter from 5 to the mentioned value in the parameter group to exclude automatic change done by RDS.

ODD Platform configuration

Following variables need to be defined:

notifications.enabled: must be set to true. Defaults to false
notifications.message.downstream-entities-depth: limits the amount of fetching of affected data entities in terms of lineage graph level. Defaults to 1
notifications.wal.advisory-lock-id: ODD Platform uses PostgreSQL advisory lock in order to make sure that in a case of horizontal scaling only one instance of the Platform processes alert messages. This setting defines advisory lock id. Defaults to 100
notifications.wal.replication-slot-name: PostgreSQL replication slot name will be created if it doesn't exist yet. Defaults to odd_platform_replication_slot
notifications.wal.publication-name: PostgreSQL publication name will be created if it doesn't exist yet. Defaults to odd_platform_publication_alert
notifications.receivers.slack.url: Slack incoming webhook URL
notifications.receivers.webhook.url: Generic webhook URL
notifications.receivers.email.host: the SMTP server.
notifications.receivers.email.port: the port used for the email protocol (SMTP, IMAP, or POP3)
notifications.receivers.email.protocol: the email protocol (e.g., SMTP, SMTPS, IMAP, IMAPS, POP3, POP3S)
notifications.receivers.email.smtp.auth: a boolean value (true or false) indicating whether the SMTP server requires authentication
notifications.receivers.email.smtp.starttls: a boolean indicating whether to use STARTTLS, a security protocol that upgrades an unencrypted connection to an encrypted one
notifications.receivers.email.password: the password used for email authentication
notifications.receivers.email.sender: the email address sending the notifications
notifications.receivers.email.notification.emails: the list of recipients for the email notifications
odd.platform-base-url: ODD Platform URL to be used in alert messages' hyperlinks.

ODD Platform configuration would look like this:

notifications:
  enabled: true
  message:
    downstream-entities-depth: {downstream_entities_depth_to_fetch}
  wal:
    advisory-lock-id: {postgresql_advisory_lock_id}
    replication-slot-name: {postgresql_replication_slot_name}
    publication-name: {postgresql_publication_name}
  receivers:
    slack:
      url: {slack_incoming_webhook_url}
    webhook:
      url: {webhook_url}
    email: 
      host: {host} 
      port: {port}
      protocol: {protocol}  # SMTP, SMTPS, IMAP, IMAPS, POP3, POP3S 
      smtp: 
        auth: true # Set to true if SMTP server requires authentication 
        starttls: true # Set to true to enable STARTTLS 
      password:  {email_password}
      sender: {sender_email} 
      notification: 
        emails: {[email protected],[email protected]}   
odd:
  platform-base-url: {platform_url}

NOTIFICATIONS_ENABLED=true
NOTIFICATIONS_MESSAGE_DOWNSTREAM-ENTITIES_DEPTH={downstream_entities_depth_to_fetch}
NOTIFICATIONS_WAL_ADVISORY-LOCK-ID={postgresql_advisory_lock_id}
NOTIFICATIONS_WAL_REPLICATION-SLOT-NAME={postgresql_replication_slot_name}
NOTIFICATIONS_WAL_PUBLICATION-NAME={postgresql_publication_name}
NOTIFICATIONS_RECEIVERS_SLACK_URL={slack_incoming_webhook_url}
NOTIFICATIONS_RECEIVERS_SLACK_PLATFORM-BASE-URL={odd_platform_url}
NOTIFICATIONS_RECEIVERS_WEBHOOK_URL={webhook_url}
yNOTIFICATIONS_RECEIVERS_EMAIL_HOST={host} NOTIFICATIONS_RECEIVERS_EMAIL_PORT={port} 
NOTIFICATIONS_RECEIVERS_EMAIL_PROTOCOL={protocol} # SMTP, SMTPS, IMAP, IMAPS, POP3, POP3S
NOTIFICATIONS_RECEIVERS_EMAIL_SMTP_AUTH=true      # Set to true if SMTP server requires authentication 
NOTIFICATIONS_RECEIVERS_EMAIL_SMTP_STARTTLS=true  # Set to true to enable STARTTLS
NOTIFICATIONS_RECEIVERS_STARTTLS EMAIL_PASSWORD={email_password} 
NOTIFICATIONS_RECEIVERS_EMAIL_SENDER={sender_email} 
NOTIFICATIONS_RECEIVERS_EMAIL_NOTIFICATION_EMAILS={[email protected], [email protected]}

Cleaning up

ODD Platform doesn't clean up replication slot it has created. If you need to disable Alert Notification functionality, please perform the following steps along with disabling a feature on a ODD Platform side

In order to remove replication slot and publication, these SQL queries must be run against the database:

```
SELECT pg_drop_replication_slot('<>');
```
where <> is a name of replication slot defined in the ODD Platform. Default is odd_platform_replication_slot
```
DROP PUBLICATION IF EXISTS <>;
```
where <> is a name of publication defined in the ODD Platform. Default is odd_platform_publication_alert

Enable Data Collaboration

Data collaboration feature allows users to initiate discussion about specific data entity in messengers directly from the ODD Platform. Thread replies are tracked by ODD Platform and saved in it, allowing users to retrieve conversation's context and decisions from one place.

At the moment ODD Platform supports only Slack as a target messenger. It uses Slack APIs to send messages and Slack Events API to receive message's thread replies.

Creating Slack application

Go to the Slack apps website and click on Create New App -> From an app manifest

Select a workspace you want to add an application to and click Next

Enter the following manifest into the YAML section, replace the <ODD_PLATFORM_BASE_URL> with URL of your ODD Platform deployment and click Next

display_information:
  name: ODD Data Collaboration
features:
  bot_user:
    display_name: ODD Data Collaboration
    always_online: false
oauth_config:
  scopes:
    bot:
      - channels:history
      - channels:read
      - chat:write
      - users:read
      - incoming-webhook
settings:
  event_subscriptions:
    request_url: https://<ODD_PLATFORM_BASE_URL>/api/slack/events
    bot_events:
      - message.channels
  org_deploy_enabled: false
  socket_mode_enabled: false
  token_rotation_enabled: false

Review your application's scopes and permissions and click Create

Proceed with Slack instructions on how to install application into workspace and you should be good to go.

ODD Platform configuration

Following variables need to be defined:

datacollaboration.enabled: must be set to true. Defaults to false
datacollaboration.receive-event-advisory-lock-id: PostgreSQL advisory lock id for a job, which translates events from messengers to messages. Defaults to 110
datacollaboration.sender-message-advisory-lock-id: PostgreSQL advisory lock id for a job, which sends messages created in the platform to messengers. Defaults to 120
datacollaboration.message-partition-period: time interval in days for a message table partition in PostgreSQL. Defaults to 30
datacollaboration.sending-messages-retry-count: how many times the Platform will attempt to send a message to provider. Cannot be less than zero. Defaults to 3
datacollaboration.slack-oauth-token: Slack application OAuth token used for communicating with Slack. Can be retrieved in the OAuth & Permissions section of a Slack application.\
Retrieving OAuth Token

datacollaboration:
  receive-event-advisory-lock-id: {receive_event_advisory_lock_id}
  sender-message-advisory-lock-id: {sender_message_advisory_lock_id}
  message-partition-period: {message_partition_period}
  sending-messages-retry-count: {sending-messages-retry-count}
  enabled: true
  slack-oauth-token: {slack_oauth_token}

odd:
  platform-base-url: {platform_url}

DATACOLLABORATION_ENABLED=true
DATACOLLABORATION_RECEIVE_EVENT_ADVISORY_LOCK_ID={receive_event_advisory_lock_id}
DATACOLLABORATION_SENDER_MESSAGE_ADVISORY_LOCK_ID={sender_message_advisory_lock_id}
DATACOLLABORATION_MESSAGE_PARTITION_PERIOD={message_partition_period}
DATACOLLABORATION_SENDING_MESSAGES_RETRY_COUNT={sending_messages_retry_count}
DATACOLLABORATION_SLACK_OAUTH_TOKEN={slack_oauth_token}
ODD_PLATFORM_BASE_URL={odd_platform_base_url}

Housekeeping Settings Configuration

The Housekeeping module is enabled (enabled: true) allowing for automated maintenance tasks. The Time-To-Live (TTL) settings define the retention period for following data categories:

Resolved Alerts: data related to resolved alerts will be retained for 30 days.
Search Facets: historical search facets data will be maintained for 30 days.
Data Entity Deletion: information about deleted data entities will be preserved for 30 days.

These settings ensure that unnecessary or stale data is automatically cleaned up after the specified time periods. Adjusting these TTL values allows for customization based on specific business needs and data retention policies.

housekeeping:
  enabled: true
  ttl:
    resolved_alerts_days: 30
    search_facets_days: 30
    data_entity_delete_days: 30

HOUSEKEEPING_ENABLED=true
HOUSEKEEPING_TTL_RESOLVED_ALERTS_DAYS=30
HOUSEKEEPING_TTL_SEARCH_FACETS_DAYS=30
HOUSEKEEPING_TTL_DATA_ENTITY_DELETE_DAYS=30

Detecting Stale Metadata

Stale metadata refers to the situation where the metadata becomes outdated within ODD Platform. This indicates a lack of information regarding updates from the source, which could occur due to issues such as collector not functioning as planned, the collector being deactivated or other issues in the source data system that result in the unavailability of that metadata.

By default, the refresh period is set to 7 days in the configuration file.

odd:
#  platform-base-url:
  tenant-id:
  data-entity-stale-period: 7 # days
  activity:
    partition-period: 30

    ODD_TENANT-ID_DATA_EMTITY_STALE_PERIOD=7
    ODD_ACTIVITY_PARTITION-PREIOD=30

This indicates that if the platform received information from the source over 7 days ago, the item would be labeled as "Stale" within the platform. ODD users have the flexibility to adjust this period to better suit their needs - whether opting for a shorter or longer timeframe.

Logging Settings Configuration

Logs provide detailed information about errors in the application helping its users quickly identify and fix problems. Setting up logging is recommended for ensuring operational excellence, system reliability, effective monitoring and troubleshooting. Here is a code snippet for setting up logs in ODD Platform:

logging:
  level:
    org.springframework.transaction.interceptor: info
    org.jooq.tools.LoggerListener: info
    io.r2dbc.postgresql.QUERY: info
    io.r2dbc.postgresql.PARAM: info
    org.opendatadiscovery.oddplatform.notification: info
    org.opendatadiscovery.oddplatform.housekeeping: info
    org.opendatadiscovery.oddplatform.partition: info
    org.opendatadiscovery.oddplatform.datacollaboration: info
    org.opendatadiscovery.oddplatform.service.ingestion: info

LOGGING_LEVEL_ORG_SPRINGFRAMEWORK_TRANSACTION_INTERCEPTOR: info
LOGGING_LEVEL_ORG_JOOQ_TOOLS_LOGGERLISTENER: info
LOGGING_LEVEL_IO_R2DBC_POSTGRESQL_QUERY: info
LOGGING_LEVEL_IO_R2DBC_POSTGRESQL_PARAM: info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_NOTIFICATION: info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_HOUSEKEEPING: info
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_PARTITION: info    
LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_DATACOLLABORATION: info    LOGGING_LEVEL_ORG_OPENDATADISCOVERY_ODDPLATFORM_SERVICE_INGESTION: info

Setting the logging level to info allows you to see useful messages about the platform’s functioning without being overwhelmed by too much detail as with trace or debug or missing important issues as with warn or higher level. However, feel free to adjust the logging level as needed to get more or less information based on your specific requirements.

Machine-to-Machine (M2M) Tokens Configuration

For M2M communication with the API, a secret is provided to the ODD platform before deployment. This allows the platform to bypass identity providers. When a user request is sent to the API with the correct secret, the API will respond without any issues.

auth: 
 s2s: 
  enabled: true 
  token: stringExample

AUTH_S2S_ENABLED=true 
AUTH_S2S_TOKEN=stringExample

This functionality is not the preferred method and is disabled by default, but it can be enabled and configured when needed.

PreviousDeploy to Amazon Elastic Kubernetes Service (EKS)NextEnable security

Last updated 10 months ago