Configure ODD Platform

This section defines how to configure ODD Platform in order to leverage all of its functionality and features.

Configuration approaches

There are two ways to configure the Platform:

  • Environment variables are used for simple entries

  • Configuring via YAML can come in handy when it is necessary to define a complex configuration block (e.g OAuth2 authentication or logging levels).

YAML entries VS environment variables

Here is an example of how to define the following block and configure the Platform with it using environment variables.

YAML:

spring:
    datasource:
        url: URL
        username: USERNAME
        password: PASSWORD
    custom-datasource:
        url: URL
        username: USERNAME
        password: PASSWORD

To configure the Platform using environment variables, replace semicolons with underscores and uppercasing words, like so:

  • SPRING_DATASOURCE_URL=URL

  • SPRING_DATASOURCE_USERNAME=USERNAME

  • SPRING_DATASOURCE_PASSWORD=PASSWORD

  • SPRING_CUSTOM_DATASOURCE_URL=URL

  • SPRING_CUSTOM_DATASOURCE_USERNAME=USERNAME

  • SPRING_CUSTOM_DATASOURCE_PASSWORD=PASSWORD

Connect your database

For all of its features ODD Platform uses PostgreSQL database and PostgreSQL database only. These variables are needed to be defined to connect ODD Platform to database:

  • spring.datasource.url: JDBC string of your PostgreSQL database. Default value is jdbc:postgresql://127.0.0.1:5432/odd-platform

  • spring.datasource.username: your PostgreSQL user's name. Default value is odd-platform

  • spring.datasource.password: your PostgreSQL user's password. Default value is odd-platform-password

These variables are optional (by default, they have the same value as spring.datasource) and will be used to connect to PostgreSQL and store Lookup Tables :

  • spring.custom-datasource.url: JDBC string of your PostgreSQL database where we store Lookup Tables. Default value is jdbc:postgresql://127.0.0.1:5432/odd-platform. Note: you can specify any {database_host}, {database_port} or {database_name} but schema, where Lookup Tables are stored always is lookup_tables_schema.

  • spring.custom-datasource.username: your PostgreSQL user's name for custom-datasource. Default value is odd-platform

  • spring.custom-datasource.password: your PostgreSQL user's password for custom-datasource. Default value is odd-platform-password

So that your database connection defining block would look like this:

spring:
    datasource:
        url: jdbc:postgresql://{database_host}:{database_port}/{database_name}
        username: {database_username}
        password: {database_password}
#    [OPTIONAL]
     custom-datasource:
        url: jdbc:postgresql://{database_host}:{database_port}/{database_name}
        username: {database_username}
        password: {database_password}

Security

Please follow Enable security section for enabling security in ODD Platform.

Select session provider

ODD Platform is able to keep users' sessions in several places such as in memory, PostgreSQL database or Redis. A session provider can be set via session.provider variable with following expected values:

  • IN_MEMORY: Local in-memory storage. ODD Platform defaults to this value

  • INTERNAL_POSTGRESQL: Underlying PostgreSQL database

If you'd like to use only one instance of ODD Platform and you're ready to tolerate users' logouts each time the Platform restarts, the best choice would be IN_MEMORY

If you already have a Redis in your infrastructure or you're willing to install it, the best choice would be REDIS

Otherwise INTERNAL_POSTGRESQL is the best pick

In memory (default)

session:
    provider: IN_MEMORY

Internal PostgreSQL

session:
    provider: INTERNAL_POSTGRESQL

Redis

In order to connect to Redis following variables are needed to be defined:

  • spring.redis.host: Redis host

  • spring.redis.port: Redis port

  • spring.redis.username: Redis user's name

  • spring.redis.password: Redis user's password

  • spring.redis.database: Redis database index

YAML for Redis session provider

spring:
    redis:
        host: {redis_host}
        port: {redis_port}
        username: {redis_username}
        password: {redis_password}
session:
    provider: REDIS

Enable Metrics

Some of metadata ODD Platform ingests can be conveniently represented in a shape of time-series chart, for example, an amount of data in a MySQL table or a physical size of a Redshift database. ODD Platform pushes metadata to the OTLP collector as a telemetry in order to be able to create charts in Prometheus, New Relic or any other backend that supports OTLP Exporters. These variables are needed to be set in order to leverage this functionality:

  • metrics.export.enabled: Must be set to true

  • metrics.export.otlp-endpoint: OTLP Collector endpoint

metrics:
    export:
        enabled: true
        otlp-endpoint: {otlp-endpoint-url}

Enable Alert Notifications

Any alert that is created inside the platform can be sent via webhook and/or Slack incoming webhook and/or email notifications (via Google SMTP, AWS SMTP, etc). Such notifications contain information such as:

  1. Name of the entity upon which alert has been created

  2. Data source and namespace of an entity

  3. Owners of an entity

  4. Possibly affected entities

ODD Platform uses the PostgreSQL replication mechanism to be able to send a notification even if there's a network lag occurred or the Platform crushes. In order to enable this functionality, an underlying PostgreSQL database needs to be configured as well.

PostgreSQL Configuration

PostgreSQL database must be configured in order to leverage the replication mechanism of the Platform along with the granting the database user replication permissions.

Database settings

To configure the database, add the following entries to the postgresql.conf file:

max_wal_senders = 1
wal_keep_size = 16
wal_level = logical
max_replication_slots = 1

Or if the replication mechanism is already configured, just increment the max_wal_senders and max_replication_slots numbers.

Database user permissions

ODD Platform database user must be granted with replication permissions:

ALTER ROLE {database_username} WITH REPLICATION

User permissions and database configuration may vary from one on-demand/cloud provider to another.

For instance, In AWS RDS, PostgreSQL instances are managed services where certain aspects of replication management are automated. This is done to minimize the risk of misconfiguration. Due to this managed nature, some settings are either not exposed or are altered differently compared to a standard PostgreSQL setup. To enable notifications in such an environment, follow these steps (only differences are mentioned): 1. Alter the rds.logical_replication parameter in your database instance's Parameter Group by setting it to 1, instead of directly modifying the wal_level parameter. 2. Ensure the ODD user connecting to the database has the rds_replication role. The Master username of the database typically already has this role by default. If using a different username, you may need to assign the necessary role using the command GRANT rds_replication TO {your_database_username}; 3.If you changed max_wal_senders to 5 (as it's mentioned as a minimal value in Parameter Group) and then constantly getting messages like "The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 5 to 55" in the events list of the database instance, please, consider adjusting the parameter from 5 to the mentioned value in the parameter group to exclude automatic change done by RDS.

ODD Platform configuration

Following variables need to be defined:

  • notifications.enabled: must be set to true. Defaults to false

  • notifications.message.downstream-entities-depth: limits the amount of fetching of affected data entities in terms of lineage graph level. Defaults to 1

  • notifications.wal.advisory-lock-id: ODD Platform uses PostgreSQL advisory lock in order to make sure that in a case of horizontal scaling only one instance of the Platform processes alert messages. This setting defines advisory lock id. Defaults to 100

  • notifications.wal.replication-slot-name: PostgreSQL replication slot name will be created if it doesn't exist yet. Defaults to odd_platform_replication_slot

  • notifications.wal.publication-name: PostgreSQL publication name will be created if it doesn't exist yet. Defaults to odd_platform_publication_alert

  • notifications.receivers.slack.url: Slack incoming webhook URL

  • notifications.receivers.webhook.url: Generic webhook URL

  • notifications.receivers.email.sender: email address, which will be used for sending email

  • notifications.receivers.email.password: password for the sender email address

  • notifications.receivers.email.smtp: SMTP host, which is used for sending email

  • notifications.receivers.email.port: SMTP host port

  • notifications.receivers.email.notification.emails: email addresses, which will receive notifications from ODD

  • odd.platform-base-url: ODD Platform URL to be used in alert messages' hyperlinks.

ODD Platform configuration would look like this:

notifications:
  enabled: true
  message:
    downstream-entities-depth: {downstream_entities_depth_to_fetch}
  wal:
    advisory-lock-id: {postgresql_advisory_lock_id}
    replication-slot-name: {postgresql_replication_slot_name}
    publication-name: {postgresql_publication_name}
  receivers:
    slack:
      url: {slack_incoming_webhook_url}
    webhook:
      url: {webhook_url}
    email:
      sender: {your_sender_email}
      password: {password}
      smtp: {smtp_host}
      port: {smtp_port}
      notification:
        emails: {your_1@mail.com,your_2@mail.com}  
odd:
  platform-base-url: {platform_url}

Cleaning up

ODD Platform doesn't clean up replication slot it has created. If you need to disable Alert Notification functionality, please perform the following steps along with disabling a feature on a ODD Platform side

In order to remove replication slot and publication, these SQL queries must be run against the database:

  • SELECT pg_drop_replication_slot('<>');

    where <> is a name of replication slot defined in the ODD Platform. Default is odd_platform_replication_slot

  • DROP PUBLICATION IF EXISTS <>;

    where <> is a name of publication defined in the ODD Platform. Default is odd_platform_publication_alert

Enable Data Collaboration

Data collaboration feature allows users to initiate discussion about specific data entity in messengers directly from the ODD Platform. Thread replies are tracked by ODD Platform and saved in it, allowing users to retrieve conversation's context and decisions from one place.

At the moment ODD Platform supports only Slack as a target messenger. It uses Slack APIs to send messages and Slack Events API to receive message's thread replies.

Creating Slack application

Go to the Slack apps website and click on Create New App -> From an app manifest

Select a workspace you want to add an application to and click Next

Enter the following manifest into the YAML section, replace the <ODD_PLATFORM_BASE_URL> with URL of your ODD Platform deployment and click Next

display_information:
  name: ODD Data Collaboration
features:
  bot_user:
    display_name: ODD Data Collaboration
    always_online: false
oauth_config:
  scopes:
    bot:
      - channels:history
      - channels:read
      - chat:write
      - users:read
      - incoming-webhook
settings:
  event_subscriptions:
    request_url: https://<ODD_PLATFORM_BASE_URL>/api/slack/events
    bot_events:
      - message.channels
  org_deploy_enabled: false
  socket_mode_enabled: false
  token_rotation_enabled: false

Review your application's scopes and permissions and click Create

Proceed with Slack instructions on how to install application into workspace and you should be good to go.

ODD Platform configuration

Following variables need to be defined:

  • datacollaboration.enabled: must be set to true. Defaults to false

  • datacollaboration.receive-event-advisory-lock-id: PostgreSQL advisory lock id for a job, which translates events from messengers to messages. Defaults to 110

  • datacollaboration.sender-message-advisory-lock-id: PostgreSQL advisory lock id for a job, which sends messages created in the platform to messengers. Defaults to 120

  • datacollaboration.message-partition-period: time interval in days for a message table partition in PostgreSQL. Defaults to 30

  • datacollaboration.sending-messages-retry-count: how many times the Platform will attempt to send a message to provider. Cannot be less than zero. Defaults to 3

  • datacollaboration.slack-oauth-token: Slack application OAuth token used for communicating with Slack. Can be retrieved in the OAuth & Permissions section of a Slack application.\

datacollaboration:
  receive-event-advisory-lock-id: {receive_event_advisory_lock_id}
  sender-message-advisory-lock-id: {sender_message_advisory_lock_id}
  message-partition-period: {message_partition_period}
  sending-messages-retry-count: {sending-messages-retry-count}
  enabled: true
  slack-oauth-token: {slack_oauth_token}

odd:
  platform-base-url: {platform_url}

Housekeeping Settings Configuration

The Housekeeping module is enabled (enabled: true) allowing for automated maintenance tasks. The Time-To-Live (TTL) settings define the retention period for following data categories:

  • Resolved Alerts: data related to resolved alerts will be retained for 30 days.

  • Search Facets: historical search facets data will be maintained for 30 days.

  • Data Entity Deletion: information about deleted data entities will be preserved for 30 days.

These settings ensure that unnecessary or stale data is automatically cleaned up after the specified time periods. Adjusting these TTL values allows for customization based on specific business needs and data retention policies.

housekeeping:
  enabled: true
  ttl:
    resolved_alerts_days: 30
    search_facets_days: 30
    data_entity_delete_days: 30

Detecting Stale Metadata

Stale metadata refers to the situation where the metadata becomes outdated within ODD Platform. This indicates a lack of information regarding updates from the source, which could occur due to issues such as collector not functioning as planned, the collector being deactivated or other issues in the source data system that result in the unavailability of that metadata.

By default, the refresh period is set to 7 days in the configuration file.

odd:
#  platform-base-url:
  tenant-id:
  data-entity-stale-period: 7 # days
  activity:
    partition-period: 30

This indicates that if the platform received information from the source over 7 days ago, the item would be labeled as "Stale" within the platform. ODD users have the flexibility to adjust this period to better suit their needs - whether opting for a shorter or longer timeframe.

Last updated