# Data Collaboration

ODD Platform's **Data Collaboration** feature lets users start in-app discussions about a specific data entity, with replies tracked back into the platform from a messenger. Conversations stay attached to the entity that anchored them, so an operator returning to a dataset months later can read the original threads — context, decisions, and follow-ups — without leaving the catalog.

The feature is exposed as a **Discussions** tab on every data entity's detail page (per-entity scope; not a global hub). At the moment Slack is the only supported messenger; integration with other providers would extend the same per-entity threading model.

The feature is **disabled by default** (`datacollaboration.enabled=false`). When disabled, every `/api/datacollaboration/...` route plus the `/api/slack/events` webhook return `404 Not Found` — the controllers are gated by `@ConditionalOnDataCollaboration`.

{% hint style="warning" %}
**The Discussions tab is visible even when Data Collaboration is disabled.** The tab itself is currently rendered regardless of the flag (only hidden when the entity status is deleted). On a deployment that has not configured Data Collaboration the tab is visible but its content fails to load — operators who do not plan to enable the feature should be aware of this until the platform gates tab visibility on the flag.
{% endhint %}

## How a discussion flows

* A user opens the **Discussions** tab on a data entity, picks a Slack channel from the bot's autocomplete, and posts a thread-root message. The platform writes the message to its own message store and queues it for delivery into Slack.
* A background sender (`DataCollaborationMessageSenderJob`) drains the queue. Each message is retried up to `datacollaboration.sending-messages-retry-count` times (default `3`) before being marked failed.
* When a Slack user replies in the thread, Slack POSTs an `event_callback` to the platform's `/api/slack/events` webhook. The platform translates the reply into an in-platform message linked to the same data entity, and the new reply appears under the original thread on the Discussions tab.
* Per-entity history is queryable: thread-root messages and replies are paginated by entity ID. The platform retains its own copy of every message, so historical threads stay available even if the Slack channel rotates or the Slack workspace is migrated.

## Slack — the full Slack app, not the alert webhook

The Slack integration that powers Data Collaboration is a **full Slack app**, distinct from the outgoing webhook the [Notifications](/features/active-platform-features/notifications.md) subsystem uses for alert messages. The data-collaboration app uses OAuth (`datacollaboration.slack-oauth-token`) and the [Slack Events API](https://docs.slack.dev/apis/events-api/) to read replies back into the platform — bidirectional. The alerting Slack integration is a one-way [incoming webhook](https://docs.slack.dev/messaging/sending-messages-using-incoming-webhooks) that the platform POSTs alert messages to (`notifications.receivers.slack.url`) — no replies, no thread state.

An operator who has already set up the alert webhook still needs to set up the Slack app separately to enable Discussions; the two configurations are independent. See [Main Concepts → Terms & Aliases](/introduction/main-concepts.md#terms-and-aliases) for the canonical side-by-side comparison.

## Where to set it up

* **Operator setup** — config keys, the Slack app manifest, advisory-lock IDs, message-partition rationale, retry behavior: [Configure ODD Platform → Enable Data Collaboration](/configuration-and-deployment/odd-platform.md#enable-data-collaboration).
* **HTTP API** — the seven `/api/datacollaboration/...` routes plus the inbound `/api/slack/events` webhook contract, channel autocomplete, per-entity message paging: [API Reference → Data Collaboration](/developer-guides/api-reference/data-collaboration.md).

## Known operator caveats

The Data Collaboration surface — and specifically the Slack integration that powers it today — carries several behaviours that are non-obvious from the configuration reference and the feature description above. Each item below states what an operator might assume, what the platform actually does, and what to do today.

{% hint style="danger" %}
**The `POST /api/slack/events` webhook does not verify Slack request signatures — any internet caller can forge an event the platform will treat as Slack.** Slack's documented protocol mandates HMAC-SHA256 over the request body using the app's signing secret, delivered in the `X-Slack-Signature` header alongside `X-Slack-Request-Timestamp`. The platform's event-API controller reads the request body as a raw `Mono<String>` and dispatches it to the Slack event parser **without reading either header** — a code-side grep across the platform for `X-Slack-Signature`, `signing.secret`, `signingSecret`, `verifySignature`, and `HMAC.SHA256` returns zero matches. Any caller who can reach `/api/slack/events` over the network can `POST` a forged `event_callback` payload that the platform treats as a legitimate Slack reply, materialising arbitrary messages in the per-entity Discussions tab and triggering the same downstream processing as a real Slack delivery.

**Mitigation today.** Deploy the platform behind a reverse proxy that performs Slack HMAC-SHA256 verification on `/api/slack/events` before forwarding the request — typically a small middleware in front of an ingress controller or a dedicated Lambda-edge-style verifier. The upstream platform-side fix adds signature verification to the event-API controller itself; until it lands, the perimeter check is the only protection.
{% endhint %}

{% hint style="danger" %}
**Enabling Data Collaboration is the only access control on its routes — none of them carry an RBAC permission, and under `auth.type=DISABLED` the whole surface is anonymous.** There is no `TERM_*`-style permission for Data Collaboration: the platform's security-rule registry has no entry for the `/api/datacollaboration/...` routes or the per-entity message-history routes, so every one of them falls through to the global "any authenticated user" rule (the inbound webhook is allow-listed entirely, see the caveat above). On a `LOGIN_FORM`, `OAUTH2`, or `LDAP` deployment that means any logged-in user can post into and read every entity's discussions regardless of the Policy grants you have authored. On an `auth.type=DISABLED` deployment the global rule permits everyone, so flipping `datacollaboration.enabled=true` publishes the entire read-and-post surface to anonymous callers — the flag the feature description frames as the on/off switch is also, in that mode, the only thing between the internet and the discussion store.

**What to do.** Do not enable Data Collaboration on an internet-reachable deployment running `auth.type=DISABLED`. Run a real authentication mode, and combine it with the perimeter signature check above for the inbound webhook. Treat "Data Collaboration enabled" as "every authenticated user can use Discussions on every entity" when authoring roles — there is no per-entity or per-permission scoping today.
{% endhint %}

{% hint style="warning" %}
**The `datacollaboration.slack-oauth-token` bot OAuth token is the only thing standing between the platform and `chat:write` / `channels:read` / `channels:history` / `users:read` / `incoming-webhook` access on the Slack workspace.** The platform constructs a singleton Slack API client at boot from the token; the singleton is reused across every outbound Slack call for the platform's lifetime. Spring Boot masks **every** property value in `/actuator/env` by default (`show-values` defaults to `NEVER`), so the token value is not exposed there — but that masking is the only protection: there is no environment-variable rotation hook, no token-lifecycle integration, no fail-closed behaviour on Slack-side revocation. Any compromise of the platform process, the underlying config store, or a configured property value leaks **workspace channel enumeration plus post-as-bot capability across the bot's full OAuth scope**.

**Operator workflow.** Treat the OAuth token the same way you treat database credentials: store it only in your secrets manager (not in plaintext YAML), rotate it on a schedule, and restart the platform on every rotation so the singleton picks up the new value. If Slack-side revokes the token (admin action, app removal), the platform continues calling Slack with the revoked token until restart — there is no platform-side detection of revocation. After Slack-side revocation, restart the platform to surface the error and rotate to a fresh token.
{% endhint %}

{% hint style="warning" %}
**Slack delivers events at-least-once — the platform may materialise duplicate child messages for the same Slack thread reply.** Slack's Events API documents an at-least-once delivery contract; the platform writes incoming events to a `message_provider_event` table without a unique constraint on `(provider, event_id)` and inserts each event with a plain `INSERT` (no `ON CONFLICT` clause). A duplicate delivery for the same Slack `event_ts` inserts an additional row; the downstream message processor materialises an additional child message linked to the same per-entity thread. The retry section above describes the **outbound** retry behaviour; this caveat is about the **inbound** duplicate behaviour, which is a different mechanism.

**Operator-visible signal.** Two near-identical thread replies appearing seconds apart with the same author, text, and timestamp on the per-entity Discussions tab. The platform does not deduplicate them today; the upstream fix adds a unique constraint plus `ON CONFLICT DO NOTHING` on the event-insert path.
{% endhint %}

{% hint style="info" %}
**Slack channel autocomplete is cached for 60 seconds, matches on a prefix, and lists only channels the bot already belongs to.** The platform's Slack channel lookup uses a Caffeine async-loading cache keyed on a fixed sentinel, with `expireAfterWrite(1, MINUTES)`. When an operator invites the platform's bot to a new Slack channel, the channel becomes reachable from the bot's perspective immediately on the Slack side — but the platform's autocomplete continues to return the cached list (which does not include the new channel) for up to 60 seconds. After the cache TTL expires, the next autocomplete request triggers a fresh fetch and the channel appears.

Two further behaviours shape what the autocomplete shows. The typed filter is a **prefix** match (`startsWith`), not a substring match — typing the middle of a channel name returns no suggestions, so users must type from the start of the channel name. And the list only includes **public channels the bot has been added to**; private channels, DMs, and archived channels are never offered, so a channel that is missing from the picker usually means the bot has not been invited to it.

If you cannot wait 60 seconds (operator demo, time-sensitive setup), restarting the platform forces the cache to rebuild on first call.
{% endhint %}

## Where to next

* For the Slack alert webhook (a different Slack integration on the same workspace) → [Notifications → Slack incoming webhook](/features/active-platform-features/notifications.md#slack-incoming-webhook).
* For the system-driven per-entity event stream the platform emits alongside user-authored discussion messages → [Activity Feed](/features/active-platform-features/activity-feed.md).
* For the canonical side-by-side comparison of the two Slack integrations → [Main Concepts → Terms & Aliases](/introduction/main-concepts.md#terms-and-aliases).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/features/active-platform-features/data-collaboration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.