# Architecture

This page is the **structural mental model** of an Open Data Discovery deployment — what runs where, how metadata flows from a source system to a user's screen, and which architectural concerns cross every component. It is the front door before [Features](/features/features.md), [Integrations](/integrations/integrations.md), and [Configure ODD Platform](/configuration-and-deployment/odd-platform.md). For the producer-side vocabulary used here (Adapter, Plugin, Collector, Push adapter), see [Main Concepts → The architecture chain](/introduction/main-concepts.md#the-architecture-chain); this page operates in **client-server topology framing** (Push-client, Collector, Platform, Server).

![](/files/B58jM0c4DqDZUzXvvIkB)

## Data flow

Metadata moves through five stages between a source system and a catalog user:

1. **Produce.** A source system has metadata that needs to surface in the catalog — a database schema, a job graph, a dbt manifest, a Spark lineage event, a Lookup-Table row.
2. **Ingest.** A producer (Collector or Push-client) sends metadata to the platform's Ingestion API. Pull producers (Collectors) poll on a schedule; push producers (in-process plugins, gateways, SDK callers) emit on the source's own cadence. Both speak the [ODD Specification](https://github.com/opendatadiscovery/opendatadiscovery-specification) — the wire contract.
3. **Store.** The platform writes the metadata to PostgreSQL keyed by [ODDRN](/introduction/main-concepts.md#oddrn). Same-ODDRN means same entity across ingests, across producers, over time — that is what makes cross-system lineage possible.
4. **Query.** UI calls and external scripts hit the Platform API (`/api/...`). Reads serve the catalog (search, lineage, alerts, glossary, query examples, relationships); writes mutate the catalog (ownership, tags, alert status, halt configuration, lookup-table rows). See the [API Reference hub](/developer-guides/api-reference.md) for the full surface.
5. **Render.** The platform UI (served from the same process) renders the catalog: search, entity pages, lineage graphs, alert tabs, the Directory drill-down, the Catalog Overview home page.

The Push-client / Collector split is **only at stage 2** — every later stage is identical regardless of which producer family fed the catalog.

## Deployment topology

| Component                            | What it is                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | What an operator deploys                                                                                                     | Configuration home                                                                                                                           |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **Platform (Server)**                | The Spring-Boot application: Ingestion API, Platform API, UI, scheduled jobs (housekeeping, alerting, data-collaboration sender).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | One Platform process plus PostgreSQL.                                                                                        | [`configuration-and-deployment/odd-platform.md`](/configuration-and-deployment/odd-platform.md).                                             |
| **Collector**                        | Container of **pull** adapters plus the runtime around them (adapter launcher, logger, Platform-API client, scheduler). The canonical implementation is [`odd-collector`](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector); cloud-specific siblings are [`odd-collector-aws`](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector-aws), [`odd-collector-azure`](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector-azure), [`odd-collector-gcp`](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector-gcp), and [`odd-collector-profiler`](https://github.com/opendatadiscovery/odd-collector-profiler). | One Collector container per cloud / source-family group, each holding many configured **plugins** (one per source instance). | [`developer-guides/build-and-run/build-and-run-odd-collectors.md`](/developer-guides/build-and-run/build-and-run-odd-collectors.md).         |
| **Push-client (in-process plugin)**  | A push-strategy adapter that runs **inside** the source system's runtime — a dbt plugin, an Airflow plugin, a Great Expectations checkpoint action, a Spark listener, an `odd-cli` invocation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Installed alongside the source application; emits metadata on the source's own cadence.                                      | Per-tool repos under [opendatadiscovery on GitHub](https://github.com/opendatadiscovery); see [Integrations](/integrations/integrations.md). |
| **Push-client (standalone gateway)** | A push-strategy adapter that runs as its own service. Source systems push over an externally-defined wire protocol (today: OpenTelemetry/OTLP for [`odd-tracing-gateway`](/integrations/integrations/odd-tracing-gateway.md)); the gateway processes the input and exposes the inferred entities for the Platform / a collector to pull through the standard adapter-contract entities API.                                                                                                                                                                                                                                                                                                                  | One gateway process per network perimeter that needs aggregated push ingress.                                                | [`integrations/auxiliary/odd-tracing-gateway.md`](/integrations/integrations/odd-tracing-gateway.md).                                        |
| **UI**                               | Single-page React application served from the Platform process at `/`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Same process as the Platform — operators do not deploy the UI separately.                                                    | (Configured indirectly through `odd.platform-base-url`; see [Configure ODD Platform](/configuration-and-deployment/odd-platform.md).)        |

Centralised: the Platform (one server, one PostgreSQL) and the UI (served from the same process). Distributed: every Collector and every Push-client lives in or beside its source system. The reason ODD scales to many sources is that the producer side is horizontally distributable while the catalog stays a single coherent surface.

## Cross-cutting concerns

A landing-level pointer per concern; every link below has its own canonical home with the full operator detail.

* **Authentication.** UI / Platform-API auth (Disabled / Login form / OAUTH2 / LDAP) plus separate Server-to-server (S2S) tokens for programmatic clients plus an independent Ingestion-API filter for producer traffic. See [Enable security](/configuration-and-deployment/enable-security.md) — the three surfaces are deliberately decoupled.
* **Alerting.** Platform-detected (failed jobs, failed DQ tests, schema-incompatible changes, distribution anomalies) and externally-injected (Prometheus AlertManager via `/ingestion/alert/alertmanager`). Dispatch goes to in-app tabs, optional Slack webhook, optional email. See [Active platform features → Alerting](/features/active-platform-features/alerting.md) and [Active platform features → Notifications](/features/active-platform-features/notifications.md), with the operator-side configuration on [Configure ODD Platform → Enable Alert Notifications](/configuration-and-deployment/odd-platform.md#enable-alert-notifications).
* **Lineage.** Cross-system upstream / downstream graphs at entity granularity, plus group lineage for Data Entity Groups (including ML experiments). See [Data Lineage → Data Objects Lineage](/features/data-lineage/data-objects.md) and the [API Reference → Lineage](/developer-guides/api-reference/lineage.md) sub-page.
* **Search.** Free-text plus seven facets — Datasource, Type, Namespace, Owner, Tag, Groups, Statuses. Complemented by the Directory's hierarchy-driven browse. See [Data Discovery](/features/data-discovery.md) and the dedicated [Search and Filtering](/features/data-discovery/search.md) page for the per-facet semantics and the per-result transparency icons.
* **Attachments.** Per-entity files (PNG / PDF / docs) stored locally or to a REMOTE S3-compatible bucket. The default is **local file system** (`./attachments/`) — explicitly switch to `REMOTE` for production deployments. See [Configure ODD Platform → Attachment storage](/configuration-and-deployment/odd-platform.md#attachment-storage) for the operator caveats.
* **Data Collaboration.** Optional Slack-based per-entity discussion threads (full Slack app via OAuth + Events API webhook). Distinct from the alert webhook. See [Active platform features → Data Collaboration](/features/active-platform-features/data-collaboration.md) and [Configure ODD Platform → Enable Data Collaboration](/configuration-and-deployment/odd-platform.md#enable-data-collaboration).
* **GenAI proxy.** Optional thin proxy from the platform to an external AI service the operator runs. The platform itself does not embed an LLM. See [Active platform features → GenAI assistant](/features/active-platform-features/genai.md).

## Pull vs Push — when to choose which

Both topologies feed the same catalog through the same Ingestion API. The choice is operational:

* **Pull (Collector)** when the source is a **passive data store** (database, warehouse, BI tool, ML registry, message broker) and you want **point-in-time snapshots on a cadence**. The Collector drives; the source has no awareness of the catalog. Most data-source integrations work this way.
* **Push (Push-client)** when the source is an **already-running application** that you can instrument — Airflow DAGs, dbt runs, Spark jobs, Great Expectations validations, your own services calling `odd-cli` — and you want **per-run lineage and results reported as they happen**. The source drives; latency from event to catalog is bounded by the producer's own emit cadence.
* **Both at once** is normal: a pull Collector indexes the warehouse catalog while an Airflow Push-client reports per-run lineage on top of it.

For the in-the-spec view of push-strategy producers, see the [push model section of the ODD Specification](https://github.com/opendatadiscovery/opendatadiscovery-specification/blob/main/specification/specification.md#push-model).

## ODDRN

**ODDRN** (Open Data Discovery Resource Name) is the stable string that identifies every entity in the system — a dataset, a column, a data source, a pipeline run. Producers generate an ODDRN for each entity they emit so the platform can recognise the same entity across ingests, across producers, and over time. ODDRN is what makes cross-system lineage possible, what makes idempotent ingests possible, and what gives the AlertManager webhook its `entity_oddrn` routing key.

Operators rarely interact with ODDRNs directly — they become relevant when authoring a custom adapter. See [ODDRN](/introduction/main-concepts.md#oddrn) for the format, examples, and the generator libraries for Python and Java; see [Build a custom collector](/developer-guides/build-and-run/custom-collectors.md) for the end-to-end Python pattern.

## Where to read the code

The mapping from this overview to the actual code lives in the workspace's navigation domain pages — `navigation/domains/{feature}.md` files maintain controller / service / configuration / UI pointers per feature so a reader does not need to grep. The contributor-facing entry points on the public doc tree are:

* [GitHub organization overview](/developer-guides/github-organization-overview.md) — every ODD repository with a one-line summary.
* [Build and run](/developer-guides/build-and-run.md) — Platform and Collector build / deploy walkthroughs, plus the [Build a custom collector](/developer-guides/build-and-run/custom-collectors.md) developer guide.
* [Main Concepts](/introduction/main-concepts.md) — the producer-side vocabulary (Adapter / Plugin / Collector / Push adapter / Data source) and the Data Governance map (which pillars ODD covers, which are roadmap).
* [API Reference](/developer-guides/api-reference.md) — the canonical hub for every HTTP endpoint, with per-feature sub-pages.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/introduction/architecture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.