# Main Concepts

This page introduces the core vocabulary of the Open Data Discovery (ODD) project. It is a **map** — each concept gets a short definition and a link to its canonical deep-dive page.

> **Not the Business Glossary.** ODD Platform ships an in-app **Business Glossary** feature (term entities you can link to datasets, term-to-term relationships, ownership). That is a different thing from this docs page. See the [Business Glossary](/features/data-glossary/business-glossary.md) feature page for the product feature.

## The architecture chain

Metadata flows from data systems into the platform along two paths — **pull** (a collector polls the source) and **push** (an adapter embedded inside the source's runtime emits directly to the platform):

* **Pull path:** Data source ← Pull adapter (wrapped as a **Plugin** inside a **Collector**) → ODD Platform
* **Push path:** Data source's application runtime → **Push adapter** → ODD Platform

The producer-side concepts:

* **Data source** — a system holding data or data-adjacent metadata: a database, a warehouse, a BI tool, an ML training registry, an orchestrator.
* **Adapter** — a set of scripts that map metadata from a source system (PostgreSQL, MySQL, Airflow, Kafka, …) to the ODD specification — Data Entities, data types, lineage edges, quality tests. An adapter's job is **extract-and-map**, nothing more. Adapters come in two flavours: **pull** (reads from the source on a schedule) and **push** (emits from inside the source's runtime). An adapter never runs alone; it is either hosted by a collector (pull) or packaged as a push adapter (push).
* **Plugin** — a *configured* adapter instance inside a collector. One plugin carries one adapter's connection and schedule settings (source host, database, credentials, cadence). A single collector can host many plugins — multiple instances of the same adapter type (e.g., two PostgreSQL plugins pointing at different hosts or databases) or plugins for different adapter types.
* **Collector** — a container of **pull** adapters plus the runtime around them: adapter launcher, logging system, Platform-API client, configuration reader, scheduling. A collector is what you deploy; the pull adapters inside it are the mappers, and each one is configured via a plugin. The canonical implementation is `odd-collector` with 40+ bundled pull adapters; specialist collectors exist for AWS, GCP, Azure, and data profiling. A collector is **not** a synonym for "pull adapter".
* **Push adapter** (also known as **push-client**) — a push-strategy adapter; the source initiates the data flow and the adapter knows the platform's endpoint. Push adapters ship in three deployment shapes:

  * **In-process plugin / extension** — embedded in the source system's own runtime: a dbt plugin, a Great Expectations checkpoint action, an Airflow plugin, a Spark listener. The most common shape today.
  * **Standalone gateway** — a separate service that source systems push to (today's only example: [`odd-tracing-gateway`](/integrations/integrations/odd-tracing-gateway.md), which receives OpenTelemetry traces). Operator-mental-model is "push"; the Platform-side leg is a pull hidden behind the gateway's standalone deployment.
  * **Direct SDK / CLI use** — push via a CLI or library call from custom code (`odd-cli` invocation, custom Python using `odd-models-package`).

  All three shapes are extract-and-map adapters; what differs from a pull adapter is deployment topology — the adapter does not live in a collector container.
* **ODD Platform** — the central server: stores the metadata, provides search, lineage, ownership, alerts, DQ dashboards, and the UI.

Pick **pull** when the source is a data store and you want point-in-time snapshots on a cadence — most data-source integrations work this way, since the source is passive and the collector drives. Pick **push** when the source is an application already running code we can instrument — Airflow DAGs, dbt runs, Spark jobs, Great Expectations validations — and you want each run's lineage and results reported as they happen. Some ecosystems combine both: a pull collector indexes the catalog while a push-client reports per-run lineage.

See [Architecture.md](/introduction/architecture.md) for the diagram, [developer-guides/build-and-run/build-and-run-odd-collectors.md](/developer-guides/build-and-run/build-and-run-odd-collectors.md) for deployment detail, and the [specification's push-model note](https://github.com/opendatadiscovery/opendatadiscovery-specification/blob/main/specification/specification.md#push-model) for protocol-level detail.

## ODDRN

**ODDRN** (**Open Data Discovery Resource Name**) is the unique, stable string that identifies every entity in the system — a dataset, a column, a data source, a pipeline run, a transformer. Producers (collectors, push adapters, custom agents) must generate an ODDRN for each entity they report so the platform can recognise the same entity across ingests, across producers, and over time. ODDRNs are what make cross-system lineage possible.

**Format.** Every ODDRN starts with a double slash and the data-source family, followed by the connection coordinates that uniquely locate the entity in the world — host for self-hosted databases, AWS account ID + region for cloud services, etc. The format follows REST URL conventions:

```
//postgresql/host/1.2.3.4/databases/ex_database/schemas/public/table/ex_table
```

where:

* `1.2.3.4` — the PostgreSQL instance host
* `ex_database` — the target database
* `public` — the target schema
* `ex_table` — the target table

**Usage.** ODDRNs power the [Ingestion API](#odd-specification) — the same string identifying the same entity across ingests is what lets the platform decide whether to create new entities, update existing ones, or delete obsolete ones on each payload. Operators rarely see ODDRNs directly; they become relevant when writing a custom agent. To assist, ODD ships open-source generator libraries for [Python](https://pypi.org/project/oddrn-generator/) and [Java](https://mvnrepository.com/artifact/org.opendatadiscovery/oddrn-generator-java); the [Build a custom collector](/developer-guides/build-and-run/custom-collectors.md) walkthrough covers the Python pattern end-to-end, including which `Generator` subclass to use per source family.

**Known limitation.** All consumers of the Ingestion API must use the **same** ODDRN string for the **same** entity. Since ODDRNs encode connection coordinates, this means agents reporting on the same data infrastructure must agree on hostnames or static IPs — coordinate identifiers across your deployment if multiple agents touch the same source.

## ODD Specification

The [**ODD Specification**](https://github.com/opendatadiscovery/opendatadiscovery-specification) is the wire contract between producers (collectors, push-clients) and the platform — the Ingestion API schema. It decouples the two sides: any producer that speaks the specification can feed any compliant platform. This is what makes custom agents and third-party collectors possible.

## Data Governance map

A structured view of how ODD's functionality maps onto recognised data governance pillars. Use this to answer "does ODD do X?" for your governance framework.

* **Data Discovery** — *available.* The core of the platform: catalog search with multiple facets, entity pages (datasets, transformers, consumers, quality tests, ML models), tags, ownership, and the Directory view. See the [Data Discovery](/features/data-discovery.md) pillar landing for the four entry paths (Search, Directory, Tagging, Data Entity Groups & Domains) and the Catalog Overview home page.
* **Data Lineage** — *available.* Upstream and downstream lineage across the full entity model, not just datasets — pipelines, ML experiments, and quality tests all participate, plus microservices traced through OpenTelemetry. See the [Data Lineage](/features/data-lineage.md) pillar landing.
* **Data Quality** — *available.* Per-entity test results surfaced on entity pages, the catalog-wide Data Quality dashboard, and operator-set Minor / Major / Critical SLA statuses. See the [Data Quality](/features/data-quality.md) pillar landing and [Visibility for Data Quality Engineer](/use-cases/use-cases/dq-visibility.md).
* **Data Modeling** — *partially available.* [Data Entity Groups](/features/data-discovery/groups-domains.md) (DEGs) for logical grouping and entity relationship / ERD views today. Schema evolution signals (backwards-incompatible change triggers) are surfaced in alerts. See [Dataset schema diff](/features/data-discovery/schema-diff.md) and the [Data Modelling](/features/data-modelling.md) pillar.
* **Data Glossary** — *available.* The in-app **Business Glossary** feature — term entities with term-to-term and term-to-data-entity linking, ownership, tags. Distinct from this Main Concepts page: Business Glossary is a product feature, Main Concepts is documentation. See the [Data Glossary](/features/data-glossary.md) pillar landing and the [Business Glossary](/features/data-glossary/business-glossary.md) reference.
* **Master Data Management (incl. Reference Data Management)** — *partially available.* **Lookup Tables** provide operator-managed reference data as first-class entities in the catalog. Full MDM semantics (golden records, survivorship rules, stewardship workflows) are not part of ODD today — what ships is reference-data management. See the [Master Data Management](/features/master-data-management.md) pillar landing and the [Lookup Tables](/features/master-data-management/lookup-tables.md) feature page.
* **Data Cost** — *roadmap.* Cost attribution to datasets, pipelines, and owners is not implemented today.
* **Data Security (governance-level)** — *roadmap.* Data classification, sensitivity tagging, PII/PHI handling, and fine-grained data-access control sit on the roadmap. **This is different from platform-access security** (who can log in, what roles they have, what policies apply to the UI/API) — that is already shipped and documented under [configuration-and-deployment/enable-security/README.md](/configuration-and-deployment/enable-security.md).

### Pillar differentiation

The six available / partially-available pillars are conceptually distinct because each captures a different *operator workflow*:

* **Data Discovery** is **location-oriented** — finding existing entities by search, browse, or home-page surfacing. Entities come from collectors and push adapters; this pillar provides the navigation paths into the catalog.
* **Data Modelling** is **contract-oriented** — describing how a dataset is queried ([Query Examples](/features/data-modelling/query-examples.md)) and connected ([Relationships / ERDs](/features/data-modelling/relationships.md)). The dataset itself comes from outside; the platform records intent and structure on top.
* **Master Data Management** is **operator-curated reference data** — the canonical lookup tables managed inside the platform. There is no external source; the platform is the system of record.
* **Data Lineage** is **connection-oriented** — describing how entities flow into and out of each other across pipelines and microservices. The lineage is the cross-pillar record because every entity has a structure, a meaning, a location, a quality signal, *and* a lineage.
* **Data Glossary** is **meaning-oriented** — naming and describing the concepts the data represents. Terms are first-class catalog entities with their own lifecycle, ownership, RBAC, and search surface; not metadata attached to other entities.
* **Data Quality** is **correctness-oriented** — test results, anomaly classes, dataset SLAs. Every catalogued dataset has a quality story, even if it is only "no checks defined".

That difference shows up in where the data lives: Data Modelling artefacts attach to existing entities; Master Data artefacts *are* entities (Lookup Tables exist as Data Entities of type `LOOKUP_TABLE`); Data Quality results are pushed in by external frameworks; Lineage edges are computed from the connection graph. The six pillars sit alongside each other in the Data Governance map above, not nested.

## AI aspects

ODD integrates AI/GenAI capabilities in a few places:

* **GenAI assistant** — opt-in proxy from a single platform endpoint to an external AI service the operator runs (the platform does not embed an LLM). API-only today. See the [GenAI assistant](/features/active-platform-features/genai.md) page for configuration, the external service contract, and operator caveats.
* **Data profiling** — automatic statistical profiles for datasets (null ratios, distributions, cardinality) via `odd-collector-profiler`. Surfaces on entity pages.
* **ML experiment / model lineage** — experiments and trained models are first-class entities with their own lineage edges; useful for reproducibility and governance of ML pipelines.

## Terms & Aliases

A living record of synonyms and aliases users may search for. If you know a feature by a different name, start here.

| Canonical term                        | Also known as                                                             | What it is                                                                                                                                                                                                                                                                                                                                                                                                                              | Details                                                                                                                                    |
| ------------------------------------- | ------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| Server-to-server (S2S) authentication | Machine-to-machine (M2M) tokens, M2M auth                                 | Static API-key authentication for programmatic clients                                                                                                                                                                                                                                                                                                                                                                                  | [s2s.md](/configuration-and-deployment/enable-security/authentication/s2s.md)                                                              |
| Ingestion authentication filter       | Ingestion filter, ingestion API key                                       | Token-based auth for `/ingestion/**` — independent of UI auth, off by default                                                                                                                                                                                                                                                                                                                                                           | Ingestion authentication in [Enable security](/configuration-and-deployment/enable-security.md)                                            |
| Collector secrets backend             | Alternative secrets backend                                               | Store collector credentials in an external secret store (AWS SSM) instead of YAML                                                                                                                                                                                                                                                                                                                                                       | [collectors-secrets-backend.md](/configuration-and-deployment/collectors-secrets-backend.md)                                               |
| ODDRN                                 | Open Data Discovery Resource Name                                         | Stable string identifying every entity in the system                                                                                                                                                                                                                                                                                                                                                                                    | [ODDRN section above](#oddrn)                                                                                                              |
| Business Glossary (feature)           | Glossary, Terms                                                           | In-app feature for managing term entities and linking them to datasets — not this Main Concepts page                                                                                                                                                                                                                                                                                                                                    | [Business Glossary](/features/data-glossary/business-glossary.md) under [Data Glossary](/features/data-glossary.md)                        |
| Data Entity Group                     | DEG                                                                       | Logical grouping of data entities inside the catalog                                                                                                                                                                                                                                                                                                                                                                                    | [Data Entity Groups & Domains](/features/data-discovery/groups-domains.md) under [Data Discovery](/features/data-discovery.md)             |
| ML Experiments                        | ML Experiment Logging (deprecated)                                        | A Data Entity Group collecting the entities produced by one training run — inputs, jobs, models, artifacts. Catalog view, not a metrics tracker.                                                                                                                                                                                                                                                                                        | [Data Entity Groups & Domains → Relationship to ML Experiments](/features/data-discovery/groups-domains.md#relationship-to-ml-experiments) |
| ODD Specification                     | Ingestion API spec, ingress API                                           | Wire contract between producers and the platform                                                                                                                                                                                                                                                                                                                                                                                        | [opendatadiscovery-specification](https://github.com/opendatadiscovery/opendatadiscovery-specification)                                    |
| Integration                           | —                                                                         | Umbrella term for any path metadata takes from a source into the Platform — collectors (pull) and push adapters (push). Prefer in user-facing prose unless direction (pull/push) matters.                                                                                                                                                                                                                                               | [Integrations hub](/integrations/integrations.md)                                                                                          |
| Adapter                               | —                                                                         | Source→spec mapper (push or pull); extract-and-map only, never runs alone. Classified by **strategy** (pull / push) and **deployment shape**.                                                                                                                                                                                                                                                                                           | [The architecture chain](#the-architecture-chain)                                                                                          |
| Pull adapter                          | —                                                                         | Pull-strategy adapter — reads from the source on a cadence; the adapter knows the source endpoint and credentials. Today always paired with the collector-hosted deployment shape (configured via a plugin).                                                                                                                                                                                                                            | [The architecture chain](#the-architecture-chain)                                                                                          |
| Plugin                                | Adapter instance, adapter config                                          | A configured pull-adapter instance inside a collector. Push adapters do not use the plugin term.                                                                                                                                                                                                                                                                                                                                        | [The architecture chain](#the-architecture-chain)                                                                                          |
| Collector                             | Pull-adapter container (informal)                                         | Container of pull adapters + runtime; the collector-hosted deployment shape. **Not** a synonym for "pull adapter".                                                                                                                                                                                                                                                                                                                      | [The architecture chain](#the-architecture-chain)                                                                                          |
| Push adapter                          | Push-client (client-server framing)                                       | Push-strategy adapter — the source initiates the data flow. Three deployment shapes: **in-process plugin / extension** (dbt, GE, Airflow, Spark), **standalone gateway** (`odd-tracing-gateway`), **direct SDK / CLI use** (`odd-cli`). Used when the discussion is about extract-and-map mechanics.                                                                                                                                    | [The architecture chain](#the-architecture-chain)                                                                                          |
| Push-client                           | Push adapter (extract-and-map framing)                                    | Same component as **Push adapter**, framed from client-server topology — a producer-side client of the Platform server using push strategy. Used when the discussion is about deployment topology or network position.                                                                                                                                                                                                                  | [Architecture.md](/introduction/architecture.md)                                                                                           |
| Standalone gateway                    | Push-adapter standalone shape, OTel gateway, tracing gateway              | A push-adapter deployment shape: a separate service that source systems push to over an externally-defined wire protocol (today: OpenTelemetry/OTLP), with the Platform pulling the inferred entities. Today's only example is [`odd-tracing-gateway`](/integrations/integrations/odd-tracing-gateway.md). Distinct from in-process plugins (which live inside a source tool's runtime) and from collectors (which host pull adapters). | [`integrations/auxiliary/odd-tracing-gateway.md`](/integrations/integrations/odd-tracing-gateway.md)                                       |
| Catalog Overview page                 | Overview page, Main page, Data Entity Report (deprecated as page synonym) | The catalog's home page — main search, top tags, domains, the per-class **Entities** report, directory, and (when auth is on) owner association. Distinct from a data entity's own **Overview tab**, which is the per-entity landing view inside a detail page.                                                                                                                                                                         | [Data Discovery](/features/data-discovery.md) (the bucket landing the home page surfaces)                                                  |
| Master Data Management                | MDM, Reference Data Management, Reference Data                            | The Data Governance pillar covering operator-curated reference data managed inside the platform. ODD ships the Reference-Data subset (Lookup Tables); golden records / survivorship / stewardship workflows are not part of ODD today.                                                                                                                                                                                                  | [master-data-management.md](/features/master-data-management.md)                                                                           |
| Lookup Tables                         | Reference tables, Master Data tables                                      | Operator-curated reference tables managed inside the platform — schema, data, RBAC, API surface. Exposed in the catalog as Data Entities of type `LOOKUP_TABLE`. UI section: **Master Data** top-level tab.                                                                                                                                                                                                                             | [lookup-tables.md](/features/master-data-management/lookup-tables.md)                                                                      |
| Slack alert webhook                   | Slack notifications, Slack incoming webhook                               | Outgoing-only HTTP POST of alert messages into a Slack channel via `notifications.receivers.slack.url`. One-way write — no thread state, no replies read back. **Distinct** from the Slack collaboration app. Consumer: `SlackNotificationSender` (gated by `@ConditionalOnProperty(name = "notifications.receivers.slack.url")`).                                                                                                      | [Enable Alert Notifications](/configuration-and-deployment/odd-platform.md#enable-alert-notifications)                                     |
| Slack collaboration app               | Slack Events API, Slack OAuth integration, Data Collaboration Slack       | Full Slack app for in-app per-entity discussion threads — OAuth (`datacollaboration.slack-oauth-token`) plus the [Slack Events API](https://docs.slack.dev/apis/events-api/) webhook to read replies back into the platform; bidirectional. **Distinct** from the Slack alert webhook. Routes gated by `@ConditionalOnDataCollaboration` (returns `404 Not Found` when `datacollaboration.enabled=false`).                              | [Enable Data Collaboration](/configuration-and-deployment/odd-platform.md#enable-data-collaboration)                                       |

New aliases get added as they're discovered. If you notice a term that is missing or ambiguous, open an issue or a PR — the goal is that searching any common name lands you on the right page.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/introduction/main-concepts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
