Overview

Hub for every way metadata reaches the ODD Platform — pull adapters (collector-hosted), push adapters (in-process plugins, standalone gateways, direct SDK use).

An integration is any path metadata takes from a source system into the ODD Platform. ODD ships two strategies — pull (a collector polls the source on a schedule) and push (a push adapter lives inside or alongside the source and emits as the source runs). Pick by where the work happens: pull when the source is a passive data store you want snapshotted on a cadence, push when the source is an application or a stream you want reporting per-run lineage and results in real time.

Push adapters ship in three deployment shapes:

  • In-process plugin / extension — the adapter is embedded inside the source tool's own runtime (odd-airflow-2, odd-dbt, odd-spark-adapter, odd-great-expectations). Operators install the adapter into the existing source application.

  • Standalone gateway — the adapter is its own service that source systems push to over an externally-defined wire protocol (today: odd-tracing-gateway over OpenTelemetry/OTLP). Operators deploy the gateway as a separate process and point their existing observability pipeline at it.

  • Direct SDK / CLI use — the adapter is invoked as a CLI or a library call from custom code (odd-cli).

Pull vs push at a glance

Integration
Strategy
Deployment shape
What it integrates
Repo
Page

odd-collector

pull

collector-hosted

41 generic adapters: databases, BI tools, streams, MLOps

odd-collector-aws

pull

collector-hosted

11 AWS adapters: Glue, S3, Athena, Kinesis, SageMaker, …

odd-collector-azure

pull

collector-hosted

4 Azure adapters: PowerBI, Azure SQL, Blob Storage, Data Factory

odd-collector-gcp

pull

collector-hosted

4 GCP adapters: BigQuery, BigTable, GCS, GCS Delta

odd-collector-profiler

pull

collector-hosted

Statistical data profiling for Postgres / Azure SQL

odd-airflow-2

push

in-process plugin

Airflow DAG / task / lineage metadata via a Listener

odd-dbt

push

in-process plugin

dbt model lineage and test results

odd-spark-adapter

push

in-process plugin

Spark job lineage (RDD, JDBC, Kafka batch, Snowflake, S3 Delta)

odd-great-expectations

push

in-process plugin

Great Expectations checkpoint results

odd-cli

push

direct SDK / CLI

Local files and ad-hoc dataset metadata

odd-tracing-gateway

push

standalone gateway

Microservice identities and dependencies inferred from OpenTelemetry traces (HTTP, JDBC, Kafka, gRPC, AWS)

The same vocabulary appears in Main Concepts: a collector is the deployable container for pull adapters; a push adapter runs inside the source's runtime, beside it as a standalone gateway, or as a direct SDK / CLI call; a plugin is one configured pull-adapter instance inside a collector. "Pull adapter" is not a synonym for "collector" — pull adapters live inside collectors, plural per collector.

Which integration do I need?

  • A database, data warehouse, or BI tool (PostgreSQL, MySQL, Snowflake, Redshift, Tableau, …) → odd-collector.

  • An AWS service (Glue, S3, Athena, Kinesis, …) → odd-collector-aws.

  • An Azure service (PowerBI, Azure SQL, Blob Storage, Data Factory) → odd-collector-azure.

  • A GCP service (BigQuery, GCS, BigTable) → odd-collector-gcp.

  • Dataset statistics / profiling for Postgres or Azure SQL → odd-collector-profiler.

  • An Airflow scheduler running DAGs you want lineage for → odd-airflow-2.

  • dbt models and tests you want surfaced in the catalog → odd-dbt.

  • Spark jobs you want lineage from → odd-spark-adapter.

  • Great Expectations quality results → odd-great-expectations.

  • Local CSV / Parquet files, or an ad-hoc push from a script or CI step → odd-cli.

  • Microservices instrumented with OpenTelemetry — identities, HTTP / JDBC / Kafka / gRPC / AWS-SDK dependencies inferred from distributed traces → odd-tracing-gateway. Reach for this when your stack already collects OpenTelemetry traces and you want the catalog to also reflect the microservices and the dependencies your existing observability pipeline already sees.

A single deployment commonly mixes strategies and shapes — e.g., one odd-collector container ingesting your warehouses on a schedule, odd-airflow-2 reporting DAG-level lineage as the orchestrator runs, and odd-tracing-gateway populating microservice identities from your OpenTelemetry pipeline. The platform is the same on the receiving end; pick per source.

Common configuration (collectors)

All collectors share the same top-level configuration schema, defined once in the SDK. The full reference, with every field, lives in Build and run ODD Collectors → Full configuration reference; the abridged shape is:

Push adapters are configured separately by their host tool (Airflow Connection, Spark configs, dbt env vars, GE action block) — they do not consume collector_config.yaml. See each push-adapter page for the per-tool configuration.

One collector hosts many plugins

A single collector instance — one container, one process — hosts as many plugins as you list in plugins:. Plugins can mix adapter types, and you can add multiple plugins of the same type to ingest from several sources of the same kind (three PostgreSQL databases on different hosts, two S3 buckets in different accounts, …). Each plugin needs a unique name; that's the discriminator the collector uses in logs and metrics.

Two plugins of the same type are a routine deployment pattern — one container scales to your full pull-side surface, you don't run one container per source.

Beyond connection settings: per-adapter features

Many pull adapters expose features that go past "connect and read schema". Two of the most-used ones are surfaced once here so you know to look for them on individual adapter pages:

  • Ingestion filtersschemas_filter (PostgreSQL, Snowflake), filename_filter (S3, Azure Blob Storage, GCS), datasets_filter (BigQuery), pipeline_filter (Azure Data Factory) and similar. Each takes regex include / exclude lists. When omitted, the default is "include everything" — i.e. the adapter ingests every schema / file / dataset it can see. Use filters to scope a plugin to the slice you actually want catalogued. See the dedicated Ingestion filters page for the per-key shape, the include / exclude interaction rule, and a worked PostgreSQL example.

  • Foreign-key (ERD) relationships — PostgreSQL and Snowflake plugins emit ENTITY_RELATIONSHIP entities for tables connected by foreign keys (cross-schema relations included). The platform renders these as ERD diagrams on the dataset detail page. Other adapters do not currently extract foreign-key relationships.

The full per-adapter capability matrix (which adapters support filters, which support ERD, which have additional knobs like dataset partitioning) lives on the per-collector pages.

Secrets backend (optional)

Any field in collector_config.yaml can be sourced from AWS SSM Parameter Store instead of inline YAML — see Collector secrets backend. Only odd-collector (the generic collector) ships with a Secrets Backend hook today; the cloud and profiler collectors read configuration from YAML and environment variables.

Integration Wizard (in-app UI)

To shorten the path from "I picked an integration" to "I have a working collector_config.yaml snippet", the platform ships an Integration Wizard under Management → Integrations. The wizard is data-driven by manifests on the platform's classpath (META-INF/wizard/*.yaml) and exposes the same set through GET /api/integrations / GET /api/integrations/{integration_id}. For each integration the wizard shows a description, walks the operator through prerequisites, and renders a parameterised YAML snippet — fill in host / port / credentials and copy the result into the plugins: block of your collector_config.yaml.

The wizard is a starting point, not a replacement for collector_config.yaml: it generates one plugin's worth of YAML, not the full file. Operators still hand-author platform_host_url, token, default_pulling_interval, additional plugins, filters, and any secrets-backend references. For the per-card flow, the static-parameter substitution context (today only platform_url, resolved from odd.platform-base-url), and the API surface, see Integration Wizard.

Token and datasource registration

Every integration — pull or push — authenticates to the platform with a collector token issued by the ODD Platform. The token is created in the UI under Management → Collectors (see Try locally → Create Collector entity for the step-by-step). The same flow issues the token regardless of whether the integration that consumes it is pull or push; what differs is how each integration consumes the token:

Integration
How the token is supplied

odd-collector* (all pull collectors)

token: <COLLECTOR_TOKEN> field in collector_config.yaml, or TOKEN environment variable

odd-airflow-2

Airflow Connection named odd, password field

odd-dbt

ODD_PLATFORM_TOKEN env var (or --platform-token flag)

odd-spark-adapter

spark.odd.host.url / spark.odd.oddrn.key Spark configuration (no static token — the JAR identifies itself by oddrn.key)

odd-great-expectations

platform_token field in the ODDAction block

odd-cli

ODD_PLATFORM_TOKEN env var

On the platform side, every integration registers its data sources via POST /ingestion/datasources, which the platform exposes as part of the Ingress API. Pull collectors call this from their SDK; push adapters call it (or rely on the platform recognising entity ODDRNs implicitly on first push) per the ODD Specification. The endpoint is unauthenticated by default — see Enable security → Ingestion authentication for the production posture.

Where to next

Last updated