Overview
Hub for every way metadata reaches the ODD Platform — pull adapters (collector-hosted), push adapters (in-process plugins, standalone gateways, direct SDK use).
An integration is any path metadata takes from a source system into the ODD Platform. ODD ships two strategies — pull (a collector polls the source on a schedule) and push (a push adapter lives inside or alongside the source and emits as the source runs). Pick by where the work happens: pull when the source is a passive data store you want snapshotted on a cadence, push when the source is an application or a stream you want reporting per-run lineage and results in real time.
Push adapters ship in three deployment shapes:
In-process plugin / extension — the adapter is embedded inside the source tool's own runtime (
odd-airflow-2,odd-dbt,odd-spark-adapter,odd-great-expectations). Operators install the adapter into the existing source application.Standalone gateway — the adapter is its own service that source systems push to over an externally-defined wire protocol (today:
odd-tracing-gatewayover OpenTelemetry/OTLP). Operators deploy the gateway as a separate process and point their existing observability pipeline at it.Direct SDK / CLI use — the adapter is invoked as a CLI or a library call from custom code (
odd-cli).
Pull vs push at a glance
odd-collector
pull
collector-hosted
41 generic adapters: databases, BI tools, streams, MLOps
odd-collector-aws
pull
collector-hosted
11 AWS adapters: Glue, S3, Athena, Kinesis, SageMaker, …
odd-collector-azure
pull
collector-hosted
4 Azure adapters: PowerBI, Azure SQL, Blob Storage, Data Factory
odd-collector-gcp
pull
collector-hosted
4 GCP adapters: BigQuery, BigTable, GCS, GCS Delta
odd-collector-profiler
pull
collector-hosted
Statistical data profiling for Postgres / Azure SQL
odd-airflow-2
push
in-process plugin
Airflow DAG / task / lineage metadata via a Listener
odd-spark-adapter
push
in-process plugin
Spark job lineage (RDD, JDBC, Kafka batch, Snowflake, S3 Delta)
odd-great-expectations
push
in-process plugin
Great Expectations checkpoint results
odd-tracing-gateway
push
standalone gateway
Microservice identities and dependencies inferred from OpenTelemetry traces (HTTP, JDBC, Kafka, gRPC, AWS)
The same vocabulary appears in Main Concepts: a collector is the deployable container for pull adapters; a push adapter runs inside the source's runtime, beside it as a standalone gateway, or as a direct SDK / CLI call; a plugin is one configured pull-adapter instance inside a collector. "Pull adapter" is not a synonym for "collector" — pull adapters live inside collectors, plural per collector.
Which integration do I need?
A database, data warehouse, or BI tool (PostgreSQL, MySQL, Snowflake, Redshift, Tableau, …) →
odd-collector.An AWS service (Glue, S3, Athena, Kinesis, …) →
odd-collector-aws.An Azure service (PowerBI, Azure SQL, Blob Storage, Data Factory) →
odd-collector-azure.A GCP service (BigQuery, GCS, BigTable) →
odd-collector-gcp.Dataset statistics / profiling for Postgres or Azure SQL →
odd-collector-profiler.An Airflow scheduler running DAGs you want lineage for →
odd-airflow-2.dbt models and tests you want surfaced in the catalog →
odd-dbt.Spark jobs you want lineage from →
odd-spark-adapter.Great Expectations quality results →
odd-great-expectations.Local CSV / Parquet files, or an ad-hoc push from a script or CI step →
odd-cli.Microservices instrumented with OpenTelemetry — identities, HTTP / JDBC / Kafka / gRPC / AWS-SDK dependencies inferred from distributed traces →
odd-tracing-gateway. Reach for this when your stack already collects OpenTelemetry traces and you want the catalog to also reflect the microservices and the dependencies your existing observability pipeline already sees.
A single deployment commonly mixes strategies and shapes — e.g., one odd-collector container ingesting your warehouses on a schedule, odd-airflow-2 reporting DAG-level lineage as the orchestrator runs, and odd-tracing-gateway populating microservice identities from your OpenTelemetry pipeline. The platform is the same on the receiving end; pick per source.
Common configuration (collectors)
All collectors share the same top-level configuration schema, defined once in the SDK. The full reference, with every field, lives in Build and run ODD Collectors → Full configuration reference; the abridged shape is:
Push adapters are configured separately by their host tool (Airflow Connection, Spark configs, dbt env vars, GE action block) — they do not consume collector_config.yaml. See each push-adapter page for the per-tool configuration.
One collector hosts many plugins
A single collector instance — one container, one process — hosts as many plugins as you list in plugins:. Plugins can mix adapter types, and you can add multiple plugins of the same type to ingest from several sources of the same kind (three PostgreSQL databases on different hosts, two S3 buckets in different accounts, …). Each plugin needs a unique name; that's the discriminator the collector uses in logs and metrics.
Two plugins of the same type are a routine deployment pattern — one container scales to your full pull-side surface, you don't run one container per source.
Beyond connection settings: per-adapter features
Many pull adapters expose features that go past "connect and read schema". Two of the most-used ones are surfaced once here so you know to look for them on individual adapter pages:
Ingestion filters —
schemas_filter(PostgreSQL, Snowflake),filename_filter(S3, Azure Blob Storage, GCS),datasets_filter(BigQuery),pipeline_filter(Azure Data Factory) and similar. Each takes regexinclude/excludelists. When omitted, the default is "include everything" — i.e. the adapter ingests every schema / file / dataset it can see. Use filters to scope a plugin to the slice you actually want catalogued. See the dedicated Ingestion filters page for the per-key shape, the include / exclude interaction rule, and a worked PostgreSQL example.Foreign-key (ERD) relationships — PostgreSQL and Snowflake plugins emit
ENTITY_RELATIONSHIPentities for tables connected by foreign keys (cross-schema relations included). The platform renders these as ERD diagrams on the dataset detail page. Other adapters do not currently extract foreign-key relationships.
The full per-adapter capability matrix (which adapters support filters, which support ERD, which have additional knobs like dataset partitioning) lives on the per-collector pages.
Secrets backend (optional)
Any field in collector_config.yaml can be sourced from AWS SSM Parameter Store instead of inline YAML — see Collector secrets backend. Only odd-collector (the generic collector) ships with a Secrets Backend hook today; the cloud and profiler collectors read configuration from YAML and environment variables.
Integration Wizard (in-app UI)
To shorten the path from "I picked an integration" to "I have a working collector_config.yaml snippet", the platform ships an Integration Wizard under Management → Integrations. The wizard is data-driven by manifests on the platform's classpath (META-INF/wizard/*.yaml) and exposes the same set through GET /api/integrations / GET /api/integrations/{integration_id}. For each integration the wizard shows a description, walks the operator through prerequisites, and renders a parameterised YAML snippet — fill in host / port / credentials and copy the result into the plugins: block of your collector_config.yaml.
The wizard is a starting point, not a replacement for collector_config.yaml: it generates one plugin's worth of YAML, not the full file. Operators still hand-author platform_host_url, token, default_pulling_interval, additional plugins, filters, and any secrets-backend references. For the per-card flow, the static-parameter substitution context (today only platform_url, resolved from odd.platform-base-url), and the API surface, see Integration Wizard.
Token and datasource registration
Every integration — pull or push — authenticates to the platform with a collector token issued by the ODD Platform. The token is created in the UI under Management → Collectors (see Try locally → Create Collector entity for the step-by-step). The same flow issues the token regardless of whether the integration that consumes it is pull or push; what differs is how each integration consumes the token:
odd-collector* (all pull collectors)
token: <COLLECTOR_TOKEN> field in collector_config.yaml, or TOKEN environment variable
odd-airflow-2
Airflow Connection named odd, password field
odd-dbt
ODD_PLATFORM_TOKEN env var (or --platform-token flag)
odd-spark-adapter
spark.odd.host.url / spark.odd.oddrn.key Spark configuration (no static token — the JAR identifies itself by oddrn.key)
odd-great-expectations
platform_token field in the ODDAction block
odd-cli
ODD_PLATFORM_TOKEN env var
On the platform side, every integration registers its data sources via POST /ingestion/datasources, which the platform exposes as part of the Ingress API. Pull collectors call this from their SDK; push adapters call it (or rely on the platform recognising entity ODDRNs implicitly on first push) per the ODD Specification. The endpoint is unauthenticated by default — see Enable security → Ingestion authentication for the production posture.
Where to next
Scoping what a plugin ingests → Ingestion filters — regex
include/excludeper plugin.Bootstrapping a
collector_config.yamlsnippet from the in-app wizard → Integration Wizard.Storing collector secrets in AWS SSM → Collector secrets backend.
Building / running a collector locally → Build and run ODD Collectors.
The wire contract between any integration and the platform → ODD Specification.
Authoring a brand-new adapter (when an existing one doesn't fit) → Build a custom collector. The SDK lives at odd-collectors/odd-collector-sdk.
Existing repository overview — GitHub organization overview lists every ODD repo with one-line summaries.
Last updated