odd-airflow-2

Airflow plugin that captures DAG, task, and task-run metadata via Airflow Listeners and pushes it to the ODD Platform.

Status: Stable, listener-based. Distributed as odd-airflow2-integration on PyPI; the repo's default branch is master and the latest release at the time of writing is v0.0.8.

odd-airflow-2 is a push adapter for Apache Airflow 2.x. It runs inside the Airflow scheduler process as an Airflow Listener — DAG / task / task-run metadata is captured as Airflow emits its own lifecycle events, and lineage is collected from each task's inlets / outlets attributes. There's no DAG-side instrumentation — once the plugin is installed and the platform connection is configured, every DAG benefits.

For the broader pull-vs-push picture, start at the Integrations hub.

Requirements

  • Apache Airflow 2.5.1 or later (the listener API matured in 2.5; earlier 2.x versions are not supported by this plugin).

  • Python 3.9 or later — matches the Airflow runtime, no separate Python version concern.

  • An ODD Platform-side collector token, created in the UI under Management → Collectors (see Try locally → Create Collector entity).

For Airflow 1.x (or any setup where the Listener API isn't available), the legacy odd-airflow-adapter repo exists as a different implementation; it is not maintained on the same release cadence as odd-airflow-2.

Installation

Install alongside Airflow in the same Python environment:

pip install odd-airflow2-integration
# or
poetry add odd-airflow2-integration

The package registers an Airflow plugin entry-point on import — no airflow.cfg plugin entry is required.

Configuration

The plugin reads the platform URL and collector token from an Airflow Connection that must be named odd:

Connection field
Value

Conn Id

odd

Conn Type

HTTP

Host

ODD Platform host (e.g. odd-platform.internal)

Port

optional — port if the platform is not on the standard HTTP/S port

Password

the collector token issued by the platform

Define this connection before the scheduler starts — the Listener loads it once at scheduler startup and does not refresh it dynamically.

What gets sent

  • DAGs — definition, schedule, owner, tags.

  • Tasks — operator, task ID, configuration.

  • Task runs — start / end timestamps, status (success / failure / skipped / …).

  • Lineage edges — derived from each task's inlets and outlets attributes.

A typical lineage-aware task:

Or with the operator API:

The string passed to inlets / outlets is an ODDRN — the cross-system identifier ODD uses to recognise the same entity across ingests.

Known limitations

  • Airflow 2.5.1 minimum. Earlier 2.x releases lacked the Listener hooks this plugin depends on.

  • inlets / outlets do not support templating. Airflow's template_fields mechanism does not apply to these attributes — values are read verbatim at task definition time, not at runtime.

  • Connection name is fixed. The plugin looks up the connection by the literal name odd; renaming it disables the integration silently.

  • Connection loaded at scheduler start. Changing the platform URL or token requires a scheduler restart for the Listener to pick up the new value.

  • Airflow 1.x is not supported by this package; use the legacy odd-airflow-adapter repo (separate maintenance).

  • Repo default branch is master, not main. URLs that hit /blob/main/... 404 — use /blob/master/... (or browse the repo root).

Troubleshooting

  • No metadata appears in the platform. Check that the odd Airflow connection is defined, the Password field carries the token, and the scheduler was restarted after defining the connection.

  • Lineage edges missing. Confirm inlets / outlets are set on the relevant tasks, that they are valid ODDRNs, and that the values are constants — not templated.

  • Verbose logging. Airflow's standard logging.level controls the plugin's logs.

Where to next

Last updated