# odd-collector (generic)

{% hint style="info" %}
**Status: Stable.** Released as a tagged Docker image; the underlying SDK is the same one all `odd-collector-*` collectors share.
{% endhint %}

`odd-collector` is the general-purpose pull collector. It bundles 41 adapters covering relational databases, data warehouses, NoSQL stores, message brokers, BI tools, MLOps platforms, and a few catalog / orchestration sources. One container instance can host any combination of those adapters as plugins, including multiple plugins of the same type pointing at different sources.

For the broader pull-vs-push picture and the shared collector configuration schema, start at the [Integrations hub](/integrations/integrations.md). For deployment-side detail (build, Docker, env vars), see [Build and run ODD Collectors](/developer-guides/build-and-run/build-and-run-odd-collectors.md).

## Supported adapters

The 41 adapters registered in `odd_collector/domain/plugin.py` (`PLUGIN_FACTORY`). Every adapter has per-field documentation below — three (`postgresql`, `snowflake`, `kafka`) get longer deep-dive spotlights with deployment guidance and feature notes; the remaining 38 are catalogued in the [per-adapter configuration reference](#per-adapter-configuration-reference) section.

| Type literal    | Source system                    | Spotlighted below |
| --------------- | -------------------------------- | ----------------- |
| `airbyte`       | Airbyte                          |                   |
| `cassandra`     | Apache Cassandra                 |                   |
| `ckan`          | CKAN                             |                   |
| `clickhouse`    | ClickHouse                       |                   |
| `cockroachdb`   | CockroachDB                      |                   |
| `couchbase`     | Couchbase                        |                   |
| `cubejs`        | Cube.js                          |                   |
| `databricks`    | Databricks (Unity Catalog)       |                   |
| `dbt`           | dbt Cloud (catalog import)       |                   |
| `druid`         | Apache Druid                     |                   |
| `duckdb`        | DuckDB                           |                   |
| `elasticsearch` | Elasticsearch                    |                   |
| `feast`         | Feast feature store              |                   |
| `fivetran`      | Fivetran                         |                   |
| `hive`          | Apache Hive                      |                   |
| `kafka`         | Apache Kafka                     | ✓                 |
| `kubeflow`      | Kubeflow Pipelines               |                   |
| `metabase`      | Metabase                         |                   |
| `mlflow`        | MLflow                           |                   |
| `mode`          | Mode Analytics                   |                   |
| `mongodb`       | MongoDB                          |                   |
| `mssql`         | Microsoft SQL Server             |                   |
| `mysql`         | MySQL / MariaDB                  |                   |
| `neo4j`         | Neo4j                            |                   |
| `odbc`          | Generic ODBC source              |                   |
| `odd_adapter`   | Another ODD Platform (federated) |                   |
| `opensearch`    | OpenSearch                       |                   |
| `oracle`        | Oracle Database                  |                   |
| `postgresql`    | PostgreSQL (incl. pgvector)      | ✓                 |
| `presto`        | Presto                           |                   |
| `redash`        | Redash                           |                   |
| `redshift`      | Amazon Redshift                  |                   |
| `scylladb`      | ScyllaDB                         |                   |
| `singlestore`   | SingleStore                      |                   |
| `snowflake`     | Snowflake                        | ✓                 |
| `sqlite`        | SQLite                           |                   |
| `superset`      | Apache Superset                  |                   |
| `tableau`       | Tableau                          |                   |
| `tarantool`     | Tarantool                        |                   |
| `trino`         | Trino                            |                   |
| `vertica`       | Vertica                          |                   |

The canonical YAML for each adapter lives at [`odd-collectors/odd-collector/config_examples/`](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector/config_examples) — one file per adapter, named after the type literal. The Pydantic models that define the accepted fields live at [`odd-collectors/odd-collector/odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); read those when an example field is unclear.

{% hint style="warning" %}
The repo's top-level README's "Implemented adapters" table lags behind the code — `databricks`, `couchbase`, `opensearch`, and `oracle` are present in `PLUGIN_FACTORY` but missing from the README table at the time of writing. Use the type literal table above (or `PLUGIN_FACTORY` in `plugin.py`) as the authoritative inventory.
{% endhint %}

## Installation

```bash
docker pull ghcr.io/opendatadiscovery/odd-collector:latest
```

Mount a `collector_config.yaml` at `/app/collector_config.yaml` inside the container. A reference Compose snippet is in the [generic collector README](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector#docker-compose-example) and a from-source build flow is in [Build and run ODD Collectors](/developer-guides/build-and-run/build-and-run-odd-collectors.md#build-odd-collector-into-docker-container).

## Minimal config

The smallest `collector_config.yaml` that runs the collector:

```yaml
platform_host_url: http://localhost:8080
token: <COLLECTOR_TOKEN>          # see Token and datasource registration on the hub
default_pulling_interval: 10      # minutes; omit to run once and exit
plugins:
  - type: postgresql
    name: warehouse_main
    host: pg.internal
    port: 5432
    database: warehouse
    user: odd_reader
    password: !ENV ${PG_PASSWORD}
```

The shared top-level fields (`platform_host_url`, `token`, `default_pulling_interval`, `plugins`, plus the optional `connection_timeout_seconds` / `chunk_size` / `misfire_grace_time` / `max_instances` / `verify_ssl`) are documented once at [Build and run ODD Collectors → Full configuration reference](/developer-guides/build-and-run/build-and-run-odd-collectors.md#full-configuration-reference). Only the `plugins[*]` shape varies per adapter — the rest of this page covers that.

{% hint style="warning" %}
**Regenerating the collector token has no grace window — it stops ingestion until you update this file and restart.** The `token` above is a long-lived platform credential. Regenerating it (in **Management → Collectors**) is an in-place overwrite: every running collector still using the old value starts getting `401`s the instant the new token commits, and ingestion stops until you set the new value here and restart the collector. The token is also stored and returned in plaintext on the platform side, and regeneration is not recorded in the audit trail. See [Management → Collectors known caveats](/features/management.md#collectors-known-caveats) for the full token contract before rotating a token that running collectors depend on.
{% endhint %}

## Multiple plugins in one container

`plugins` is a list — add as many entries as you need, mixing types freely. Two plugins of the **same type** (e.g. several PostgreSQL databases on different hosts) is the common pattern:

```yaml
plugins:
  - type: postgresql
    name: warehouse_eu
    host: pg-eu.internal
    port: 5432
    database: warehouse
    user: odd_reader
    password: !ENV ${PG_EU_PASSWORD}
  - type: postgresql
    name: warehouse_us
    host: pg-us.internal
    port: 5432
    database: warehouse
    user: odd_reader
    password: !ENV ${PG_US_PASSWORD}
  - type: snowflake
    name: dwh_snowflake
    account: ab12345.eu-central-1
    warehouse: COMPUTE_WH
    database: PROD
    user: ODD_READER
    password: !ENV ${SNOWFLAKE_PASSWORD}
```

Each plugin's `name` must be unique within the file — the collector uses it to log per-plugin progress and to wire each plugin to its own scheduled job. The `default_pulling_interval` applies to every plugin uniformly; per-plugin overrides are not supported.

## Spotlight: PostgreSQL (`type: postgresql`)

Pulls schemas, tables, columns, foreign-key relationships, and (with `pgvector` installed in the source) vector indexes. PostgreSQL tables containing at least one `vector`-typed column are classified as the `Vector Store` dataset type — see [Vector Store metadata](/features/data-discovery/vector-stores.md) for the user-facing classification, the dedicated icon in the catalog, and the `Vector` column data type rendering on the Structure tab.

| Field                    | Type            | Required | Default  | Description                                                         |
| ------------------------ | --------------- | -------- | -------- | ------------------------------------------------------------------- |
| `name`                   | string          | yes      | —        | Operator-chosen unique plugin name.                                 |
| `host`                   | string          | yes      | —        | PostgreSQL server hostname.                                         |
| `port`                   | integer         | no       | `5432`   | TCP port.                                                           |
| `database`               | string          | yes      | —        | Database to scan; one plugin = one database.                        |
| `user`                   | string          | yes      | —        | Login. The user needs read on the system catalogs you want indexed. |
| `password`               | string (Secret) | yes      | empty    | Password. Use `!ENV ${VAR}` to source from an environment variable. |
| `schemas_filter.include` | list of regex   | no       | `[".*"]` | Schemas to include.                                                 |
| `schemas_filter.exclude` | list of regex   | no       | `[]`     | Schemas to drop after `include` matches.                            |

Source: [`PostgreSQLPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/postgresql.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/postgresql.yaml).

```yaml
plugins:
  - type: postgresql
    name: warehouse_main
    host: pg.internal
    port: 5432
    database: warehouse
    user: odd_reader
    password: !ENV ${PG_PASSWORD}
    schemas_filter:
      include: ["public", "analytics_.*"]
      exclude: ["analytics_archive_.*"]
```

The PostgreSQL adapter extracts foreign-key relationships and emits them as `ENTITY_RELATIONSHIP` entities — these render as ERD edges on the dataset detail page in the catalog. Cross-schema foreign keys are supported.

## Spotlight: Snowflake (`type: snowflake`)

Pulls databases, schemas, tables, views, columns, and foreign-key relationships.

| Field                    | Type            | Required | Default  | Description                                                                                                                     |
| ------------------------ | --------------- | -------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `name`                   | string          | yes      | —        | Operator-chosen unique plugin name.                                                                                             |
| `account`                | string          | yes      | —        | Snowflake account identifier (e.g. `ab12345.eu-central-1`). The adapter derives the host as `{ACCOUNT}.snowflakecomputing.com`. |
| `warehouse`              | string          | yes      | —        | Compute warehouse used for the catalog query.                                                                                   |
| `database`               | string          | yes      | —        | Database to scan.                                                                                                               |
| `user`                   | string          | yes      | —        | Snowflake login.                                                                                                                |
| `password`               | string (Secret) | yes      | —        | Password.                                                                                                                       |
| `schemas_filter.include` | list of regex   | no       | `[".*"]` | Schemas to include.                                                                                                             |
| `schemas_filter.exclude` | list of regex   | no       | `[]`     | Schemas to drop after `include` matches.                                                                                        |

Source: [`SnowflakePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/snowflake.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/snowflake.yaml).

```yaml
plugins:
  - type: snowflake
    name: dwh_snowflake
    account: ab12345.eu-central-1
    warehouse: COMPUTE_WH
    database: PROD
    user: ODD_READER
    password: !ENV ${SNOWFLAKE_PASSWORD}
    schemas_filter:
      include: [".*"]
      exclude: ["TEMP_.*", "SCRATCH"]
```

Like PostgreSQL, the Snowflake adapter extracts foreign-key constraints and emits `ENTITY_RELATIONSHIP` entities.

## Spotlight: Kafka (`type: kafka`)

Pulls Kafka topics and (when a Confluent-compatible Schema Registry is reachable) the registered schemas.

| Field                  | Type    | Required | Default | Description                                                                                                       |
| ---------------------- | ------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------------- |
| `name`                 | string  | yes      | —       | Operator-chosen unique plugin name.                                                                               |
| `host`                 | string  | yes      | —       | Bootstrap broker host.                                                                                            |
| `port`                 | integer | yes      | —       | Bootstrap broker port.                                                                                            |
| `broker_conf`          | dict    | yes      | —       | Passed to `confluent_kafka.AdminClient` — e.g. SASL credentials, SSL settings.                                    |
| `schema_registry_conf` | dict    | no       | `{}`    | Passed to the Schema Registry client — e.g. URL, basic auth. When empty, the adapter does not query the registry. |

Source: [`KafkaPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/kafka.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/kafka.yaml).

```yaml
plugins:
  - type: kafka
    name: events_kafka
    host: kafka.internal
    port: 9092
    broker_conf:
      security.protocol: SASL_SSL
      sasl.mechanism: PLAIN
      sasl.username: !ENV ${KAFKA_USER}
      sasl.password: !ENV ${KAFKA_PASSWORD}
    schema_registry_conf:
      url: https://schema-registry.internal:8081
      basic.auth.user.info: !ENV ${SR_USER_INFO}
```

## Per-adapter configuration reference

The three spotlights above cover the deployment-shape questions; this section enumerates the per-field config schema for the remaining 38 adapters. Field names, types, and defaults are sourced from the Pydantic plugin classes in [`odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); each adapter links to its `config_examples/{type}.yaml` reference YAML where one exists. Two adapters — `mode` and `opensearch` — have no upstream config example; their tables come from the Pydantic model alone and the per-section note flags the gap.

Common shapes used across the families below:

* **`BasePlugin`** — every plugin carries `name` (required, operator-chosen, unique within the file). The optional metadata fields `description` and `namespace` are accepted by every plugin and omitted from the per-adapter tables to save space.
* **`DatabasePlugin` base** — adds `host: str` (required), `port: str` (required, often overridden to `int` by subclasses), `database: str | null` (optional in the base; many subclasses redeclare it as required), `user: str` (required), `password: str` (required, redeclared by most subclasses as `SecretStr` with an empty default).
* **`WithHost`** — adds only `host: str`. **`WithPort`** — adds only `port: str`. Both are mixed in by adapters that don't fit the full `DatabasePlugin` shape.

Each table below repeats every field the adapter accepts so that an entry is self-contained — operators don't need to chase the inheritance chain in `plugin.py`.

### Relational databases

#### Microsoft SQL Server (`type: mssql`)

Pulls schemas, tables, views, and columns from a Microsoft SQL Server / Azure SQL Server source via the SQL catalog views.

| Field      | Type            | Required | Default | Description                                                         |
| ---------- | --------------- | -------- | ------- | ------------------------------------------------------------------- |
| `name`     | string          | yes      | —       | Operator-chosen unique plugin name.                                 |
| `host`     | string          | yes      | —       | SQL Server host.                                                    |
| `port`     | integer         | yes      | —       | TCP port (typical: `1433`).                                         |
| `database` | string          | yes      | —       | Database to scan; one plugin = one database.                        |
| `user`     | string          | yes      | —       | SQL login.                                                          |
| `password` | string (Secret) | no       | empty   | Password. Use `!ENV ${VAR}` to source from an environment variable. |

Source: [`MSSQLPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/mssql.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/mssql.yaml).

```yaml
plugins:
  - type: mssql
    name: prod_sqlserver
    host: sqlserver.internal
    port: 1433
    database: warehouse
    user: odd_reader
    password: !ENV ${MSSQL_PASSWORD}
```

#### MySQL / MariaDB (`type: mysql`)

Pulls schemas, tables, views, and columns. Compatible with MariaDB.

| Field          | Type            | Required | Default | Description                                                                                                          |
| -------------- | --------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------- |
| `name`         | string          | yes      | —       | Operator-chosen unique plugin name.                                                                                  |
| `host`         | string          | yes      | —       | MySQL server hostname.                                                                                               |
| `port`         | integer         | yes      | —       | TCP port (typical: `3306`).                                                                                          |
| `database`     | string          | yes      | —       | Database to scan.                                                                                                    |
| `user`         | string          | yes      | —       | Login.                                                                                                               |
| `password`     | string (Secret) | no       | empty   | Password.                                                                                                            |
| `ssl_disabled` | boolean         | no       | `false` | When `true`, disables TLS to the server — typically used only for local development against an unencrypted instance. |

Source: [`MySQLPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/mysql.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/mysql.yaml).

```yaml
plugins:
  - type: mysql
    name: legacy_billing
    host: mysql.internal
    port: 3306
    database: billing
    user: odd_reader
    password: !ENV ${MYSQL_PASSWORD}
```

#### ClickHouse (`type: clickhouse`)

Pulls databases, tables, and columns from a ClickHouse cluster.

| Field             | Type            | Required | Default | Description                                                                                                                       |
| ----------------- | --------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `name`            | string          | yes      | —       | Operator-chosen unique plugin name.                                                                                               |
| `host`            | string          | yes      | —       | ClickHouse server hostname.                                                                                                       |
| `port`            | integer or null | yes      | —       | HTTP (`8123`) or native (`9000`) port. The Pydantic model accepts `null`, but every reference example provides an explicit value. |
| `database`        | string or null  | no       | —       | Database to scan. When unset, the connection's default database is used.                                                          |
| `user`            | string          | yes      | —       | Login.                                                                                                                            |
| `password`        | string (Secret) | yes      | —       | Password.                                                                                                                         |
| `secure`          | boolean         | no       | `false` | Toggles TLS on the connection. Set to `true` for ClickHouse Cloud or any TLS-fronted deployment.                                  |
| `verify`          | boolean         | no       | `true`  | Whether to verify the server certificate when `secure: true`. Set to `false` only for self-signed certs on local clusters.        |
| `server_hostname` | string or null  | no       | `null`  | Optional hostname for SNI / certificate validation; defaults to the value of `host`.                                              |
| `query_limit`     | integer or null | no       | `0`     | Optional row cap applied to internal catalog queries. `0` means no limit.                                                         |

Source: [`ClickhousePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/clickhouse.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/clickhouse.yaml).

```yaml
plugins:
  - type: clickhouse
    name: clickhouse_main
    host: clickhouse.internal
    port: 8123
    database: default
    user: default
    password: !ENV ${CLICKHOUSE_PASSWORD}
    secure: false
```

#### Amazon Redshift (`type: redshift`)

Pulls schemas, tables, views, and columns from an Amazon Redshift cluster.

| Field                | Type                   | Required | Default | Description                                                                                                                                                                                              |
| -------------------- | ---------------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`               | string                 | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                                      |
| `host`               | string                 | yes      | —       | Redshift cluster endpoint (`{cluster}.{region}.redshift.amazonaws.com`).                                                                                                                                 |
| `port`               | string                 | yes      | —       | TCP port as a string (typical: `"5439"`).                                                                                                                                                                |
| `database`           | string or null         | no       | —       | Database to scan.                                                                                                                                                                                        |
| `user`               | string                 | yes      | —       | Login.                                                                                                                                                                                                   |
| `password`           | string (Secret)        | yes      | —       | Password.                                                                                                                                                                                                |
| `schemas`            | list of string or null | no       | `null`  | Allowlist of schema names. When omitted, every non-system schema is ingested. **Literal name list**, not a regex filter — different from the `schemas_filter` available on `postgresql` and `snowflake`. |
| `connection_timeout` | integer or null        | no       | `10`    | Connection timeout in seconds.                                                                                                                                                                           |

Source: [`RedshiftPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/redshift.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/redshift.yaml).

```yaml
plugins:
  - type: redshift
    name: warehouse_redshift
    host: my-cluster.abc123.us-east-1.redshift.amazonaws.com
    port: "5439"
    database: warehouse
    user: odd_reader
    password: !ENV ${REDSHIFT_PASSWORD}
    schemas: ["public", "analytics"]
    connection_timeout: 10
```

#### CockroachDB (`type: cockroachdb`)

Pulls schemas, tables, columns, and foreign-key relationships. Inherits from the PostgreSQL plugin — same field shape plus the same `schemas_filter` regex behavior; ERD edges are emitted for cross-schema foreign keys exactly as on PostgreSQL.

| Field                    | Type            | Required | Default  | Description                                                                                                                      |
| ------------------------ | --------------- | -------- | -------- | -------------------------------------------------------------------------------------------------------------------------------- |
| `name`                   | string          | yes      | —        | Operator-chosen unique plugin name.                                                                                              |
| `host`                   | string          | yes      | —        | CockroachDB SQL endpoint.                                                                                                        |
| `port`                   | integer         | no       | `5432`   | TCP port. CockroachDB's typical SQL port is `26257`; the model default is the PostgreSQL port inherited from `PostgreSQLPlugin`. |
| `database`               | string          | yes      | —        | Database to scan.                                                                                                                |
| `user`                   | string          | yes      | —        | Login.                                                                                                                           |
| `password`               | string (Secret) | no       | empty    | Password.                                                                                                                        |
| `schemas_filter.include` | list of regex   | no       | `[".*"]` | Schemas to include.                                                                                                              |
| `schemas_filter.exclude` | list of regex   | no       | `[]`     | Schemas to drop after `include` matches.                                                                                         |

Source: [`CockroachDBPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py) (extends `PostgreSQLPlugin`); reference YAML at [`config_examples/cocroachdb.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/cocroachdb.yaml). The upstream filename has a `cocroach` typo — the type literal `cockroachdb` is correct and is what you write in `collector_config.yaml`.

```yaml
plugins:
  - type: cockroachdb
    name: orders_crdb
    host: crdb.internal
    port: 26257
    database: orders
    user: odd_reader
    password: !ENV ${CRDB_PASSWORD}
```

#### Vertica (`type: vertica`)

Pulls schemas, tables, views, and columns from a Vertica analytic database.

| Field      | Type           | Required | Default | Description                         |
| ---------- | -------------- | -------- | ------- | ----------------------------------- |
| `name`     | string         | yes      | —       | Operator-chosen unique plugin name. |
| `host`     | string         | yes      | —       | Vertica host.                       |
| `port`     | string         | yes      | —       | TCP port (typical: `"5433"`).       |
| `database` | string or null | no       | —       | Database to scan.                   |
| `user`     | string         | yes      | —       | Login.                              |
| `password` | string         | yes      | —       | Password.                           |

Source: [`VerticaPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/vertica.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/vertica.yaml).

```yaml
plugins:
  - type: vertica
    name: vertica_main
    host: vertica.internal
    port: "5433"
    database: warehouse
    user: odd_reader
    password: !ENV ${VERTICA_PASSWORD}
```

#### SingleStore (`type: singlestore`)

Pulls schemas, tables, views, and columns from a SingleStore (formerly MemSQL) cluster. Wire-compatible with MySQL.

| Field          | Type            | Required | Default | Description                                                        |
| -------------- | --------------- | -------- | ------- | ------------------------------------------------------------------ |
| `name`         | string          | yes      | —       | Operator-chosen unique plugin name.                                |
| `host`         | string          | yes      | —       | SingleStore host.                                                  |
| `port`         | string          | yes      | —       | TCP port.                                                          |
| `database`     | string or null  | no       | —       | Database to scan.                                                  |
| `user`         | string          | yes      | —       | Login.                                                             |
| `password`     | string          | yes      | —       | Password.                                                          |
| `ssl_disabled` | boolean or null | no       | `false` | Disables TLS to the server — typically only for local development. |

Source: [`SingleStorePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/singlestore.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/singlestore.yaml).

```yaml
plugins:
  - type: singlestore
    name: singlestore_main
    host: singlestore.internal
    port: "3306"
    database: warehouse
    user: odd_reader
    password: !ENV ${SINGLESTORE_PASSWORD}
```

#### Oracle Database (`type: oracle`)

Pulls schemas (one per Oracle user), tables, views, and columns from an Oracle Database.

| Field        | Type            | Required | Default | Description                                                                                                                                                                                                |
| ------------ | --------------- | -------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`       | string          | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                                        |
| `host`       | string          | yes      | —       | Oracle server hostname.                                                                                                                                                                                    |
| `port`       | string          | yes      | —       | TCP port (typical: `"1521"`).                                                                                                                                                                              |
| `user`       | string          | yes      | —       | Oracle login (becomes the schema name in Oracle's data model).                                                                                                                                             |
| `service`    | string          | yes      | —       | Oracle service name (e.g., `XEPDB1`). Use the service name, not the SID.                                                                                                                                   |
| `password`   | string (Secret) | yes      | —       | Password.                                                                                                                                                                                                  |
| `thick_mode` | boolean or null | no       | `false` | When `true`, switches the underlying Oracle client to thick mode (requires the Oracle Instant Client to be installed in the container). Default thin mode is pure Python and works without Instant Client. |

Source: [`OraclePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/oracle.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/oracle.yaml).

```yaml
plugins:
  - type: oracle
    name: oracle_main
    host: oracle.internal
    port: "1521"
    user: odd_reader
    service: XEPDB1
    password: !ENV ${ORACLE_PASSWORD}
    thick_mode: false
```

#### Generic ODBC source (`type: odbc`)

Pulls schemas, tables, and columns from any source reachable through an ODBC driver registered on the collector container. Useful for sources without a dedicated adapter.

| Field      | Type                    | Required | Default                              | Description                                                                                                                                                                                                                                                                                                     |
| ---------- | ----------------------- | -------- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`     | string                  | yes      | —                                    | Operator-chosen unique plugin name.                                                                                                                                                                                                                                                                             |
| `host`     | string                  | yes      | —                                    | Source hostname.                                                                                                                                                                                                                                                                                                |
| `port`     | string                  | yes      | —                                    | TCP port.                                                                                                                                                                                                                                                                                                       |
| `database` | string                  | yes      | —                                    | Database to scan.                                                                                                                                                                                                                                                                                               |
| `user`     | string                  | yes      | —                                    | Login.                                                                                                                                                                                                                                                                                                          |
| `password` | string (Secret) or null | no       | —                                    | Password.                                                                                                                                                                                                                                                                                                       |
| `driver`   | string                  | no       | `"{ODBC Driver 17s for SQL Server}"` | ODBC driver name as registered in `odbcinst.ini` on the container. **The upstream default contains a typo (`17s` should be `17`)** — always set this field explicitly to the driver string for your environment (e.g., `{ODBC Driver 17 for SQL Server}` or your platform's equivalent). See Known limitations. |

Source: [`OdbcPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/odbc.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/odbc.yaml).

```yaml
plugins:
  - type: odbc
    name: legacy_odbc
    host: odbc.internal
    port: "1433"
    database: legacy
    user: odd_reader
    password: !ENV ${ODBC_PASSWORD}
    driver: "{ODBC Driver 17 for SQL Server}"
```

#### SQLite (`type: sqlite`)

Reads a SQLite database file from a local path on the collector container. In-memory SQLite databases are not supported (each connection sees its own private DB).

| Field         | Type               | Required | Default | Description                                                                                                                           |
| ------------- | ------------------ | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `name`        | string             | yes      | —       | Operator-chosen unique plugin name.                                                                                                   |
| `data_source` | string (file path) | yes      | —       | Absolute path to the `.db` file inside the container. The file must exist at startup; the model uses Pydantic's `FilePath` validator. |

Source: [`SQLitePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/sqlite.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/sqlite.yaml).

```yaml
plugins:
  - type: sqlite
    name: local_sqlite
    data_source: /data/local/file.db
```

### Wide-column, document, and key-value stores

#### MongoDB (`type: mongodb`)

Catalogs MongoDB databases, collections, and inferred field types.

| Field      | Type           | Required | Default | Description                                                                                                                                                         |
| ---------- | -------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`     | string         | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                 |
| `host`     | string         | yes      | —       | MongoDB host or seed-list host.                                                                                                                                     |
| `port`     | string         | yes      | —       | TCP port (typical: `"27017"`).                                                                                                                                      |
| `database` | string or null | no       | —       | Database to scan.                                                                                                                                                   |
| `user`     | string         | yes      | —       | Login.                                                                                                                                                              |
| `password` | string         | yes      | —       | Password.                                                                                                                                                           |
| `protocol` | string         | yes      | —       | Connection scheme passed to the MongoDB driver — `mongodb` for direct host/port connections, `mongodb+srv` for SRV-resolved seed lists (typical for MongoDB Atlas). |

Source: [`MongoDBPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/mongodb.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/mongodb.yaml).

```yaml
plugins:
  - type: mongodb
    name: mongo_orders
    host: mongo.internal
    port: "27017"
    database: orders
    user: odd_reader
    password: !ENV ${MONGO_PASSWORD}
    protocol: mongodb
```

#### Apache Cassandra (`type: cassandra`)

Catalogs keyspaces, tables, and columns from a Cassandra cluster.

| Field            | Type           | Required | Default | Description                                                                                                     |
| ---------------- | -------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------- |
| `name`           | string         | yes      | —       | Operator-chosen unique plugin name.                                                                             |
| `host`           | string         | yes      | —       | Cassandra contact host.                                                                                         |
| `port`           | string         | yes      | —       | TCP port (typical: `"9042"`).                                                                                   |
| `database`       | string or null | no       | —       | Keyspace name; one plugin scans one keyspace when supplied.                                                     |
| `user`           | string         | yes      | —       | Login.                                                                                                          |
| `password`       | string         | yes      | —       | Password.                                                                                                       |
| `contact_points` | list of string | no       | `[]`    | Additional contact-host endpoints for the cluster's gossip layer. Empty list means the driver uses `host` only. |

Source: [`CassandraPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/cassandra.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/cassandra.yaml).

```yaml
plugins:
  - type: cassandra
    name: cassandra_events
    host: cassandra-1.internal
    port: "9042"
    database: events
    user: odd_reader
    password: !ENV ${CASSANDRA_PASSWORD}
    contact_points: ["cassandra-2.internal", "cassandra-3.internal"]
```

#### ScyllaDB (`type: scylladb`)

Catalogs keyspaces, tables, and columns from a ScyllaDB cluster. Same field shape as Cassandra — Scylla is wire-compatible with the Cassandra driver.

| Field            | Type           | Required | Default | Description                         |
| ---------------- | -------------- | -------- | ------- | ----------------------------------- |
| `name`           | string         | yes      | —       | Operator-chosen unique plugin name. |
| `host`           | string         | yes      | —       | Scylla contact host.                |
| `port`           | string         | yes      | —       | TCP port (typical: `"9042"`).       |
| `database`       | string or null | no       | —       | Keyspace name.                      |
| `user`           | string         | yes      | —       | Login.                              |
| `password`       | string         | yes      | —       | Password.                           |
| `contact_points` | list of string | no       | `[]`    | Additional contact-host endpoints.  |

Source: [`ScyllaDBPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/scylladb.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/scylladb.yaml).

```yaml
plugins:
  - type: scylladb
    name: scylla_events
    host: scylla-1.internal
    port: "9042"
    database: events
    user: odd_reader
    password: !ENV ${SCYLLA_PASSWORD}
    contact_points: ["scylla-2.internal", "scylla-3.internal"]
```

#### Tarantool (`type: tarantool`)

Catalogs spaces and indexes from a Tarantool instance. Uses the standard `DatabasePlugin` shape with no Tarantool-specific fields.

| Field      | Type           | Required | Default | Description                             |
| ---------- | -------------- | -------- | ------- | --------------------------------------- |
| `name`     | string         | yes      | —       | Operator-chosen unique plugin name.     |
| `host`     | string         | yes      | —       | Tarantool host.                         |
| `port`     | string         | yes      | —       | TCP port (typical: `"3301"`).           |
| `database` | string or null | no       | —       | Database / space-collection identifier. |
| `user`     | string         | yes      | —       | Login.                                  |
| `password` | string         | yes      | —       | Password.                               |

Source: [`TarantoolPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/tarantool.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/tarantool.yaml).

```yaml
plugins:
  - type: tarantool
    name: tarantool_main
    host: tarantool.internal
    port: "3301"
    user: odd_reader
    password: !ENV ${TARANTOOL_PASSWORD}
```

#### Couchbase (`type: couchbase`)

Catalogs Couchbase buckets and infers document field types by sampling. Couchbase is schemaless, so the adapter samples N documents per collection to derive a structural view.

| Field               | Type            | Required | Default | Description                                                                                                               |
| ------------------- | --------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `name`              | string          | yes      | —       | Operator-chosen unique plugin name.                                                                                       |
| `host`              | string          | yes      | —       | Couchbase connection string (e.g., `couchbase://node1.internal,node2.internal`).                                          |
| `bucket`            | string          | yes      | —       | Bucket name; one plugin scans one bucket.                                                                                 |
| `user`              | string          | yes      | —       | Login.                                                                                                                    |
| `password`          | string (Secret) | yes      | —       | Password.                                                                                                                 |
| `sample_size`       | integer or null | no       | `0`     | Number of documents to sample per collection for schema inference. `0` disables sampling and uses the metadata-only view. |
| `num_sample_values` | integer or null | no       | `10`    | When sampling is on, number of value examples to retain per inferred field.                                               |

Source: [`CouchbasePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/couchbase.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/couchbase.yaml).

```yaml
plugins:
  - type: couchbase
    name: couchbase_orders
    host: couchbase://couchbase.internal
    bucket: orders
    user: odd_reader
    password: !ENV ${COUCHBASE_PASSWORD}
    sample_size: 100
    num_sample_values: 10
```

#### Neo4j (`type: neo4j`)

Catalogs Neo4j databases, node labels, and relationship types. Uses the standard `DatabasePlugin` shape; the typical Bolt port is `7687`.

| Field      | Type           | Required | Default | Description                                |
| ---------- | -------------- | -------- | ------- | ------------------------------------------ |
| `name`     | string         | yes      | —       | Operator-chosen unique plugin name.        |
| `host`     | string         | yes      | —       | Neo4j host.                                |
| `port`     | string         | yes      | —       | Bolt port (typical: `"7687"`).             |
| `database` | string or null | no       | —       | Database name (Neo4j 4.x+ multi-database). |
| `user`     | string         | yes      | —       | Login.                                     |
| `password` | string         | yes      | —       | Password.                                  |

Source: [`Neo4jPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/neo4j.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/neo4j.yaml).

```yaml
plugins:
  - type: neo4j
    name: neo4j_graph
    host: neo4j.internal
    port: "7687"
    database: neo4j
    user: neo4j
    password: !ENV ${NEO4J_PASSWORD}
```

### Search engines

#### Elasticsearch (`type: elasticsearch`)

Catalogs Elasticsearch indices and field mappings.

| Field          | Type            | Required | Default | Description                                                                                                                                          |
| -------------- | --------------- | -------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`         | string          | yes      | —       | Operator-chosen unique plugin name.                                                                                                                  |
| `host`         | string          | yes      | —       | Elasticsearch host (typically including scheme — e.g., `https://es.internal`).                                                                       |
| `port`         | integer         | yes      | —       | TCP port (typical: `9200`).                                                                                                                          |
| `username`     | string          | yes      | —       | Login.                                                                                                                                               |
| `password`     | string (Secret) | yes      | —       | Password.                                                                                                                                            |
| `verify_certs` | boolean or null | no       | `null`  | Whether to verify TLS certificates on the Elasticsearch endpoint. `null` defers to the Elasticsearch client default (verify when scheme is `https`). |
| `ca_certs`     | string or null  | no       | `null`  | Optional path to a CA bundle file inside the container.                                                                                              |

Source: [`ElasticsearchPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/elasticsearch.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/elasticsearch.yaml).

```yaml
plugins:
  - type: elasticsearch
    name: es_logs
    host: https://es.internal
    port: 9200
    username: elastic
    password: !ENV ${ES_PASSWORD}
    verify_certs: true
    ca_certs: /etc/ssl/certs/ca-bundle.crt
```

#### OpenSearch (`type: opensearch`)

Catalogs OpenSearch indices and field mappings.

{% hint style="info" %}
**No `config_examples/opensearch.yaml` file exists upstream.** The fields below are read directly from `OpensearchPlugin` in `plugin.py`; the YAML below is hand-crafted from that model.
{% endhint %}

| Field           | Type                    | Required | Default | Description                                                                                                     |
| --------------- | ----------------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------- |
| `name`          | string                  | yes      | —       | Operator-chosen unique plugin name.                                                                             |
| `host`          | string                  | yes      | —       | OpenSearch host (include scheme when using HTTPS).                                                              |
| `port`          | integer or null         | no       | `443`   | TCP port. The model defaults to `443` (the typical AWS OpenSearch Service port); set to `9200` for self-hosted. |
| `http_compress` | boolean or null         | no       | `true`  | Whether to gzip request bodies.                                                                                 |
| `use_ssl`       | boolean or null         | no       | `true`  | Toggle TLS on the connection.                                                                                   |
| `username`      | string or null          | yes      | —       | Login. The model is `Optional[str]` with no explicit default — provide a value (or `null`) at config time.      |
| `password`      | string (Secret) or null | yes      | —       | Password. Same Pydantic shape as `username` — provide a value or `null`.                                        |
| `verify_certs`  | boolean or null         | no       | `null`  | Whether to verify TLS certificates. `null` defers to the OpenSearch client default.                             |
| `ca_certs`      | string or null          | no       | `null`  | Optional path to a CA bundle file inside the container.                                                         |

Source: [`OpensearchPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py).

```yaml
plugins:
  - type: opensearch
    name: opensearch_logs
    host: https://search-mydomain.us-east-1.es.amazonaws.com
    port: 443
    use_ssl: true
    username: opensearch_admin
    password: !ENV ${OPENSEARCH_PASSWORD}
    verify_certs: true
```

### Analytics engines and warehouses

#### Databricks Unity Catalog (`type: databricks`)

Catalogs Databricks Unity Catalog catalogs, schemas, tables, and columns via the Databricks workspace REST API.

| Field       | Type                   | Required | Default | Description                                                                                                                      |
| ----------- | ---------------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------- |
| `name`      | string                 | yes      | —       | Operator-chosen unique plugin name.                                                                                              |
| `workspace` | string                 | yes      | —       | Databricks workspace URL (e.g., `https://adb-1234567890.0.azuredatabricks.net`).                                                 |
| `token`     | string (Secret)        | yes      | —       | Databricks personal access token (PAT) or service-principal token authorized for Unity Catalog.                                  |
| `catalogs`  | list of string or null | no       | `null`  | Allowlist of Unity Catalog catalogs. When omitted, every catalog the token can see is ingested. Literal name list — not a regex. |

Source: [`DatabricksPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/databricks_unity_catalog.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/databricks_unity_catalog.yaml). The upstream filename is `databricks_unity_catalog.yaml` while the type literal is the shorter `databricks` — write `type: databricks` in `collector_config.yaml`.

```yaml
plugins:
  - type: databricks
    name: databricks_main
    workspace: https://adb-1234567890.0.azuredatabricks.net
    token: !ENV ${DATABRICKS_TOKEN}
    catalogs: ["main", "analytics"]
```

#### DuckDB (`type: duckdb`)

Reads one or more DuckDB database files from local paths on the collector container; can scan multiple files or whole directories of `.db` files in a single plugin.

| Field   | Type                        | Required | Default       | Description                                                                                                |
| ------- | --------------------------- | -------- | ------------- | ---------------------------------------------------------------------------------------------------------- |
| `name`  | string                      | yes      | —             | Operator-chosen unique plugin name.                                                                        |
| `paths` | list of string (file paths) | yes      | —             | List of paths to `.db` files **or** directories containing `.db` files. Each path is opened independently. |
| `host`  | string or null              | no       | `"localhost"` | Logical hostname used when generating ODDRNs for the catalog entries.                                      |

Source: [`DuckDBPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/duckdb.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/duckdb.yaml).

```yaml
plugins:
  - type: duckdb
    name: duckdb_local
    paths:
      - /data/analytics/warehouse.db
      - /data/analytics/extras/
    host: analytics-runner
```

#### Presto (`type: presto`)

Catalogs schemas, tables, and columns from a Presto coordinator.

| Field          | Type           | Required | Default | Description                                                                                                                                                                                 |
| -------------- | -------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`         | string         | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                         |
| `host`         | string         | yes      | —       | Presto coordinator host.                                                                                                                                                                    |
| `port`         | integer        | yes      | —       | Coordinator HTTP port (typical: `8080` or `8081`).                                                                                                                                          |
| `user`         | string         | yes      | —       | User identity (Presto authenticates by user header by default).                                                                                                                             |
| `principal_id` | string or null | yes      | —       | Optional principal identifier for LDAP-configured clusters. The model is `Optional[str]` with no default — pass `null` (or empty string, as the upstream example does) when not using LDAP. |
| `password`     | string or null | yes      | —       | LDAP password. Same Pydantic shape as `principal_id` — pass `null` / empty string on non-LDAP clusters.                                                                                     |

Source: [`PrestoPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/presto.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/presto.yaml).

```yaml
plugins:
  - type: presto
    name: presto_main
    host: presto.internal
    port: 8081
    user: odd_reader
    principal_id: null
    password: null
```

#### Trino (`type: trino`)

Catalogs schemas, tables, and columns from a Trino coordinator. Wire-compatible with Presto (same client family).

| Field      | Type           | Required | Default | Description                                                     |
| ---------- | -------------- | -------- | ------- | --------------------------------------------------------------- |
| `name`     | string         | yes      | —       | Operator-chosen unique plugin name.                             |
| `host`     | string         | yes      | —       | Trino coordinator host.                                         |
| `port`     | integer        | yes      | —       | Coordinator HTTP port (typical: `8080` / `8081`).               |
| `user`     | string         | yes      | —       | User identity.                                                  |
| `password` | string or null | yes      | —       | LDAP password. Pass `null` / empty string on non-LDAP clusters. |

Source: [`TrinoPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/trino.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/trino.yaml).

```yaml
plugins:
  - type: trino
    name: trino_main
    host: trino.internal
    port: 8081
    user: odd_reader
    password: null
```

#### Apache Druid (`type: druid`)

Catalogs Druid datasources via the broker API.

| Field  | Type    | Required | Default | Description                         |
| ------ | ------- | -------- | ------- | ----------------------------------- |
| `name` | string  | yes      | —       | Operator-chosen unique plugin name. |
| `host` | string  | yes      | —       | Druid broker host.                  |
| `port` | integer | yes      | —       | Broker HTTP port (typical: `8082`). |

Source: [`DruidPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/druid.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/druid.yaml).

```yaml
plugins:
  - type: druid
    name: druid_main
    host: druid-broker.internal
    port: 8082
```

#### Apache Hive (`type: hive`)

Catalogs Hive databases, tables, and columns via HiveServer2. Configuration is grouped under a nested `connection_params` object — Hive's auth surface is varied enough that the adapter exposes the full HS2 connection knob set.

| Field                                     | Type            | Required | Default | Description                                                                                                                                 |
| ----------------------------------------- | --------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                                    | string          | yes      | —       | Operator-chosen unique plugin name.                                                                                                         |
| `count_statistics`                        | boolean         | no       | `false` | Whether to collect row-count statistics (`SELECT COUNT(*)`) per table. Off by default — these queries can be expensive on large warehouses. |
| `connection_params`                       | object          | yes      | —       | Nested HS2 connection block — see fields below.                                                                                             |
| `connection_params.host`                  | string          | yes      | —       | HiveServer2 host.                                                                                                                           |
| `connection_params.port`                  | integer or null | no       | `null`  | HS2 port. Defaults to `10000` when `scheme` is unset, `1000` when `scheme: http` / `https` (per the upstream HS2 client convention).        |
| `connection_params.database`              | string          | yes      | —       | Hive database to scan.                                                                                                                      |
| `connection_params.scheme`                | string or null  | no       | `null`  | HS2 transport — `"http"` or `"https"` for HTTP transport; `null` for binary transport.                                                      |
| `connection_params.auth`                  | string or null  | no       | `null`  | Auth mode — one of `"BASIC"`, `"NOSASL"`, `"KERBEROS"`, `"NONE"`. Defaults to `NONE` when omitted.                                          |
| `connection_params.username`              | string or null  | no       | `null`  | Username. Used with `auth: LDAP` or `auth: CUSTOM`.                                                                                         |
| `connection_params.password`              | string or null  | no       | `null`  | Password. Used with `auth: LDAP` or `auth: CUSTOM`.                                                                                         |
| `connection_params.kerberos_service_name` | string or null  | no       | `null`  | Used with `auth: KERBEROS` only.                                                                                                            |
| `connection_params.configuration`         | object or null  | no       | `null`  | Free-form dict of Hive session configuration overrides.                                                                                     |
| `connection_params.check_hostname`        | string or null  | no       | `null`  | TLS hostname check toggle as a string `"true"` / `"false"`.                                                                                 |
| `connection_params.ssl_cert`              | string or null  | no       | `null`  | Path to a CA / client certificate file inside the container.                                                                                |

Source: [`HivePlugin` and `HiveConnectionParams` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/hive.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/hive.yaml).

```yaml
plugins:
  - type: hive
    name: hive_main
    count_statistics: false
    connection_params:
      host: hive.internal
      port: 10000
      database: default
      auth: NONE
```

### Business intelligence and dashboards

#### Tableau (`type: tableau`)

Catalogs Tableau site content (workspaces, projects, dashboards, sheets) via the Tableau REST API.

| Field             | Type                    | Required | Default | Description                                                                                                                   |
| ----------------- | ----------------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `name`            | string                  | yes      | —       | Operator-chosen unique plugin name.                                                                                           |
| `server`          | string                  | yes      | —       | Tableau Server / Tableau Cloud URL.                                                                                           |
| `site`            | string or null          | yes      | —       | Tableau site name (empty string for the default site). The model is `Optional[str]` with no default — pass an explicit value. |
| `user`            | string or null          | yes      | —       | Username. Pass `null` if authenticating via `token_name` / `token_value`.                                                     |
| `password`        | string (Secret) or null | yes      | —       | Password. Pass `null` if authenticating via PAT.                                                                              |
| `token_name`      | string or null          | yes      | —       | Personal access token name (for 2FA / SSO accounts that can't use password auth).                                             |
| `token_value`     | string (Secret) or null | yes      | —       | Personal access token value.                                                                                                  |
| `pagination_size` | integer                 | no       | `10`    | Page size for the REST API. Larger values reduce request count but increase per-request latency; tune for very large sites.   |

Source: [`TableauPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/tableau.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/tableau.yaml).

```yaml
plugins:
  - type: tableau
    name: tableau_main
    server: https://tableau.internal
    site: analytics
    user: null
    password: null
    token_name: odd-reader-pat
    token_value: !ENV ${TABLEAU_TOKEN}
    pagination_size: 50
```

#### Apache Superset (`type: superset`)

Catalogs Superset datasets, dashboards, and charts via the Superset REST API.

| Field      | Type            | Required | Default | Description                                 |
| ---------- | --------------- | -------- | ------- | ------------------------------------------- |
| `name`     | string          | yes      | —       | Operator-chosen unique plugin name.         |
| `server`   | string          | yes      | —       | Superset base URL (include trailing slash). |
| `username` | string          | yes      | —       | Superset login.                             |
| `password` | string (Secret) | yes      | —       | Password.                                   |

Source: [`SupersetPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/superset.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/superset.yaml).

```yaml
plugins:
  - type: superset
    name: superset_main
    server: https://superset.internal/
    username: admin
    password: !ENV ${SUPERSET_PASSWORD}
```

#### Metabase (`type: metabase`)

Catalogs Metabase dashboards, questions, and the underlying datasets they reference.

| Field      | Type            | Required | Default | Description                         |
| ---------- | --------------- | -------- | ------- | ----------------------------------- |
| `name`     | string          | yes      | —       | Operator-chosen unique plugin name. |
| `host`     | string          | yes      | —       | Metabase host.                      |
| `port`     | string          | yes      | —       | TCP port.                           |
| `login`    | string          | yes      | —       | Metabase login email.               |
| `password` | string (Secret) | yes      | —       | Password.                           |

Source: [`MetabasePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/metabase.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/metabase.yaml).

```yaml
plugins:
  - type: metabase
    name: metabase_main
    host: metabase.internal
    port: "3000"
    login: odd-reader@example.com
    password: !ENV ${METABASE_PASSWORD}
```

#### Redash (`type: redash`)

Catalogs Redash queries and dashboards via the Redash API.

| Field     | Type   | Required | Default | Description                                                                                 |
| --------- | ------ | -------- | ------- | ------------------------------------------------------------------------------------------- |
| `name`    | string | yes      | —       | Operator-chosen unique plugin name.                                                         |
| `server`  | string | yes      | —       | Redash server base URL.                                                                     |
| `api_key` | string | yes      | —       | Redash API key (account-scoped — gives the adapter access to whatever the account can see). |

Source: [`RedashPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/redash.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/redash.yaml).

```yaml
plugins:
  - type: redash
    name: redash_main
    server: https://redash.internal
    api_key: !ENV ${REDASH_API_KEY}
```

#### Mode Analytics (`type: mode`)

Catalogs Mode reports and the underlying datasets they reference.

{% hint style="info" %}
**No `config_examples/mode.yaml` file exists upstream.** The fields below are read directly from `ModePlugin` in `plugin.py`; the YAML below is hand-crafted from that model.
{% endhint %}

| Field         | Type                    | Required | Default | Description                                                                                                           |
| ------------- | ----------------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------------- |
| `name`        | string                  | yes      | —       | Operator-chosen unique plugin name.                                                                                   |
| `host`        | string                  | yes      | —       | Mode workspace host (e.g., `https://app.mode.com`).                                                                   |
| `account`     | string                  | yes      | —       | Mode account / workspace identifier.                                                                                  |
| `data_source` | string                  | yes      | —       | Mode data-source identifier the adapter should report against.                                                        |
| `token`       | string (Secret) or null | yes      | —       | API token. The model is `Optional[SecretStr]` with no default — pass a value or `null` if relying on `password` auth. |
| `password`    | string (Secret) or null | yes      | —       | Password (legacy auth). Pass `null` if using token auth.                                                              |

Source: [`ModePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py).

```yaml
plugins:
  - type: mode
    name: mode_main
    host: https://app.mode.com
    account: my_workspace
    data_source: my_warehouse
    token: !ENV ${MODE_TOKEN}
    password: null
```

#### Cube.js (`type: cubejs`)

Catalogs Cube.js cubes and members; uses the cube's underlying SQL data source to resolve lineage from cube measures back to the source columns.

| Field                            | Type                    | Required    | Default | Description                                                                                                                                                                                                                                                                                                                                                                                             |
| -------------------------------- | ----------------------- | ----------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                           | string                  | yes         | —       | Operator-chosen unique plugin name.                                                                                                                                                                                                                                                                                                                                                                     |
| `host`                           | string                  | yes         | —       | Cube.js server base URL.                                                                                                                                                                                                                                                                                                                                                                                |
| `dev_mode`                       | boolean                 | no          | `false` | When `true`, the adapter relaxes auth — `token` may be `null`. In production (`dev_mode: false`), `token` is required and the adapter raises `ValueError` on startup if it isn't set.                                                                                                                                                                                                                   |
| `token`                          | string (Secret) or null | conditional | `null`  | Cube.js auth token — required unless `dev_mode: true`.                                                                                                                                                                                                                                                                                                                                                  |
| `predefined_datasource`          | object                  | yes         | —       | Sub-object describing the SQL data source backing the cubes — used by the adapter's SQL parser to generate lineage-edge ODDRNs. Only `postgres` and `clickhouse` are recognised types (see [`PostgresDatasource` / `ClickHouseDatasource` in `predefined_data_source.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/predefined_data_source.py)). |
| `predefined_datasource.type`     | string                  | yes         | —       | `"postgres"` or `"clickhouse"`.                                                                                                                                                                                                                                                                                                                                                                         |
| `predefined_datasource.host`     | string or null          | no          | `null`  | Source host — used as the lineage ODDRN host.                                                                                                                                                                                                                                                                                                                                                           |
| `predefined_datasource.database` | string or null          | no          | `null`  | Source database.                                                                                                                                                                                                                                                                                                                                                                                        |

Source: [`CubeJSPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/cubejs.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/cubejs.yaml).

```yaml
plugins:
  - type: cubejs
    name: cubejs_main
    host: http://cube.internal:4000
    dev_mode: false
    token: !ENV ${CUBEJS_TOKEN}
    predefined_datasource:
      type: postgres
      host: pg.internal
      database: warehouse
```

### Catalog, ingestion, and federation

#### CKAN (`type: ckan`)

Catalogs CKAN packages and resources from the CKAN action API.

| Field           | Type                    | Required | Default | Description                                                                                                                                       |
| --------------- | ----------------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`          | string                  | yes      | —       | Operator-chosen unique plugin name.                                                                                                               |
| `host`          | string                  | yes      | —       | CKAN host.                                                                                                                                        |
| `port`          | string                  | yes      | —       | TCP port.                                                                                                                                         |
| `ckan_endpoint` | string                  | no       | empty   | Optional path prefix between the host and the CKAN action API (e.g., `"/additional/endpoint"`). When the API is mounted at the root, leave empty. |
| `token`         | string (Secret) or null | no       | `null`  | CKAN auth token. Some action endpoints require authorization.                                                                                     |

Source: [`CKANPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/ckan.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/ckan.yaml).

```yaml
plugins:
  - type: ckan
    name: ckan_main
    host: ckan.internal
    port: "80"
    ckan_endpoint: ""
    token: !ENV ${CKAN_TOKEN}
```

#### Airbyte (`type: airbyte`)

Catalogs Airbyte connectors, sources, destinations, and the lineage edges between them.

| Field               | Type           | Required | Default | Description                                                                                                                                                                                                                                                                            |
| ------------------- | -------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`              | string         | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                                                                                                                    |
| `host`              | string         | yes      | —       | Airbyte API host.                                                                                                                                                                                                                                                                      |
| `port`              | string         | yes      | —       | Airbyte API port (typical: `"8000"`).                                                                                                                                                                                                                                                  |
| `user`              | string or null | yes      | —       | Airbyte username. The model is `Optional[str]` with no default — pass a value or `null` for unauthenticated deployments.                                                                                                                                                               |
| `password`          | string or null | yes      | —       | Airbyte password. Same Pydantic shape as `user`.                                                                                                                                                                                                                                       |
| `platform_host_url` | string         | yes      | —       | The ODD Platform URL the adapter advertises in generated ODDRNs for downstream destinations. **This is a per-plugin field on `AirbytePlugin` that overlaps with the collector-level `platform_host_url`** at the top of `collector_config.yaml` — both must be set when using Airbyte. |
| `store_raw_tables`  | boolean        | no       | `true`  | Whether to ingest Airbyte's `_airbyte_raw_*` staging tables. Set to `false` to keep them out of the catalog.                                                                                                                                                                           |

Source: [`AirbytePlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/airbyte.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/airbyte.yaml).

```yaml
plugins:
  - type: airbyte
    name: airbyte_main
    host: airbyte.internal
    port: "8000"
    platform_host_url: http://odd-platform.internal:8080
    user: airbyte
    password: !ENV ${AIRBYTE_PASSWORD}
    store_raw_tables: false
```

#### Fivetran (`type: fivetran`)

Catalogs a single Fivetran connector and its destination via the Fivetran REST API.

| Field            | Type            | Required | Default                      | Description                                                                                                     |
| ---------------- | --------------- | -------- | ---------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `name`           | string          | yes      | —                            | Operator-chosen unique plugin name.                                                                             |
| `base_url`       | string          | no       | `"https://api.fivetran.com"` | Fivetran API base URL. Override only for Fivetran's regional API endpoints.                                     |
| `api_key`        | string          | yes      | —                            | Fivetran API key.                                                                                               |
| `api_secret`     | string (Secret) | yes      | —                            | Fivetran API secret.                                                                                            |
| `connector_id`   | string          | yes      | —                            | Fivetran connector identifier — one plugin = one connector. Add a second plugin entry per additional connector. |
| `destination_id` | string          | yes      | —                            | Fivetran destination identifier corresponding to the connector.                                                 |

Source: [`FivetranPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/fivetran.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/fivetran.yaml).

```yaml
plugins:
  - type: fivetran
    name: fivetran_orders
    api_key: !ENV ${FIVETRAN_API_KEY}
    api_secret: !ENV ${FIVETRAN_API_SECRET}
    connector_id: orders_connector
    destination_id: warehouse_destination
```

#### dbt Cloud catalog import (`type: dbt`)

Pulls dbt model lineage and metadata via a pre-uploaded `catalog.json` on a host the adapter can reach. This is the **pull** dbt adapter — distinct from [`odd-dbt`](/integrations/integrations/odd-dbt.md), the push-strategy adapter that emits live test results from dbt runs.

| Field             | Type   | Required | Default | Description                                                                                  |
| ----------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------- |
| `name`            | string | yes      | —       | Operator-chosen unique plugin name.                                                          |
| `host`            | string | yes      | —       | Logical host used for ODDRN generation — typically the dbt Cloud / dbt Core deployment host. |
| `odd_catalog_url` | string | yes      | —       | URL the adapter fetches the `catalog.json` from.                                             |

Source: [`DbtPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/dbt.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/dbt.yaml).

```yaml
plugins:
  - type: dbt
    name: dbt_catalog
    host: dbt.internal
    odd_catalog_url: https://dbt.internal/catalog.json
```

#### Federated ODD Platform (`type: odd_adapter`)

Pulls metadata from another ODD Platform instance — federate a child platform's catalog into a parent platform.

| Field               | Type   | Required | Default | Description                                                                                                    |
| ------------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------------------- |
| `name`              | string | yes      | —       | Operator-chosen unique plugin name.                                                                            |
| `host`              | string | yes      | —       | URL of the source ODD service that implements the `odd_adapter` Ingress API.                                   |
| `data_source_oddrn` | string | yes      | —       | The ODDRN to advertise as the federated data source root (e.g., `//my_adapter/host/source-platform.internal`). |

Source: [`OddAdapterPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/odd_adapter.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/odd_adapter.yaml).

```yaml
plugins:
  - type: odd_adapter
    name: federated_eu
    host: http://odd-platform-eu.internal:8000
    data_source_oddrn: //my_adapter/host/odd-platform-eu.internal:8000
```

### Machine learning platforms

#### MLflow (`type: mlflow`)

Catalogs MLflow experiments, runs, and registered models from the MLflow tracking and model-registry APIs.

| Field                | Type                   | Required | Default | Description                                                                                                 |
| -------------------- | ---------------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------- |
| `name`               | string                 | yes      | —       | Operator-chosen unique plugin name.                                                                         |
| `dev_mode`           | boolean                | no       | `false` | Adapter-side dev mode toggle.                                                                               |
| `tracking_uri`       | string                 | yes      | —       | MLflow tracking server URI.                                                                                 |
| `registry_uri`       | string                 | yes      | —       | MLflow model-registry URI (often the same as `tracking_uri`).                                               |
| `filter_experiments` | list of string or null | no       | `null`  | Allowlist of experiment names. When omitted, every experiment is ingested. Literal name list — not a regex. |

Source: [`MlflowPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/mlflow.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/mlflow.yaml).

```yaml
plugins:
  - type: mlflow
    name: mlflow_main
    dev_mode: false
    tracking_uri: https://mlflow.internal
    registry_uri: https://mlflow.internal
    filter_experiments: ["churn_v2", "fraud_detection"]
```

#### Feast feature store (`type: feast`)

Catalogs Feast feature views and entities by reading the Feast repo definition from a path on the collector container.

| Field       | Type   | Required | Default | Description                                                    |
| ----------- | ------ | -------- | ------- | -------------------------------------------------------------- |
| `name`      | string | yes      | —       | Operator-chosen unique plugin name.                            |
| `host`      | string | yes      | —       | Logical host used for ODDRN generation.                        |
| `repo_path` | string | yes      | —       | Path to a checked-out Feast feature-repo inside the container. |

Source: [`FeastPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/feast.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/feast.yaml).

```yaml
plugins:
  - type: feast
    name: feast_features
    host: feast.internal
    repo_path: /opt/feast/feature_repo
```

#### Kubeflow Pipelines (`type: kubeflow`)

Catalogs Kubeflow pipelines, runs, and the lineage edges between them.

| Field             | Type           | Required | Default | Description                                                                                                                                                                                                                   |
| ----------------- | -------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`            | string         | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                                                           |
| `host`            | string         | yes      | —       | Kubeflow Pipelines host (typically the KFP UI URL).                                                                                                                                                                           |
| `namespace`       | string         | yes      | —       | **Kubernetes namespace** Kubeflow runs in — not the same as ODD's `namespace` metadata field. The Kubeflow plugin redeclares `namespace` as required at the plugin level, which shadows BasePlugin's optional metadata field. |
| `session_cookie0` | string or null | yes      | —       | First half of the KFP session cookie (Istio AuthService split-cookie pattern). The model is `Optional[str]` with no default — provide a value or `null`.                                                                      |
| `session_cookie1` | string or null | yes      | —       | Second half of the KFP session cookie.                                                                                                                                                                                        |

Source: [`KubeflowPlugin` in `odd_collector/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/odd_collector/domain/plugin.py); reference YAML at [`config_examples/kubeflow.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector/config_examples/kubeflow.yaml).

```yaml
plugins:
  - type: kubeflow
    name: kubeflow_main
    host: https://kfp.internal
    namespace: kubeflow-user
    session_cookie0: !ENV ${KFP_COOKIE_0}
    session_cookie1: !ENV ${KFP_COOKIE_1}
```

## Per-adapter feature matrix

Cross-cutting capabilities and where they apply across the 41-adapter set:

| Feature                                                        | Where it applies                                                                                                                                                                                                                                                        | What it does                                                                                                                                                                                                                                                                                                     |
| -------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Regex ingestion filters (`Filter`)**                         | `postgresql.schemas_filter`, `snowflake.schemas_filter`, `cockroachdb.schemas_filter` (inherited from `PostgreSQLPlugin`)                                                                                                                                               | Regex `include` / `exclude` lists scope which schemas / objects the adapter sees. Defaults to "include everything" when omitted. Source: [`Filter` in `odd_collector_sdk/domain/filter.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-sdk/odd_collector_sdk/domain/filter.py). |
| **Literal-name allowlist filters**                             | `redshift.schemas` (schemas), `databricks.catalogs` (Unity Catalog catalogs), `mlflow.filter_experiments` (experiment names)                                                                                                                                            | Plain list of names the adapter restricts ingestion to. **Not** a regex — literal exact match. When omitted, the adapter ingests everything visible to the credentials.                                                                                                                                          |
| **ERD relationships (foreign keys)**                           | `postgresql`, `snowflake`, `cockroachdb` (via PostgreSQL inheritance)                                                                                                                                                                                                   | The adapter emits `ENTITY_RELATIONSHIP` entities for tables connected by foreign keys, including cross-schema. The platform renders these as ERD edges on dataset detail pages. No other adapter currently extracts foreign-key relationships.                                                                   |
| **TLS toggles on the source connection**                       | `clickhouse.secure` + `clickhouse.verify`, `mysql.ssl_disabled`, `singlestore.ssl_disabled`, `elasticsearch.verify_certs` + `elasticsearch.ca_certs`, `opensearch.use_ssl` + `opensearch.verify_certs` + `opensearch.ca_certs`                                          | Per-adapter knobs for TLS toggling and certificate validation. Defaults are tuned per adapter (see each section above) — only override for self-signed certs on local clusters or for unencrypted local development.                                                                                             |
| **Token-based auth (PAT / API token alternative to password)** | `tableau.token_name` + `tableau.token_value`, `databricks.token`, `redash.api_key`, `fivetran.api_key` + `fivetran.api_secret`, `cubejs.token`, `mlflow` (via `tracking_uri` auth), `ckan.token`, `mode.token`, `kubeflow.session_cookie0` + `kubeflow.session_cookie1` | Replaces username/password auth; required when the source enforces SSO / 2FA / MFA on user accounts.                                                                                                                                                                                                             |
| **Schema inference via document sampling**                     | `couchbase.sample_size` + `couchbase.num_sample_values`                                                                                                                                                                                                                 | The adapter samples N documents per collection to derive a structural view of fields and value types. Defaults to no sampling (`sample_size: 0`) — set explicitly to enable.                                                                                                                                     |
| **Sub-object connection block (advanced auth surface)**        | `hive.connection_params` (full HS2 connection knob set), `cubejs.predefined_datasource` (postgres / clickhouse only — used to resolve cube-to-source lineage)                                                                                                           | Some adapters expose a nested object instead of flat fields when the auth or lineage surface needs more knobs than a flat schema supports.                                                                                                                                                                       |
| **Multiple file paths in one plugin**                          | `duckdb.paths`                                                                                                                                                                                                                                                          | DuckDB accepts a list of `.db` files or directories of `.db` files in one plugin — every file is opened independently. Other file-source adapters (`sqlite`) take a single path.                                                                                                                                 |
| **Special operating modes**                                    | `oracle.thick_mode` (Oracle Instant Client vs. pure-Python), `cubejs.dev_mode` (relax token requirement), `mlflow.dev_mode`                                                                                                                                             | Adapter-level toggles that alter runtime behaviour or auth strictness; safe defaults are off.                                                                                                                                                                                                                    |

Other adapters either do not expose filters (the SDK ones don't carry a `Filter` field) or do not emit relationships. For the filter mechanism's user-facing explanation (include / exclude semantics, when filters apply, default behaviour without filters), see [Ingestion filters](/integrations/integrations/ingestion-filters.md). The full cross-adapter capability matrix — which adapter exposes which filter, which emits which relationship type — lives on the [`odd-collectors` monorepo README](https://github.com/opendatadiscovery/odd-collectors#ingestion-filters-configuration); check that table when planning a new deployment.

## Known limitations

* **README drift on the source repo**: as flagged above, the upstream README's adapter table omits four adapters (`databricks`, `couchbase`, `opensearch`, `oracle`) that exist in `PLUGIN_FACTORY`. This is a docs gap on the collector repo, not a missing capability — those four adapters work; they're just under-advertised.
* **Foreign-key extraction is PostgreSQL/Snowflake only** today. ClickHouse, MySQL, MSSQL, and others extract schemas and columns but not foreign-key relationships.
* **No per-plugin `pulling_interval`**: every plugin in the file shares `default_pulling_interval`. Splitting workloads with different cadences requires running multiple collector containers, each with its own config.
* **M1 / Apple Silicon build issues**: `pyodbc`, `confluent-kafka`, and `grpcio` need extra environment variables to build natively. See the [generic collector README → M1 building issue](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector#m1-building-issue).
* **`odbc.driver` upstream typo**: `OdbcPlugin.driver` defaults to `"{ODBC Driver 17s for SQL Server}"` (with an extra `s`) — `s` should not be in the driver string. Always set `driver:` explicitly in the plugin config to the registered driver name on your container (e.g., `{ODBC Driver 17 for SQL Server}` or your platform's equivalent). Without an explicit value, the adapter's connection attempt fails because no ODBC driver matches the typoed string. The reference YAML at `config_examples/odbc.yaml` uses the correct value, so copy from there rather than relying on the model default.
* **Missing upstream config examples for `mode` and `opensearch`**: both adapters are present in `PLUGIN_FACTORY` and shipped, but `odd-collectors/odd-collector/config_examples/` does not contain a `mode.yaml` or `opensearch.yaml`. The per-adapter sections above include hand-crafted YAML examples derived from the Pydantic models for both.
* **`config_examples/cocroachdb.yaml` filename typo**: the file containing the CockroachDB reference YAML is `cocroachdb.yaml` (missing the `k`). The type literal (`cockroachdb`) is correct — the file's contents work as-is; only the filename is misspelled.

## Where to next

* [`odd-collector-aws`](/integrations/integrations/odd-collector-aws.md) — when your source is an AWS managed service.
* [`odd-collector-azure`](/integrations/integrations/odd-collector-azure.md) / [`odd-collector-gcp`](/integrations/integrations/odd-collector-gcp.md) — for Azure / GCP.
* [`odd-collector-profiler`](/integrations/integrations/odd-collector-profiler.md) — when you want statistical profiles on a Postgres / Azure SQL source.
* [Collector secrets backend](/configuration-and-deployment/collectors-secrets-backend.md) — to source any field from AWS SSM instead of inline YAML.
* [Build and run ODD Collectors](/developer-guides/build-and-run/build-and-run-odd-collectors.md) — full SDK config reference and from-source build / run instructions.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/integrations/integrations/odd-collector.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
