odd-collector (generic)

Generic ODD Collector — 41 pull adapters for databases, data warehouses, BI tools, streams, and MLOps platforms.

Status: Stable. Released as a tagged Docker image; the underlying SDK is the same one all odd-collector-* collectors share.

odd-collector is the general-purpose pull collector. It bundles 41 adapters covering relational databases, data warehouses, NoSQL stores, message brokers, BI tools, MLOps platforms, and a few catalog / orchestration sources. One container instance can host any combination of those adapters as plugins, including multiple plugins of the same type pointing at different sources.

For the broader pull-vs-push picture and the shared collector configuration schema, start at the Integrations hub. For deployment-side detail (build, Docker, env vars), see Build and run ODD Collectors.

Supported adapters

The 41 adapters registered in odd_collector/domain/plugin.py (PLUGIN_FACTORY). Every adapter has per-field documentation below — three (postgresql, snowflake, kafka) get longer deep-dive spotlights with deployment guidance and feature notes; the remaining 38 are catalogued in the per-adapter configuration reference section.

Type literal
Source system
Spotlighted below

airbyte

Airbyte

cassandra

Apache Cassandra

ckan

CKAN

clickhouse

ClickHouse

cockroachdb

CockroachDB

couchbase

Couchbase

cubejs

Cube.js

databricks

Databricks (Unity Catalog)

dbt

dbt Cloud (catalog import)

druid

Apache Druid

duckdb

DuckDB

elasticsearch

Elasticsearch

feast

Feast feature store

fivetran

Fivetran

hive

Apache Hive

kafka

Apache Kafka

kubeflow

Kubeflow Pipelines

metabase

Metabase

mlflow

MLflow

mode

Mode Analytics

mongodb

MongoDB

mssql

Microsoft SQL Server

mysql

MySQL / MariaDB

neo4j

Neo4j

odbc

Generic ODBC source

odd_adapter

Another ODD Platform (federated)

opensearch

OpenSearch

oracle

Oracle Database

postgresql

PostgreSQL (incl. pgvector)

presto

Presto

redash

Redash

redshift

Amazon Redshift

scylladb

ScyllaDB

singlestore

SingleStore

snowflake

Snowflake

sqlite

SQLite

superset

Apache Superset

tableau

Tableau

tarantool

Tarantool

trino

Trino

vertica

Vertica

The canonical YAML for each adapter lives at odd-collectors/odd-collector/config_examples/ — one file per adapter, named after the type literal. The Pydantic models that define the accepted fields live at odd-collectors/odd-collector/odd_collector/domain/plugin.py; read those when an example field is unclear.

Installation

Mount a collector_config.yaml at /app/collector_config.yaml inside the container. A reference Compose snippet is in the generic collector README and a from-source build flow is in Build and run ODD Collectors.

Minimal config

The smallest collector_config.yaml that runs the collector:

The shared top-level fields (platform_host_url, token, default_pulling_interval, plugins, plus the optional connection_timeout_seconds / chunk_size / misfire_grace_time / max_instances / verify_ssl) are documented once at Build and run ODD Collectors → Full configuration reference. Only the plugins[*] shape varies per adapter — the rest of this page covers that.

Multiple plugins in one container

plugins is a list — add as many entries as you need, mixing types freely. Two plugins of the same type (e.g. several PostgreSQL databases on different hosts) is the common pattern:

Each plugin's name must be unique within the file — the collector uses it to log per-plugin progress and to wire each plugin to its own scheduled job. The default_pulling_interval applies to every plugin uniformly; per-plugin overrides are not supported.

Spotlight: PostgreSQL (type: postgresql)

Pulls schemas, tables, columns, foreign-key relationships, and (with pgvector installed in the source) vector indexes. PostgreSQL tables containing at least one vector-typed column are classified as the Vector Store dataset type — see Vector Store metadata for the user-facing classification, the dedicated icon in the catalog, and the Vector column data type rendering on the Structure tab.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

PostgreSQL server hostname.

port

integer

no

5432

TCP port.

database

string

yes

Database to scan; one plugin = one database.

user

string

yes

Login. The user needs read on the system catalogs you want indexed.

password

string (Secret)

yes

empty

Password. Use !ENV ${VAR} to source from an environment variable.

schemas_filter.include

list of regex

no

[".*"]

Schemas to include.

schemas_filter.exclude

list of regex

no

[]

Schemas to drop after include matches.

Source: PostgreSQLPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/postgresql.yaml.

The PostgreSQL adapter extracts foreign-key relationships and emits them as ENTITY_RELATIONSHIP entities — these render as ERD edges on the dataset detail page in the catalog. Cross-schema foreign keys are supported.

Spotlight: Snowflake (type: snowflake)

Pulls databases, schemas, tables, views, columns, and foreign-key relationships.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

account

string

yes

Snowflake account identifier (e.g. ab12345.eu-central-1). The adapter derives the host as {ACCOUNT}.snowflakecomputing.com.

warehouse

string

yes

Compute warehouse used for the catalog query.

database

string

yes

Database to scan.

user

string

yes

Snowflake login.

password

string (Secret)

yes

Password.

schemas_filter.include

list of regex

no

[".*"]

Schemas to include.

schemas_filter.exclude

list of regex

no

[]

Schemas to drop after include matches.

Source: SnowflakePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/snowflake.yaml.

Like PostgreSQL, the Snowflake adapter extracts foreign-key constraints and emits ENTITY_RELATIONSHIP entities.

Spotlight: Kafka (type: kafka)

Pulls Kafka topics and (when a Confluent-compatible Schema Registry is reachable) the registered schemas.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Bootstrap broker host.

port

integer

yes

Bootstrap broker port.

broker_conf

dict

yes

Passed to confluent_kafka.AdminClient — e.g. SASL credentials, SSL settings.

schema_registry_conf

dict

no

{}

Passed to the Schema Registry client — e.g. URL, basic auth. When empty, the adapter does not query the registry.

Source: KafkaPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/kafka.yaml.

Per-adapter configuration reference

The three spotlights above cover the deployment-shape questions; this section enumerates the per-field config schema for the remaining 38 adapters. Field names, types, and defaults are sourced from the Pydantic plugin classes in odd_collector/domain/plugin.py; each adapter links to its config_examples/{type}.yaml reference YAML where one exists. Two adapters — mode and opensearch — have no upstream config example; their tables come from the Pydantic model alone and the per-section note flags the gap.

Common shapes used across the families below:

  • BasePlugin — every plugin carries name (required, operator-chosen, unique within the file). The optional metadata fields description and namespace are accepted by every plugin and omitted from the per-adapter tables to save space.

  • DatabasePlugin base — adds host: str (required), port: str (required, often overridden to int by subclasses), database: str | null (optional in the base; many subclasses redeclare it as required), user: str (required), password: str (required, redeclared by most subclasses as SecretStr with an empty default).

  • WithHost — adds only host: str. WithPort — adds only port: str. Both are mixed in by adapters that don't fit the full DatabasePlugin shape.

Each table below repeats every field the adapter accepts so that an entry is self-contained — operators don't need to chase the inheritance chain in plugin.py.

Relational databases

Microsoft SQL Server (type: mssql)

Pulls schemas, tables, views, and columns from a Microsoft SQL Server / Azure SQL Server source via the SQL catalog views.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

SQL Server host.

port

integer

yes

TCP port (typical: 1433).

database

string

yes

Database to scan; one plugin = one database.

user

string

yes

SQL login.

password

string (Secret)

no

empty

Password. Use !ENV ${VAR} to source from an environment variable.

Source: MSSQLPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mssql.yaml.

MySQL / MariaDB (type: mysql)

Pulls schemas, tables, views, and columns. Compatible with MariaDB.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

MySQL server hostname.

port

integer

yes

TCP port (typical: 3306).

database

string

yes

Database to scan.

user

string

yes

Login.

password

string (Secret)

no

empty

Password.

ssl_disabled

boolean

no

false

When true, disables TLS to the server — typically used only for local development against an unencrypted instance.

Source: MySQLPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mysql.yaml.

ClickHouse (type: clickhouse)

Pulls databases, tables, and columns from a ClickHouse cluster.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

ClickHouse server hostname.

port

integer or null

yes

HTTP (8123) or native (9000) port. The Pydantic model accepts null, but every reference example provides an explicit value.

database

string or null

no

Database to scan. When unset, the connection's default database is used.

user

string

yes

Login.

password

string (Secret)

yes

Password.

secure

boolean

no

false

Toggles TLS on the connection. Set to true for ClickHouse Cloud or any TLS-fronted deployment.

verify

boolean

no

true

Whether to verify the server certificate when secure: true. Set to false only for self-signed certs on local clusters.

server_hostname

string or null

no

null

Optional hostname for SNI / certificate validation; defaults to the value of host.

query_limit

integer or null

no

0

Optional row cap applied to internal catalog queries. 0 means no limit.

Source: ClickhousePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/clickhouse.yaml.

Amazon Redshift (type: redshift)

Pulls schemas, tables, views, and columns from an Amazon Redshift cluster.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Redshift cluster endpoint ({cluster}.{region}.redshift.amazonaws.com).

port

string

yes

TCP port as a string (typical: "5439").

database

string or null

no

Database to scan.

user

string

yes

Login.

password

string (Secret)

yes

Password.

schemas

list of string or null

no

null

Allowlist of schema names. When omitted, every non-system schema is ingested. Literal name list, not a regex filter — different from the schemas_filter available on postgresql and snowflake.

connection_timeout

integer or null

no

10

Connection timeout in seconds.

Source: RedshiftPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/redshift.yaml.

CockroachDB (type: cockroachdb)

Pulls schemas, tables, columns, and foreign-key relationships. Inherits from the PostgreSQL plugin — same field shape plus the same schemas_filter regex behavior; ERD edges are emitted for cross-schema foreign keys exactly as on PostgreSQL.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

CockroachDB SQL endpoint.

port

integer

no

5432

TCP port. CockroachDB's typical SQL port is 26257; the model default is the PostgreSQL port inherited from PostgreSQLPlugin.

database

string

yes

Database to scan.

user

string

yes

Login.

password

string (Secret)

no

empty

Password.

schemas_filter.include

list of regex

no

[".*"]

Schemas to include.

schemas_filter.exclude

list of regex

no

[]

Schemas to drop after include matches.

Source: CockroachDBPlugin in odd_collector/domain/plugin.py (extends PostgreSQLPlugin); reference YAML at config_examples/cocroachdb.yaml. The upstream filename has a cocroach typo — the type literal cockroachdb is correct and is what you write in collector_config.yaml.

Vertica (type: vertica)

Pulls schemas, tables, views, and columns from a Vertica analytic database.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Vertica host.

port

string

yes

TCP port (typical: "5433").

database

string or null

no

Database to scan.

user

string

yes

Login.

password

string

yes

Password.

Source: VerticaPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/vertica.yaml.

SingleStore (type: singlestore)

Pulls schemas, tables, views, and columns from a SingleStore (formerly MemSQL) cluster. Wire-compatible with MySQL.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

SingleStore host.

port

string

yes

TCP port.

database

string or null

no

Database to scan.

user

string

yes

Login.

password

string

yes

Password.

ssl_disabled

boolean or null

no

false

Disables TLS to the server — typically only for local development.

Source: SingleStorePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/singlestore.yaml.

Oracle Database (type: oracle)

Pulls schemas (one per Oracle user), tables, views, and columns from an Oracle Database.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Oracle server hostname.

port

string

yes

TCP port (typical: "1521").

user

string

yes

Oracle login (becomes the schema name in Oracle's data model).

service

string

yes

Oracle service name (e.g., XEPDB1). Use the service name, not the SID.

password

string (Secret)

yes

Password.

thick_mode

boolean or null

no

false

When true, switches the underlying Oracle client to thick mode (requires the Oracle Instant Client to be installed in the container). Default thin mode is pure Python and works without Instant Client.

Source: OraclePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/oracle.yaml.

Generic ODBC source (type: odbc)

Pulls schemas, tables, and columns from any source reachable through an ODBC driver registered on the collector container. Useful for sources without a dedicated adapter.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Source hostname.

port

string

yes

TCP port.

database

string

yes

Database to scan.

user

string

yes

Login.

password

string (Secret) or null

no

Password.

driver

string

no

"{ODBC Driver 17s for SQL Server}"

ODBC driver name as registered in odbcinst.ini on the container. The upstream default contains a typo (17s should be 17) — always set this field explicitly to the driver string for your environment (e.g., {ODBC Driver 17 for SQL Server} or your platform's equivalent). See Known limitations.

Source: OdbcPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/odbc.yaml.

SQLite (type: sqlite)

Reads a SQLite database file from a local path on the collector container. In-memory SQLite databases are not supported (each connection sees its own private DB).

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

data_source

string (file path)

yes

Absolute path to the .db file inside the container. The file must exist at startup; the model uses Pydantic's FilePath validator.

Source: SQLitePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/sqlite.yaml.

Wide-column, document, and key-value stores

MongoDB (type: mongodb)

Catalogs MongoDB databases, collections, and inferred field types.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

MongoDB host or seed-list host.

port

string

yes

TCP port (typical: "27017").

database

string or null

no

Database to scan.

user

string

yes

Login.

password

string

yes

Password.

protocol

string

yes

Connection scheme passed to the MongoDB driver — mongodb for direct host/port connections, mongodb+srv for SRV-resolved seed lists (typical for MongoDB Atlas).

Source: MongoDBPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mongodb.yaml.

Apache Cassandra (type: cassandra)

Catalogs keyspaces, tables, and columns from a Cassandra cluster.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Cassandra contact host.

port

string

yes

TCP port (typical: "9042").

database

string or null

no

Keyspace name; one plugin scans one keyspace when supplied.

user

string

yes

Login.

password

string

yes

Password.

contact_points

list of string

no

[]

Additional contact-host endpoints for the cluster's gossip layer. Empty list means the driver uses host only.

Source: CassandraPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/cassandra.yaml.

ScyllaDB (type: scylladb)

Catalogs keyspaces, tables, and columns from a ScyllaDB cluster. Same field shape as Cassandra — Scylla is wire-compatible with the Cassandra driver.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Scylla contact host.

port

string

yes

TCP port (typical: "9042").

database

string or null

no

Keyspace name.

user

string

yes

Login.

password

string

yes

Password.

contact_points

list of string

no

[]

Additional contact-host endpoints.

Source: ScyllaDBPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/scylladb.yaml.

Tarantool (type: tarantool)

Catalogs spaces and indexes from a Tarantool instance. Uses the standard DatabasePlugin shape with no Tarantool-specific fields.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Tarantool host.

port

string

yes

TCP port (typical: "3301").

database

string or null

no

Database / space-collection identifier.

user

string

yes

Login.

password

string

yes

Password.

Source: TarantoolPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/tarantool.yaml.

Couchbase (type: couchbase)

Catalogs Couchbase buckets and infers document field types by sampling. Couchbase is schemaless, so the adapter samples N documents per collection to derive a structural view.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Couchbase connection string (e.g., couchbase://node1.internal,node2.internal).

bucket

string

yes

Bucket name; one plugin scans one bucket.

user

string

yes

Login.

password

string (Secret)

yes

Password.

sample_size

integer or null

no

0

Number of documents to sample per collection for schema inference. 0 disables sampling and uses the metadata-only view.

num_sample_values

integer or null

no

10

When sampling is on, number of value examples to retain per inferred field.

Source: CouchbasePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/couchbase.yaml.

Neo4j (type: neo4j)

Catalogs Neo4j databases, node labels, and relationship types. Uses the standard DatabasePlugin shape; the typical Bolt port is 7687.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Neo4j host.

port

string

yes

Bolt port (typical: "7687").

database

string or null

no

Database name (Neo4j 4.x+ multi-database).

user

string

yes

Login.

password

string

yes

Password.

Source: Neo4jPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/neo4j.yaml.

Search engines

Elasticsearch (type: elasticsearch)

Catalogs Elasticsearch indices and field mappings.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Elasticsearch host (typically including scheme — e.g., https://es.internal).

port

integer

yes

TCP port (typical: 9200).

username

string

yes

Login.

password

string (Secret)

yes

Password.

verify_certs

boolean or null

no

null

Whether to verify TLS certificates on the Elasticsearch endpoint. null defers to the Elasticsearch client default (verify when scheme is https).

ca_certs

string or null

no

null

Optional path to a CA bundle file inside the container.

Source: ElasticsearchPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/elasticsearch.yaml.

OpenSearch (type: opensearch)

Catalogs OpenSearch indices and field mappings.

No config_examples/opensearch.yaml file exists upstream. The fields below are read directly from OpensearchPlugin in plugin.py; the YAML below is hand-crafted from that model.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

OpenSearch host (include scheme when using HTTPS).

port

integer or null

no

443

TCP port. The model defaults to 443 (the typical AWS OpenSearch Service port); set to 9200 for self-hosted.

http_compress

boolean or null

no

true

Whether to gzip request bodies.

use_ssl

boolean or null

no

true

Toggle TLS on the connection.

username

string or null

yes

Login. The model is Optional[str] with no explicit default — provide a value (or null) at config time.

password

string (Secret) or null

yes

Password. Same Pydantic shape as username — provide a value or null.

verify_certs

boolean or null

no

null

Whether to verify TLS certificates. null defers to the OpenSearch client default.

ca_certs

string or null

no

null

Optional path to a CA bundle file inside the container.

Source: OpensearchPlugin in odd_collector/domain/plugin.py.

Analytics engines and warehouses

Databricks Unity Catalog (type: databricks)

Catalogs Databricks Unity Catalog catalogs, schemas, tables, and columns via the Databricks workspace REST API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

workspace

string

yes

Databricks workspace URL (e.g., https://adb-1234567890.0.azuredatabricks.net).

token

string (Secret)

yes

Databricks personal access token (PAT) or service-principal token authorized for Unity Catalog.

catalogs

list of string or null

no

null

Allowlist of Unity Catalog catalogs. When omitted, every catalog the token can see is ingested. Literal name list — not a regex.

Source: DatabricksPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/databricks_unity_catalog.yaml. The upstream filename is databricks_unity_catalog.yaml while the type literal is the shorter databricks — write type: databricks in collector_config.yaml.

DuckDB (type: duckdb)

Reads one or more DuckDB database files from local paths on the collector container; can scan multiple files or whole directories of .db files in a single plugin.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

paths

list of string (file paths)

yes

List of paths to .db files or directories containing .db files. Each path is opened independently.

host

string or null

no

"localhost"

Logical hostname used when generating ODDRNs for the catalog entries.

Source: DuckDBPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/duckdb.yaml.

Presto (type: presto)

Catalogs schemas, tables, and columns from a Presto coordinator.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Presto coordinator host.

port

integer

yes

Coordinator HTTP port (typical: 8080 or 8081).

user

string

yes

User identity (Presto authenticates by user header by default).

principal_id

string or null

yes

Optional principal identifier for LDAP-configured clusters. The model is Optional[str] with no default — pass null (or empty string, as the upstream example does) when not using LDAP.

password

string or null

yes

LDAP password. Same Pydantic shape as principal_id — pass null / empty string on non-LDAP clusters.

Source: PrestoPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/presto.yaml.

Trino (type: trino)

Catalogs schemas, tables, and columns from a Trino coordinator. Wire-compatible with Presto (same client family).

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Trino coordinator host.

port

integer

yes

Coordinator HTTP port (typical: 8080 / 8081).

user

string

yes

User identity.

password

string or null

yes

LDAP password. Pass null / empty string on non-LDAP clusters.

Source: TrinoPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/trino.yaml.

Apache Druid (type: druid)

Catalogs Druid datasources via the broker API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Druid broker host.

port

integer

yes

Broker HTTP port (typical: 8082).

Source: DruidPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/druid.yaml.

Apache Hive (type: hive)

Catalogs Hive databases, tables, and columns via HiveServer2. Configuration is grouped under a nested connection_params object — Hive's auth surface is varied enough that the adapter exposes the full HS2 connection knob set.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

count_statistics

boolean

no

false

Whether to collect row-count statistics (SELECT COUNT(*)) per table. Off by default — these queries can be expensive on large warehouses.

connection_params

object

yes

Nested HS2 connection block — see fields below.

connection_params.host

string

yes

HiveServer2 host.

connection_params.port

integer or null

no

null

HS2 port. Defaults to 10000 when scheme is unset, 1000 when scheme: http / https (per the upstream HS2 client convention).

connection_params.database

string

yes

Hive database to scan.

connection_params.scheme

string or null

no

null

HS2 transport — "http" or "https" for HTTP transport; null for binary transport.

connection_params.auth

string or null

no

null

Auth mode — one of "BASIC", "NOSASL", "KERBEROS", "NONE". Defaults to NONE when omitted.

connection_params.username

string or null

no

null

Username. Used with auth: LDAP or auth: CUSTOM.

connection_params.password

string or null

no

null

Password. Used with auth: LDAP or auth: CUSTOM.

connection_params.kerberos_service_name

string or null

no

null

Used with auth: KERBEROS only.

connection_params.configuration

object or null

no

null

Free-form dict of Hive session configuration overrides.

connection_params.check_hostname

string or null

no

null

TLS hostname check toggle as a string "true" / "false".

connection_params.ssl_cert

string or null

no

null

Path to a CA / client certificate file inside the container.

Source: HivePlugin and HiveConnectionParams in odd_collector/domain/plugin.py; reference YAML at config_examples/hive.yaml.

Business intelligence and dashboards

Tableau (type: tableau)

Catalogs Tableau site content (workspaces, projects, dashboards, sheets) via the Tableau REST API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

server

string

yes

Tableau Server / Tableau Cloud URL.

site

string or null

yes

Tableau site name (empty string for the default site). The model is Optional[str] with no default — pass an explicit value.

user

string or null

yes

Username. Pass null if authenticating via token_name / token_value.

password

string (Secret) or null

yes

Password. Pass null if authenticating via PAT.

token_name

string or null

yes

Personal access token name (for 2FA / SSO accounts that can't use password auth).

token_value

string (Secret) or null

yes

Personal access token value.

pagination_size

integer

no

10

Page size for the REST API. Larger values reduce request count but increase per-request latency; tune for very large sites.

Source: TableauPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/tableau.yaml.

Apache Superset (type: superset)

Catalogs Superset datasets, dashboards, and charts via the Superset REST API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

server

string

yes

Superset base URL (include trailing slash).

username

string

yes

Superset login.

password

string (Secret)

yes

Password.

Source: SupersetPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/superset.yaml.

Metabase (type: metabase)

Catalogs Metabase dashboards, questions, and the underlying datasets they reference.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Metabase host.

port

string

yes

TCP port.

login

string

yes

Metabase login email.

password

string (Secret)

yes

Password.

Source: MetabasePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/metabase.yaml.

Redash (type: redash)

Catalogs Redash queries and dashboards via the Redash API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

server

string

yes

Redash server base URL.

api_key

string

yes

Redash API key (account-scoped — gives the adapter access to whatever the account can see).

Source: RedashPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/redash.yaml.

Mode Analytics (type: mode)

Catalogs Mode reports and the underlying datasets they reference.

No config_examples/mode.yaml file exists upstream. The fields below are read directly from ModePlugin in plugin.py; the YAML below is hand-crafted from that model.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Mode workspace host (e.g., https://app.mode.com).

account

string

yes

Mode account / workspace identifier.

data_source

string

yes

Mode data-source identifier the adapter should report against.

token

string (Secret) or null

yes

API token. The model is Optional[SecretStr] with no default — pass a value or null if relying on password auth.

password

string (Secret) or null

yes

Password (legacy auth). Pass null if using token auth.

Source: ModePlugin in odd_collector/domain/plugin.py.

Cube.js (type: cubejs)

Catalogs Cube.js cubes and members; uses the cube's underlying SQL data source to resolve lineage from cube measures back to the source columns.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Cube.js server base URL.

dev_mode

boolean

no

false

When true, the adapter relaxes auth — token may be null. In production (dev_mode: false), token is required and the adapter raises ValueError on startup if it isn't set.

token

string (Secret) or null

conditional

null

Cube.js auth token — required unless dev_mode: true.

predefined_datasource

object

yes

Sub-object describing the SQL data source backing the cubes — used by the adapter's SQL parser to generate lineage-edge ODDRNs. Only postgres and clickhouse are recognised types (see PostgresDatasource / ClickHouseDatasource in predefined_data_source.py).

predefined_datasource.type

string

yes

"postgres" or "clickhouse".

predefined_datasource.host

string or null

no

null

Source host — used as the lineage ODDRN host.

predefined_datasource.database

string or null

no

null

Source database.

Source: CubeJSPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/cubejs.yaml.

Catalog, ingestion, and federation

CKAN (type: ckan)

Catalogs CKAN packages and resources from the CKAN action API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

CKAN host.

port

string

yes

TCP port.

ckan_endpoint

string

no

empty

Optional path prefix between the host and the CKAN action API (e.g., "/additional/endpoint"). When the API is mounted at the root, leave empty.

token

string (Secret) or null

no

null

CKAN auth token. Some action endpoints require authorization.

Source: CKANPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/ckan.yaml.

Airbyte (type: airbyte)

Catalogs Airbyte connectors, sources, destinations, and the lineage edges between them.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Airbyte API host.

port

string

yes

Airbyte API port (typical: "8000").

user

string or null

yes

Airbyte username. The model is Optional[str] with no default — pass a value or null for unauthenticated deployments.

password

string or null

yes

Airbyte password. Same Pydantic shape as user.

platform_host_url

string

yes

The ODD Platform URL the adapter advertises in generated ODDRNs for downstream destinations. This is a per-plugin field on AirbytePlugin that overlaps with the collector-level platform_host_url at the top of collector_config.yaml — both must be set when using Airbyte.

store_raw_tables

boolean

no

true

Whether to ingest Airbyte's _airbyte_raw_* staging tables. Set to false to keep them out of the catalog.

Source: AirbytePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/airbyte.yaml.

Fivetran (type: fivetran)

Catalogs a single Fivetran connector and its destination via the Fivetran REST API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

base_url

string

no

"https://api.fivetran.com"

Fivetran API base URL. Override only for Fivetran's regional API endpoints.

api_key

string

yes

Fivetran API key.

api_secret

string (Secret)

yes

Fivetran API secret.

connector_id

string

yes

Fivetran connector identifier — one plugin = one connector. Add a second plugin entry per additional connector.

destination_id

string

yes

Fivetran destination identifier corresponding to the connector.

Source: FivetranPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/fivetran.yaml.

dbt Cloud catalog import (type: dbt)

Pulls dbt model lineage and metadata via a pre-uploaded catalog.json on a host the adapter can reach. This is the pull dbt adapter — distinct from odd-dbt, the push-strategy adapter that emits live test results from dbt runs.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Logical host used for ODDRN generation — typically the dbt Cloud / dbt Core deployment host.

odd_catalog_url

string

yes

URL the adapter fetches the catalog.json from.

Source: DbtPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/dbt.yaml.

Federated ODD Platform (type: odd_adapter)

Pulls metadata from another ODD Platform instance — federate a child platform's catalog into a parent platform.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

URL of the source ODD service that implements the odd_adapter Ingress API.

data_source_oddrn

string

yes

The ODDRN to advertise as the federated data source root (e.g., //my_adapter/host/source-platform.internal).

Source: OddAdapterPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/odd_adapter.yaml.

Machine learning platforms

MLflow (type: mlflow)

Catalogs MLflow experiments, runs, and registered models from the MLflow tracking and model-registry APIs.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

dev_mode

boolean

no

false

Adapter-side dev mode toggle.

tracking_uri

string

yes

MLflow tracking server URI.

registry_uri

string

yes

MLflow model-registry URI (often the same as tracking_uri).

filter_experiments

list of string or null

no

null

Allowlist of experiment names. When omitted, every experiment is ingested. Literal name list — not a regex.

Source: MlflowPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mlflow.yaml.

Feast feature store (type: feast)

Catalogs Feast feature views and entities by reading the Feast repo definition from a path on the collector container.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Logical host used for ODDRN generation.

repo_path

string

yes

Path to a checked-out Feast feature-repo inside the container.

Source: FeastPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/feast.yaml.

Kubeflow Pipelines (type: kubeflow)

Catalogs Kubeflow pipelines, runs, and the lineage edges between them.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

host

string

yes

Kubeflow Pipelines host (typically the KFP UI URL).

namespace

string

yes

Kubernetes namespace Kubeflow runs in — not the same as ODD's namespace metadata field. The Kubeflow plugin redeclares namespace as required at the plugin level, which shadows BasePlugin's optional metadata field.

session_cookie0

string or null

yes

First half of the KFP session cookie (Istio AuthService split-cookie pattern). The model is Optional[str] with no default — provide a value or null.

session_cookie1

string or null

yes

Second half of the KFP session cookie.

Source: KubeflowPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/kubeflow.yaml.

Per-adapter feature matrix

Cross-cutting capabilities and where they apply across the 41-adapter set:

Feature
Where it applies
What it does

Regex ingestion filters (Filter)

postgresql.schemas_filter, snowflake.schemas_filter, cockroachdb.schemas_filter (inherited from PostgreSQLPlugin)

Regex include / exclude lists scope which schemas / objects the adapter sees. Defaults to "include everything" when omitted. Source: Filter in odd_collector_sdk/domain/filter.py.

Literal-name allowlist filters

redshift.schemas (schemas), databricks.catalogs (Unity Catalog catalogs), mlflow.filter_experiments (experiment names)

Plain list of names the adapter restricts ingestion to. Not a regex — literal exact match. When omitted, the adapter ingests everything visible to the credentials.

ERD relationships (foreign keys)

postgresql, snowflake, cockroachdb (via PostgreSQL inheritance)

The adapter emits ENTITY_RELATIONSHIP entities for tables connected by foreign keys, including cross-schema. The platform renders these as ERD edges on dataset detail pages. No other adapter currently extracts foreign-key relationships.

TLS toggles on the source connection

clickhouse.secure + clickhouse.verify, mysql.ssl_disabled, singlestore.ssl_disabled, elasticsearch.verify_certs + elasticsearch.ca_certs, opensearch.use_ssl + opensearch.verify_certs + opensearch.ca_certs

Per-adapter knobs for TLS toggling and certificate validation. Defaults are tuned per adapter (see each section above) — only override for self-signed certs on local clusters or for unencrypted local development.

Token-based auth (PAT / API token alternative to password)

tableau.token_name + tableau.token_value, databricks.token, redash.api_key, fivetran.api_key + fivetran.api_secret, cubejs.token, mlflow (via tracking_uri auth), ckan.token, mode.token, kubeflow.session_cookie0 + kubeflow.session_cookie1

Replaces username/password auth; required when the source enforces SSO / 2FA / MFA on user accounts.

Schema inference via document sampling

couchbase.sample_size + couchbase.num_sample_values

The adapter samples N documents per collection to derive a structural view of fields and value types. Defaults to no sampling (sample_size: 0) — set explicitly to enable.

Sub-object connection block (advanced auth surface)

hive.connection_params (full HS2 connection knob set), cubejs.predefined_datasource (postgres / clickhouse only — used to resolve cube-to-source lineage)

Some adapters expose a nested object instead of flat fields when the auth or lineage surface needs more knobs than a flat schema supports.

Multiple file paths in one plugin

duckdb.paths

DuckDB accepts a list of .db files or directories of .db files in one plugin — every file is opened independently. Other file-source adapters (sqlite) take a single path.

Special operating modes

oracle.thick_mode (Oracle Instant Client vs. pure-Python), cubejs.dev_mode (relax token requirement), mlflow.dev_mode

Adapter-level toggles that alter runtime behaviour or auth strictness; safe defaults are off.

Other adapters either do not expose filters (the SDK ones don't carry a Filter field) or do not emit relationships. For the filter mechanism's user-facing explanation (include / exclude semantics, when filters apply, default behaviour without filters), see Ingestion filters. The full cross-adapter capability matrix — which adapter exposes which filter, which emits which relationship type — lives on the odd-collectors monorepo README; check that table when planning a new deployment.

Known limitations

  • README drift on the source repo: as flagged above, the upstream README's adapter table omits four adapters (databricks, couchbase, opensearch, oracle) that exist in PLUGIN_FACTORY. This is a docs gap on the collector repo, not a missing capability — those four adapters work; they're just under-advertised.

  • Foreign-key extraction is PostgreSQL/Snowflake only today. ClickHouse, MySQL, MSSQL, and others extract schemas and columns but not foreign-key relationships.

  • No per-plugin pulling_interval: every plugin in the file shares default_pulling_interval. Splitting workloads with different cadences requires running multiple collector containers, each with its own config.

  • M1 / Apple Silicon build issues: pyodbc, confluent-kafka, and grpcio need extra environment variables to build natively. See the generic collector README → M1 building issue.

  • odbc.driver upstream typo: OdbcPlugin.driver defaults to "{ODBC Driver 17s for SQL Server}" (with an extra s) — s should not be in the driver string. Always set driver: explicitly in the plugin config to the registered driver name on your container (e.g., {ODBC Driver 17 for SQL Server} or your platform's equivalent). Without an explicit value, the adapter's connection attempt fails because no ODBC driver matches the typoed string. The reference YAML at config_examples/odbc.yaml uses the correct value, so copy from there rather than relying on the model default.

  • Missing upstream config examples for mode and opensearch: both adapters are present in PLUGIN_FACTORY and shipped, but odd-collectors/odd-collector/config_examples/ does not contain a mode.yaml or opensearch.yaml. The per-adapter sections above include hand-crafted YAML examples derived from the Pydantic models for both.

  • config_examples/cocroachdb.yaml filename typo: the file containing the CockroachDB reference YAML is cocroachdb.yaml (missing the k). The type literal (cockroachdb) is correct — the file's contents work as-is; only the filename is misspelled.

Where to next

Last updated