odd-collector (generic)
Generic ODD Collector — 41 pull adapters for databases, data warehouses, BI tools, streams, and MLOps platforms.
Status: Stable. Released as a tagged Docker image; the underlying SDK is the same one all odd-collector-* collectors share.
odd-collector is the general-purpose pull collector. It bundles 41 adapters covering relational databases, data warehouses, NoSQL stores, message brokers, BI tools, MLOps platforms, and a few catalog / orchestration sources. One container instance can host any combination of those adapters as plugins, including multiple plugins of the same type pointing at different sources.
For the broader pull-vs-push picture and the shared collector configuration schema, start at the Integrations hub. For deployment-side detail (build, Docker, env vars), see Build and run ODD Collectors.
Supported adapters
The 41 adapters registered in odd_collector/domain/plugin.py (PLUGIN_FACTORY). Every adapter has per-field documentation below — three (postgresql, snowflake, kafka) get longer deep-dive spotlights with deployment guidance and feature notes; the remaining 38 are catalogued in the per-adapter configuration reference section.
airbyte
Airbyte
cassandra
Apache Cassandra
ckan
CKAN
clickhouse
ClickHouse
cockroachdb
CockroachDB
couchbase
Couchbase
cubejs
Cube.js
databricks
Databricks (Unity Catalog)
dbt
dbt Cloud (catalog import)
druid
Apache Druid
duckdb
DuckDB
elasticsearch
Elasticsearch
feast
Feast feature store
fivetran
Fivetran
hive
Apache Hive
kafka
Apache Kafka
✓
kubeflow
Kubeflow Pipelines
metabase
Metabase
mlflow
MLflow
mode
Mode Analytics
mongodb
MongoDB
mssql
Microsoft SQL Server
mysql
MySQL / MariaDB
neo4j
Neo4j
odbc
Generic ODBC source
odd_adapter
Another ODD Platform (federated)
opensearch
OpenSearch
oracle
Oracle Database
postgresql
PostgreSQL (incl. pgvector)
✓
presto
Presto
redash
Redash
redshift
Amazon Redshift
scylladb
ScyllaDB
singlestore
SingleStore
snowflake
Snowflake
✓
sqlite
SQLite
superset
Apache Superset
tableau
Tableau
tarantool
Tarantool
trino
Trino
vertica
Vertica
The canonical YAML for each adapter lives at odd-collectors/odd-collector/config_examples/ — one file per adapter, named after the type literal. The Pydantic models that define the accepted fields live at odd-collectors/odd-collector/odd_collector/domain/plugin.py; read those when an example field is unclear.
The repo's top-level README's "Implemented adapters" table lags behind the code — databricks, couchbase, opensearch, and oracle are present in PLUGIN_FACTORY but missing from the README table at the time of writing. Use the type literal table above (or PLUGIN_FACTORY in plugin.py) as the authoritative inventory.
Installation
Mount a collector_config.yaml at /app/collector_config.yaml inside the container. A reference Compose snippet is in the generic collector README and a from-source build flow is in Build and run ODD Collectors.
Minimal config
The smallest collector_config.yaml that runs the collector:
The shared top-level fields (platform_host_url, token, default_pulling_interval, plugins, plus the optional connection_timeout_seconds / chunk_size / misfire_grace_time / max_instances / verify_ssl) are documented once at Build and run ODD Collectors → Full configuration reference. Only the plugins[*] shape varies per adapter — the rest of this page covers that.
Multiple plugins in one container
plugins is a list — add as many entries as you need, mixing types freely. Two plugins of the same type (e.g. several PostgreSQL databases on different hosts) is the common pattern:
Each plugin's name must be unique within the file — the collector uses it to log per-plugin progress and to wire each plugin to its own scheduled job. The default_pulling_interval applies to every plugin uniformly; per-plugin overrides are not supported.
Spotlight: PostgreSQL (type: postgresql)
type: postgresql)Pulls schemas, tables, columns, foreign-key relationships, and (with pgvector installed in the source) vector indexes. PostgreSQL tables containing at least one vector-typed column are classified as the Vector Store dataset type — see Vector Store metadata for the user-facing classification, the dedicated icon in the catalog, and the Vector column data type rendering on the Structure tab.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
PostgreSQL server hostname.
port
integer
no
5432
TCP port.
database
string
yes
—
Database to scan; one plugin = one database.
user
string
yes
—
Login. The user needs read on the system catalogs you want indexed.
password
string (Secret)
yes
empty
Password. Use !ENV ${VAR} to source from an environment variable.
schemas_filter.include
list of regex
no
[".*"]
Schemas to include.
schemas_filter.exclude
list of regex
no
[]
Schemas to drop after include matches.
Source: PostgreSQLPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/postgresql.yaml.
The PostgreSQL adapter extracts foreign-key relationships and emits them as ENTITY_RELATIONSHIP entities — these render as ERD edges on the dataset detail page in the catalog. Cross-schema foreign keys are supported.
Spotlight: Snowflake (type: snowflake)
type: snowflake)Pulls databases, schemas, tables, views, columns, and foreign-key relationships.
name
string
yes
—
Operator-chosen unique plugin name.
account
string
yes
—
Snowflake account identifier (e.g. ab12345.eu-central-1). The adapter derives the host as {ACCOUNT}.snowflakecomputing.com.
warehouse
string
yes
—
Compute warehouse used for the catalog query.
database
string
yes
—
Database to scan.
user
string
yes
—
Snowflake login.
password
string (Secret)
yes
—
Password.
schemas_filter.include
list of regex
no
[".*"]
Schemas to include.
schemas_filter.exclude
list of regex
no
[]
Schemas to drop after include matches.
Source: SnowflakePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/snowflake.yaml.
Like PostgreSQL, the Snowflake adapter extracts foreign-key constraints and emits ENTITY_RELATIONSHIP entities.
Spotlight: Kafka (type: kafka)
type: kafka)Pulls Kafka topics and (when a Confluent-compatible Schema Registry is reachable) the registered schemas.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Bootstrap broker host.
port
integer
yes
—
Bootstrap broker port.
broker_conf
dict
yes
—
Passed to confluent_kafka.AdminClient — e.g. SASL credentials, SSL settings.
schema_registry_conf
dict
no
{}
Passed to the Schema Registry client — e.g. URL, basic auth. When empty, the adapter does not query the registry.
Source: KafkaPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/kafka.yaml.
Per-adapter configuration reference
The three spotlights above cover the deployment-shape questions; this section enumerates the per-field config schema for the remaining 38 adapters. Field names, types, and defaults are sourced from the Pydantic plugin classes in odd_collector/domain/plugin.py; each adapter links to its config_examples/{type}.yaml reference YAML where one exists. Two adapters — mode and opensearch — have no upstream config example; their tables come from the Pydantic model alone and the per-section note flags the gap.
Common shapes used across the families below:
BasePlugin— every plugin carriesname(required, operator-chosen, unique within the file). The optional metadata fieldsdescriptionandnamespaceare accepted by every plugin and omitted from the per-adapter tables to save space.DatabasePluginbase — addshost: str(required),port: str(required, often overridden tointby subclasses),database: str | null(optional in the base; many subclasses redeclare it as required),user: str(required),password: str(required, redeclared by most subclasses asSecretStrwith an empty default).WithHost— adds onlyhost: str.WithPort— adds onlyport: str. Both are mixed in by adapters that don't fit the fullDatabasePluginshape.
Each table below repeats every field the adapter accepts so that an entry is self-contained — operators don't need to chase the inheritance chain in plugin.py.
Relational databases
Microsoft SQL Server (type: mssql)
type: mssql)Pulls schemas, tables, views, and columns from a Microsoft SQL Server / Azure SQL Server source via the SQL catalog views.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
SQL Server host.
port
integer
yes
—
TCP port (typical: 1433).
database
string
yes
—
Database to scan; one plugin = one database.
user
string
yes
—
SQL login.
password
string (Secret)
no
empty
Password. Use !ENV ${VAR} to source from an environment variable.
Source: MSSQLPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mssql.yaml.
MySQL / MariaDB (type: mysql)
type: mysql)Pulls schemas, tables, views, and columns. Compatible with MariaDB.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
MySQL server hostname.
port
integer
yes
—
TCP port (typical: 3306).
database
string
yes
—
Database to scan.
user
string
yes
—
Login.
password
string (Secret)
no
empty
Password.
ssl_disabled
boolean
no
false
When true, disables TLS to the server — typically used only for local development against an unencrypted instance.
Source: MySQLPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mysql.yaml.
ClickHouse (type: clickhouse)
type: clickhouse)Pulls databases, tables, and columns from a ClickHouse cluster.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
ClickHouse server hostname.
port
integer or null
yes
—
HTTP (8123) or native (9000) port. The Pydantic model accepts null, but every reference example provides an explicit value.
database
string or null
no
—
Database to scan. When unset, the connection's default database is used.
user
string
yes
—
Login.
password
string (Secret)
yes
—
Password.
secure
boolean
no
false
Toggles TLS on the connection. Set to true for ClickHouse Cloud or any TLS-fronted deployment.
verify
boolean
no
true
Whether to verify the server certificate when secure: true. Set to false only for self-signed certs on local clusters.
server_hostname
string or null
no
null
Optional hostname for SNI / certificate validation; defaults to the value of host.
query_limit
integer or null
no
0
Optional row cap applied to internal catalog queries. 0 means no limit.
Source: ClickhousePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/clickhouse.yaml.
Amazon Redshift (type: redshift)
type: redshift)Pulls schemas, tables, views, and columns from an Amazon Redshift cluster.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Redshift cluster endpoint ({cluster}.{region}.redshift.amazonaws.com).
port
string
yes
—
TCP port as a string (typical: "5439").
database
string or null
no
—
Database to scan.
user
string
yes
—
Login.
password
string (Secret)
yes
—
Password.
schemas
list of string or null
no
null
Allowlist of schema names. When omitted, every non-system schema is ingested. Literal name list, not a regex filter — different from the schemas_filter available on postgresql and snowflake.
connection_timeout
integer or null
no
10
Connection timeout in seconds.
Source: RedshiftPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/redshift.yaml.
CockroachDB (type: cockroachdb)
type: cockroachdb)Pulls schemas, tables, columns, and foreign-key relationships. Inherits from the PostgreSQL plugin — same field shape plus the same schemas_filter regex behavior; ERD edges are emitted for cross-schema foreign keys exactly as on PostgreSQL.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
CockroachDB SQL endpoint.
port
integer
no
5432
TCP port. CockroachDB's typical SQL port is 26257; the model default is the PostgreSQL port inherited from PostgreSQLPlugin.
database
string
yes
—
Database to scan.
user
string
yes
—
Login.
password
string (Secret)
no
empty
Password.
schemas_filter.include
list of regex
no
[".*"]
Schemas to include.
schemas_filter.exclude
list of regex
no
[]
Schemas to drop after include matches.
Source: CockroachDBPlugin in odd_collector/domain/plugin.py (extends PostgreSQLPlugin); reference YAML at config_examples/cocroachdb.yaml. The upstream filename has a cocroach typo — the type literal cockroachdb is correct and is what you write in collector_config.yaml.
Vertica (type: vertica)
type: vertica)Pulls schemas, tables, views, and columns from a Vertica analytic database.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Vertica host.
port
string
yes
—
TCP port (typical: "5433").
database
string or null
no
—
Database to scan.
user
string
yes
—
Login.
password
string
yes
—
Password.
Source: VerticaPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/vertica.yaml.
SingleStore (type: singlestore)
type: singlestore)Pulls schemas, tables, views, and columns from a SingleStore (formerly MemSQL) cluster. Wire-compatible with MySQL.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
SingleStore host.
port
string
yes
—
TCP port.
database
string or null
no
—
Database to scan.
user
string
yes
—
Login.
password
string
yes
—
Password.
ssl_disabled
boolean or null
no
false
Disables TLS to the server — typically only for local development.
Source: SingleStorePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/singlestore.yaml.
Oracle Database (type: oracle)
type: oracle)Pulls schemas (one per Oracle user), tables, views, and columns from an Oracle Database.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Oracle server hostname.
port
string
yes
—
TCP port (typical: "1521").
user
string
yes
—
Oracle login (becomes the schema name in Oracle's data model).
service
string
yes
—
Oracle service name (e.g., XEPDB1). Use the service name, not the SID.
password
string (Secret)
yes
—
Password.
thick_mode
boolean or null
no
false
When true, switches the underlying Oracle client to thick mode (requires the Oracle Instant Client to be installed in the container). Default thin mode is pure Python and works without Instant Client.
Source: OraclePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/oracle.yaml.
Generic ODBC source (type: odbc)
type: odbc)Pulls schemas, tables, and columns from any source reachable through an ODBC driver registered on the collector container. Useful for sources without a dedicated adapter.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Source hostname.
port
string
yes
—
TCP port.
database
string
yes
—
Database to scan.
user
string
yes
—
Login.
password
string (Secret) or null
no
—
Password.
driver
string
no
"{ODBC Driver 17s for SQL Server}"
ODBC driver name as registered in odbcinst.ini on the container. The upstream default contains a typo (17s should be 17) — always set this field explicitly to the driver string for your environment (e.g., {ODBC Driver 17 for SQL Server} or your platform's equivalent). See Known limitations.
Source: OdbcPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/odbc.yaml.
SQLite (type: sqlite)
type: sqlite)Reads a SQLite database file from a local path on the collector container. In-memory SQLite databases are not supported (each connection sees its own private DB).
name
string
yes
—
Operator-chosen unique plugin name.
data_source
string (file path)
yes
—
Absolute path to the .db file inside the container. The file must exist at startup; the model uses Pydantic's FilePath validator.
Source: SQLitePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/sqlite.yaml.
Wide-column, document, and key-value stores
MongoDB (type: mongodb)
type: mongodb)Catalogs MongoDB databases, collections, and inferred field types.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
MongoDB host or seed-list host.
port
string
yes
—
TCP port (typical: "27017").
database
string or null
no
—
Database to scan.
user
string
yes
—
Login.
password
string
yes
—
Password.
protocol
string
yes
—
Connection scheme passed to the MongoDB driver — mongodb for direct host/port connections, mongodb+srv for SRV-resolved seed lists (typical for MongoDB Atlas).
Source: MongoDBPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mongodb.yaml.
Apache Cassandra (type: cassandra)
type: cassandra)Catalogs keyspaces, tables, and columns from a Cassandra cluster.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Cassandra contact host.
port
string
yes
—
TCP port (typical: "9042").
database
string or null
no
—
Keyspace name; one plugin scans one keyspace when supplied.
user
string
yes
—
Login.
password
string
yes
—
Password.
contact_points
list of string
no
[]
Additional contact-host endpoints for the cluster's gossip layer. Empty list means the driver uses host only.
Source: CassandraPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/cassandra.yaml.
ScyllaDB (type: scylladb)
type: scylladb)Catalogs keyspaces, tables, and columns from a ScyllaDB cluster. Same field shape as Cassandra — Scylla is wire-compatible with the Cassandra driver.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Scylla contact host.
port
string
yes
—
TCP port (typical: "9042").
database
string or null
no
—
Keyspace name.
user
string
yes
—
Login.
password
string
yes
—
Password.
contact_points
list of string
no
[]
Additional contact-host endpoints.
Source: ScyllaDBPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/scylladb.yaml.
Tarantool (type: tarantool)
type: tarantool)Catalogs spaces and indexes from a Tarantool instance. Uses the standard DatabasePlugin shape with no Tarantool-specific fields.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Tarantool host.
port
string
yes
—
TCP port (typical: "3301").
database
string or null
no
—
Database / space-collection identifier.
user
string
yes
—
Login.
password
string
yes
—
Password.
Source: TarantoolPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/tarantool.yaml.
Couchbase (type: couchbase)
type: couchbase)Catalogs Couchbase buckets and infers document field types by sampling. Couchbase is schemaless, so the adapter samples N documents per collection to derive a structural view.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Couchbase connection string (e.g., couchbase://node1.internal,node2.internal).
bucket
string
yes
—
Bucket name; one plugin scans one bucket.
user
string
yes
—
Login.
password
string (Secret)
yes
—
Password.
sample_size
integer or null
no
0
Number of documents to sample per collection for schema inference. 0 disables sampling and uses the metadata-only view.
num_sample_values
integer or null
no
10
When sampling is on, number of value examples to retain per inferred field.
Source: CouchbasePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/couchbase.yaml.
Neo4j (type: neo4j)
type: neo4j)Catalogs Neo4j databases, node labels, and relationship types. Uses the standard DatabasePlugin shape; the typical Bolt port is 7687.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Neo4j host.
port
string
yes
—
Bolt port (typical: "7687").
database
string or null
no
—
Database name (Neo4j 4.x+ multi-database).
user
string
yes
—
Login.
password
string
yes
—
Password.
Source: Neo4jPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/neo4j.yaml.
Search engines
Elasticsearch (type: elasticsearch)
type: elasticsearch)Catalogs Elasticsearch indices and field mappings.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Elasticsearch host (typically including scheme — e.g., https://es.internal).
port
integer
yes
—
TCP port (typical: 9200).
username
string
yes
—
Login.
password
string (Secret)
yes
—
Password.
verify_certs
boolean or null
no
null
Whether to verify TLS certificates on the Elasticsearch endpoint. null defers to the Elasticsearch client default (verify when scheme is https).
ca_certs
string or null
no
null
Optional path to a CA bundle file inside the container.
Source: ElasticsearchPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/elasticsearch.yaml.
OpenSearch (type: opensearch)
type: opensearch)Catalogs OpenSearch indices and field mappings.
No config_examples/opensearch.yaml file exists upstream. The fields below are read directly from OpensearchPlugin in plugin.py; the YAML below is hand-crafted from that model.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
OpenSearch host (include scheme when using HTTPS).
port
integer or null
no
443
TCP port. The model defaults to 443 (the typical AWS OpenSearch Service port); set to 9200 for self-hosted.
http_compress
boolean or null
no
true
Whether to gzip request bodies.
use_ssl
boolean or null
no
true
Toggle TLS on the connection.
username
string or null
yes
—
Login. The model is Optional[str] with no explicit default — provide a value (or null) at config time.
password
string (Secret) or null
yes
—
Password. Same Pydantic shape as username — provide a value or null.
verify_certs
boolean or null
no
null
Whether to verify TLS certificates. null defers to the OpenSearch client default.
ca_certs
string or null
no
null
Optional path to a CA bundle file inside the container.
Source: OpensearchPlugin in odd_collector/domain/plugin.py.
Analytics engines and warehouses
Databricks Unity Catalog (type: databricks)
type: databricks)Catalogs Databricks Unity Catalog catalogs, schemas, tables, and columns via the Databricks workspace REST API.
name
string
yes
—
Operator-chosen unique plugin name.
workspace
string
yes
—
Databricks workspace URL (e.g., https://adb-1234567890.0.azuredatabricks.net).
token
string (Secret)
yes
—
Databricks personal access token (PAT) or service-principal token authorized for Unity Catalog.
catalogs
list of string or null
no
null
Allowlist of Unity Catalog catalogs. When omitted, every catalog the token can see is ingested. Literal name list — not a regex.
Source: DatabricksPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/databricks_unity_catalog.yaml. The upstream filename is databricks_unity_catalog.yaml while the type literal is the shorter databricks — write type: databricks in collector_config.yaml.
DuckDB (type: duckdb)
type: duckdb)Reads one or more DuckDB database files from local paths on the collector container; can scan multiple files or whole directories of .db files in a single plugin.
name
string
yes
—
Operator-chosen unique plugin name.
paths
list of string (file paths)
yes
—
List of paths to .db files or directories containing .db files. Each path is opened independently.
host
string or null
no
"localhost"
Logical hostname used when generating ODDRNs for the catalog entries.
Source: DuckDBPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/duckdb.yaml.
Presto (type: presto)
type: presto)Catalogs schemas, tables, and columns from a Presto coordinator.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Presto coordinator host.
port
integer
yes
—
Coordinator HTTP port (typical: 8080 or 8081).
user
string
yes
—
User identity (Presto authenticates by user header by default).
principal_id
string or null
yes
—
Optional principal identifier for LDAP-configured clusters. The model is Optional[str] with no default — pass null (or empty string, as the upstream example does) when not using LDAP.
password
string or null
yes
—
LDAP password. Same Pydantic shape as principal_id — pass null / empty string on non-LDAP clusters.
Source: PrestoPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/presto.yaml.
Trino (type: trino)
type: trino)Catalogs schemas, tables, and columns from a Trino coordinator. Wire-compatible with Presto (same client family).
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Trino coordinator host.
port
integer
yes
—
Coordinator HTTP port (typical: 8080 / 8081).
user
string
yes
—
User identity.
password
string or null
yes
—
LDAP password. Pass null / empty string on non-LDAP clusters.
Source: TrinoPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/trino.yaml.
Apache Druid (type: druid)
type: druid)Catalogs Druid datasources via the broker API.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Druid broker host.
port
integer
yes
—
Broker HTTP port (typical: 8082).
Source: DruidPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/druid.yaml.
Apache Hive (type: hive)
type: hive)Catalogs Hive databases, tables, and columns via HiveServer2. Configuration is grouped under a nested connection_params object — Hive's auth surface is varied enough that the adapter exposes the full HS2 connection knob set.
name
string
yes
—
Operator-chosen unique plugin name.
count_statistics
boolean
no
false
Whether to collect row-count statistics (SELECT COUNT(*)) per table. Off by default — these queries can be expensive on large warehouses.
connection_params
object
yes
—
Nested HS2 connection block — see fields below.
connection_params.host
string
yes
—
HiveServer2 host.
connection_params.port
integer or null
no
null
HS2 port. Defaults to 10000 when scheme is unset, 1000 when scheme: http / https (per the upstream HS2 client convention).
connection_params.database
string
yes
—
Hive database to scan.
connection_params.scheme
string or null
no
null
HS2 transport — "http" or "https" for HTTP transport; null for binary transport.
connection_params.auth
string or null
no
null
Auth mode — one of "BASIC", "NOSASL", "KERBEROS", "NONE". Defaults to NONE when omitted.
connection_params.username
string or null
no
null
Username. Used with auth: LDAP or auth: CUSTOM.
connection_params.password
string or null
no
null
Password. Used with auth: LDAP or auth: CUSTOM.
connection_params.kerberos_service_name
string or null
no
null
Used with auth: KERBEROS only.
connection_params.configuration
object or null
no
null
Free-form dict of Hive session configuration overrides.
connection_params.check_hostname
string or null
no
null
TLS hostname check toggle as a string "true" / "false".
connection_params.ssl_cert
string or null
no
null
Path to a CA / client certificate file inside the container.
Source: HivePlugin and HiveConnectionParams in odd_collector/domain/plugin.py; reference YAML at config_examples/hive.yaml.
Business intelligence and dashboards
Tableau (type: tableau)
type: tableau)Catalogs Tableau site content (workspaces, projects, dashboards, sheets) via the Tableau REST API.
name
string
yes
—
Operator-chosen unique plugin name.
server
string
yes
—
Tableau Server / Tableau Cloud URL.
site
string or null
yes
—
Tableau site name (empty string for the default site). The model is Optional[str] with no default — pass an explicit value.
user
string or null
yes
—
Username. Pass null if authenticating via token_name / token_value.
password
string (Secret) or null
yes
—
Password. Pass null if authenticating via PAT.
token_name
string or null
yes
—
Personal access token name (for 2FA / SSO accounts that can't use password auth).
token_value
string (Secret) or null
yes
—
Personal access token value.
pagination_size
integer
no
10
Page size for the REST API. Larger values reduce request count but increase per-request latency; tune for very large sites.
Source: TableauPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/tableau.yaml.
Apache Superset (type: superset)
type: superset)Catalogs Superset datasets, dashboards, and charts via the Superset REST API.
name
string
yes
—
Operator-chosen unique plugin name.
server
string
yes
—
Superset base URL (include trailing slash).
username
string
yes
—
Superset login.
password
string (Secret)
yes
—
Password.
Source: SupersetPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/superset.yaml.
Metabase (type: metabase)
type: metabase)Catalogs Metabase dashboards, questions, and the underlying datasets they reference.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Metabase host.
port
string
yes
—
TCP port.
login
string
yes
—
Metabase login email.
password
string (Secret)
yes
—
Password.
Source: MetabasePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/metabase.yaml.
Redash (type: redash)
type: redash)Catalogs Redash queries and dashboards via the Redash API.
name
string
yes
—
Operator-chosen unique plugin name.
server
string
yes
—
Redash server base URL.
api_key
string
yes
—
Redash API key (account-scoped — gives the adapter access to whatever the account can see).
Source: RedashPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/redash.yaml.
Mode Analytics (type: mode)
type: mode)Catalogs Mode reports and the underlying datasets they reference.
No config_examples/mode.yaml file exists upstream. The fields below are read directly from ModePlugin in plugin.py; the YAML below is hand-crafted from that model.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Mode workspace host (e.g., https://app.mode.com).
account
string
yes
—
Mode account / workspace identifier.
data_source
string
yes
—
Mode data-source identifier the adapter should report against.
token
string (Secret) or null
yes
—
API token. The model is Optional[SecretStr] with no default — pass a value or null if relying on password auth.
password
string (Secret) or null
yes
—
Password (legacy auth). Pass null if using token auth.
Source: ModePlugin in odd_collector/domain/plugin.py.
Cube.js (type: cubejs)
type: cubejs)Catalogs Cube.js cubes and members; uses the cube's underlying SQL data source to resolve lineage from cube measures back to the source columns.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Cube.js server base URL.
dev_mode
boolean
no
false
When true, the adapter relaxes auth — token may be null. In production (dev_mode: false), token is required and the adapter raises ValueError on startup if it isn't set.
token
string (Secret) or null
conditional
null
Cube.js auth token — required unless dev_mode: true.
predefined_datasource
object
yes
—
Sub-object describing the SQL data source backing the cubes — used by the adapter's SQL parser to generate lineage-edge ODDRNs. Only postgres and clickhouse are recognised types (see PostgresDatasource / ClickHouseDatasource in predefined_data_source.py).
predefined_datasource.type
string
yes
—
"postgres" or "clickhouse".
predefined_datasource.host
string or null
no
null
Source host — used as the lineage ODDRN host.
predefined_datasource.database
string or null
no
null
Source database.
Source: CubeJSPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/cubejs.yaml.
Catalog, ingestion, and federation
CKAN (type: ckan)
type: ckan)Catalogs CKAN packages and resources from the CKAN action API.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
CKAN host.
port
string
yes
—
TCP port.
ckan_endpoint
string
no
empty
Optional path prefix between the host and the CKAN action API (e.g., "/additional/endpoint"). When the API is mounted at the root, leave empty.
token
string (Secret) or null
no
null
CKAN auth token. Some action endpoints require authorization.
Source: CKANPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/ckan.yaml.
Airbyte (type: airbyte)
type: airbyte)Catalogs Airbyte connectors, sources, destinations, and the lineage edges between them.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Airbyte API host.
port
string
yes
—
Airbyte API port (typical: "8000").
user
string or null
yes
—
Airbyte username. The model is Optional[str] with no default — pass a value or null for unauthenticated deployments.
password
string or null
yes
—
Airbyte password. Same Pydantic shape as user.
platform_host_url
string
yes
—
The ODD Platform URL the adapter advertises in generated ODDRNs for downstream destinations. This is a per-plugin field on AirbytePlugin that overlaps with the collector-level platform_host_url at the top of collector_config.yaml — both must be set when using Airbyte.
store_raw_tables
boolean
no
true
Whether to ingest Airbyte's _airbyte_raw_* staging tables. Set to false to keep them out of the catalog.
Source: AirbytePlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/airbyte.yaml.
Fivetran (type: fivetran)
type: fivetran)Catalogs a single Fivetran connector and its destination via the Fivetran REST API.
name
string
yes
—
Operator-chosen unique plugin name.
base_url
string
no
"https://api.fivetran.com"
Fivetran API base URL. Override only for Fivetran's regional API endpoints.
api_key
string
yes
—
Fivetran API key.
api_secret
string (Secret)
yes
—
Fivetran API secret.
connector_id
string
yes
—
Fivetran connector identifier — one plugin = one connector. Add a second plugin entry per additional connector.
destination_id
string
yes
—
Fivetran destination identifier corresponding to the connector.
Source: FivetranPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/fivetran.yaml.
dbt Cloud catalog import (type: dbt)
type: dbt)Pulls dbt model lineage and metadata via a pre-uploaded catalog.json on a host the adapter can reach. This is the pull dbt adapter — distinct from odd-dbt, the push-strategy adapter that emits live test results from dbt runs.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Logical host used for ODDRN generation — typically the dbt Cloud / dbt Core deployment host.
odd_catalog_url
string
yes
—
URL the adapter fetches the catalog.json from.
Source: DbtPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/dbt.yaml.
Federated ODD Platform (type: odd_adapter)
type: odd_adapter)Pulls metadata from another ODD Platform instance — federate a child platform's catalog into a parent platform.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
URL of the source ODD service that implements the odd_adapter Ingress API.
data_source_oddrn
string
yes
—
The ODDRN to advertise as the federated data source root (e.g., //my_adapter/host/source-platform.internal).
Source: OddAdapterPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/odd_adapter.yaml.
Machine learning platforms
MLflow (type: mlflow)
type: mlflow)Catalogs MLflow experiments, runs, and registered models from the MLflow tracking and model-registry APIs.
name
string
yes
—
Operator-chosen unique plugin name.
dev_mode
boolean
no
false
Adapter-side dev mode toggle.
tracking_uri
string
yes
—
MLflow tracking server URI.
registry_uri
string
yes
—
MLflow model-registry URI (often the same as tracking_uri).
filter_experiments
list of string or null
no
null
Allowlist of experiment names. When omitted, every experiment is ingested. Literal name list — not a regex.
Source: MlflowPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/mlflow.yaml.
Feast feature store (type: feast)
type: feast)Catalogs Feast feature views and entities by reading the Feast repo definition from a path on the collector container.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Logical host used for ODDRN generation.
repo_path
string
yes
—
Path to a checked-out Feast feature-repo inside the container.
Source: FeastPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/feast.yaml.
Kubeflow Pipelines (type: kubeflow)
type: kubeflow)Catalogs Kubeflow pipelines, runs, and the lineage edges between them.
name
string
yes
—
Operator-chosen unique plugin name.
host
string
yes
—
Kubeflow Pipelines host (typically the KFP UI URL).
namespace
string
yes
—
Kubernetes namespace Kubeflow runs in — not the same as ODD's namespace metadata field. The Kubeflow plugin redeclares namespace as required at the plugin level, which shadows BasePlugin's optional metadata field.
session_cookie0
string or null
yes
—
First half of the KFP session cookie (Istio AuthService split-cookie pattern). The model is Optional[str] with no default — provide a value or null.
session_cookie1
string or null
yes
—
Second half of the KFP session cookie.
Source: KubeflowPlugin in odd_collector/domain/plugin.py; reference YAML at config_examples/kubeflow.yaml.
Per-adapter feature matrix
Cross-cutting capabilities and where they apply across the 41-adapter set:
Regex ingestion filters (Filter)
postgresql.schemas_filter, snowflake.schemas_filter, cockroachdb.schemas_filter (inherited from PostgreSQLPlugin)
Regex include / exclude lists scope which schemas / objects the adapter sees. Defaults to "include everything" when omitted. Source: Filter in odd_collector_sdk/domain/filter.py.
Literal-name allowlist filters
redshift.schemas (schemas), databricks.catalogs (Unity Catalog catalogs), mlflow.filter_experiments (experiment names)
Plain list of names the adapter restricts ingestion to. Not a regex — literal exact match. When omitted, the adapter ingests everything visible to the credentials.
ERD relationships (foreign keys)
postgresql, snowflake, cockroachdb (via PostgreSQL inheritance)
The adapter emits ENTITY_RELATIONSHIP entities for tables connected by foreign keys, including cross-schema. The platform renders these as ERD edges on dataset detail pages. No other adapter currently extracts foreign-key relationships.
TLS toggles on the source connection
clickhouse.secure + clickhouse.verify, mysql.ssl_disabled, singlestore.ssl_disabled, elasticsearch.verify_certs + elasticsearch.ca_certs, opensearch.use_ssl + opensearch.verify_certs + opensearch.ca_certs
Per-adapter knobs for TLS toggling and certificate validation. Defaults are tuned per adapter (see each section above) — only override for self-signed certs on local clusters or for unencrypted local development.
Token-based auth (PAT / API token alternative to password)
tableau.token_name + tableau.token_value, databricks.token, redash.api_key, fivetran.api_key + fivetran.api_secret, cubejs.token, mlflow (via tracking_uri auth), ckan.token, mode.token, kubeflow.session_cookie0 + kubeflow.session_cookie1
Replaces username/password auth; required when the source enforces SSO / 2FA / MFA on user accounts.
Schema inference via document sampling
couchbase.sample_size + couchbase.num_sample_values
The adapter samples N documents per collection to derive a structural view of fields and value types. Defaults to no sampling (sample_size: 0) — set explicitly to enable.
Sub-object connection block (advanced auth surface)
hive.connection_params (full HS2 connection knob set), cubejs.predefined_datasource (postgres / clickhouse only — used to resolve cube-to-source lineage)
Some adapters expose a nested object instead of flat fields when the auth or lineage surface needs more knobs than a flat schema supports.
Multiple file paths in one plugin
duckdb.paths
DuckDB accepts a list of .db files or directories of .db files in one plugin — every file is opened independently. Other file-source adapters (sqlite) take a single path.
Special operating modes
oracle.thick_mode (Oracle Instant Client vs. pure-Python), cubejs.dev_mode (relax token requirement), mlflow.dev_mode
Adapter-level toggles that alter runtime behaviour or auth strictness; safe defaults are off.
Other adapters either do not expose filters (the SDK ones don't carry a Filter field) or do not emit relationships. For the filter mechanism's user-facing explanation (include / exclude semantics, when filters apply, default behaviour without filters), see Ingestion filters. The full cross-adapter capability matrix — which adapter exposes which filter, which emits which relationship type — lives on the odd-collectors monorepo README; check that table when planning a new deployment.
Known limitations
README drift on the source repo: as flagged above, the upstream README's adapter table omits four adapters (
databricks,couchbase,opensearch,oracle) that exist inPLUGIN_FACTORY. This is a docs gap on the collector repo, not a missing capability — those four adapters work; they're just under-advertised.Foreign-key extraction is PostgreSQL/Snowflake only today. ClickHouse, MySQL, MSSQL, and others extract schemas and columns but not foreign-key relationships.
No per-plugin
pulling_interval: every plugin in the file sharesdefault_pulling_interval. Splitting workloads with different cadences requires running multiple collector containers, each with its own config.M1 / Apple Silicon build issues:
pyodbc,confluent-kafka, andgrpcioneed extra environment variables to build natively. See the generic collector README → M1 building issue.odbc.driverupstream typo:OdbcPlugin.driverdefaults to"{ODBC Driver 17s for SQL Server}"(with an extras) —sshould not be in the driver string. Always setdriver:explicitly in the plugin config to the registered driver name on your container (e.g.,{ODBC Driver 17 for SQL Server}or your platform's equivalent). Without an explicit value, the adapter's connection attempt fails because no ODBC driver matches the typoed string. The reference YAML atconfig_examples/odbc.yamluses the correct value, so copy from there rather than relying on the model default.Missing upstream config examples for
modeandopensearch: both adapters are present inPLUGIN_FACTORYand shipped, butodd-collectors/odd-collector/config_examples/does not contain amode.yamloropensearch.yaml. The per-adapter sections above include hand-crafted YAML examples derived from the Pydantic models for both.config_examples/cocroachdb.yamlfilename typo: the file containing the CockroachDB reference YAML iscocroachdb.yaml(missing thek). The type literal (cockroachdb) is correct — the file's contents work as-is; only the filename is misspelled.
Where to next
odd-collector-aws— when your source is an AWS managed service.odd-collector-azure/odd-collector-gcp— for Azure / GCP.odd-collector-profiler— when you want statistical profiles on a Postgres / Azure SQL source.Collector secrets backend — to source any field from AWS SSM instead of inline YAML.
Build and run ODD Collectors — full SDK config reference and from-source build / run instructions.
Last updated