Ingestion filters
Ingestion filters — collector-side regex include / exclude rules that scope what schemas, tables, files, datasets, or pipelines a plugin ingests. Configured per-plugin in `collector_config.yaml`.
Pull adapters in ODD's collectors ingest everything they can see by default — every schema in a database, every file in a bucket, every dataset in a warehouse. Ingestion filters scope a plugin to a slice of that surface using regex include / exclude rules, so an operator can keep the catalog focused on what their teams actually care about.
This page covers the filter mechanism — the per-key shape, how include and exclude interact, and a worked PostgreSQL example. For per-adapter filter coverage (which adapter exposes which filter keys), see the per-collector pages under Integrations.
Not the same as the platform's "ingestion filter". The ODD Platform has a separate, unrelated feature that also carries the name ingestion filter — a token-based authentication gate on the /ingestion/entities endpoint, enabled with the auth.ingestion.filter.enabled setting (off by default). It controls who may push ingestion requests to the platform, not what a collector reads from a source. The filters on this page are collector-side and decide which schemas, tables, files, datasets, or pipelines an adapter ingests; they have nothing to do with authentication. If you came here to secure the ingestion endpoint, see Ingestion authentication instead.
Where filters are configured
Filters live in collector_config.yaml under the per-plugin block — not at the collector level. Each plugin type exposes its own filter keys named after the dimension being filtered:
schemas_filter— PostgreSQL, Snowflake (filter by database schema).filename_filter— S3, Azure Blob Storage, GCS (filter by file path / name).datasets_filter— BigQuery (filter by dataset).pipeline_filter— Azure Data Factory (filter by pipeline name).
Other adapters expose filters under names that match their domain. The shape — include and exclude regex lists — is consistent across them.
Shape of a filter
Every filter takes two regex lists:
schemas_filter:
include: ['regex_1', 'regex_2', ...]
exclude: ['regex_1', 'regex_2', ...]include— the plugin only ingests items matching at least one regex in the list. Ifincludeis set and no regex matches, the item is skipped.exclude— the plugin skips items matching at least one regex in the list, even if they matchedinclude.
When both lists are set, the rule is "included AND not excluded":
The item must match at least one
includepattern.The item must match zero
excludepatterns.
Either list is optional. Omitting include means "include everything that isn't excluded". Omitting both filters off entirely means "ingest everything the adapter can see" — the default.
Patterns are regular expressions, not glob patterns. Anchor with ^ / $ if you need exact-prefix or exact-suffix matching; otherwise the regex matches anywhere in the candidate string.
Worked example — PostgreSQL schemas_filter
schemas_filterSuppose a PostgreSQL source has these schemas:
test_prodapplication_devdata_in_prodtest_data_in_prod_for_application
Configuring this filter on the PostgreSQL plugin:
The plugin processes each schema:
test_prod→ matchesinclude[0](test) ✓ → matchesexclude[0](prod$) ✗ — excluded.application_dev→ matches noincluderule — skipped (not included).data_in_prod→ matches noincluderule — skipped (not included). (Note:^in.*prodrequires the schema name to start within, whichdata_in_proddoes not.)test_data_in_prod_for_application→ matchesinclude[0](test) ✓ → matches noexcluderule ✓ — ingested.
Net effect: only test_data_in_prod_for_application is ingested. The other three are filtered out at collection time and never appear in the catalog.
When filters apply
Filters apply at ingestion time, on the collector side — the platform never sees the filtered-out items. This means:
Filtered-out items consume zero database storage, zero search index, zero entity-page rendering cost. The filter is not a UI hide; it is a non-ingest.
Changing a filter rule and restarting the collector does not retroactively remove already-ingested items. To prune previously-ingested items the operator must also delete them from the platform (manual delete, or a controlled re-ingest after the filter change clears them from the source-truthy set).
Per-source coverage on Management → Datasources reflects what the filter let through — the entity counts there are post-filter.
Default behaviour without filters
When a plugin's filter block is absent or empty, the plugin ingests everything the source exposes. This is the default — for a fresh deployment, every schema, file, or dataset shows up in the catalog until the operator scopes the surface down.
Per-adapter coverage
Most pull adapters that read from sources with multiple "namespaceable" dimensions (schemas, datasets, paths, projects, ...) expose a corresponding filter. The complete adapter-by-adapter capability list lives on the odd-collectors repository's filtering documentation. When in doubt, consult the per-adapter page under Integrations for the exact key name.
Where to next
odd-collector(generic) — the collector hosting most pull adapters that expose ingestion filters.Common configuration (collectors) → Beyond connection settings — the integrations hub's brief on per-adapter feature surfaces, including filters.
Build and run ODD Collectors → Full configuration reference — the canonical reference for the collector-config schema.
Integrations overview — the bucket landing this page sits under.
Last updated
