odd-collector-azure
Azure-services pull collector — adapters for PowerBI, Azure SQL, Blob Storage, and Data Factory.
Status: Stable. Released as a tagged Docker image alongside the rest of the odd-collectors monorepo.
odd-collector-azure packages adapters for Azure managed services. Like the other pull collectors, it ships as a daemon container that hosts one or more configured plugins; one container can host multiple plugins of any combination of types.
For the broader pull-vs-push picture, start at the Integrations hub. For deployment-side detail, see Build and run ODD Collectors.
Supported adapters
The 4 adapters registered in odd_collector_azure/domain/plugin.py (PLUGIN_FACTORY):
powerbi
Microsoft PowerBI (workspaces, datasets, reports, dashboards)
✓
azure_sql
Azure SQL Database
blob_storage
Azure Blob Storage
✓
azure_data_factory
Azure Data Factory pipelines
The reference YAML for each adapter lives at odd-collectors/odd-collector-azure/config_examples/. The Pydantic models that define accepted fields live at odd-collector-azure/odd_collector_azure/domain/plugin.py.
Installation
docker pull ghcr.io/opendatadiscovery/odd-collector-azure:latestMount a collector_config.yaml at /app/collector_config.yaml. A reference Compose snippet is in the azure collector README.
Minimal config
platform_host_url: http://localhost:8080
token: <COLLECTOR_TOKEN>
default_pulling_interval: 10
plugins:
- type: powerbi
name: powerbi_main
client_id: <AAD_APP_CLIENT_ID>
client_secret: !ENV ${POWERBI_CLIENT_SECRET}
username: [email protected]
password: !ENV ${POWERBI_PASSWORD}
domain: yourdomain.comMultiple plugins in one container
Spotlight: PowerBI (type: powerbi)
type: powerbi)Pulls workspaces, datasets, reports, and dashboards from PowerBI via the Azure AD-authenticated REST API.
name
string
yes
—
Operator-chosen unique plugin name.
client_id
string
yes
—
Client ID of the Azure AD app registration.
client_secret
string
yes
—
Client secret of the Azure AD app registration.
username
string
yes
—
PowerBI account email.
password
string
yes
—
PowerBI account password.
domain
string
yes
—
Tenant domain (e.g. yourdomain.com).
Source: PowerBiPlugin / AzurePlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/power_bi.yaml.
Spotlight: Blob Storage (type: blob_storage)
type: blob_storage)Pulls Blob containers and infers dataset schema from objects (files) inside them. The shape of dataset_config mirrors the AWS S3 plugin — the expected list pattern is container + prefix per dataset.
name
string
yes
—
Operator-chosen unique plugin name.
account_name
string
yes
—
Storage account name.
account_key
string (Secret)
one-of
—
Account key. Use either account_key or connection_string.
connection_string
string (Secret)
one-of
—
Full connection string from the Azure portal.
dataset_config
list of objects
yes
—
List of { container, prefix } entries.
file_filter.include
list of regex
no
[".*"]
File-name patterns to include.
file_filter.exclude
list of regex
no
[]
File-name patterns to drop after include matches.
Source: BlobPlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/blob_storage.yaml.
The legacy datasets: field on BlobPlugin is deprecated and rejected at validation time. Use dataset_config (singular) — the Pydantic validator throws an error on startup if datasets: is present.
Spotlight: Azure SQL (type: azure_sql)
type: azure_sql)Pulls databases, schemas, tables, views, and columns from Azure SQL Database via SQL Server's catalog views.
AzureSQLPlugin does not inherit from AzurePlugin (the AAD-app pattern that PowerBI uses) — Azure SQL authenticates with a SQL login (username + password), not with client_id + client_secret + tenant. Use a SQL Server contained user or an AAD password-grant user; AAD interactive / managed-identity is not supported by this adapter.
name
string
yes
—
Operator-chosen unique plugin name.
server
string
yes
—
SQL Server host. For Azure SQL Database use <server>.database.windows.net; for a local instance, localhost.
port
string
yes
—
TCP port, as a string (the Pydantic model is port: str). Typical Azure SQL value is "1433".
database
string
yes
—
Database to scan; one plugin = one database.
username
string
yes
—
SQL login.
password
string
yes
—
Password.
encrypt
string
no
"yes"
TLS toggle for the SQL connection — "yes" / "no". Azure SQL requires TLS; leave at default unless connecting to an unencrypted local SQL Server.
trust_server_certificate
string
no
"no"
Skip certificate validation — "yes" / "no". Set to "yes" only for local development against a self-signed cert.
connection_timeout
string
no
"30"
Driver-level connection timeout, in seconds, as a string.
The TLS-related fields (encrypt, trust_server_certificate, connection_timeout) are typed as str in the Pydantic model and the README expects literal string values like "yes" / "no" / "30" rather than booleans / integers. Quote them in YAML if your editor / linter is type-aware.
Source: AzureSQLPlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/azure_sql.yaml.
Spotlight: Azure Data Factory (type: azure_data_factory)
type: azure_data_factory)Pulls Azure Data Factory pipelines, including pipeline-to-dataset lineage where the pipeline's activities reference a known catalogued dataset.
The adapter authenticates via DefaultAzureCredential — credentials are sourced from environment variables on the container, not from inline plugin fields. The standard env vars are AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET for a service principal; alternatively DefaultAzureCredential will fall back to managed identity / Azure CLI / VS Code credentials when running on Azure infrastructure.
name
string
yes
—
Operator-chosen unique plugin name.
subscription
string
yes
—
Azure subscription ID containing the factory.
resource_group
string
yes
—
Resource group containing the factory.
factory
string
yes
—
Data Factory resource name; one plugin = one factory.
pipeline_filter.include
list of regex
no
[".*"]
Pipeline names to include.
pipeline_filter.exclude
list of regex
no
[]
Pipeline names to drop after include matches.
Source: DataFactoryPlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/azure_data_factory.yaml.
There is no plugin-level field for client_id / client_secret on azure_data_factory. Setting them inline in YAML has no effect — the adapter's underlying azure-identity client only reads from environment variables. Wire credentials onto the container via AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET (or rely on workload identity in AKS / Container Apps).
Per-adapter feature matrix
Ingestion filters (file_filter)
blob_storage. Regex include / exclude lists; default includes everything.
Ingestion filters (pipeline_filter)
azure_data_factory. Regex include / exclude lists scoping which pipelines are catalogued.
DefaultAzureCredential auth chain
azure_data_factory. Reads AZURE_TENANT_ID / AZURE_CLIENT_ID / AZURE_CLIENT_SECRET from the environment per the DefaultAzureCredential docs.
Connection string OR account key
blob_storage. Pick one; both is not required.
TLS knobs (encrypt, trust_server_certificate, connection_timeout)
azure_sql. String-typed; defaults "yes" / "no" / "30".
SQL-login auth (no AAD app)
azure_sql. Username + password only — does not accept client_id / client_secret.
Source: PLUGIN_FACTORY in odd-collector-azure/.../plugin.py.
Known limitations
Service-principal-only PowerBI auth. PowerBI requires both an Azure AD app registration (
client_id+client_secret) and a user account (username+password) — there is no service-principal-only path on this adapter today. Plan for an account that survives MFA changes.blob_storage.datasetsfield rejected at validation. Usedataset_config(singular).azure_data_factoryrequires environment-variable auth. The adapter usesDefaultAzureCredentialfromazure-identityand does not accept inline credentials in the plugin config — setAZURE_TENANT_ID,AZURE_CLIENT_ID,AZURE_CLIENT_SECRETas environment variables on the container.No foreign-key / ERD extraction in any Azure adapter.
account_keyandconnection_stringare inline plaintext fields by default — source them from environment variables (!ENV) or hand-managed secret storage.
Where to next
odd-collector— generic collector for databases, BI, streams.odd-collector-aws/odd-collector-gcp— sibling cloud collectors.Build and run ODD Collectors — common SDK schema and from-source build flow.
Last updated