odd-collector-azure

Azure-services pull collector — adapters for PowerBI, Azure SQL, Blob Storage, and Data Factory.

Status: Stable. Released as a tagged Docker image alongside the rest of the odd-collectors monorepo.

odd-collector-azure packages adapters for Azure managed services. Like the other pull collectors, it ships as a daemon container that hosts one or more configured plugins; one container can host multiple plugins of any combination of types.

For the broader pull-vs-push picture, start at the Integrations hub. For deployment-side detail, see Build and run ODD Collectors.

Supported adapters

The 4 adapters registered in odd_collector_azure/domain/plugin.py (PLUGIN_FACTORY):

Type literal
Azure service
Spotlighted below

powerbi

Microsoft PowerBI (workspaces, datasets, reports, dashboards)

azure_sql

Azure SQL Database

blob_storage

Azure Blob Storage

azure_data_factory

Azure Data Factory pipelines

The reference YAML for each adapter lives at odd-collectors/odd-collector-azure/config_examples/. The Pydantic models that define accepted fields live at odd-collector-azure/odd_collector_azure/domain/plugin.py.

Installation

docker pull ghcr.io/opendatadiscovery/odd-collector-azure:latest

Mount a collector_config.yaml at /app/collector_config.yaml. A reference Compose snippet is in the azure collector README.

Minimal config

platform_host_url: http://localhost:8080
token: <COLLECTOR_TOKEN>
default_pulling_interval: 10
plugins:
  - type: powerbi
    name: powerbi_main
    client_id: <AAD_APP_CLIENT_ID>
    client_secret: !ENV ${POWERBI_CLIENT_SECRET}
    username: [email protected]
    password: !ENV ${POWERBI_PASSWORD}
    domain: yourdomain.com

Multiple plugins in one container

Spotlight: PowerBI (type: powerbi)

Pulls workspaces, datasets, reports, and dashboards from PowerBI via the Azure AD-authenticated REST API.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

client_id

string

yes

Client ID of the Azure AD app registration.

client_secret

string

yes

Client secret of the Azure AD app registration.

username

string

yes

PowerBI account email.

password

string

yes

PowerBI account password.

domain

string

yes

Tenant domain (e.g. yourdomain.com).

Source: PowerBiPlugin / AzurePlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/power_bi.yaml.

Spotlight: Blob Storage (type: blob_storage)

Pulls Blob containers and infers dataset schema from objects (files) inside them. The shape of dataset_config mirrors the AWS S3 plugin — the expected list pattern is container + prefix per dataset.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

account_name

string

yes

Storage account name.

account_key

string (Secret)

one-of

Account key. Use either account_key or connection_string.

connection_string

string (Secret)

one-of

Full connection string from the Azure portal.

dataset_config

list of objects

yes

List of { container, prefix } entries.

file_filter.include

list of regex

no

[".*"]

File-name patterns to include.

file_filter.exclude

list of regex

no

[]

File-name patterns to drop after include matches.

Source: BlobPlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/blob_storage.yaml.

Spotlight: Azure SQL (type: azure_sql)

Pulls databases, schemas, tables, views, and columns from Azure SQL Database via SQL Server's catalog views.

AzureSQLPlugin does not inherit from AzurePlugin (the AAD-app pattern that PowerBI uses) — Azure SQL authenticates with a SQL login (username + password), not with client_id + client_secret + tenant. Use a SQL Server contained user or an AAD password-grant user; AAD interactive / managed-identity is not supported by this adapter.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

server

string

yes

SQL Server host. For Azure SQL Database use <server>.database.windows.net; for a local instance, localhost.

port

string

yes

TCP port, as a string (the Pydantic model is port: str). Typical Azure SQL value is "1433".

database

string

yes

Database to scan; one plugin = one database.

username

string

yes

SQL login.

password

string

yes

Password.

encrypt

string

no

"yes"

TLS toggle for the SQL connection — "yes" / "no". Azure SQL requires TLS; leave at default unless connecting to an unencrypted local SQL Server.

trust_server_certificate

string

no

"no"

Skip certificate validation — "yes" / "no". Set to "yes" only for local development against a self-signed cert.

connection_timeout

string

no

"30"

Driver-level connection timeout, in seconds, as a string.

The TLS-related fields (encrypt, trust_server_certificate, connection_timeout) are typed as str in the Pydantic model and the README expects literal string values like "yes" / "no" / "30" rather than booleans / integers. Quote them in YAML if your editor / linter is type-aware.

Source: AzureSQLPlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/azure_sql.yaml.

Spotlight: Azure Data Factory (type: azure_data_factory)

Pulls Azure Data Factory pipelines, including pipeline-to-dataset lineage where the pipeline's activities reference a known catalogued dataset.

The adapter authenticates via DefaultAzureCredential — credentials are sourced from environment variables on the container, not from inline plugin fields. The standard env vars are AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET for a service principal; alternatively DefaultAzureCredential will fall back to managed identity / Azure CLI / VS Code credentials when running on Azure infrastructure.

Field
Type
Required
Default
Description

name

string

yes

Operator-chosen unique plugin name.

subscription

string

yes

Azure subscription ID containing the factory.

resource_group

string

yes

Resource group containing the factory.

factory

string

yes

Data Factory resource name; one plugin = one factory.

pipeline_filter.include

list of regex

no

[".*"]

Pipeline names to include.

pipeline_filter.exclude

list of regex

no

[]

Pipeline names to drop after include matches.

Source: DataFactoryPlugin in odd-collector-azure/.../plugin.py; reference YAML at config_examples/azure_data_factory.yaml.

Per-adapter feature matrix

Feature
Where it applies

Ingestion filters (file_filter)

blob_storage. Regex include / exclude lists; default includes everything.

Ingestion filters (pipeline_filter)

azure_data_factory. Regex include / exclude lists scoping which pipelines are catalogued.

DefaultAzureCredential auth chain

azure_data_factory. Reads AZURE_TENANT_ID / AZURE_CLIENT_ID / AZURE_CLIENT_SECRET from the environment per the DefaultAzureCredential docs.

Connection string OR account key

blob_storage. Pick one; both is not required.

TLS knobs (encrypt, trust_server_certificate, connection_timeout)

azure_sql. String-typed; defaults "yes" / "no" / "30".

SQL-login auth (no AAD app)

azure_sql. Username + password only — does not accept client_id / client_secret.

Source: PLUGIN_FACTORY in odd-collector-azure/.../plugin.py.

Known limitations

  • Service-principal-only PowerBI auth. PowerBI requires both an Azure AD app registration (client_id + client_secret) and a user account (username + password) — there is no service-principal-only path on this adapter today. Plan for an account that survives MFA changes.

  • blob_storage.datasets field rejected at validation. Use dataset_config (singular).

  • azure_data_factory requires environment-variable auth. The adapter uses DefaultAzureCredential from azure-identity and does not accept inline credentials in the plugin config — set AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET as environment variables on the container.

  • No foreign-key / ERD extraction in any Azure adapter.

  • account_key and connection_string are inline plaintext fields by default — source them from environment variables (!ENV) or hand-managed secret storage.

Where to next

Last updated