# odd-collector-aws

{% hint style="info" %}
**Status: Stable.** Released as a tagged Docker image alongside the rest of the `odd-collectors` monorepo.
{% endhint %}

`odd-collector-aws` packages adapters for AWS managed services. Like the other pull collectors, it ships as a daemon container that hosts one or more configured plugins; one container can host multiple plugins of any combination of types.

For the broader pull-vs-push picture, start at the [Integrations hub](/integrations/integrations.md). For deployment-side detail (build, Docker, env vars), see [Build and run ODD Collectors](/developer-guides/build-and-run/build-and-run-odd-collectors.md).

## Supported adapters

The 11 adapters registered in `odd_collector_aws/domain/plugin.py` (`PLUGIN_FACTORY`). Every adapter has per-field documentation below — two (`glue`, `s3`) get longer deep-dive spotlights with deployment guidance and feature notes; the remaining 9 are catalogued in the [per-adapter configuration reference](#per-adapter-configuration-reference) section.

| Type literal             | AWS service                    | Spotlighted below |
| ------------------------ | ------------------------------ | ----------------- |
| `athena`                 | Amazon Athena                  |                   |
| `dms`                    | AWS Database Migration Service |                   |
| `dynamodb`               | DynamoDB                       |                   |
| `glue`                   | AWS Glue Data Catalog          | ✓                 |
| `kinesis`                | Amazon Kinesis                 |                   |
| `quicksight`             | Amazon QuickSight              |                   |
| `s3`                     | Amazon S3 (object catalog)     | ✓                 |
| `s3_delta`               | Amazon S3 — Delta Lake tables  |                   |
| `sagemaker`              | Amazon SageMaker               |                   |
| `sagemaker_featurestore` | SageMaker Feature Store        |                   |
| `sqs`                    | Amazon SQS                     |                   |

The reference YAML for each adapter lives at [`odd-collectors/odd-collector-aws/config_examples/`](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector-aws/config_examples). The Pydantic models that define accepted fields live at [`odd-collector-aws/odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py).

## Common AWS authentication

Every adapter inherits from `AwsPlugin` and accepts the same set of optional AWS auth fields. When unset, the underlying `boto3` client falls back to its standard credential chain — environment variables, `~/.aws/credentials`, EC2 / EKS instance profile, etc.

| Field                   | Type   | Default | Description                                                                          |
| ----------------------- | ------ | ------- | ------------------------------------------------------------------------------------ |
| `aws_access_key_id`     | string | `None`  | Static access key.                                                                   |
| `aws_secret_access_key` | string | `None`  | Static secret key.                                                                   |
| `aws_session_token`     | string | `None`  | Required when using temporary credentials.                                           |
| `aws_region`            | string | `None`  | AWS region. Required for region-bound services when no environment default exists.   |
| `aws_account_id`        | string | `None`  | Account ID. Required by `kinesis`.                                                   |
| `profile_name`          | string | `None`  | Named profile from `~/.aws/credentials`.                                             |
| `aws_role_arn`          | string | `None`  | Role ARN to assume.                                                                  |
| `aws_role_session_name` | string | `None`  | Session name for the assumed role.                                                   |
| `endpoint_url`          | string | `None`  | Override the AWS endpoint — used for LocalStack and S3-compatible stores like MinIO. |

Source: [`AwsPlugin` base in `odd-collector-aws/odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py).

{% hint style="warning" %}
The container image pulls credentials from environment variables by convention (`AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`). Inline YAML credentials work but are typically left empty for IAM-role-based deployments — the [reference Compose file](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector-aws#docker-compose-example) wires them as env-vars only. Prefer IAM roles over static keys in production.
{% endhint %}

## Installation

```bash
docker pull ghcr.io/opendatadiscovery/odd-collector-aws:latest
```

Mount a `collector_config.yaml` at `/app/collector_config.yaml`. A reference Compose snippet is in the [aws collector README](https://github.com/opendatadiscovery/odd-collectors/tree/main/odd-collector-aws#docker-compose-example).

## Minimal config

```yaml
platform_host_url: http://localhost:8080
token: <COLLECTOR_TOKEN>
default_pulling_interval: 10
plugins:
  - type: glue
    name: glue_main
    aws_region: eu-central-1
    # Static keys optional — falls back to the boto3 default credential chain.
```

## Multiple plugins in one container

A single `odd-collector-aws` instance commonly fans out across multiple AWS accounts or regions:

```yaml
plugins:
  - type: glue
    name: glue_eu
    aws_region: eu-central-1
    aws_role_arn: arn:aws:iam::111111111111:role/odd-reader
    aws_role_session_name: odd
  - type: glue
    name: glue_us
    aws_region: us-east-1
    aws_role_arn: arn:aws:iam::222222222222:role/odd-reader
    aws_role_session_name: odd
  - type: s3
    name: data_lake
    aws_region: eu-central-1
    dataset_config:
      bucket: my-data-lake
      prefix: gold/
```

Plugin `name` must be unique within the file.

## Spotlight: Glue (`type: glue`)

Pulls the AWS Glue Data Catalog — databases, tables, columns, partition keys.

| Field           | Type   | Required    | Default | Description                                      |
| --------------- | ------ | ----------- | ------- | ------------------------------------------------ |
| `name`          | string | yes         | —       | Operator-chosen unique plugin name.              |
| `aws_region`    | string | recommended | `None`  | AWS region. Glue is region-bound.                |
| AWS auth fields | —      | —           | —       | See the common AWS authentication section above. |

Source: [`GluePlugin` in `odd-collector-aws/.../plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/glue.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/glue.yaml).

```yaml
plugins:
  - type: glue
    name: glue_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

## Spotlight: S3 (`type: s3`)

Pulls a curated set of S3 objects (or folders treated as datasets) and infers their schema. Supports CSV / TSV / Parquet, with explicit support for Hive-style partitioning.

| Field                              | Type          | Required | Default  | Description                                                                     |
| ---------------------------------- | ------------- | -------- | -------- | ------------------------------------------------------------------------------- |
| `name`                             | string        | yes      | —        | Operator-chosen unique plugin name.                                             |
| `dataset_config.bucket`            | string        | yes      | —        | S3 bucket name.                                                                 |
| `dataset_config.prefix`            | string        | no       | empty    | Path prefix inside the bucket.                                                  |
| `dataset_config.folder_as_dataset` | object        | no       | —        | Treat a folder as a single partitioned dataset (see partitioned example below). |
| `endpoint_url`                     | string        | no       | `None`   | Override for S3-compatible stores (MinIO, LocalStack).                          |
| `filename_filter.include`          | list of regex | no       | `[".*"]` | Object names to include.                                                        |
| `filename_filter.exclude`          | list of regex | no       | `[]`     | Object names to drop after `include` matches.                                   |
| AWS auth fields                    | —             | —        | —        | See the common AWS authentication section above.                                |

Source: [`S3Plugin` in `odd-collector-aws/.../plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/s3.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/s3.yaml).

```yaml
plugins:
  # Single object as a dataset.
  - type: s3
    name: orders_csv
    aws_region: eu-central-1
    dataset_config:
      bucket: my_bucket
      prefix: folder/subfolder/orders.csv
    filename_filter:
      include: [".*\\.parquet$"]
      exclude: ["dev_.*"]
  # Hive-partitioned folder treated as one dataset.
  - type: s3
    name: events_partitioned
    aws_region: eu-central-1
    dataset_config:
      bucket: my_bucket
      prefix: events/
      folder_as_dataset:
        file_format: parquet
        flavor: hive
  # MinIO (S3-compatible) using endpoint_url.
  - type: s3
    name: dev_minio
    endpoint_url: http://localhost:9000
    aws_access_key_id: minioadmin
    aws_secret_access_key: minioadmin
    dataset_config:
      bucket: dev-bucket
```

{% hint style="warning" %}
The legacy `datasets:` field on `S3Plugin` is deprecated and rejected at validation time. Use `dataset_config` (singular) — the reference YAML and the Pydantic validator both enforce this.
{% endhint %}

## Per-adapter configuration reference

The two spotlights above cover the deployment-shape questions; this section enumerates the per-field config schema for the remaining 9 adapters. Field names, types, and defaults are sourced from the Pydantic plugin classes in [`odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); each adapter links to its `config_examples/{type}.yaml` reference YAML.

Every adapter inherits from `AwsPlugin` (documented under [Common AWS authentication](#common-aws-authentication) above) and therefore accepts the standard AWS auth fields `aws_access_key_id`, `aws_secret_access_key`, `aws_session_token`, `aws_region`, `aws_account_id`, `profile_name`, `aws_role_arn`, `aws_role_session_name`, and `endpoint_url`. The per-adapter tables below list **only adapter-specific fields** plus call out which AWS auth fields a given service requires in practice.

### Amazon Athena (`type: athena`)

Catalogs Athena workgroups, databases, tables, and views.

The plugin declares no fields beyond the `AwsPlugin` base — `aws_region` is required in practice (Athena is region-bound) and the rest of the AWS auth set follows the boto3 default credential chain when unset.

| Field           | Type   | Required | Default | Description                                                                                        |
| --------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------- |
| `name`          | string | yes      | —       | Operator-chosen unique plugin name.                                                                |
| AWS auth fields | —      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice. |

Source: [`AthenaPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/athena.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/athena.yaml).

```yaml
plugins:
  - type: athena
    name: athena_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

### AWS Database Migration Service (`type: dms`)

Catalogs DMS replication instances, endpoints, and tasks.

| Field           | Type   | Required | Default | Description                                                                                        |
| --------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------- |
| `name`          | string | yes      | —       | Operator-chosen unique plugin name.                                                                |
| AWS auth fields | —      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice. |

Source: [`DmsPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/dms.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/dms.yaml).

```yaml
plugins:
  - type: dms
    name: dms_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

### DynamoDB (`type: dynamodb`)

Catalogs DynamoDB tables and infers attribute types from a row sample. The adapter scopes to one region per plugin.

| Field            | Type                   | Required | Default | Description                                                                                                                                                                        |
| ---------------- | ---------------------- | -------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`           | string                 | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                |
| `exclude_tables` | list of string or null | no       | `[]`    | Literal table-name list to skip (e.g., to exclude internal / staging tables). Plain name match — not regex.                                                                        |
| AWS auth fields  | —                      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice; `endpoint_url` works for LocalStack and other DynamoDB-compatible local stores. |

Source: [`DynamoDbPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/dynamodb.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/dynamodb.yaml).

```yaml
plugins:
  - type: dynamodb
    name: dynamodb_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
    exclude_tables: ["staging_audit", "tmp_migration"]
```

### Amazon Kinesis (`type: kinesis`)

Catalogs Kinesis streams in one account / region per plugin.

| Field            | Type   | Required | Default | Description                                                                                                                                                                                              |
| ---------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`           | string | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                                      |
| `aws_account_id` | string | **yes**  | —       | **Required for `kinesis`.** Other adapters inherit `aws_account_id` as `Optional[str]` from `AwsPlugin`; `KinesisPlugin` redeclares it as required. The collector errors out on startup if it isn't set. |
| AWS auth fields  | —      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice.                                                                                                       |

Source: [`KinesisPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/kinesis.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/kinesis.yaml).

```yaml
plugins:
  - type: kinesis
    name: kinesis_main
    aws_region: eu-central-1
    aws_account_id: "123456789012"
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

### Amazon QuickSight (`type: quicksight`)

Catalogs QuickSight datasets, dashboards, and analyses.

| Field           | Type   | Required | Default | Description                                                                                        |
| --------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------- |
| `name`          | string | yes      | —       | Operator-chosen unique plugin name.                                                                |
| AWS auth fields | —      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice. |

Source: [`QuicksightPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/quicksight.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/quicksight.yaml).

```yaml
plugins:
  - type: quicksight
    name: quicksight_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

### S3 Delta Lake (`type: s3_delta`)

Catalogs Delta Lake tables stored in S3 (or any S3-compatible storage). The adapter reads the Delta `_delta_log/` to recover the table's evolved schema rather than inferring it from the underlying Parquet files.

| Field                         | Type                    | Required | Default  | Description                                                                                                                                                                                                                                                        |
| ----------------------------- | ----------------------- | -------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name`                        | string                  | yes      | —        | Operator-chosen unique plugin name.                                                                                                                                                                                                                                |
| `delta_tables`                | object                  | yes      | —        | A **single** Delta-table descriptor (`bucket`, `prefix`, optional `filter`). Note: this is one object — not a list — different from `gcs_delta` on the GCP collector which takes a list. To catalog multiple Delta tables, use multiple `s3_delta` plugin entries. |
| `delta_tables.bucket`         | string                  | yes      | —        | S3 bucket name.                                                                                                                                                                                                                                                    |
| `delta_tables.prefix`         | string                  | yes      | —        | Path prefix inside the bucket pointing at the Delta table root (the directory containing `_delta_log/`).                                                                                                                                                           |
| `delta_tables.filter.include` | list of regex           | no       | `[".*"]` | Per-table regex include list applied during enumeration.                                                                                                                                                                                                           |
| `delta_tables.filter.exclude` | list of regex           | no       | `[]`     | Per-table regex exclude list.                                                                                                                                                                                                                                      |
| `delta_tables.scheme`         | string (alias `schema`) | no       | `"s3"`   | Storage scheme. Defaults to `s3` — override only when pointing the adapter at a non-S3 Delta location. The model accepts `schema` as an alias for backward compatibility.                                                                                          |
| `endpoint_url`                | string or null          | no       | `null`   | Override for S3-compatible endpoints (LocalStack, MinIO). Re-declared on `S3DeltaPlugin` over the inherited `AwsPlugin` field for clarity.                                                                                                                         |
| `aws_storage_allow_http`      | boolean or null         | no       | `false`  | Permit plain-HTTP access to the storage backend. Set to `true` for local MinIO / LocalStack deployments using `http://`; leave `false` for production S3.                                                                                                          |
| AWS auth fields               | —                       | —        | —        | See [Common AWS authentication](#common-aws-authentication).                                                                                                                                                                                                       |

Source: [`S3DeltaPlugin` and `DeltaTableConfig` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/s3_delta.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/s3_delta.yaml).

```yaml
plugins:
  # Production S3.
  - type: s3_delta
    name: lake_delta_prod
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
    delta_tables:
      bucket: data-lake
      prefix: gold/orders/
      filter:
        include: ["events"]
        exclude: ["_pii"]
  # MinIO (S3-compatible) for local dev.
  - type: s3_delta
    name: lake_delta_minio
    endpoint_url: http://localhost:9000
    aws_storage_allow_http: true
    aws_access_key_id: minioadmin
    aws_secret_access_key: minioadmin
    delta_tables:
      bucket: dev-bucket
      prefix: delta_data
```

### Amazon SageMaker (`type: sagemaker`)

Catalogs SageMaker experiments, trials, and model artifacts.

{% hint style="warning" %}
**`SagemakerPlugin` re-declares the AWS auth fields and `experiments` without defaults**, which makes them effectively required in Pydantic — the adapter will not start until you provide `aws_access_key_id`, `aws_secret_access_key`, `aws_region`, `aws_session_token`, `aws_account_id`, and `experiments` (each can be `null` if you intend to fall back to the boto3 credential chain or to ingest every experiment, but the keys must be present in the YAML). This is asymmetric with every other AWS adapter in the collector. Set values explicitly or pass `null` per field.
{% endhint %}

| Field                                                                   | Type                   | Required | Default | Description                                                                                                                                                                                                            |
| ----------------------------------------------------------------------- | ---------------------- | -------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                                                                  | string                 | yes      | —       | Operator-chosen unique plugin name.                                                                                                                                                                                    |
| `aws_secret_access_key`                                                 | string or null         | yes      | —       | Required at the model level (re-declared without default).                                                                                                                                                             |
| `aws_access_key_id`                                                     | string or null         | yes      | —       | Required at the model level (re-declared without default).                                                                                                                                                             |
| `aws_region`                                                            | string or null         | yes      | —       | Required at the model level (re-declared without default).                                                                                                                                                             |
| `aws_session_token`                                                     | string or null         | yes      | —       | Required at the model level (re-declared without default).                                                                                                                                                             |
| `aws_account_id`                                                        | string or null         | yes      | —       | Required at the model level (re-declared without default).                                                                                                                                                             |
| `experiments`                                                           | list of string or null | yes      | —       | Allowlist of SageMaker experiment names to scope ingestion. The model is `Optional[list[str]]` with no default — pass an explicit list to scope or `null` to ingest every experiment. Literal name list — not a regex. |
| `profile_name`, `aws_role_arn`, `aws_role_session_name`, `endpoint_url` | —                      | —        | —       | Inherited from `AwsPlugin`; see [Common AWS authentication](#common-aws-authentication).                                                                                                                               |

Source: [`SagemakerPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/sagemaker.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/sagemaker.yaml).

```yaml
plugins:
  - type: sagemaker
    name: sagemaker_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
    aws_session_token: null
    aws_account_id: "123456789012"
    experiments: ["churn_v2", "fraud_detection"]
```

### SageMaker Feature Store (`type: sagemaker_featurestore`)

Catalogs SageMaker Feature Store feature groups and feature definitions.

| Field           | Type   | Required | Default | Description                                                                                        |
| --------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------- |
| `name`          | string | yes      | —       | Operator-chosen unique plugin name.                                                                |
| AWS auth fields | —      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice. |

Source: [`SagemakerFeaturestorePlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/sagemaker_featurestore.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/sagemaker_featurestore.yaml).

```yaml
plugins:
  - type: sagemaker_featurestore
    name: sagemaker_features
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

### Amazon SQS (`type: sqs`)

Catalogs SQS queues in one region per plugin.

| Field           | Type   | Required | Default | Description                                                                                        |
| --------------- | ------ | -------- | ------- | -------------------------------------------------------------------------------------------------- |
| `name`          | string | yes      | —       | Operator-chosen unique plugin name.                                                                |
| AWS auth fields | —      | —        | —       | See [Common AWS authentication](#common-aws-authentication). `aws_region` is required in practice. |

Source: [`SQSPlugin` in `odd_collector_aws/domain/plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py); reference YAML at [`config_examples/sqs.yaml`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/config_examples/sqs.yaml).

```yaml
plugins:
  - type: sqs
    name: sqs_main
    aws_region: eu-central-1
    aws_access_key_id: !ENV ${AWS_ACCESS_KEY_ID}
    aws_secret_access_key: !ENV ${AWS_SECRET_ACCESS_KEY}
```

## Per-adapter feature matrix

| Feature                                         | Where it applies                                                                                                                                                                   |
| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Ingestion filters (`filename_filter`)**       | `s3`, `s3_delta` (via `delta_tables.filter`). Regex `include` / `exclude` lists; default includes everything.                                                                      |
| **Folder-as-dataset / Hive partitioning**       | `s3`. `dataset_config.folder_as_dataset` accepts `file_format` (`parquet` / `csv` / `tsv`), `flavor` (`hive` / `presto`), and an optional `field_names` list for non-Hive layouts. |
| **Cross-region credentials via `aws_role_arn`** | Every adapter inheriting from `AwsPlugin`.                                                                                                                                         |
| **`endpoint_url` override**                     | `s3`, `s3_delta`, `dynamodb` (and any AWS-SDK call boto3 routes through the configured endpoint). Used for LocalStack and MinIO.                                                   |
| **`exclude_tables`**                            | `dynamodb`. Plain list of table names to skip.                                                                                                                                     |
| **`aws_storage_allow_http` toggle**             | `s3_delta`. Enables plain-HTTP storage access (MinIO / LocalStack); off by default.                                                                                                |
| **Literal-name allowlist filters**              | `sagemaker.experiments` (experiment names). Plain list — not a regex. Required at the model level (no default); pass `null` to ingest every experiment, or a list to scope.        |
| **Required `aws_account_id`**                   | `kinesis`. Re-declared as required (the rest of the AWS adapters take `aws_account_id` as `Optional[str]`).                                                                        |

Source: [`PLUGIN_FACTORY` in `odd-collector-aws/.../plugin.py`](https://github.com/opendatadiscovery/odd-collectors/blob/main/odd-collector-aws/odd_collector_aws/domain/plugin.py).

The cloud collectors do **not** ship with the AWS SSM secrets backend hook that `odd-collector` (the generic one) ships — see [Collector secrets backend](/configuration-and-deployment/collectors-secrets-backend.md) for the supported scope.

## Known limitations

* **Static credentials in YAML are not the recommended path.** IAM roles via `aws_role_arn` (or pod-identity / instance-profile) avoid leaking long-lived keys into config files. The reference Compose template wires credentials only as env-vars.
* **`s3.datasets` field rejected at validation.** Use `dataset_config` (singular). The collector errors out on startup if `datasets:` is present — see the `validate_datasets` validator on `S3Plugin`.
* **No foreign-key / ERD extraction** in any AWS adapter — that capability is PostgreSQL- and Snowflake-only on the generic collector.
* **`kinesis` requires `aws_account_id` explicitly** — it's the only field outside the common AWS auth set that is required for `kinesis`.
* **`sagemaker` re-declares the AWS auth fields without defaults**, making them effectively required even when you intend to inherit from the boto3 credential chain. Provide each field explicitly (use `null` when you want the boto3 fallback). This is an asymmetry with the rest of the adapters in the collector.
* **`s3_delta.delta_tables` is a single object**, not a list. To catalog multiple Delta tables in one collector, use multiple `s3_delta` plugin entries. This is an asymmetry with `gcs_delta` on the GCP collector, which takes a list of `delta_tables`.
* **`s3.dataset_config` is a single object**, not a list. To catalog multiple S3 buckets, use multiple `s3` plugin entries.

## Where to next

* [`odd-collector`](/integrations/integrations/odd-collector.md) — generic collector with PostgreSQL, Snowflake, etc.
* [`odd-collector-azure`](/integrations/integrations/odd-collector-azure.md) / [`odd-collector-gcp`](/integrations/integrations/odd-collector-gcp.md) — sibling cloud collectors.
* [Build and run ODD Collectors](/developer-guides/build-and-run/build-and-run-odd-collectors.md) — common SDK schema and from-source build flow.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/integrations/integrations/odd-collector-aws.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
