# Collector secrets backend

By default, every value in a collector's `collector_config.yaml` — the Platform token, per-plugin database passwords, cloud-provider credentials — lives in plaintext on disk. That is acceptable for local development, but for production deployments the collector SDK can load sensitive values from an external **secrets backend** (also referred to as the **alternative secrets backend**) instead, leaving only the backend pointer in the YAML file.

## Supported providers

| Provider                            | `secrets_backend.provider` value  |
| ----------------------------------- | --------------------------------- |
| AWS Systems Manager Parameter Store | `AWSSystemsManagerParameterStore` |

Only one provider is available today; additional providers can be plugged in via the `BaseSecretsBackend` abstract class in `odd-collector-sdk`.

## How the backend is consumed

When a `secrets_backend` block is present in `collector_config.yaml`, the loader fetches two things from the backend:

1. **Collector settings** — a single parameter whose YAML value is merged into the top-level collector settings (for example `token`, `platform_host_url`, `default_pulling_interval`).
2. **Plugin settings** — one parameter per plugin under a shared prefix; each parameter's YAML value is an individual plugin block.

The local YAML file is still parsed, but **values from the secrets backend win**. The local `plugins` list can contribute plugins whose names do not collide with plugins loaded from the backend, and the local top-level settings can contribute keys that are not set in the backend — but they can never override a key that is set in the backend. Treat the local file as a fallback, not an override.

## Configuration reference

All keys below live under `secrets_backend:` in `collector_config.yaml`.

| Key                                 | Type   | Default                                    | Description                                                          |
| ----------------------------------- | ------ | ------------------------------------------ | -------------------------------------------------------------------- |
| `provider`                          | string | *(required)*                               | Backend provider code. Must be `AWSSystemsManagerParameterStore`.    |
| `region_name`                       | string | *(resolved from env / IMDS — see below)*   | AWS region the SSM parameters live in.                               |
| `collector_settings_parameter_name` | string | `/odd/collector_config/collector_settings` | Full SSM parameter name holding the top-level collector settings.    |
| `collector_plugins_prefix`          | string | `/odd/collector_config/plugins`            | SSM parameter prefix under which one parameter per plugin is stored. |

The `region_name` is resolved in this order: `AWS_REGION` environment variable → `region_name` in `collector_config.yaml` → the EC2 IMDS (for collectors running on EC2 or EKS). If none of these resolves to a region, the collector process still starts, but the first SSM call raises `botocore.exceptions.NoRegionError` and the collector exits before ingestion begins. Set the region explicitly for non-EC2 deployments.

{% hint style="info" %}
The SDK's settings loader uses `extra="allow"`, so any additional keys under `secrets_backend:` are forwarded to the provider constructor. Future providers may introduce their own keys without needing a schema change here.
{% endhint %}

## Parameter naming convention

For the AWS SSM provider, each plugin in the collector is stored as a separate SSM parameter under `collector_plugins_prefix`. The parameter's **name** is not significant to the loader — the SDK fetches every parameter under the prefix with a recursive `GetParametersByPath` call and parses each value as a plugin YAML block. The convention shown in the SSM example below uses the plugin `name` as the final path segment purely for operator readability; what matters is that each plugin's YAML is stored as its own parameter.

The **value** of each parameter is a YAML document with the same schema as a single entry in the `plugins:` list in `collector_config.yaml`.

## Worked example

### Step 1 — Store the collector settings in SSM

Create one `SecureString` parameter named `/odd/collector_config/collector_settings` with this value:

```yaml
default_pulling_interval: 60
platform_host_url: https://odd.internal.example.com
token: <platform_token>
```

### Step 2 — Store each plugin in SSM

For every data source the collector should pull, create one parameter under `/odd/collector_config/plugins/`. For example, a PostgreSQL plugin at `/odd/collector_config/plugins/postgresql_adapter`:

```yaml
type: postgresql
name: postgresql_adapter
description: ""
database: warehouse
host: db.internal.example.com
port: 5432
user: odd_ro
password: <db_password>
```

### Step 3 — Point the collector at the backend

Keep only the backend pointer (and anything you intentionally want the local file to contribute) in `collector_config.yaml`:

```yaml
secrets_backend:
  provider: AWSSystemsManagerParameterStore
  region_name: eu-central-1
  collector_settings_parameter_name: /odd/collector_config/collector_settings
  collector_plugins_prefix: /odd/collector_config/plugins
```

### Step 4 — Run the collector

On startup the collector will:

1. Parse `collector_config.yaml`.
2. Connect to SSM using the configured region.
3. Fetch `/odd/collector_config/collector_settings` and merge its values into the top-level settings (SSM wins on conflicts).
4. Fetch every parameter under `/odd/collector_config/plugins/` and merge its plugins into the plugin list (SSM wins on name conflicts).
5. Validate the resulting config and start pulling.

## Required IAM permissions

The IAM identity the collector runs under — an EKS service account role, an EC2 instance role, or a user with static credentials — needs at least these permissions on the parameters you created:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameter",
        "ssm:GetParametersByPath"
      ],
      "Resource": [
        "arn:aws:ssm:{region}:{account_id}:parameter/odd/collector_config/collector_settings",
        "arn:aws:ssm:{region}:{account_id}:parameter/odd/collector_config/plugins/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": "arn:aws:kms:{region}:{account_id}:key/{kms_key_id}"
    }
  ]
}
```

The `kms:Decrypt` statement is only needed when the parameters were created as `SecureString` with a customer-managed KMS key (or with the AWS-managed `alias/aws/ssm` key). Plain `String` parameters do not require KMS permissions.

## Known limitations

The AWS SSM backend is a thin wrapper around `boto3`. The following behaviors come from the SDK defaults and cannot currently be overridden from `collector_config.yaml`. Review each against your deployment before relying on the backend in production.

{% hint style="danger" %}
**The backend loads at most 10 plugin parameters from SSM.** The SDK calls `ssm.get_parameters_by_path(..., Recursive=True)` once and reads `response["Parameters"]` without paginating. The AWS default `MaxResults` for this call is 10. A collector whose `collector_plugins_prefix` contains 11 or more plugin parameters will silently start with only the first 10; everything past that point is dropped with no error and no log warning.

Mitigations until the SDK adds pagination:

* Keep the plugin count under 10 per collector. Split larger fleets into multiple collectors, each with its own prefix.
* After deploying, compare the plugin count in SSM against the plugin count logged at collector startup — a mismatch means you have hit the cap.
  {% endhint %}

{% hint style="warning" %}
**No custom SSM endpoint can be configured.** The SDK constructs the SSM client with `boto3.client("ssm", region_name=...)` and does not expose `endpoint_url`. Deployments that need to reach SSM via a VPC interface endpoint, a private DNS override, LocalStack, or any non-default endpoint must rely on system-level DNS / network configuration — there is no `secrets_backend.endpoint_url` key to set.
{% endhint %}

{% hint style="warning" %}
**No timeout or retry overrides.** The backend does not pass a `botocore.config.Config`, so the SDK's default connect timeout, read timeout, and retry mode apply. On a partitioned or slow SSM endpoint the collector can block on startup for the full default timeout before failing. There is no `secrets_backend.connect_timeout` / `read_timeout` / `retries` key.
{% endhint %}

{% hint style="warning" %}
**SSM parameter values are parsed as YAML without schema validation at fetch time.** A malformed YAML document in any parameter under `collector_plugins_prefix` will raise during `safe_load` and abort collector startup with a parsing error, not a message pointing at the offending parameter. Treat SSM parameter edits the same way you would treat edits to `collector_config.yaml` — validate the YAML locally before putting the parameter.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/configuration-and-deployment/collectors-secrets-backend.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
