> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/features/data-discovery/custom-metadata.md).

# Custom metadata

Every data entity in the catalog carries a **Custom metadata** panel — a key/value list operators populate to capture per-entity facts the source system does not provide (cost-centre allocation, regulated-PII flag, downstream-consumer team, freshness SLA, anything you want to attach as a typed annotation). Two halves sit behind the panel: a **field catalogue** that defines the available keys for the deployment, and a per-entity **value set** that binds each key to a value for one specific entity.

This page covers both halves — how the catalogue is populated, how per-entity values are authored, how the two origins (operator-curated **INTERNAL** vs collector-ingested **EXTERNAL**) differ, the permission gates, and the load-bearing caveats: the silent-no-op write path, the dropped `active` flag, the absence of server-side type validation, the API path that overwrites EXTERNAL values, the unauthenticated catalogue enumeration, and the absence of activity-feed events on metadata mutations.

## Where to find it

Open any data entity's detail page → **Overview** tab. The **Metadata** panel renders in the main column below the [description](/features/data-discovery/entity-description.md) and [attachments](/features/data-discovery/attachments.md) panels. The panel shows a combined list — operator-curated fields and collector-ingested fields rendered side by side, distinguished by the field's **Origin** badge.

Operators with the [`DATA_ENTITY_CUSTOM_METADATA_CREATE`](/configuration-and-deployment/enable-security/authorization/permissions.md) permission see an **Add** affordance for assigning a new field value to the entity; the affordance opens an autocomplete picker over the deployment's field catalogue (with the option to type a new field name on miss — see the catalogue side-channel caveat below). Operators with `_UPDATE` see an in-place edit affordance on each existing value; operators with `_DELETE` see a remove affordance.

## The two halves

**Field catalogue (deployment-scoped vocabulary).** A `metadata_field` table holds one row per field name + type pair the deployment knows about. The catalogue is shared across every entity in the deployment — the same `cost_centre` field name resolves to the same `metadata_field.id` for every entity that uses it. Reading the catalogue is what powers the autocomplete picker when an operator adds a new value to an entity, and the catalogue read is the **only** read of the metadata surface that does not go through the per-entity endpoint.

**Per-entity value set (entity-scoped binding).** A `metadata_field_value` table holds one row per `(data_entity_id, metadata_field_id)` pair — the binding that says "this entity has this field set to this value." Operators author each row through the per-entity write endpoints listed below; the read side ships the bound values back as part of the entity's detail-page payload.

## Field types and origin

Fields carry two pieces of metadata beyond the name: a **type** (the value's shape) and an **origin** (who owns the field's existence in the catalogue).

**Supported field types** — the type is set when the field is first minted in the catalogue and is immutable thereafter. Seven types are selectable when authoring a field:

* `STRING` — free-text.
* `INTEGER`, `FLOAT` — numeric.
* `BOOLEAN` — `true` / `false`.
* `DATETIME` — ISO-8601 timestamp.
* `ARRAY` — a list of strings (each element rendered as a chip in the value display).
* `JSON` — arbitrary JSON document (rendered as collapsible tree in the value display).

The platform's internal type enum carries one more value, `UNKNOWN`, beyond the seven above (the public API `MetadataFieldType` enum exposes only the seven). `UNKNOWN` is a defensive fallback the ingestion parser assigns when it cannot classify a collector-supplied value into one of the seven shapes — it is not offered when authoring a field and is not a type an operator chooses.

The per-entity value side stores the value as a JSON-encoded string regardless of declared type; the type drives the value-editor's input shape and the display formatter. **The API does not enforce the declared type on write** — see the caveat below.

**Origin** — two values, mutually exclusive per field:

* **INTERNAL** — operator-curated. The field was minted by a catalog user authoring a value on an entity (see the auto-create-on-miss side-channel below) or by an explicit catalogue mutation. INTERNAL fields are the only ones surfaced in the autocomplete picker on the Add-value affordance.
* **EXTERNAL** — collector-ingested. The field came in attached to an entity via the ingestion pipeline (a collector's adapter mapped a source-side property into ODD's metadata schema). EXTERNAL fields render alongside INTERNAL fields in the entity's Metadata panel but cannot be edited or added from the **UI** — they are owned by the source system and refreshed on every ingestion pass. The **API** does not enforce that boundary, though — see the EXTERNAL-origin caveat below.

When an operator views an entity's Metadata panel, both origins render in the same list. The UI distinguishes them with an inline origin marker; the Add affordance only writes INTERNAL.

## Field naming is case-sensitive

Field names in the catalogue are **case-sensitive**. `cost_centre` and `Cost_centre` are two distinct rows in `metadata_field`; an operator who types `Cost_centre` into the autocomplete picker, sees no match, and accepts the auto-create-on-miss side-channel (below) mints a parallel field that operators searching for `cost_centre` will not find. The autocomplete query uses a case-insensitive substring match for the suggestion list, but the resolution against the catalogue is by exact-string match — autocomplete saves a few keystrokes; it does not protect against case-drift duplicates.

Treat field names as a controlled vocabulary that benefits from a documented naming convention (`snake_case` is the most common in deployments we have seen). A naming-convention drift is the most common source of "I added this field on entity X yesterday but I can't find it in the autocomplete on entity Y today" reports.

## Authoring per-entity values

Three operations exist on the per-entity side, each gated by a distinct permission:

| Operation                                                                                      | Endpoint                                                     | Permission                           |
| ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------ |
| Create one or more new field values on an entity (each carrying the field name + type + value) | `POST /api/dataentities/{id}/metadata`                       | `DATA_ENTITY_CUSTOM_METADATA_CREATE` |
| Update an existing field value on an entity (by field id)                                      | `PUT /api/dataentities/{id}/metadata/{metadata_field_id}`    | `DATA_ENTITY_CUSTOM_METADATA_UPDATE` |
| Delete a field value from an entity (by field id)                                              | `DELETE /api/dataentities/{id}/metadata/{metadata_field_id}` | `DATA_ENTITY_CUSTOM_METADATA_DELETE` |

The Create path takes a list of field objects (`name`, `type`, `value`) in the request body — each entry either resolves against an existing INTERNAL field in the catalogue (matched by exact name + type) or **mints a new INTERNAL field** in the catalogue on the spot. The Update and Delete paths are by field id and operate on the per-entity value row only — they never touch the catalogue.

There is no operator-facing catalogue-maintenance UI: INTERNAL field rows are created as a side effect of the Create-value path, and the catalogue read endpoint (`GET /api/metadata/fields`) returns the full INTERNAL set with optional substring-filter parameter for autocomplete.

## Permissions

Three permissions gate the per-entity surface; the catalogue read is **not gated** by a custom-metadata permission (see the unauthenticated-enumeration caveat below).

| Permission                                                                                                         | What it gates                                                                                                                                               |
| ------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`DATA_ENTITY_CUSTOM_METADATA_CREATE`](/configuration-and-deployment/enable-security/authorization/permissions.md) | The Add affordance + `POST /api/dataentities/{id}/metadata`. Includes the auto-create-on-miss side-channel that mints new INTERNAL fields in the catalogue. |
| [`DATA_ENTITY_CUSTOM_METADATA_UPDATE`](/configuration-and-deployment/enable-security/authorization/permissions.md) | The edit affordance on each value row + `PUT /api/dataentities/{id}/metadata/{metadata_field_id}`.                                                          |
| [`DATA_ENTITY_CUSTOM_METADATA_DELETE`](/configuration-and-deployment/enable-security/authorization/permissions.md) | The remove affordance on each value row + `DELETE /api/dataentities/{id}/metadata/{metadata_field_id}`.                                                     |

All three are scoped to the data entity in the URL — granting `_CREATE` on entity X does not grant it on entity Y. The catalogue read (`GET /api/metadata/fields`) is reachable by every authenticated caller and, under `auth.type=DISABLED`, by every anonymous caller (see [DISABLED authentication](/configuration-and-deployment/enable-security/authentication/disabled-authentication.md)).

## Activity trail

Custom-metadata mutations **emit no Activity Feed event today.** The [`ActivityEventTypeDto`](https://github.com/opendatadiscovery/odd-platform/blob/main/odd-platform-api/src/main/java/org/opendatadiscovery/oddplatform/dto/activity/ActivityEventTypeDto.java) enum carries `CUSTOM_METADATA_CREATED`, `CUSTOM_METADATA_UPDATED`, and `CUSTOM_METADATA_DELETED` values, but no code path emits any of them — they are dead enum entries. Operators looking at an entity's Activity tab will see description updates, tag updates, owner updates, and term assignments, but not metadata-value changes. See the caveat in the next section for the forensic-silence implications.

## Known limitations and operator caveats

{% hint style="warning" %}
**The Update path is a silent no-op when the value row does not pre-exist.** The platform's `PUT /api/dataentities/{id}/metadata/{metadata_field_id}` endpoint declares its operation as `upsertDataEntityMetadataFieldValue` in the OpenAPI spec, but the repository call behind it is a pure SQL `UPDATE` against `metadata_field_value` keyed on `(data_entity_id, metadata_field_id)` — no `INSERT ... ON CONFLICT` fallback. If the row does not exist (the field has never been assigned a value on this entity), the UPDATE matches zero rows, returns nothing, and the controller propagates an empty `Mono`. The HTTP response is `200 OK` with an empty body; the UI toast reads "Metadata successfully updated." even though nothing was written.

**What this means in practice.** Reconciliation pipelines that issue PUTs assuming upsert semantics (the operationId implies replace-or-create) silently lose writes for any field not previously assigned on the target entity. The bootstrap path that does work is `POST /api/dataentities/{id}/metadata` (the Create endpoint), which mints the field on the catalogue and binds the value to the entity in one call.

**Mitigation today.** Issue a GET preflight against the entity's metadata before any PUT — if the field id is not in the response, switch to a POST. The platform-side fix (true upsert semantics, or rejecting the PUT with a meaningful error when no row matches) is on the roadmap; until it lands, the doc-only preflight is the operator-side workaround.
{% endhint %}

{% hint style="warning" %}
**Every successful Update silently sets the `active` column on the value row to `NULL`.** The service layer constructs the persistence pojo without calling `setActive(...)`; the Java `Boolean` field stays null. The repository's UPDATE writes the null verbatim into the row's `active` column, overwriting whatever value the row carried previously. The column's database `DEFAULT TRUE` only fires on INSERT — it does not protect UPDATE — so every edited row ends up with `active IS NULL`.

**What this means in practice.** Any downstream code (in the platform, in a future feature, or in an external query) that filters `WHERE active = TRUE` will silently drop edited rows. The currently-shipping platform code does not appear to filter on `active` for the value rows, so this caveat is latent rather than user-visible today — but it is a foot-gun for anyone querying the table directly, building an external analytics view over it, or relying on future platform code to honour the column. Use `WHERE active IS DISTINCT FROM FALSE` (treats null as active) rather than `WHERE active = TRUE` when querying the table outside the platform's own service code.

The platform-side fix is to either set `setActive(true)` on the service-layer pojo before the UPDATE, or to exclude the `active` column from the UPDATE's SET clause entirely.
{% endhint %}

{% hint style="warning" %}
**The API does not validate a value against its field's declared type.** The declared type (`INTEGER`, `BOOLEAN`, `DATETIME`, and so on) drives the UI value editor and the display formatter, but the write endpoints store whatever string the request body carries — there is no server-side type check. A `POST` or `PUT` can persist `"not a number"` on an `INTEGER` field or `"maybe"` on a `BOOLEAN` field, and the platform accepts it with a `200`.

**What this means in practice.** The UI editor is the only thing enforcing type shape; an SDK client, a `curl`, or a reconciliation pipeline writing directly to the API can land type-violating values that then render through a formatter expecting the declared type. Validate the value shape on the writer side before the call — the platform will not reject a mismatch for you.
{% endhint %}

{% hint style="warning" %}
**The API lets an operator overwrite an EXTERNAL (collector-ingested) value; only the UI hides it.** The UI suppresses edit affordances on EXTERNAL fields, but the per-entity write endpoints (`POST` / `PUT /api/dataentities/{id}/metadata`) do not check the field's origin. An operator with `DATA_ENTITY_CUSTOM_METADATA_UPDATE` can write a value onto an EXTERNAL field through the API directly.

**What this means in practice.** The overwrite is **not durable** — the next ingestion pass for that entity replaces the collector-owned value again, so a hand-edited EXTERNAL value silently reverts on the next collector run. Treat EXTERNAL fields as read-only in any integration even though the API does not enforce it; if a value needs to change permanently, change it at the source the collector ingests from.
{% endhint %}

{% hint style="danger" %}
**The catalogue read is unauthenticated and unbounded, and the per-entity Create path mints new INTERNAL fields visible to every authenticated user.** Two compounding shapes here:

* `GET /api/metadata/fields` has **no entry** in the platform's security rules — the path falls through to the default `.authenticated()` matcher, so every authenticated caller can list the catalogue. Under `auth.type=DISABLED` (no authentication required at all), the endpoint is reachable by every anonymous caller too. The response carries every INTERNAL field name in the deployment.
* The same endpoint's SQL has no `LIMIT`, no `OFFSET`, and no `ORDER BY` clause. Every call returns the entire catalogue. The response's `PageInfo` is theatre — `total` is computed as `items.size()` on every call (so it always equals the response length, not the catalogue size) and `hasNext` is hardcoded `false`. SDK clients written from the OpenAPI spec build "load more" infinite-scroll workflows that never fire; the catalogue ships as a single response per call.

**What this means in practice.** Deployments with operator-named field schemas (`finance_cost_centre`, `marketing_attribution`, `pii_redaction_rule`, `aml_review_status`, anything that names team-internal taxonomy in the field name) leak the full vocabulary to every authenticated user — and to every anonymous user under DISABLED. A user with `DATA_ENTITY_CUSTOM_METADATA_CREATE` on a single entity can mint a new INTERNAL field through the Create-value path that becomes visible to every other user on their next autocomplete keystroke.

Production deployments with 10K+ INTERNAL field rows pay a 1–2 MB response on every autocomplete keystroke (the endpoint accepts a query parameter, but the filter is applied server-side after fetching the unbounded result set; there is no DB-level early termination).

**Mitigation today.** Treat custom metadata field names as deployment-public. If a field name itself encodes sensitive information about a team's workflow or taxonomy, do not put it in custom metadata — author it in a system the platform does not enumerate. Grant `DATA_ENTITY_CUSTOM_METADATA_CREATE` only to operators trusted to mint new vocabulary; the auto-create-on-miss side-channel makes the permission an indirect grant of catalogue-write access. The platform-side fix (introducing a `CUSTOM_METADATA_FIELD_READ` permission + adding a security rule for the catalogue endpoint + paginating the SQL + computing real `total` and `hasNext`) is on the roadmap.
{% endhint %}

{% hint style="info" %}
**Custom-metadata mutations leave no audit trail in the Activity Feed.** Three dead enum values exist in the platform's `ActivityEventTypeDto` — `CUSTOM_METADATA_CREATED`, `CUSTOM_METADATA_UPDATED`, `CUSTOM_METADATA_DELETED` — but no code path emits any of them. The entity's Activity tab shows other mutations (description, tags, owners, terms) but not metadata-value writes or deletes. Same forensic-silence pattern as the [DEG-membership write paths](/features/data-discovery/groups-domains.md#managing-deg-membership) and the [`DATA_ENTITY_RELATION_UPDATED` dead enum](/features/active-platform-features/activity-feed.md#known-caveats).

For compliance teams that need a who-changed-what-when trail on custom metadata, instrument it externally — an API-gateway access log records the authenticated POST / PUT / DELETE calls; the PostgreSQL WAL via `pgaudit` records the `metadata_field_value` row writes. See [Audit trail scope](/configuration-and-deployment/enable-security/audit-trail-scope.md) for the compensating-controls catalogue across every silent-mutation surface the platform carries today.
{% endhint %}

## Where to next

* [Entity description](/features/data-discovery/entity-description.md) — the sibling per-entity Overview surface; same Add / Edit affordance shape but a single free-text Markdown field rather than a typed key/value catalogue. Carries its own load-bearing caveat (no write-time HTML sanitisation across six Markdown surfaces).
* [Data entity detail page](/features/data-discovery/entity-detail-page.md) — the parent container for the Metadata panel; covers how the panel composes with the rest of the Overview tab.
* [Activity Feed](/features/active-platform-features/activity-feed.md) — the audit trail for entity-level mutations, and the canonical home for the forensic-silence framing that custom-metadata writes share with DEG-membership writes.
* [Audit trail scope](/configuration-and-deployment/enable-security/audit-trail-scope.md) — the compliance-facing summary of what the platform audits today and what it does not, including the compensating controls for the silent-mutation surfaces.
* [Permissions](/configuration-and-deployment/enable-security/authorization/permissions.md) — the canonical home for the three `DATA_ENTITY_CUSTOM_METADATA_*` permissions and the full per-resource gating story.