> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/features/data-discovery/custom-metadata.md).

# Custom metadata

Every data entity in the catalog carries a **Custom metadata** panel — a key/value list operators populate to capture per-entity facts the source system does not provide (cost-centre allocation, regulated-PII flag, downstream-consumer team, freshness SLA, anything you want to attach as a typed annotation). Two halves sit behind the panel: a **field catalogue** that defines the available keys for the deployment, and a per-entity **value set** that binds each key to a value for one specific entity.

This page covers both halves — how the catalogue is populated, how per-entity values are authored, how the two origins (operator-curated **INTERNAL** vs collector-ingested **EXTERNAL**) differ, the permission gates, and the load-bearing caveats: the silent-no-op write path, the dropped `active` flag, the absence of server-side type validation, the API path that overwrites EXTERNAL values, the unauthenticated catalogue enumeration, and the absence of activity-feed events on metadata mutations.

## Where to find it

Open any data entity's detail page → **Overview** tab. The **Metadata** panel renders in the main column below the [description](/features/data-discovery/entity-description.md) and [attachments](/features/data-discovery/attachments.md) panels. The panel shows a combined list — operator-curated fields and collector-ingested fields rendered side by side, distinguished by the field's **Origin** badge.

Operators with the [`DATA_ENTITY_CUSTOM_METADATA_CREATE`](/configuration-and-deployment/enable-security/authorization/permissions.md) permission see an **Add** affordance for assigning a new field value to the entity; the affordance opens an autocomplete picker over the deployment's field catalogue (with the option to type a new field name on miss — see the catalogue side-channel caveat below). Operators with `_UPDATE` see an in-place edit affordance on each existing value; operators with `_DELETE` see a remove affordance.

## The two halves

**Field catalogue (deployment-scoped vocabulary).** A `metadata_field` table holds one row per field name + type pair the deployment knows about. The catalogue is shared across every entity in the deployment — the same `cost_centre` field name resolves to the same `metadata_field.id` for every entity that uses it. Reading the catalogue is what powers the autocomplete picker when an operator adds a new value to an entity, and the catalogue read is the **only** read of the metadata surface that does not go through the per-entity endpoint.

**Per-entity value set (entity-scoped binding).** A `metadata_field_value` table holds one row per `(data_entity_id, metadata_field_id)` pair — the binding that says "this entity has this field set to this value." Operators author each row through the per-entity write endpoints listed below; the read side ships the bound values back as part of the entity's detail-page payload.

## Field types and origin

Fields carry two pieces of metadata beyond the name: a **type** (the value's shape) and an **origin** (who owns the field's existence in the catalogue).

**Supported field types** — the type is set when the field is first minted in the catalogue and is immutable thereafter. Seven types are selectable when authoring a field:

* `STRING` — free-text.
* `INTEGER`, `FLOAT` — numeric.
* `BOOLEAN` — `true` / `false`.
* `DATETIME` — ISO-8601 timestamp.
* `ARRAY` — a list of strings (each element rendered as a chip in the value display).
* `JSON` — arbitrary JSON document (rendered as collapsible tree in the value display).

The platform's internal type enum carries one more value, `UNKNOWN`, beyond the seven above (the public API `MetadataFieldType` enum exposes only the seven). `UNKNOWN` is a defensive fallback the ingestion parser assigns when it cannot classify a collector-supplied value into one of the seven shapes — it is not offered when authoring a field and is not a type an operator chooses.

The per-entity value side stores the value as a JSON-encoded string regardless of declared type; the type drives the value-editor's input shape and the display formatter. **The API does not enforce the declared type on write** — see the caveat below.

**Origin** — two values, mutually exclusive per field:

* **INTERNAL** — operator-curated. The field was minted by a catalog user authoring a value on an entity (see the auto-create-on-miss side-channel below) or by an explicit catalogue mutation. INTERNAL fields are the only ones surfaced in the autocomplete picker on the Add-value affordance.
* **EXTERNAL** — collector-ingested. The field came in attached to an entity via the ingestion pipeline (a collector's adapter mapped a source-side property into ODD's metadata schema). EXTERNAL fields render alongside INTERNAL fields in the entity's Metadata panel but cannot be edited or added from the **UI** — they are owned by the source system and refreshed on every ingestion pass. The **API** does not enforce that boundary, though — see the EXTERNAL-origin caveat below.

When an operator views an entity's Metadata panel, both origins render in the same list. The UI distinguishes them with an inline origin marker; the Add affordance only writes INTERNAL.

## Field naming is case-sensitive

Field names in the catalogue are **case-sensitive**. `cost_centre` and `Cost_centre` are two distinct rows in `metadata_field`; an operator who types `Cost_centre` into the autocomplete picker, sees no match, and accepts the auto-create-on-miss side-channel (below) mints a parallel field that operators searching for `cost_centre` will not find. The autocomplete query uses a case-insensitive substring match for the suggestion list, but the resolution against the catalogue is by exact-string match — autocomplete saves a few keystrokes; it does not protect against case-drift duplicates.

Treat field names as a controlled vocabulary that benefits from a documented naming convention (`snake_case` is the most common in deployments we have seen). A naming-convention drift is the most common source of "I added this field on entity X yesterday but I can't find it in the autocomplete on entity Y today" reports.

## Authoring per-entity values

Three operations exist on the per-entity side, each gated by a distinct permission:

| Operation                                                                                      | Endpoint                                                     | Permission                           |
| ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------ |
| Create one or more new field values on an entity (each carrying the field name + type + value) | `POST /api/dataentities/{id}/metadata`                       | `DATA_ENTITY_CUSTOM_METADATA_CREATE` |
| Update an existing field value on an entity (by field id)                                      | `PUT /api/dataentities/{id}/metadata/{metadata_field_id}`    | `DATA_ENTITY_CUSTOM_METADATA_UPDATE` |
| Delete a field value from an entity (by field id)                                              | `DELETE /api/dataentities/{id}/metadata/{metadata_field_id}` | `DATA_ENTITY_CUSTOM_METADATA_DELETE` |

The Create path takes a list of field objects (`name`, `type`, `value`) in the request body — each entry either resolves against an existing INTERNAL field in the catalogue (matched by exact name + type) or **mints a new INTERNAL field** in the catalogue on the spot. The Update and Delete paths are by field id and operate on the per-entity value row only — they never touch the catalogue.

There is no operator-facing catalogue-maintenance UI: INTERNAL field rows are created as a side effect of the Create-value path, and the catalogue read endpoint (`GET /api/metadata/fields`) returns the full INTERNAL set with optional substring-filter parameter for autocomplete.

## Permissions

Three permissions gate the per-entity surface; the catalogue read is **not gated** by a custom-metadata permission (see the unauthenticated-enumeration caveat below).

| Permission                                                                                                         | What it gates                                                                                                                                               |
| ------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`DATA_ENTITY_CUSTOM_METADATA_CREATE`](/configuration-and-deployment/enable-security/authorization/permissions.md) | The Add affordance + `POST /api/dataentities/{id}/metadata`. Includes the auto-create-on-miss side-channel that mints new INTERNAL fields in the catalogue. |
| [`DATA_ENTITY_CUSTOM_METADATA_UPDATE`](/configuration-and-deployment/enable-security/authorization/permissions.md) | The edit affordance on each value row + `PUT /api/dataentities/{id}/metadata/{metadata_field_id}`.                                                          |
| [`DATA_ENTITY_CUSTOM_METADATA_DELETE`](/configuration-and-deployment/enable-security/authorization/permissions.md) | The remove affordance on each value row + `DELETE /api/dataentities/{id}/metadata/{metadata_field_id}`.                                                     |

All three are scoped to the data entity in the URL — granting `_CREATE` on entity X does not grant it on entity Y. The catalogue read (`GET /api/metadata/fields`) is reachable by every authenticated caller and, under `auth.type=DISABLED`, by every anonymous caller (see [DISABLED authentication](/configuration-and-deployment/enable-security/authentication/disabled-authentication.md)).

## Activity trail

Custom-metadata mutations **emit no Activity Feed event today.** The [`ActivityEventTypeDto`](https://github.com/opendatadiscovery/odd-platform/blob/main/odd-platform-api/src/main/java/org/opendatadiscovery/oddplatform/dto/activity/ActivityEventTypeDto.java) enum carries `CUSTOM_METADATA_CREATED`, `CUSTOM_METADATA_UPDATED`, and `CUSTOM_METADATA_DELETED` values, but no code path emits any of them — they are dead enum entries. Operators looking at an entity's Activity tab will see description updates, tag updates, owner updates, and term assignments, but not metadata-value changes. See the caveat in the next section for the forensic-silence implications.

## Known limitations and operator caveats

{% hint style="warning" %}
**The Update path is a silent no-op when the value row does not pre-exist.** The platform's `PUT /api/dataentities/{id}/metadata/{metadata_field_id}` endpoint declares its operation as `upsertDataEntityMetadataFieldValue` in the OpenAPI spec, but the repository call behind it is a pure SQL `UPDATE` against `metadata_field_value` keyed on `(data_entity_id, metadata_field_id)` — no `INSERT ... ON CONFLICT` fallback. If the row does not exist (the field has never been assigned a value on this entity), the UPDATE matches zero rows, returns nothing, and the controller propagates an empty `Mono`. The HTTP response is `200 OK` with an empty body; the UI toast reads "Metadata successfully updated." even though nothing was written.

**What this means in practice.** Reconciliation pipelines that issue PUTs assuming upsert semantics (the operationId implies replace-or-create) silently lose writes for any field not previously assigned on the target entity. The bootstrap path that does work is `POST /api/dataentities/{id}/metadata` (the Create endpoint), which mints the field on the catalogue and binds the value to the entity in one call.

**Mitigation today.** Issue a GET preflight against the entity's metadata before any PUT — if the field id is not in the response, switch to a POST. The platform-side fix (true upsert semantics, or rejecting the PUT with a meaningful error when no row matches) is on the roadmap; until it lands, the doc-only preflight is the operator-side workaround.
{% endhint %}

{% hint style="warning" %}
**Every successful Update silently sets the `active` column on the value row to `NULL`.** The service layer constructs the persistence pojo without calling `setActive(...)`; the Java `Boolean` field stays null. The repository's UPDATE writes the null verbatim into the row's `active` column, overwriting whatever value the row carried previously. The column's database `DEFAULT TRUE` only fires on INSERT — it does not protect UPDATE — so every edited row ends up with `active IS NULL`.

**What this means in practice.** Any downstream code (in the platform, in a future feature, or in an external query) that filters `WHERE active = TRUE` will silently drop edited rows. The currently-shipping platform code does not appear to filter on `active` for the value rows, so this caveat is latent rather than user-visible today — but it is a foot-gun for anyone querying the table directly, building an external analytics view over it, or relying on future platform code to honour the column. Use `WHERE active IS DISTINCT FROM FALSE` (treats null as active) rather than `WHERE active = TRUE` when querying the table outside the platform's own service code.

The platform-side fix is to either set `setActive(true)` on the service-layer pojo before the UPDATE, or to exclude the `active` column from the UPDATE's SET clause entirely.
{% endhint %}

{% hint style="warning" %}
**The API does not validate a value against its field's declared type.** The declared type (`INTEGER`, `BOOLEAN`, `DATETIME`, and so on) drives the UI value editor and the display formatter, but the write endpoints store whatever string the request body carries — there is no server-side type check. A `POST` or `PUT` can persist `"not a number"` on an `INTEGER` field or `"maybe"` on a `BOOLEAN` field, and the platform accepts it with a `200`.

**What this means in practice.** The UI editor is the only thing enforcing type shape; an SDK client, a `curl`, or a reconciliation pipeline writing directly to the API can land type-violating values that then render through a formatter expecting the declared type. Validate the value shape on the writer side before the call — the platform will not reject a mismatch for you.
{% endhint %}

{% hint style="warning" %}
**The API lets an operator overwrite an EXTERNAL (collector-ingested) value; only the UI hides it.** The UI suppresses edit affordances on EXTERNAL fields, but the per-entity write endpoints (`POST` / `PUT /api/dataentities/{id}/metadata`) do not check the field's origin. An operator with `DATA_ENTITY_CUSTOM_METADATA_UPDATE` can write a value onto an EXTERNAL field through the API directly.

**What this means in practice.** The overwrite is **not durable** — the next ingestion pass for that entity replaces the collector-owned value again, so a hand-edited EXTERNAL value silently reverts on the next collector run. Treat EXTERNAL fields as read-only in any integration even though the API does not enforce it; if a value needs to change permanently, change it at the source the collector ingests from.
{% endhint %}

{% hint style="danger" %}
**The catalogue read is unauthenticated and unbounded, and the per-entity Create path mints new INTERNAL fields visible to every authenticated user.** Two compounding shapes here:

* `GET /api/metadata/fields` has **no entry** in the platform's security rules — the path falls through to the default `.authenticated()` matcher, so every authenticated caller can list the catalogue. Under `auth.type=DISABLED` (no authentication required at all), the endpoint is reachable by every anonymous caller too. The response carries every INTERNAL field name in the deployment.
* The same endpoint's SQL has no `LIMIT`, no `OFFSET`, and no `ORDER BY` clause. Every call returns the entire catalogue. The response's `PageInfo` is theatre — `total` is computed as `items.size()` on every call (so it always equals the response length, not the catalogue size) and `hasNext` is hardcoded `false`. SDK clients written from the OpenAPI spec build "load more" infinite-scroll workflows that never fire; the catalogue ships as a single response per call.

**What this means in practice.** Deployments with operator-named field schemas (`finance_cost_centre`, `marketing_attribution`, `pii_redaction_rule`, `aml_review_status`, anything that names team-internal taxonomy in the field name) leak the full vocabulary to every authenticated user — and to every anonymous user under DISABLED. A user with `DATA_ENTITY_CUSTOM_METADATA_CREATE` on a single entity can mint a new INTERNAL field through the Create-value path that becomes visible to every other user on their next autocomplete keystroke.

Production deployments with 10K+ INTERNAL field rows pay a 1–2 MB response on every autocomplete keystroke (the endpoint accepts a query parameter, but the filter is applied server-side after fetching the unbounded result set; there is no DB-level early termination).

**Mitigation today.** Treat custom metadata field names as deployment-public. If a field name itself encodes sensitive information about a team's workflow or taxonomy, do not put it in custom metadata — author it in a system the platform does not enumerate. Grant `DATA_ENTITY_CUSTOM_METADATA_CREATE` only to operators trusted to mint new vocabulary; the auto-create-on-miss side-channel makes the permission an indirect grant of catalogue-write access. The platform-side fix (introducing a `CUSTOM_METADATA_FIELD_READ` permission + adding a security rule for the catalogue endpoint + paginating the SQL + computing real `total` and `hasNext`) is on the roadmap.
{% endhint %}

{% hint style="info" %}
**Custom-metadata mutations leave no audit trail in the Activity Feed.** Three dead enum values exist in the platform's `ActivityEventTypeDto` — `CUSTOM_METADATA_CREATED`, `CUSTOM_METADATA_UPDATED`, `CUSTOM_METADATA_DELETED` — but no code path emits any of them. The entity's Activity tab shows other mutations (description, tags, owners, terms) but not metadata-value writes or deletes. Same forensic-silence pattern as the [DEG-membership write paths](/features/data-discovery/groups-domains.md#managing-deg-membership) and the [`DATA_ENTITY_RELATION_UPDATED` dead enum](/features/active-platform-features/activity-feed.md#known-caveats).

For compliance teams that need a who-changed-what-when trail on custom metadata, instrument it externally — an API-gateway access log records the authenticated POST / PUT / DELETE calls; the PostgreSQL WAL via `pgaudit` records the `metadata_field_value` row writes. See [Audit trail scope](/configuration-and-deployment/enable-security/audit-trail-scope.md) for the compensating-controls catalogue across every silent-mutation surface the platform carries today.
{% endhint %}

## Where to next

* [Entity description](/features/data-discovery/entity-description.md) — the sibling per-entity Overview surface; same Add / Edit affordance shape but a single free-text Markdown field rather than a typed key/value catalogue. Carries its own load-bearing caveat (no write-time HTML sanitisation across six Markdown surfaces).
* [Data entity detail page](/features/data-discovery/entity-detail-page.md) — the parent container for the Metadata panel; covers how the panel composes with the rest of the Overview tab.
* [Activity Feed](/features/active-platform-features/activity-feed.md) — the audit trail for entity-level mutations, and the canonical home for the forensic-silence framing that custom-metadata writes share with DEG-membership writes.
* [Audit trail scope](/configuration-and-deployment/enable-security/audit-trail-scope.md) — the compliance-facing summary of what the platform audits today and what it does not, including the compensating controls for the silent-mutation surfaces.
* [Permissions](/configuration-and-deployment/enable-security/authorization/permissions.md) — the canonical home for the three `DATA_ENTITY_CUSTOM_METADATA_*` permissions and the full per-resource gating story.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/features/data-discovery/custom-metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
