> For the complete documentation index, see [llms.txt](https://docs.opendatadiscovery.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opendatadiscovery.org/features/data-discovery/tagging.md).

# Manual Object Tagging

Tags are the platform's lightweight labelling mechanism — apply them to tables, datasets, columns, and quality tests to drive faceted search, organise the catalog by domain or stage, and signal special handling (`PII`, `Important`, `Deprecated`).

This page is the **read-side canonical home** for tagging — how operators apply, browse, and filter by tags. The operator-mutating side (curate the tag vocabulary, set the Important flag, manage namespace-scoped tags) lives at [Management → Tags](/features/management.md).

## What tags are

A tag is an operator-curated label that can be attached to any data entity or to any individual column. Tags drive:

* **Discovery.** The Tag facet on [Search and Filtering](/features/data-discovery/search.md) is one of the seven catalog filters; selecting one or more tags narrows the search to entities carrying any of them.
* **Organisation.** Tags are how operators encode lightweight, cross-cutting groupings that do not justify their own [Data Entity Group](/features/data-discovery/groups-domains.md).
* **Catalog Overview surfacing.** The most-used tags surface as the **Top tags** chip strip on the Catalog Overview home page — one-click filter into the catalog.
* **Important-flag visibility.** A tag flagged as **Important** in [Management → Tags](/features/management.md) is rendered visually distinct on entity pages and result rows, surfacing high-priority labels (`PII`, `Restricted`, `Deprecated`) without requiring operators to scan every tag chip.

## Applying tags

Apply tags both to data assets as a whole and to individual columns of datasets.

![](/files/3r4fQBsjG3S6HpZxYNY4)

The same UI flow applies at both granularities — open the entity (or column) detail surface, click the tag-management control, and pick from the existing tag vocabulary or create a new tag inline.

The platform exposes three `TAG_*` RBAC permissions:

| Permission   | Action                                      |
| ------------ | ------------------------------------------- |
| `TAG_CREATE` | Create a new tag in the catalog vocabulary. |
| `TAG_UPDATE` | Edit a tag's name or its Important flag.    |
| `TAG_DELETE` | Remove a tag from the catalog vocabulary.   |

Plus the cross-cutting `TAG_ASSIGNMENT_UPDATED` activity-event marker emitted whenever tag assignments on an entity change. This is **not** an RBAC permission — it is an entry on the `ActivityEventType` enum, surfaced on the [Activity Feed](/features/active-platform-features/activity-feed.md) for the affected entity rather than gating who can mutate tags.

For the platform-wide permission catalog and how to compose roles around these permissions, see [Permissions](/configuration-and-deployment/enable-security/authorization/permissions.md).

## Tag-driven discovery

Once tags are applied, three discovery paths rely on them:

* [**Search → Tag facet**](/features/data-discovery/search.md) — multi-select filter; results match entities carrying any of the selected tags.
* **Catalog Overview → Top tags** — one-click chip strip filtering into the catalog by the most-used tags across the deployment, rendered on the home page.
* **Tag-based per-entity badges** — tags appear on entity detail pages and in search result rows; Important-flagged tags render visually distinct.

## Operator workflow

The full lifecycle of a tag splits across two surfaces by design:

1. **Author the vocabulary** — go to [Management → Tags](/features/management.md) to create the canonical tag list, set the Important flag where appropriate, and govern the vocabulary across teams.
2. **Apply tags** — on entity detail pages, attach tags from the curated vocabulary to specific entities and columns.
3. **Narrow searches** — use the Tag facet on the Catalog page to find tagged entities.

Tags appear in two places, each for a different user action. This page covers **applying tags to entities and finding entities by tag**. The [Management → Tags](/features/management.md) page is where operators **create and edit the tag vocabulary itself** — renaming, deleting, marking tags as `Important` for higher list ordering. Apply and find by tags here; manage the catalog of tags there.

## Known limitations and operator caveats

A few behaviours of the tagging surface are non-obvious from the UI alone. Each item below states what an operator might assume, what actually happens, and what to do today.

{% hint style="info" %}
**Fixed in 0.28.0 — "Top tags" and the Tag-facet seed list now rank by true popularity.** Releases up to 0.27.x truncated the tag directory to the requested page size **before** computing per-tag usage (the window ordered by `tag.id`), so once the directory exceeded the page size the strip showed the oldest tags re-ranked among themselves and younger, more-used tags never appeared (the empirical case: 35 tags, `size=30` — the 5 youngest absent regardless of usage). As of 0.28.0 the platform aggregates usage over the full directory first, then orders by usage count with tag id as a deterministic tiebreak, then paginates — the endpoint's "sorted by popularity" promise holds past one page and page boundaries are stable. No operator action needed; the pre-0.28.0 workaround (querying tag-to-entity relations directly for governance reviews) is no longer necessary.
{% endhint %}

{% hint style="warning" %}
**Five paths mint new tags into the global tag directory — not only `TAG_CREATE`.** An operator restricting `TAG_CREATE` to "vocabulary stewards" might assume that closes the directory to free-form additions. It does not. Every one of the following surfaces silently creates a new tag row for any name that does not already exist in the catalog:

The four `*_TAGS_UPDATE` permissions and the collector ingestion path all call the platform's shared `getOrCreateTagsByName` helper, which creates rows for any novel names before attaching them to the target entity. Any user holding per-entity / per-term / per-column tag-update on a single entity (or any collector ingestion) can therefore enlarge the global tag vocabulary visible to every user via `GET /api/tags`, the Top-Tags strip, and the Tag-facet seed list.

**Mitigation today:** if vocabulary governance matters in your deployment, withhold the `*_TAGS_UPDATE` permissions from rank-and-file users; do not rely on `TAG_CREATE` alone. The collector ingestion path is not gated by RBAC and cannot be locked down through permissions — restrict it via the upstream collector configuration or by reviewing ingested tags periodically.
{% endhint %}

| Surface                                                      | Permission gating the surface           | Effect on the tag directory                                                              |
| ------------------------------------------------------------ | --------------------------------------- | ---------------------------------------------------------------------------------------- |
| `POST /api/tags`                                             | `TAG_CREATE`                            | The documented path.                                                                     |
| `PUT /api/dataentities/{id}/tags`                            | `DATA_ENTITY_TAGS_UPDATE`               | A novel tag name on an entity mints a new tag in the directory.                          |
| `PUT /api/terms/{id}/tags`                                   | `TERM_TAGS_UPDATE`                      | A novel tag name on a term mints a new tag in the directory.                             |
| `PUT /api/datasetfields/{id}/tags`                           | `DATASET_FIELD_TAGS_UPDATE`             | A novel tag name on a column mints a new tag in the directory.                           |
| Collector ingestion (`ExternalTagIngestionRequestProcessor`) | Collector token (no per-tag permission) | An ingested entity carrying tag names that do not yet exist mints them in the directory. |

{% hint style="info" %}
**Tag names are case-sensitive — `finance` and `Finance` are two separate tags.** The platform stores tag names verbatim. Two tags with names that differ only in capitalisation are distinct rows; entities tagged with one are not surfaced by a Tag-facet filter on the other. When seeding the catalog vocabulary on Management → Tags, settle a casing convention up front (uniform lowercase, Title-case, or all-uppercase) and audit `GET /api/tags` periodically for accidental near-duplicates — particularly after a collector ingestion run, which often emits framework-specific casing different from the operator-curated style.
{% endhint %}

{% hint style="warning" %}
**Tag names are stored verbatim — there is no server-side trim, length cap, or character-set restriction on any write path.** None of the tag write paths normalises the incoming name: the create form's OpenAPI schema (`TagFormData.name`) is a bare string with no `maxLength` or `pattern`, and the shared service helper writes the raw name straight to the directory row. Two consequences beyond the casing caveat above:

* **Leading / trailing whitespace mints a distinct row.** Because matching is exact-string, `' tag '` (with surrounding spaces) and `'tag'` are two separate tags — the same trap as `finance` vs `Finance`, but harder to spot.
* **The global tag directory is a pollution / DoS surface.** Arbitrarily long or arbitrary-character names are accepted, and an over-long name or a flood of near-identical whitespace variants lands in `GET /api/tags`, the Catalog **Top tags** strip, and the Search Tag-facet seed list — surfaces every user sees, with no cap to bound them.

**Mitigation today:** settle a naming + casing + no-surrounding-whitespace convention up front, and audit `GET /api/tags` periodically for whitespace / over-long near-duplicates (especially after a collector ingestion run). The upstream platform fix is a server-side trim + length cap + a database `CHECK` constraint; a separate, sibling input-validation gap on the dataset-statistics ingestion endpoint is tracked independently.
{% endhint %}

{% hint style="warning" %}
**The audit trail for tag changes is non-uniform across the three tag-assign endpoints.** A nominally-symmetric set of three tag-assign actions emits three different things to the [Activity Feed](/features/active-platform-features/activity-feed.md):

Entity and dataset-field tag changes are both fully audited — each event carries the before-and-after tag lists, under the two different event types above. The gap is the term path: term tag changes are not in the Activity Feed at all — they are observable only by polling the term's current tag list and diffing externally. Compliance / audit workflows depending on tag-change history must instrument the term path separately until the platform-side fix lands.
{% endhint %}

| Action                                                                    | Audit-feed event                                                                                      |
| ------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| Tagging a **data entity** (`PUT /api/dataentities/{id}/tags`)             | Emits a `TAG_ASSIGNMENT_UPDATED` event scoped to the entity, capturing the before-and-after tag list. |
| Tagging a **dataset field** / column (`PUT /api/datasetfields/{id}/tags`) | Emits a `DATASET_FIELD_TAGS_UPDATED` event capturing the before-and-after tag list.                   |
| Tagging a **term** (`PUT /api/terms/{id}/tags`)                           | Emits **no** activity event today.                                                                    |

## Where to next

* [Data entity detail page](/features/data-discovery/entity-detail-page.md) — the per-entity surface where the sidebar Tags panel lives and Important-flagged tags render visually distinct on entity rows.
* [Search and Filtering](/features/data-discovery/search.md) — where the Tag facet narrows the catalog.
* [Data Entity Groups & Domains](/features/data-discovery/groups-domains.md) — the heavier-weight grouping mechanism for related entities (datasets, transformers, quality tests).
* [Management](/features/management.md) — the operator-mutating side: tag vocabulary curation, Important flag, namespace scoping.
* [Activity Feed](/features/active-platform-features/activity-feed.md) — the audit trail for `TAG_ASSIGNMENT_UPDATED` + `DATASET_FIELD_TAGS_UPDATED` events (read the audit-asymmetry caveat above before relying on it).
* [Permissions](/configuration-and-deployment/enable-security/authorization/permissions.md) — the platform-wide permission catalog, including the three `TAG_*` rows plus the four `*_TAGS_UPDATE` side-channel rows.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/features/data-discovery/tagging.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
