# Data Objects Lineage

ODD Platform renders upstream and downstream lineage across the full ODD entity model — not just datasets. Every data-entity class participates: datasets and views, transformers (ETL jobs, ML training jobs, microservices), transformer runs, data quality tests and their runs, consumers (BI dashboards, ML model artifacts), data inputs (API calls), data entity groups (including ML experiments), and entity relationships. See the [ODD Data Model](https://github.com/opendatadiscovery/opendatadiscovery-specification/blob/main/specification/specification.md#data-model-specification) for the canonical class reference.

![](/files/Og7MMpZ2ig24ZWPXvCuw)

## Where to find it in the UI

* **Lineage tab** on any data-entity detail page — opens the entity-centric graph with the entity at the centre and configurable upstream / downstream depth.
* **Group lineage** entry point on a [Data Entity Group](/features/data-discovery/groups-domains.md) detail page — opens the lineage of the group's *children*, not of the group itself (see below).

The graph supports pan, zoom, and on-click expansion of intermediate nodes; the per-entity depth and pre-expanded nodes are controlled by query parameters (next section).

## Query parameters

The per-entity lineage graph is parameterised by two optional query parameters — `lineage_depth` (the number of upstream / downstream hops to walk by default) and `expanded_entity_ids` (entities to expand inline rather than show as collapsed neighbour stubs). Types, defaults, and when to set each are documented at [API Reference → Lineage → Per-entity lineage](/developer-guides/architecture-decision-log/lineage.md).

{% hint style="info" %}
**Keep `expanded_entity_ids` lists short.** The platform places no upper bound on the number of ids you pass, but the ids go straight into a single database query as bound parameters. A very long list (low thousands and up) can exceed Postgres's bound-parameter limit for one statement and fail the request. The UI only ever expands the handful of groups a user has clicked, so this is a direct-API concern — expand in batches rather than passing thousands of ids at once.
{% endhint %}

## View-mode toggle (Compact / Full)

The lineage canvas exposes a Compact / Full view-mode toggle. The control is one label, but the two surfaces it appears on (Data Entity Group lineage and per-entity Hierarchy lineage) implement it differently:

* **On the Data Entity Group canvas** — flipping the toggle triggers a **layout re-run**. The graph briefly shows a loading spinner; nodes animate from their old positions to the new ones. The DEG-side layout engine recomputes from scratch.
* **On the per-entity Hierarchy canvas** — flipping the toggle is a **re-render only**, not a re-layout. The hierarchy renderer changes per-node size but keeps existing positions. The visible effect is an instant shape change, not an animation.

The same control, two observable behaviours — your mental model from one canvas does not transfer to the other.

{% hint style="warning" %}
**On a dense Hierarchy graph, switching modes can produce visual overlap.** Because the Hierarchy re-render keeps positions static while node size changes, dense graphs (many siblings at the same depth) can end up with nodes overlapping after the switch. Use Full mode for dense graphs, or expand sub-trees individually rather than toggling at the top level.
{% endhint %}

**Compact mode hides the DEG-Items button in the Hierarchy canvas.** When the per-entity canvas renders a [Data Entity Group](/features/data-discovery/groups-domains.md) node and the canvas is in Compact mode, the **DEG-Items** button (the entry point to drill into the group's members) is **hidden**. The drill-in affordance is silently lost; operators wanting to see a DEG node's members from the Hierarchy canvas must switch to **Full** mode (or use the per-entity Overview → Members surface). The trade-off is invisible until you try to click the affordance.

## Lineage depth — UI control vs API contract

The Hierarchy canvas's depth dropdown ranges from **1 to 20**. That dropdown is a UI-presentation choice — it is **not** the platform's contract.

* The UI always **sends `lineage_depth=1`** on the initial entity-lineage fetch and on the per-canvas "Load more" affordance, so this next point never bites a UI user — only direct API callers (curl / SDK / Swagger UI) can omit the parameter. The OpenAPI spec marks `lineage_depth` optional, but the per-entity endpoints have **no default**: omitting it makes the platform throw a `NullPointerException` before the graph is built and return **HTTP 500**, not an empty or default-depth graph. Always send an explicit `lineage_depth` when calling the per-entity endpoints directly.
* The URL query parameter `?d=` is parsed as a number and forwarded **unclamped**. Manually editing the URL to `?d=10000` (or pasting a URL from a colleague that already has `?d=50`) triggers a 10000-depth / 50-depth recursive walk against the platform's lineage CTE — the dropdown's 20-entry array is never consulted.
* The controller carries `@Min(1)` validation but **no `@Max`**; the service tier hands the integer through unchanged; the repository tier consumes it directly as the CTE termination predicate. Four layers, only one validates the lower bound; none validates the upper bound.

**Operator-visible consequence.** Two patterns produce arbitrarily-deep walks without the operator touching the dropdown:

1. **Click-through depth compounding.** Clicking a graph node title navigates to the new lineage view with `?d` set to **that node's distance from the current root**. Drill five hops into a graph, click a leaf, and the new view fetches lineage to depth = 5 around that leaf. Postgres CPU correlates with user clicks, not with the depth slider.
2. **URL-edited depth bypassing the cap.** If `?d=N` (N > 20) is present in the URL when the user clicks a node, the click-through propagation preserves `N` — the dropdown is never consulted on propagation. The dropdown's 20-entry array is a one-time guard at first-paint only.

What the operator-visible knob looks like and what the system actually accepts are not the same control. Tune Postgres for the platform expecting deep-walk traffic from casual canvas clicks, not only from manual depth-slider use.

{% hint style="warning" %}
**A deep walk is a memory- and CPU-amplification risk, not just a slow query.** The lineage walk is a recursive Postgres CTE whose **only** stop condition is the depth bound — there is **no cycle guard**. On cyclic or diamond-shaped graphs (the same node reachable by several paths) the recursion re-expands the same nodes at each level, inflating intermediate rows before the final de-duplication. The platform then loads the **entire** result graph into the application's memory at once before sending the response. A large `lineage_depth` against a branchy graph can therefore spike both Postgres CPU and platform heap well beyond what the row count of the final graph suggests. Keep `lineage_depth` as low as the task allows, and size both Postgres and the platform's JVM heap for the deepest walks your operators can trigger.
{% endhint %}

## Group lineage

The dedicated lineage endpoint for [Data Entity Groups](/features/data-discovery/groups-domains.md) (including [ML experiments](/features/data-discovery/groups-domains.md#relationship-to-ml-experiments) and other logical containers) returns the lineage graph for the group's *children* rather than for the group itself. Operationally: a Finance DEG containing fifteen datasets and three ETL jobs returns the lineage union across those eighteen child entities, which is what an operator usually wants when reasoning about a domain or pipeline. The endpoint is documented at [API Reference → Lineage → Group lineage](/developer-guides/architecture-decision-log/lineage.md).

The group-lineage endpoint's request contract is **narrower** than the per-entity endpoints' — it carries only the path parameter (no `lineage_depth`, no `expanded_entity_ids`).

{% hint style="info" %}
**The group view is filtered in two ways that can hide edges you expect to see.**

* **Nested groups are dropped.** Any lineage edge that touches a Data Entity Group *inside* this group is filtered out, and the nested groups themselves are removed from the result. A DEG that contains other DEGs returns those nested DEGs missing, with no signal in the response — support for nested groups in group lineage is not implemented yet.
* **Only edges fully inside the group survive.** An edge is kept only when **both** of its endpoints are members of the group. Edges that cross the group boundary — an upstream source that feeds a member but is not itself a member, or a downstream consumer outside the group — are dropped. The group view shows internal flow between members, not the group's external upstream sources or downstream consumers. To see what feeds or consumes a member across the boundary, open that member with the per-entity upstream / downstream endpoints.
  {% endhint %}

{% hint style="warning" %}
**A 404 from the group-lineage endpoint has three possible causes — you cannot tell them apart from the response.** `GET /api/dataentitygroups/{id}/lineage` returns the identical `404 Not Found` when (a) the id does not exist, (b) the group exists but has zero members, or (c) the id belongs to a data entity that is **not** a group at all. The sibling membership endpoint returns `200 OK` with an empty list on the zero-members condition instead, so a script polling both sees two different contracts for the same DEG. See the [API reference's Group lineage section](/developer-guides/architecture-decision-log/lineage.md) for the full 404-vs-200-empty asymmetry and the disambiguation steps.
{% endhint %}

## What participates

| Entity class                                         | Participates in lineage | Notes                                                                    |
| ---------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------ |
| Dataset, View                                        | Yes                     | Native source-to-sink lineage edges from collectors / push adapters.     |
| Transformer (ETL job, ML training job, microservice) | Yes                     | Edges from input datasets to the transformer to output datasets.         |
| Transformer Run                                      | Yes                     | Run-level lineage for per-execution traceability.                        |
| Data Quality Test, Data Quality Test Run             | Yes                     | Test linkage to the dataset(s) under test.                               |
| Consumer (BI dashboard, ML model artifact, …)        | Yes                     | Downstream-only — consumers read but do not produce.                     |
| Data Input (API call, …)                             | Yes                     | Upstream-only — inputs feed the platform but are not produced inside it. |
| Data Entity Group, ML experiment                     | Yes (via Group lineage) | Group lineage returns the union over the group's children.               |
| Entity Relationship                                  | Yes                     | Surfaces foreign-key-style ERD edges as part of the graph.               |

The exact set of entity classes ingested into your platform depends on which collectors and push adapters are connected; see [Integrations](/integrations/integrations.md) for the per-source coverage.

## Access model

Lineage reads are **read-collaborative**: none of the three lineage endpoints (per-entity upstream, per-entity downstream, group lineage) applies an ownership filter, and none requires a specific permission beyond being signed in. Any authenticated user who knows — or guesses — a data-entity id or group id can read the full reachable lineage subgraph around it, regardless of which team owns the entities in that graph. Lineage edges expose cross-team pipeline structure, so treat the lineage surface as visible to every authenticated user of the catalog.

{% hint style="warning" %}
**Two access facts to plan around before exposing lineage to a multi-team or untrusted audience.**

* **No owner scoping.** There is no per-owner or per-team isolation on lineage reads. If team isolation matters, scope the deployment to a single team rather than relying on lineage to hide other teams' graphs.
* **`auth.type=DISABLED` makes lineage anonymous.** When the platform runs with authentication disabled, every endpoint — including all three lineage reads — is reachable by any unauthenticated client that can reach the network port. Do not run `DISABLED` on a network where untrusted clients can reach the platform. See [Authorization](/configuration-and-deployment/enable-security/authorization.md) for the platform-wide model.

The **group-lineage** endpoint has the widest reach: by walking group ids a caller can enumerate the cross-owner co-membership graph of the whole catalog. If lineage exposure is a concern, the group endpoint is the one to weigh most carefully.
{% endhint %}

## Where to next

* [Microservices Lineage](/features/data-lineage/microservices.md) — the OpenTelemetry-traced counterpart for microservice call graphs.
* [Data Entity Groups & Domains](/features/data-discovery/groups-domains.md) — what gets returned when you call Group lineage on a DEG.
* [API Reference → Lineage](/developer-guides/architecture-decision-log/lineage.md) — full HTTP surface (per-entity, group, microservices).
* [ODD Specification — Data Model](https://github.com/opendatadiscovery/opendatadiscovery-specification/blob/main/specification/specification.md#data-model-specification) — the canonical entity-class reference.
* [Main Concepts → Data Governance map](/introduction/main-concepts.md#data-governance-map) — Data Lineage's position among the governance pillars.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatadiscovery.org/features/data-lineage/data-objects.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
