GitHub organization overview

Overview of all repositories inside of ODD GitHub organization

odd-platform [link]

Serving as the backbone of the Open Data Discovery initiative, the odd-platform repository hosts the core application for the Next-Generation Data Discovery and Data Observability Platform. It is the central hub for all the components within the ODD ecosystem, orchestrating the discovery, observability, and monitoring of data across an organization.

oddrn-models-package [link]

This project generates Python models and API clients using Pydantic from a given specification. Whenever the specification is updated, the tool automatically triggers a GitHub action to build and publish the package to PyPI.

oddrn-generator [link]

This is a collection of helper classes designed to assist in generating unique Oddrn identifiers for various data source entities. Oddrn, or the Open Data Descriptor Resource Name, is a standardized naming convention for identifying data resources. These classes provide a streamlined way to create and manage Oddrn values, ensuring consistency and uniqueness across different data sources and their associated entities. With these tools at your disposal, you can easily create and maintain Oddrn identifiers, making data resource management more efficient and organized.

odd-collectors [link]

A curated compilation of collectors, intelligently categorized by the types of data sources they interface with:

  1. odd-collector [link]

    A versatile collector designed to seamlessly handle various data sources such as databases, BI tools, APIs, and more.

  2. odd-collector-aws [link]

    An AWS cloud-based service collector.

  3. odd-collector-gcp [link]

    An GCPs cloud-based service collector.

  4. odd-collector-azure [link]

    An Azure cloud-based service collector.

  5. odd-collector-sdk [link]

    The repository houses common classes utilized by collectors. One of these key classes is the "Collector," which is responsible for reading the "collector-config.yaml" file, dynamically importing adapter modules, configuring the scheduler, and executing adapters. This centralized component streamlines the collector's setup and operation.

odd-collector-profiler [link]

The repository leverages DataProfiler to perform critical tasks, encompassing the generation of tags and the execution of extensive statistical analyses on the dataset. This multifaceted approach guarantees a thorough evaluation and categorization of the data contained within the repository, enabling precise data management and analysis.

odd-cli [link]

This project offers a range of functional capabilities accessible through the command-line interface (CLI). These functionalities include retrieving metadata for local files, generating collector tokens, and various other commands designed to enhance the project's usability and versatility.

odd-dbt [link]

A dedicated project aimed at efficiently fetching and ingesting data related to dbt tests and model lineage. This project serves as a vital tool for tracking and analyzing the quality and lineage of your data models.

odd-great-expectations [link]

Creating a custom action for GreatExpectations, this tool is designed to capture and seamlessly ingest test results into OpenDataDiscovery. This integration streamlines the process of transferring critical insights from data quality tests to your OpenDataDiscovery platform, ensuring a more efficient and comprehensive data management workflow.

odd-airflow [link]

This project integrates patched classes specially crafted to react to the execution of Apache Airflow jobs, extracting and transmitting crucial metadata. These classes significantly enhance the capability to trace lineage for both datasets and data sources, thereby improving data tracking and management.

odd-airflow-2 [link]

An Airflow plugin has been developed to meticulously track DAGs, individual tasks, task runs, and subsequently transmit this essential information to the designated platform. This tracking mechanism seamlessly operates through Airflow Listeners, ensuring that every aspect of DAG execution is thoroughly monitored and relayed for comprehensive data and workflow management.

odd-examples [link]

A compilation of Docker Compose files tailored to various use cases, offering flexible and convenient configurations for diverse deployment scenarios. Whether you need to set up development environments, orchestrate multi-container applications, or streamline specific tasks, these files provide a versatile toolkit to simplify your Docker deployments.

Last updated

#45: Adding new feature descriptions: Dataset Schema Diff and Associating Terms with Data Entities

Change request updated