Build and run ODD Collectors

Developer guide on how to build and run ODD Collectors

For instructions on how to run the ODD Platform and ODD Collectors locally in a Docker environment, please follow Try locally article.

ODD Collectors tech stack

There are 3 main collectors at the moment:

While ODD Collector AWS and ODD Collector GCP use boto3 and Google SDKs respectively, ODD Collector has a bunch dependencies for each data source.

General tech stack is:

  • Python

  • Poetry

  • asyncio

Prerequisites

Build ODD Collector into Docker container

Fork and clone a repository if you haven't done it already.

git clone https://github.com/{username}/odd-collector.git

Go into the repository's root directory

cd odd-collector

Run the following command, replacing <tag> with any tag name you'd like

docker build . -t odd-collector:<tag>

Run ODD Collector locally

Run ODD Platform locally as a target for ODD Collector

In order to run ODD Platform locally please follow this guide.

Activate environment

Go into the repository's root directory

cd odd-collector

Run following commands to create local python environment and install dependencies

poetry install

Change your python context to created one.

poetry shell

Configure ODD Collector to send request to target catalog

Create collector in the ODD Platform and copy created token using this guide.

Configure collector-config.yaml using this as an example. Replace <COLLECTOR_TOKEN> with token obtained in the previous step.

default_pulling_interval: 10
token: <COLLECTOR_TOKEN>
platform_host_url: http://localhost:8080
plugins:
  - type: my_adapter
    some_field_one: str
    some_field_two: int

Run ODD Collector

Run ODD Collector locally using following command:

sh ./start.sh

How to implement new integration

Create configuration entry for new integration

Add new integration plugin derived from BasePlugin and register it in PluginFactory

domain/plugin.py
...
class MyAdapterPlugin(BasePlugin):
    type: Literal["my_adapter"]
    some_field_one: str
    some_field_two: int

...

PLUGIN_FACTORY: PluginFactory = {
    ...
    "my_adapter": MyAdapterPlugin,
}

Implement new integration

Each adapter module (i.e odd_collector.adapters.my_adapter) must have adapter.py python file. That file must have class derived from AbstractAdapter.

adapters/new_adapter/adapter.py
from odd_collector_sdk.domain.adapter import AbstractAdapter

class Adapter(AbstractAdapter):
    def __init__(
        self,
        config: MyAdapterPlugin
    ):
        ...

    def get_data_source_oddrn(self) -> str:
        ...

    async def get_data_entity_list(self) -> DataEntityList:
        ...

Make pull request to the origin repository

Please use this guide for making forks and pull requests

Troubleshooting

Running ODD Collector on M1

libraries pyodbc , confluent-kafka and grpcio have problem during installing and building project on M1 Macbooks.

Possible solution:

The easiest way is to add all export statements to your .bashrc/.zshrc file

# pyodbc dependencies
brew install unixodbc freetds openssl

# cunfluent-kafka
export LDFLAGS="-L/opt/homebrew/lib  -L/opt/homebrew/Cellar/unixodbc/2.3.11/include -L/opt/homebrew/opt/freetds/lib -L/opt/homebrew/opt/openssl@3/lib"export CFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/freetds/include"export CPPFLAGS="-I/opt/homebrew/include -I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/openssl@3/include"
brew install librdkafka
export C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/include
export LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/lib
export PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"# grpcio
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1

Last updated

#45: Adding new feature descriptions: Dataset Schema Diff and Associating Terms with Data Entities

Change request updated