ODD Platform
  • Overview
  • Architecture
  • ODDRN
  • Features
  • Use cases
    • Data compliance for Data Scientists
    • Deprecation for Data Engineer \ Analyst
    • Visibility for Data Quality Engineer
    • Data preparation for Visualization Engineer
    • Service Provider and Pre-Sales
  • Configuration and Deployment
    • Try locally
    • Deploy to Amazon Elastic Kubernetes Service (EKS)
    • Configure ODD Platform
    • Enable security
      • Authentication
        • Disabled authentication
        • Login form
        • OAUTH2/OIDC
        • LDAP
      • Authorization
        • Policies
        • Permissions
        • Roles
        • Owners
        • User-owner association
  • Developer Guides
    • API Reference
    • How to contribute
    • GitHub organization overview
    • Build and run
      • Build and run ODD Platform
      • Build and run ODD Collectors
Powered by GitBook
On this page
  • ODD Collectors tech stack
  • Prerequisites
  • Build ODD Collector into Docker container
  • Run ODD Collector locally
  • Run ODD Platform locally as a target for ODD Collector
  • Activate environment
  • Configure ODD Collector to send request to target catalog
  • Run ODD Collector
  • How to implement new integration
  • Create configuration entry for new integration
  • Implement new integration
  • Make pull request to the origin repository
  • Troubleshooting
  • Running ODD Collector on M1
Edit on GitHub
  1. Developer Guides
  2. Build and run

Build and run ODD Collectors

Developer guide on how to build and run ODD Collectors

PreviousBuild and run ODD Platform

Last updated 2 years ago

For instructions on how to run the ODD Platform and ODD Collectors locally in a Docker environment, please follow article.

ODD Collectors tech stack

There are 3 main collectors at the moment:

  • **** **** — covering databases, BI tools, data warehouses, etc

  • **** — covering AWS services

  • **** **** — covering GCP services

While ODD Collector AWS and ODD Collector GCP use and Google SDKs respectively, ODD Collector has a bunch dependencies for each data source.

General tech stack is:

  • Python

  • Poetry

  • asyncio

Prerequisites

  • Python 3.9.1

  • 1.2.0

  • preferably the latest

Build ODD Collector into Docker container

Fork and clone a repository if you haven't done it already.

git clone https://github.com/{username}/odd-collector.git

Go into the repository's root directory

cd odd-collector

Run the following command, replacing <tag> with any tag name you'd like

docker build . -t odd-collector:<tag>

Run ODD Collector locally

Run ODD Platform locally as a target for ODD Collector

Activate environment

Go into the repository's root directory

cd odd-collector

Run following commands to create local python environment and install dependencies

poetry install

Change your python context to created one.

poetry shell

Configure ODD Collector to send request to target catalog

default_pulling_interval: 10
token: <COLLECTOR_TOKEN>
platform_host_url: http://localhost:8080
plugins:
  - type: my_adapter
    some_field_one: str
    some_field_two: int

Run ODD Collector

Run ODD Collector locally using following command:

sh ./start.sh

How to implement new integration

Create configuration entry for new integration

Add new integration plugin derived from BasePlugin and register it in PluginFactory

domain/plugin.py
...
class MyAdapterPlugin(BasePlugin):
    type: Literal["my_adapter"]
    some_field_one: str
    some_field_two: int

...

PLUGIN_FACTORY: PluginFactory = {
    ...
    "my_adapter": MyAdapterPlugin,
}

Implement new integration

Each adapter module (i.e odd_collector.adapters.my_adapter) must have adapter.py python file. That file must have class derived from AbstractAdapter.

adapters/new_adapter/adapter.py
from odd_collector_sdk.domain.adapter import AbstractAdapter

class Adapter(AbstractAdapter):
    def __init__(
        self,
        config: MyAdapterPlugin
    ):
        ...

    def get_data_source_oddrn(self) -> str:
        ...

    async def get_data_entity_list(self) -> DataEntityList:
        ...

Make pull request to the origin repository

Troubleshooting

Running ODD Collector on M1

libraries pyodbc , confluent-kafka and grpcio have problem during installing and building project on M1 Macbooks.

Possible solution:

The easiest way is to add all export statements to your .bashrc/.zshrc file

# pyodbc dependencies
brew install unixodbc freetds openssl

# cunfluent-kafka
export LDFLAGS="-L/opt/homebrew/lib  -L/opt/homebrew/Cellar/unixodbc/2.3.11/include -L/opt/homebrew/opt/freetds/lib -L/opt/homebrew/opt/openssl@3/lib"export CFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/freetds/include"export CPPFLAGS="-I/opt/homebrew/include -I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/openssl@3/include"
brew install librdkafka
export C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/include
export LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/lib
export PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"# grpcio
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1

In order to run ODD Platform locally please follow .

Create collector in the ODD Platform and copy created token using .

Configure collector-config.yaml using as an example. Replace <COLLECTOR_TOKEN> with token obtained in the previous step.

Please use for making forks and pull requests

Try locally
ODD Collector
ODD Collector AWS
ODD Collector GCP
boto3
Poetry
Docker Engine 19.03.0+
docker-compose
this guide
this
mkleehammer/pyodbc#846
confluentinc/confluent-kafka-python#1190
grpc/grpc#25082
this guide
this guide