Skip to content

Integration Guide

If your microservice produces data that needs to flow into the analytics universe, follow these steps.

  1. Not every database write needs to go to the data lake. Focus on:

    • Entities that feed compliance reports or dashboards
    • High-level business events (worker assigned, test completed, license issued)
    • Audit-relevant changes
  2. Add a background ADLS write after the transactional operation completes. The service must not wait for this write to respond to the caller.

    # Pseudocode — after DB commit
    now = utcnow()
    background_tasks.add_task(
    adls_client.ingest,
    container="angelis-datalake",
    path=(
    f"{SERVICE_NAMESPACE}/"
    f"{table_name}/"
    f"tenant={company_id}/"
    f"year={now.year}/month={now.month:02d}/day={now.day:02d}/"
    f"{uuid4()}.parquet"
    ),
    payload={**entity_data, "ingested_at": now, "source_service": SERVICE_NAME},
    )

    See Architecture — Layer 3 for the mandatory path pattern and file format rules.

  3. Create a model that maps your ADLS data to a clean SQL table. Apply only structural transformations (casting, null handling, renaming) — no business logic, no joins.

    SELECT
    id,
    company_id,
    your_field,
    CAST(created_at AS TIMESTAMP) AS created_at,
    ingested_at
    FROM datalake.your_namespace.your_table
    WHERE deleted_at IS NULL

    Coordinate with the data platform team to register it under the correct schema namespace. See Architecture — Models for the full rules.

  4. If your entity enriches an existing business view, add it as a dimension to the relevant explore.

    If it represents a new domain, create a new explore with your entity as the central fact:

    explore_your_domain
    ├── fact: your_central_model
    ├── dim: wfm_profiles (if worker data is relevant)
    └── dim: wfm_companies (always include for tenant filtering)

    Always include company_id — Superset’s RLS depends on it. See Architecture — Explores for more.

  5. Register the explore as a Superset dataset. Apply the company_id RLS rule:

    clause: company_id = '{{ current_user_attribute("company_id") }}'

    Build charts and optionally embed the dashboard into the relevant frontend application.


DecisionRationale
Analytics universe is read-onlyPrevents analytics workloads from affecting transactional performance or integrity
ADLS as the single landing zoneDecouples services from the consumption tool; swap Dremio or Superset without re-ingesting
Services write to ADLS directly (current)Simpler to implement; acceptable at current scale. Does not require additional infrastructure
Non-blocking ingestionCustomer-facing latency must not be affected by data lake write performance or failures
Explores as the Superset boundaryKeeps transformation logic in one place; visualization layer stays configuration-only
Tenant isolation at Superset RLSEnforces multi-tenant security without embedding it into each dashboard