Data Platform Architecture
The analytics universe is structured as a five-layer pipeline. Data flows in one direction — from transactional stores into the lake and upward to visualization — and never writes back.
Layer 1: Transactional DBs (PostgreSQL, MongoDB, MSSQL, MySQL) │ ▼Layer 2: Integration Layer (non-blocking background writes) │ ▼Layer 3: Data Lake — ADLS (raw, append-only Parquet files) │ ▼Layer 4: Consumption — Dremio (Models → Explores) │ ▼Layer 5: Visualization — Superset (dashboards, embedded, RLS-filtered)Layer 1 — Transactional Persistence
Section titled “Layer 1 — Transactional Persistence”The source of all data. Each microservice owns its database and is the sole writer to it. See the Data Platform Overview for the full list of transactional stores.
Layer 2 — Integration Layer
Section titled “Layer 2 — Integration Layer”The bridge that moves data from transactional stores into the Data Lake. This layer is intentionally kept thin.
| Mechanism | When to use |
|---|---|
| In-process async write | Low-latency, simple payloads; acceptable if data lake ingestion lags slightly behind transactional commit |
| CDC (Change Data Capture) | High-volume tables; need to capture all changes including deletes |
| Scheduled jobs / ETL | Batch workloads; external systems that don’t support CDC |
| Event streams | Event-driven services that already publish domain events |
Current implementation: Services write directly to ADLS as a non-blocking side effect of the transactional operation. The transactional response is returned to the caller immediately; the ADLS write happens in a parallel background thread. The caller is never blocked waiting for the data lake ingestion to complete.
Layer 3 — Data Lake (ADLS)
Section titled “Layer 3 — Data Lake (ADLS)”Tool: Azure Data Lake Storage (ADLS)
All data entering the analytics universe lands here first. ADLS holds raw, append-only copies of transactional data — exactly as produced by the source service, with no transformation applied.
Design rules for ingestion
Section titled “Design rules for ingestion”- File format: Parquet or Iceberg only. No JSON, CSV, or other formats.
- Path pattern (mandatory):
{namespace}/{table_name}/tenant={uuid}/year={YYYY}/month={MM}/day={DD}/{filename}.parquet
- Namespace: the source application name (e.g.
workforce-management,safety-compliance,worker-monitoring). This is the top-level grouping in the data lake and keeps tables from different services from colliding. - File naming: flexible. When writing one record per file, a UUID is recommended to prevent collisions. When grouping multiple records into a single batch file, any unique, descriptive name is acceptable (e.g.
batch_2026-03-16T00:00:00Z.parquet). - Immutability: never update or delete files. Corrections are appended as new files; deduplication is handled downstream.
- Metadata fields: every record must carry
ingested_at(timestamp of ADLS write) andsource_servicealongside the business payload. - Tenant field:
tenantis encoded in the path and must be present as a column in every record so Dremio can filter it independently of the partition.
Lake layout example
Section titled “Lake layout example”adls://angelis-datalake/├── workforce-management/│ ├── companies/│ │ └── tenant=<uuid>/year=2026/month=03/day=16/│ │ └── 7f3a1c2d-....parquet│ ├── profiles/│ │ └── tenant=<uuid>/year=2026/month=03/day=16/│ │ └── b2e94f01-....parquet│ └── assignments/│ └── tenant=<uuid>/year=2026/month=03/day=16/│ └── 1a8d3c55-....parquet└── safety-compliance/ └── form_submissions/ └── tenant=<uuid>/year=2026/month=03/day=16/ └── 9c0f2b44-....parquetLayer 4 — Consumption Layer (Dremio)
Section titled “Layer 4 — Consumption Layer (Dremio)”Tool: Dremio
Dremio sits on top of ADLS and exposes a SQL interface for all downstream consumers (BI tools, dashboards, ad-hoc queries). It defines two sub-layers: Models and Explores.
Registering a new table in Dremio
Section titled “Registering a new table in Dremio”Once data is loaded into ADLS, making it queryable in Dremio is purely configuration — no code required.
- Navigate to the folder in the Dremio data lake browser.
- Right-click the folder and select “Convert to Table”.
- (Optional) Enable a Reflection on the table to cache it and speed up downstream queries.
- Create a new connection in Dremio for the Storage Account + Container combination.
- Navigate to the folder in the Dremio data lake browser.
- Right-click the folder and select “Convert to Table”.
- (Optional) Enable a Reflection on the table to cache it.
Models
Section titled “Models”Raw SQL mappings over the Data Lake tables. A model is a 1:1 structured view of a data lake entity — same columns, explicit types, no joins. It replaces opaque file paths with named, queryable tables.
-- Example model: wfm_profilesSELECT id, company_id, rut, first_name, last_name, email, CAST(created_at AS TIMESTAMP) AS created_at, CAST(deleted_at AS TIMESTAMP) AS deleted_at, ingested_atFROM datalake.workforce_management.profilesWHERE deleted_at IS NULLRules for models:
- One model per source entity.
- Apply only structural transformations: casting, null handling, renaming to consistent naming conventions.
- No business logic, no joins.
- Always filter soft-deleted records (
deleted_at IS NULL).
Explores
Section titled “Explores”Star-schema views built on top of models. An explore combines a central fact model with all relevant dimension models through pre-defined joins, producing a single flat, fully-joined table ready for visualization.
explore_worker_compliance├── fact: wfm_assignments (central model)├── dim: wfm_profiles (worker details)├── dim: wfm_companies (company details)├── dim: wfm_projects (project details)└── dim: saferapp_test_results (test outcomes)Rules for explores:
- One explore per business question / domain area.
- Always include
company_idso Superset can enforce tenant filtering. - Explores are the only layer Superset queries. Never connect Superset directly to a model or raw ADLS path.
- Both models and explores can be cached by Dremio — use reflection / caching for explores that power frequent dashboards.
Layer 5 — Visualization Layer (Superset)
Section titled “Layer 5 — Visualization Layer (Superset)”Tool: Apache Superset
Superset connects to Dremio explores and is the end-user interface for dashboards and charts.
Tenant isolation via Row-Level Security (RLS)
Section titled “Tenant isolation via Row-Level Security (RLS)”Every Superset dataset maps to a Dremio explore that contains company_id. An RLS filter is applied per user session so a user can only query rows where company_id matches their tenant. This is enforced in Superset, not in the application.
RLS rule example: dataset: explore_worker_compliance clause: company_id = '{{ current_user_attribute("company_id") }}'Embedded dashboards
Section titled “Embedded dashboards”Superset supports embedding dashboards directly into the Angelis frontend applications. Embedded dashboards carry the authenticated user’s context, which drives the RLS filter above. Users never see raw data from other tenants.
Styling
Section titled “Styling”Superset’s theme is configurable. Dashboards should match the Angelis design system — primary colors, typography, and layout consistent with AdminCenter and Worker Web.