Case Study: Building a Data Platform From Zero

The Challenge

This fast-growing logistics company in Brazil had scaled significantly but was making decisions almost entirely on intuition. Data was scattered across multiple systems — a relational database, object storage, a document database, and a legacy cloud warehouse — but there was no way to bring it together, analyze it, or act on it.

The problems were cascading:

No single source of truth — operational data lived in silos with no integration layer
No analytics capability — business teams couldn't answer basic questions about operations, costs, or performance without asking engineering
No data team — there was literally no one in the company whose job was data
Multiple source systems — a relational database, a document store, file storage, and a half-used cloud warehouse — each with different access patterns and data models
Growing fast — the company needed data infrastructure that could scale with the business, not a quick hack that would need replacing in 6 months

The mandate was clear: design and build the entire data platform from scratch, and build the team to run it.

My Role & Approach

I came in as Tech Leader with a dual mandate: architect the platform and build the team. Here's how I approached it.

Architecture Design

I designed a layered architecture on a major cloud platform, choosing it for cost efficiency and serverless analytical power:

Data Lake — cloud object storage as the raw ingestion layer, immutable copies of all source data, partitioned by date
Data Warehouse — serverless analytical warehouse with dimensional tables, business logic, and aggregations
Orchestration — Python-based workflow orchestration for pipeline scheduling, dependency management, monitoring, and alerting
Analytics — open-source BI tool for self-service dashboards and reports for business teams

Migration Pipelines

The hardest part wasn't building new infrastructure — it was migrating data from four different source systems into a unified platform without disrupting operations:

Relational DB → Warehouse — incremental extraction with change data capture, schema mapping, and data type transformations
Document DB → Warehouse — denormalization of document structures into relational models suitable for analytics
File storage → Lake → Warehouse — file format standardization and ingestion automation
Legacy warehouse → New warehouse — migration of existing materialized views and historical data

All pipelines were built in Python with workflow orchestration, designed for idempotent re-runs and automated failure recovery.

Team Building

I hired and onboarded the company's first Data Team — defining roles, setting up development workflows, establishing code review practices, and creating documentation so the team could operate independently after my engagement.

Results

4 → 1 Data silos consolidated into a unified platform

Zero → Full From no data capability to self-service analytics

New Team Built the company's first Data Team from scratch

Within months, business teams went from "ask engineering and wait" to pulling their own reports. The data platform became the foundation for operational decision-making across the company — route optimization, cost analysis, performance tracking, and customer insights all became self-service.

Tech Stack

Cloud: Major cloud platform (serverless-first)

Warehouse: Serverless analytical warehouse

Lake: Cloud object storage

Orchestration: Python-based workflow orchestration

Analytics: Open-source BI platform

Sources: 4 heterogeneous systems (relational, document, file, warehouse)

Key Takeaway

Building a data platform from zero is as much an organizational challenge as a technical one. The technology choices matter less than the architecture principles: keep it simple, make it reliable, design for the team that will maintain it (not the one that built it).

The biggest mistake I see in greenfield data platforms is over-engineering for hypothetical scale. This company needed a platform that could grow with the business — not a system designed for Google-scale traffic on day one. The right architecture is the simplest one that meets current needs and has clear extension points for the future.

Building a Complete Data Platform From Zero