The Challenge
This fast-growing logistics company in Brazil had scaled significantly but was making decisions almost entirely on intuition. Data was scattered across multiple systems — a relational database, object storage, a document database, and a legacy cloud warehouse — but there was no way to bring it together, analyze it, or act on it.
The problems were cascading:
- No single source of truth — operational data lived in silos with no integration layer
- No analytics capability — business teams couldn't answer basic questions about operations, costs, or performance without asking engineering
- No data team — there was literally no one in the company whose job was data
- Multiple source systems — a relational database, a document store, file storage, and a half-used cloud warehouse — each with different access patterns and data models
- Growing fast — the company needed data infrastructure that could scale with the business, not a quick hack that would need replacing in 6 months
The mandate was clear: design and build the entire data platform from scratch, and build the team to run it.
My Role & Approach
I came in as Tech Leader with a dual mandate: architect the platform and build the team. Here's how I approached it.
Architecture Design
I designed a layered architecture on a major cloud platform, choosing it for cost efficiency and serverless analytical power:
- Data Lake — cloud object storage as the raw ingestion layer, immutable copies of all source data, partitioned by date
- Data Warehouse — serverless analytical warehouse with dimensional tables, business logic, and aggregations
- Orchestration — Python-based workflow orchestration for pipeline scheduling, dependency management, monitoring, and alerting
- Analytics — open-source BI tool for self-service dashboards and reports for business teams
Migration Pipelines
The hardest part wasn't building new infrastructure — it was migrating data from four different source systems into a unified platform without disrupting operations:
- Relational DB → Warehouse — incremental extraction with change data capture, schema mapping, and data type transformations
- Document DB → Warehouse — denormalization of document structures into relational models suitable for analytics
- File storage → Lake → Warehouse — file format standardization and ingestion automation
- Legacy warehouse → New warehouse — migration of existing materialized views and historical data
All pipelines were built in Python with workflow orchestration, designed for idempotent re-runs and automated failure recovery.
Team Building
I hired and onboarded the company's first Data Team — defining roles, setting up development workflows, establishing code review practices, and creating documentation so the team could operate independently after my engagement.
Results
Within months, business teams went from "ask engineering and wait" to pulling their own reports. The data platform became the foundation for operational decision-making across the company — route optimization, cost analysis, performance tracking, and customer insights all became self-service.
Tech Stack
Key Takeaway
Building a data platform from zero is as much an organizational challenge as a technical one. The technology choices matter less than the architecture principles: keep it simple, make it reliable, design for the team that will maintain it (not the one that built it).
The biggest mistake I see in greenfield data platforms is over-engineering for hypothetical scale. This company needed a platform that could grow with the business — not a system designed for Google-scale traffic on day one. The right architecture is the simplest one that meets current needs and has clear extension points for the future.
Need a data platform built from scratch?
I've done it multiple times — from architecture to implementation to team handoff. Let's talk about your situation.
Book a Discovery Call