The Challenge

This company is a major B2B data intelligence platform with massive data infrastructure — terabytes of Sales and Finance data flowing through pipelines daily across multiple cloud providers. The data engineering team maintained hundreds of pipelines, models, and data products.

Like every data team in 2024-2025, we were being asked: "How do we use AI?" But the question underneath was more specific:

  • How do you integrate AI into existing data engineering workflows without breaking reliability?
  • How do you move beyond "ChatGPT for code completion" to AI that actually participates in the engineering process?
  • How do you build guardrails so AI agents can help without introducing risk into production systems?
  • How do you get a team of senior engineers to actually adopt AI tools — not as a toy, but as a core part of how they work?

Most teams were stuck at the "individual developers using ChatGPT" stage. I wanted to push further — to AI as a team-level engineering tool with real integration into our workflows.

My Role & Approach

I was the first on the team to systematically integrate AI into our data engineering practice. This wasn't a mandated initiative — it was bottom-up innovation that I drove and then evangelized across the team.

LLM Agents With Guardrails

I built and configured LLM-powered agents tailored to our specific engineering context — not generic coding assistants, but agents that understood our codebase, our conventions, our data models, and our deployment patterns. Key elements:

  • Custom skills — agents trained on our specific pipeline patterns, naming conventions, and testing requirements
  • Guardrails — explicit boundaries on what agents could and couldn't do: read production data but not modify it, suggest pipeline changes but require human review, generate tests but not skip them
  • Context management — using MCP (Model Context Protocol) to give agents access to relevant documentation, schema definitions, and pipeline metadata without overwhelming context windows
Agent Guardrails Framework CAN DO Read prod data Suggest changes Generate code Review PRs WITH REVIEW Deploy suggestions Modify configs Update docs CANNOT Modify prod data Merge code Skip tests Agent capability zones — every action has an explicit boundary

Agentic Development Workflows

The real value wasn't in one-off code generation — it was in integrating AI into the workflow:

  • Pipeline development — agents that could scaffold new pipelines following our patterns, generate boilerplate, and pre-populate configurations
  • Code review assistance — agents that reviewed PRs for common data engineering pitfalls: missing null handling, schema drift risks, partition strategy issues
  • Incident response — agents with context about our monitoring setup that could help diagnose pipeline failures faster
  • Documentation generation — agents that kept pipeline documentation in sync with actual code changes

Model Context Protocol (MCP)

MCP was the key enabler. Instead of dumping entire codebases into prompts, I set up MCP servers that gave agents structured access to exactly the context they needed — DAG definitions, transformation model metadata, warehouse schema information, and pipeline run histories. This made agents dramatically more useful because they could reason about our specific infrastructure, not just generic patterns.

Results

First Team member to ship AI-augmented data workflows to production
TB/day Pipelines maintained with AI-assisted development
Team-wide Adoption of AI tools across the engineering team

The impact went beyond personal productivity. By demonstrating real, production-grade AI integration (not just demos), I helped shift the team's relationship with AI tools from curiosity to daily use. The patterns I established — guardrails, MCP context management, workflow integration — became the template for how the broader team adopted AI.

Tech Stack

AI: LLM agents with custom skills and guardrails
Protocol: Model Context Protocol (MCP)
Pipelines: Workflow orchestration, transformation framework, Python
Warehouse: Cloud analytical warehouse
Cloud: Multi-cloud (3 providers, IaC-managed)
Monitoring: Data observability platform + incident management

Key Takeaway

AI integration in data engineering is not about replacing engineers — it's about amplifying their judgment. The teams that will get the most value from AI are the ones that treat it as a tool within a structured workflow, not a magic box.

The three things that made this work:

  1. Guardrails first — define what the AI can't do before expanding what it can
  2. Context is everything — an AI agent without your specific codebase context is just a generic autocomplete. MCP changes the game.
  3. Workflow integration, not feature addition — the AI has to fit into how engineers already work, not require a new process

This is the intersection I specialize in now: the practical, production-grade integration of AI into data engineering — not as a demo, but as infrastructure.

Want to integrate AI into your data workflows?

I've done it at scale, in production, with real guardrails. Let me help your team do the same.

Book a Discovery Call