I Integrated AI Agents Into Our Data Pipelines. Here's What Actually Worked.

Over the past two years, I integrated LLM-powered agents into the data engineering workflows of a major B2B data platform — not as a side tool, but as a core part of how we built, maintained, and operated pipelines processing terabytes daily.

I built five agents: pipeline scaffolding, code review, incident response, documentation generation, and code generation. I wired them together with MCP (Model Context Protocol) so they could access our live infrastructure — schemas, DAG definitions, run histories, team conventions.

They were useful. Some were genuinely transformative. But every single one hit the same ceiling, and it took me months to understand what it was.

This is the honest account — what worked, where the ceiling was, and the thing I'd now do first on any new engagement.

The foundation: MCP changed how agents see infrastructure

The single biggest lever in the first phase was MCP. Before it, using an LLM for our specific codebase meant either dumping thousands of lines into a prompt or accepting generic advice. MCP let me build structured context servers that gave agents exactly what they needed:

Pipeline metadata — DAG definitions, schedule configurations, dependency graphs, recent run histories
Schema information — warehouse table schemas, column types, freshness metadata, lineage
Model definitions — transformation SQL, documentation, test definitions, upstream/downstream dependencies
Team conventions — naming patterns, partition strategies, testing requirements, PR checklists

With this context, agents went from "here's some generic Python" to "the upstream model refreshes at 06:00 UTC, so your dependency should wait until 06:30 to account for the 15-minute SLA buffer we use." Specificity changed everything.

Five agents, real results

On top of MCP, I built five agents, each integrated into an existing step of our engineering workflow:

Each agent produced real results:

Pipeline scaffolder — cut new pipeline setup from 45-60 minutes to 15 minutes. Consistent naming, proper partitioning, tests pre-populated.
Code review assistant — caught mechanical issues (missing null checks, partition skew risks, schema drift) before human review, freeing senior engineers for architectural feedback.
Incident response — assembled context briefings in seconds instead of 15-20 minutes of manual reconstruction at 3 AM. The most impactful agent for on-call quality of life.
Documentation generator — kept the "what" in sync with code changes automatically. Humans wrote the "why."

MCP made all of this possible. Without it, agents gave generic advice. With it, they gave answers grounded in our specific infrastructure.

But then every agent hit the same wall.

The ceiling: agents could see the system, but they couldn't remember the organization

MCP gave agents access to what our system looked like right now — live schemas, current DAG definitions, recent run histories. That's infrastructure data. It's real-time and it's precise.

But there's an entire category of knowledge that doesn't live in infrastructure:

Why was this pipeline designed this way? What constraints shaped the decision?
The last three times we tried to migrate this service, what went wrong?
This failure pattern recurs every quarter-end due to seasonal load spikes — but that's nowhere in the DAG definition
The team decided six months ago to standardize on a specific approach — but that's in a Slack thread, not in code
The senior engineer who built this system is on leave and no one else fully understands the token rotation mechanism

This is institutional knowledge — the accumulated wisdom of the team, scattered across people's heads, old Slack threads, postmortems, and design docs that nobody maintains. MCP can't see any of it. And every agent I built was limited by its absence.

The incident response agent could pull run histories and upstream status — but it couldn't know that this exact failure pattern happened last quarter and the root cause was an upstream schema change, not the pipeline itself. The code review agent caught missing null checks — but it couldn't flag that the proposed approach contradicted an architecture decision the team made six months ago. The scaffolder generated correct pipelines — but it couldn't account for a deprecation that existed only in a team planning document.

Every agent was technically correct and architecturally naive. They could see the system. They couldn't remember the organization.

The breakthrough: a knowledge base as the agent's memory

After leaving that engagement, I built the thing I wished I'd had from the start: a structured, LLM-maintained knowledge base that captures the institutional knowledge MCP can't see.

The concept builds on Andrej Karpathy's LLM Wiki pattern: instead of having agents re-derive understanding every time (the RAG approach), the LLM incrementally builds and maintains a persistent wiki. You feed it source documents — architecture decision records, postmortems, design docs, meeting notes — and it creates cross-referenced wiki pages with confidence scoring, contradiction detection, and lifecycle management.

The knowledge compounds over time. Every source ingested makes every future agent interaction smarter, because the synthesis is already done and maintained.

I've open-sourced this as a reusable template: knowledge-base-template.

When you combine MCP (live infrastructure) with the knowledge base (institutional memory), agents get the full context picture:

This isn't theoretical. Here's what each agent looks like with the knowledge base:

Incident response — doesn't just show run histories. It knows this pipeline has a seasonal quarter-end load spike, that the last time it failed this way the root cause was an upstream schema change, and that the team's preferred remediation is to increase partition count rather than cluster size.
Code review — doesn't just catch null checks. It flags that the proposed approach contradicts ADR-047 (the team's decision to avoid cross-service joins), cites the architecture decision, and suggests an alternative that aligns.
Scaffolder — doesn't just follow naming conventions. It knows that Service X is being deprecated next quarter, so it routes the new pipeline through Service Y instead. That deprecation plan is in the knowledge base, not in any config file.
Documentation — doesn't just describe what a pipeline does. It links to the business decision that created it, the incident that shaped its error handling, and the design doc that explains its unusual partition strategy.

Same agents. Same MCP. But with the knowledge base, they produce work that reflects a year of institutional context instead of a cold read of the codebase.

If I were starting today: knowledge base first

This is the key insight, and it inverts the order I followed: start with the knowledge base, then build agents on top of it.

Here's the sequence I'd follow on any new engagement:

Week 1-2: Build the knowledge base. Ingest the documents that hold institutional knowledge — ADRs, postmortems, design docs, onboarding guides. Use the template to set up the three-layer architecture (raw sources, LLM-maintained wiki, schema). Three good ADRs in the wiki will improve agent output more than fifty raw code files.
Week 2-3: Set up MCP. Connect agents to live infrastructure — schemas, orchestrator metadata, run histories. This is the "eyes" layer.
Week 3-4: Build the first agent (incident response). It has the highest impact-to-effort ratio and the fastest team buy-in. With the knowledge base already in place, it's immediately useful — not just assembling context, but understanding it.
Week 4+: Add agents incrementally. Code review, scaffolding, documentation. Each one benefits from the knowledge base that's been compounding since week 1.

The key difference: when I started in 2024, I built agents first and wished I'd had a knowledge base. If you start with the knowledge base, every agent you build later is immediately better — because the institutional memory is already there, compounding from day one.

The principles that held up

Across two years and five agents, three principles consistently separated what shipped from what got abandoned:

The bottom line

AI agents in data engineering are production-ready. But their value is determined entirely by the context infrastructure underneath them.

MCP gives agents eyes into your live infrastructure — the what. A knowledge base gives them memory of your organization — the why. Together, they're the full context layer that turns generic AI into your team's AI.

If I were starting a new AI integration engagement today, I'd start with the knowledge base. Not the agents. Not the MCP servers. The knowledge base — because it compounds from day one, and every agent you build on top of it is immediately better for it.

The teams that will win with AI agents aren't the ones with the best models. They're the ones that build the memory infrastructure to make those models genuinely understand their system.

I've written about the knowledge base strategy in depth: The Knowledge Base Strategy: Giving AI Agents a Memory That Compounds. And I've open-sourced the template: knowledge-base-template.