Building a Skills Pipeline: Empowering Agentic AI with Expertise

Agentic AI is now everywhere and virtually every company wants to implement and use them. The rise of the digital colleague is slowly becoming a reality. GenerativeAI told the world that for these systems to work they need access to good, curated, audited data with proper lineage. So the companies that dedicated the last decade to “Big Data” built data pipelines—the intricate processes that ingest, cleanse, transform, and serve massive streams of information. But there’s a critical ingredient that has been flying under the radar for agents: skills. Just as data fuels decision-making, skills are the engine that drives intelligent actions. In our era of autonomous agents, a parallel “Skills Pipeline” is essential. It’s not enough to have data; our agents need a well-organized, continuously updated repository of expertise—a marketplace of skills—to truly execute complex tasks.

Why We Need a Skills Pipeline

Data is now Agentic Data

We’ve seen how agentic processes now live in every part of the data flow—from data ingestion, where AI agents help gather and pre-process raw inputs, to transformation, where they refine and reassemble information. ETL processes would rely on extensive rules that are difficult to maintain and support, while agents today can make decisions on low-risk environments and build those pipelines on the go.

But at the end of the pipeline, traditionally, the serving phase was dominated by machine learning, analytics, and even reverse ETL. The fourth leg will be agentic data. The way agents consume data is different from traditional machine learning models, and arguably it is still under development. Whether it is streaming APIs, discoverability or new standards such as the ones being proposed now (like MCP). And the bigger leg of our future isn’t just serving data—it’s serving skills. Without access to a robust skillset, even the best data-driven insights will fall flat.

Skills as the New Currency

Think of skills as the intellectual capital that agentic AI relies on and the ability to use that capital. They can come from different vendors, be developed internally, or sourced from global expertise—mirroring the way data flows freely from various origins.

Your company can own Terabytes of data but not have the capacity to use it properly or in a beneficial way in the same way that an investor could have access to huge amounts of capital but make terrible investment decisions. Imagine taking a random sample of humans from different geographies, experience and job profiles and show them a comprehensive financial statement and risk report of a given company. Most of those individuals will be able to read the statement and make sense of the general topic of the statement. Most people will understand basic concepts such as revenue or profit. But only a handful of them with specific training and experience in finance or accounting will be able to extract real insights from that report. Agents have the same problem. We develop agentic systems with “sensors” or access to the real world. Either with cameras and microphones or with access to vast number of datasets including the whole internet. We give them the ability to read extremely fast, understand structured data such as tables or even create quick programs to perform calculations. But the ability to truly understand the meaning behind the data needs to be taught.

Let me try to explain what I mean by this. Here is a bunch of text:

I am sure most of you will be able to read it. For the most part it uses well recognized characters (english or greek) and relatively basic arithmetics. Those who did math in high school will probably recognize vectors in the shape of a little arrow on top of letters, etc. You can read the characters. But it is likely that only a few of you (physicists mainly) will truly understand what this is (The Schrödinger-Pauli Hamiltonian full formulation). You have the data, the bits are there. Now, what is that and what do I use it for… is a completely different story.

Here is another example a bit less radical:

Most of you will identify this as a profit and loss statement. And the terms are relatively self-explanatory. Revenue, cost and expenses, net loss, dividends, etc… However only some of you may identify real insights from this. What type of accounting do they do? Is there any explanation to the increase of operating costs on G&A detached from the revenue growth?, Where does the gain from liabilities extinguished come from? Everyone can read this and get a general sense of what is going on (the company is making or losing money). But only those trained in the art of finance and accounting can truly extract insights from these documents.

These are the types of skills agents need. They may be intellectual property of a given company (internal models on how to ‘understand’ information for example), same as the data itself. They may be public knowledge, or they may be sold in some kind of external or internal marketplace, where organizations can buy, sell, and exchange specialized expertise just as they do with data. However, this marketplace won’t work on its own; it must have a systematic process to ingest, normalize, transform, and serve these skills to our intelligent agents.

In short, agentic AI systems need both high-quality data and robust skillsets. Data tells the agents what to do, while skills provide the know-how to act intelligently. Without a parallel skills pipeline, agents will be forced to work with subpar or fragmented instructions, severely limiting their performance. Companies must therefore invest in building pipelines that treat skills with the same rigor as data—ensuring both are standardized and readily accessible.

Architecting the Skills Pipeline

Imagine a “big data pipeline” equivalent—but for agentic skills. Data engineering pipelines are typically dividing between ingestion, transformation and serving of the data. Data usually comes from sources (whether it is internal systems as an ERP or CRM or external such as APIs or IoT devices). They need to be transformed and normalized using ETL or ELT systems and finally distributed and served for data analytics or machine learning systems. The industry of “Big Data” has perfected this pipeline with many different options and allowed the world to serve huge amounts of distributed information. From big batches of data to streaming capabilities.

Fundamentals of Data Engineering by Joe Reis, Matt Housley

Agents are already providing support in each phase of the pipeline (for example with data normalization, or web scraping), but they are also becoming one of the biggest consumers of the distribution end of the flow. While in the psat we had Analytics & dashboards and Machine Learning as its biggest consumers, not agents will be one of the main drivers of data.

One of the biggest challenges companies face to leverage LLMs is that they don’t have a comprehensive data strategy with the right governance and process (in many cases they won’t even have a Data Warehouse, Data Lake or any other system in place). So, they can’t feed their systems properly and securely with company information. This is the first challenge companies need to do to move from a proof of concept to production-grade AI deployments. Go back to the basics and do their homework.

Any company agent or digital colleague, whether developed in house or purchased or licensed will need a robust data access to the company system and with the right governance. In the same way that a manager or C-level executive has access to BI reports and dashboards, an agent will have access to certain systems. For example: a customer support agent should have full access to all the information about clients, previous interactions, product manuals and releases, company policies, etc. And be able to log in the output back into the system as in a “Reverse ETL” solution.

A new data engineering pipeline for AI agent integration must distinguish between how agents access data, how they acquire skills, and how they communicate. For data access, agents require a standardized ingestion framework that connects to diverse data sources. This includes structured databases, APIs, proprietary tools, and real-time event streams. Interoperability frameworks like Anthropic’s Model Context Protocol (MCP) or Langchain Agent Protocol are first steps to get seamless integration across different sources, enabling agents to query and retrieve the most relevant information dynamically. Additionally, standardized schemas and knowledge graphs allow agents to contextualize raw data into meaningful insights, ensuring consistency across multiple sources.

Skill acquisition, on the other hand, requires a distinct pipeline that mirrors data engineering best practices. Agents must ingest skills from multiple sources, including internal documentation, vendor APIs, training modules, and expert contributions. These raw inputs are transformed using semantic parsing and tagging techniques, ensuring alignment with a unified skills taxonomy. By normalizing skill descriptions—such as equating “negotiation” with “conflict resolution”—AI systems can ensure interoperability across different providers. This structured repository acts as a skills data lake, providing a scalable and queryable interface where agents can dynamically fetch, compare, or even update their abilities in real-time via API gateways. Skills will also be licensed and will need to be metered and evaluated. In the same way we sell data feeds today, we will be selling learnable skills.

Once skills and data are integrated, intra-agent communication becomes the key to operationalizing these capabilities. AI agents need a standardized agent communication language or emerging open protocols to interact efficiently. This includes requesting expertise from other agents, sharing context-aware information, and collaborating in multi-step workflows. New frameworks like P3AI are appearin to bring agent-to-agent interoperability, enabling agents to transfer skills, distribute tasks, and leverage specialized capabilities across distributed AI ecosystems. Such architectures prevent siloed models, ensuring AI systems can work as modular, adaptable components in a broader computational network.

Finally, the integration of these layers into agentic workflows ensures that AI agents can execute tasks dynamically. When an agent receives an objective—such as optimizing a supply chain or generating a compliance report—it queries both data and skill pipelines to determine the optimal execution path. By using feedback loops and performance-based learning, the system continuously refines both skill acquisition and data interpretation, ensuring the agent adapts to new business challenges. Ultimately, this approach establishes a scalable, evolving AI ecosystem where agents seamlessly ingest data, acquire new skills, and communicate effectively to drive autonomous decision-making.

Community Efforts and Early Days of Interoperability

The good news is that the community is already working on interoperability standards for agentic systems. Several initiatives are emerging to tackle the challenge:

1. LangChain with Agent Protocol

LangChain is a prominent framework for building LLM applications, with its Agent Protocol announced in November 2024 as a common interface for agent communication. It includes APIs for task execution, threads in multi-turn interactions, and storage for long-term memory, aiming to standardize interactions across frameworks.

Pros:
- Offers a standardized Agent Protocol, making it highly interoperable with other agent systems.
- Framework-agnostic, allowing developers using LangGraph, AutoGen, or custom frameworks to implement it.
- Provides integration guides for connecting with other frameworks like AutoGen and CrewAI, enhancing ecosystem compatibility.
- Includes LangGraph Studio for local development and debugging, facilitating practical implementation and an interface for pipeline creation.
Cons:
- The Agent Protocol is relatively new, potentially leading to complexity for users unfamiliar with its concepts.
- Adoption is still evolving, which may limit immediate community support and mature integrations.

2. AutoGen

AutoGen, developed by Microsoft, is an open-source framework for building multi-agent conversational systems, updated in January 2025 to include cross-language interoperability. It supports Python and .NET, with plans for additional languages, and focuses on agent collaboration and observability.

Pros:
- Cross-language interoperability is a key feature, enabling agents built in different programming languages to communicate, aligning with MCP’s goal of system integration.
- Offers a flexible architecture for multi-agent systems, with tools for observability and control, enhancing developer experience.
- Supports diverse conversation patterns, making it suitable for complex workflows.
Cons:
- May be perceived as more aligned with Microsoft’s ecosystem, potentially limiting its appeal for users outside this environment.
- Setup and configuration can be complex for beginners, especially for cross-language implementations.

3. CrewAI

CrewAI is an open-source Python framework for orchestrating role-playing, autonomous AI agents, emphasizing team collaboration. It allows defining agents with specific roles, goals, and tools, and supports task delegation.

Pros:
- Role-based architecture simplifies defining agent responsibilities, making it intuitive for building multi-agent teams.
- Supports task delegation between agents, enhancing internal collaboration efficiency.
- Can integrate easily with external tools as well as langchain tools and knowledge sources, providing some level of external system interaction.
- An enterprise version provides drag & drop / no-code capabilities.
Cons:
- Lacks explicit interoperability protocols, focusing more on internal orchestration than broad system integration.
- May not match MCP’s level of standardization for connecting to external data sources.

4. Swarm

Swarm, an experimental framework by OpenAI, is a lightweight multi-agent orchestration tool launched in October 2024. It focuses on simplicity, with features like dynamic task handoffs and integration with other tools.

Pros:
- Lightweight and easy to use, making it accessible for developers experimenting with multi-agent systems.
- Interoperable with other tools and frameworks, such as LangChain and Anthropic, enhancing ecosystem compatibility.
- Supports dynamic task handoffs, enabling efficient workflow management.
Cons:
- Experimental status limits its production readiness, with potential for instability.
- Limited documentation and community support compared to established frameworks like LangChain.

5. AWS Multi-Agent Orchestrator

Introduced by AWS, this framework is designed for managing multiple AI agents and handling complex conversations, with dual language support (Python and TypeScript) and extensible architecture.

Pros:
- Supports multiple programming languages, enhancing flexibility for developers.
- Extensible architecture allows integrating new agents or customizing existing ones, supporting interoperability.
- Universal deployment options, including AWS Lambda, local environments, and other cloud platforms, make it versatile.
Cons:
- May be tied to AWS services, potentially limiting its appeal for users outside the AWS ecosystem.
- Relatively new, with less community adoption compared to LangChain or AutoGen as of March 2025.

6. Model Context Protocol (MCP) by Anthropic

MCP, announced by Anthropic in November 2024, is an open standard for connecting AI assistants to external systems, such as content repositories and business tools, to enhance context-awareness.

Pros:
- Open standard promotes widespread adoption, encouraging community and industry buy-in.
- Specifically designed for connecting AI to diverse data sources, improving response quality.
- Backed by Anthropic, a leader in AI research, enhancing credibility.
Cons:
- Relatively new, which may result in limited initial adoption and implementation by other frameworks.
- Requires other systems to adopt the protocol for full interoperability, potentially slowing progress.

Comparative Analysis

Framework	Focus Area	Interoperability Level	Deployment Flexibility	Maturity Level
LangChain with Agent Protocol	Standardization, Communication	High (Protocol-based)	Moderate (Studio, Platform)	Evolving (New Protocol)
AutoGen	Multi-Agent, Cross-Language	High (Cross-Language)	Moderate (Local, Cloud)	Established (Updated 2025)
CrewAI	Role-Based, Collaboration	Moderate (Tool Integration)	Low (Python Focus)	Established (Open Source)
Swarm	Lightweight, Experimental	Moderate (Tool Integration)	Low (Experimental)	Experimental (2024)
AWS Multi-Agent Orchestrator	Extensible, Multi-Language	Moderate (Extensible)	High (Universal)	New (2024)
MCP by Anthropic	Data Source Connection	High (Protocol-based)	High (Open Standard)	New (2024)

LangChain and AutoGen offer high interoperability, aligning closely with MCP, while CrewAI and Swarm are more internally focused but still offer tool integration. AWS Multi-Agent Orchestrator and MCP emphasize flexibility and standardization, respectively.

Implications and Future Directions

The landscape of AI agent orchestration and interoperability is rapidly evolving, with frameworks like LangChain’s Agent Protocol and AutoGen’s cross-language support pushing the boundaries of what is possible. However, challenges remain, such as the experimental nature of Swarm and the newness of MCP, which may affect adoption rates. Future research should focus on standardizing protocols across frameworks to enhance interoperability, potentially leading to an “Internet of Agents” as envisioned by initiatives like AGNTCY, involving LangChain and Cisco. As well as Intel’s open source “Open Platform for Enterprise AI”. Additionally, vertical and niche frameworks can excel at specific tasks (for example: Bolt, Lovable and v0 for coding and many others for design and protoyping, legal agents, etc.).

For practitioners, selecting a framework depends on specific needs: LangChain and AutoGen for broad interoperability, CrewAI for role-based teams, and AWS Multi-Agent Orchestrator for AWS-centric deployments. MCP’s open standard could become a benchmark, but its success depends on community adoption. The ecosystem evolves rapidly. It is expected that new entrants will enter the space, as well as new frameworks provided by hyperscalers. There is not yet a globally accepted standard for agent intra communication and skill usage or data acquisition but it is likely to come eventually.

Challenges and Future Directions

Interoperability is complex. As we build these pipelines, we must navigate:

Diverse Data Formats:
Skills come in various forms—text documents, video tutorials, certifications, and even informal knowledge-sharing platforms. Normalizing this heterogeneous input is nontrivial.
Evolving Standards:
With frameworks like MCP still in their infancy, early adopters will need to remain agile and ready to adjust as standards mature.
Security and Privacy:
Just as with data, ensuring that proprietary or sensitive skills (especially those developed internally) are securely managed is critical.

Looking ahead, the integration of a robust skills pipeline will not only empower AI agents but also democratize access to expertise, fostering a competitive internal marketplace where companies can choose the best skill sets available.

Conclusion

In the coming era of agentic AI, data pipelines are only half the story. For our intelligent agents to truly revolutionize how we work, they must be armed with both the raw data and the sophisticated skills required to process that data effectively. A parallel “Skills Pipeline” is essential—a system that ingests, normalizes, transforms, and serves skills from diverse sources, creating an internal marketplace of expertise.

While community efforts, such as the MCP initiatives and open API projects, are promising, interoperability in this space is still in its early days. The journey ahead will require continuous collaboration between developers, vendors, and industry bodies to establish robust standards. Companies need to thing in two parallel lines now: a data pipeline and a skill pipeline. Both involve proprietary and 3rd party intellectual property, complex interconnections and interoperability and one feeds the other.

It’s time to rethink our digital infrastructure. The future isn’t just about having more data—it’s about having the right skills, available on demand, to harness that data for intelligent, autonomous decision-making. Let’s build a world where our AI agents not only know what to do but know how to do it, powered by a dynamic, integrated skills pipeline.

Building a Skills Pipeline: Empowering Agentic AI with Expertise

Post navigation

Leave a Reply Cancel reply