This year, then, seems to be the year of multi-agent and multi-modal AI. These involve dozens or perhaps hundreds of smaller, specialized models that often communicate and work together autonomously. Basically, these agents work like a perfect team in an organization should. Without a team leader watching all the time, everyone takes his or her role in carrying out a task. Each team member also knows at the right moment to transfer tasks to a colleague when he or she has more knowledge and experience to do so.
Organizations are thus rapidly moving from using a few individual AI agents to a "swarm" of AI agents. In customer service, for example, one agent specializes in summarizing telephone calls, a second can search the manual database, a third filters the right information from the CRM system, a fourth knows how to extract order and payment information from an ERP system, and a fifth does a compliance check so that sensitive data is not inadvertently leaked. The information gathered can be explained to the customer by a chatbot.
In this way, multi-agent systems can take over many tasks: logistics planning, QA reporting, supply chain optimization or tracking and visibility of healthcare records. Agents no longer consult only text- and number-based sources, but increasingly process and generate images, audio and sometimes video simultaneously. Thus, true multi-modal agents are emerging. Market analyst IDC expects an average organization to deploy some 100 to 200 AI agents as digital colleagues within its operations by 2028.
However, there are still many open questions surrounding the reliability and security of such multi-agent, multi-modal systems. Also looming is the danger of "AI proliferation": more and more AI agents become active in the organization, leading to a lack of oversight and governance. Therein lies a task for the enterprise architect to merge the agent systems into a manageable platform. Then the organization itself has actually created a digital "sheep with five legs.