MetaGPT: Navigating the Future of Multi-Agent Large Language Models

Dive deep into MetaGPT, a groundbreaking framework on GPT for collaborative code generation and problem-solving in Large Language Models.

MetaGPT: Multi-Agent Large Language Models


In every narrative of triumph, be it in the cerebral realm of code or the bustling synergy of a workplace, the protagonist is never a lone warrior but a harmonious collective. The adage, ‘Teamwork makes the dream work,’ finds its veracious essence not just in human endeavours but also in the digital synapses of modern computational marvels. The realm of Large Language Models (LLMs) is no stranger to this camaraderie. The concept of cross-agent interaction isn't a new sorcery but a tested elixir in the computational saga. The chronicles of reinforcement learning and the ballet of swarm drones have long showcased how collaborative antics can morph into something magical, something transcendent. It's a realm where many a digital entity come together to conjure solutions that are far superior to the solitary attempts of a singular agent.

Embarking on a quest to create a digital rendition of the classic Tic Tac Toe game using MetaGPT, I was set to explore the zenith of collaborative magic that could be achieved. This post will dive into MetaGPT, a groundbreaking meta-programming framework built on the GPT-3 language model. Together, we'll explore how this system harmoniously blends code, thought, and collaboration, allowing for a journey beyond mere game creation into a narrative of sophisticated solution crafting and a touch of self-reflection on the nuanced dance between code perfection and collaborative intelligence.

Key Takeaways for Business Leaders:
- Swift Execution: MetaGPT significantly accelerates the journey from idea conception to execution, transforming weeks of work into mere hours.
- Strategic Empowerment: Provides a robust mechanism for performing quick user stories, competitive analysis, technical communication, and strategic planning, directly enhancing operational efficiency and product quality.
- Innovative Edge: Serves as a dynamic resource for navigating the tech landscape, and driving innovation within your team.

Beyond Solo Acts: A Gaze into Contemporary Multi-Agent Approaches

In the endeavour to craft a comprehensive understanding of Multi-Agent Large Language Models (LLMs), it's imperative to delve into the essence of various contemporary approaches that have surfaced in the AI domain. Here's a brief juxtaposition of some notable multi-agent LLM frameworks, each with its unique strengths and limitations, transitioning into the focal point of our discussion - MetaGPT.

AutoGPT: The Solo Virtuoso

  • Strengths: Known for its ability to autonomously generate task-specific models, offering a semblance of logical reasoning in its operations.
  • Limitations: Despite its prowess in task reasoning, AutoGPT lacks a mechanism to validate task completion, often rendering a dissonance between its to-do list and the actual operational output.

AgentVerse: The Harmony Experts

  • Strengths: AgentVerse steps into the sphere with a promise of dynamic interaction between multiple agents, fostering a collaborative environment.
  • Limitations: The granularity of interactions and the extent to which these collaborations mirror real-world multi-agent dynamics remain areas seeking refinement.

MetaGPT: The Starry-eyed Maestro

  • Strengths: MetaGPT, the protagonist in our narrative, emerges as a meta-programming framework that transcends the conventional boundaries, orchestrating a symphony of collaborative agents each donning a unique role. In our quest, the Tic Tac Toe game unveils the finesse of MetaGPT's Product Manager, Architect, and Engineer agents, offering a tableau of collaborative intelligence.
  • Limitations: Our expedition also uncovers the scope of improvement, particularly in the realm of variable scope management within the generated code, a spectacle showcased in the Engineer agent's narrative during the Tic Tac Toe saga.

As we traverse the nuances of MetaGPT through the lens of a simple yet telling game of Tic Tac Toe, the panorama of multi-agent collaboration unfolds, narrating tales of triumphs, tribulations, and the relentless pursuit of excellence in the ever-evolving domain of AI. The ensuing sections will meticulously dissect the intricacies of MetaGPT, juxtaposed against its contemporaries, offering a rich tapestry of insights into the boundless potential and the road ahead in multi-agent LLMs.

Unveiling the MetaGPT Framework: A Symphony of Digital Maestros

In the grand theatre of computational creativity, MetaGPT takes centre stage as a distinguished meta-programming framework, elegantly poised atop the mighty GPT-3 platform. The essence of MetaGPT is orchestrated through a meticulously designed dual-layered framework, each layer playing a pivotal role in harmonising the cacophony of tasks into a melodious output. The brilliance of MetaGPT is enshrined in its dual-layered structure, each layer playing a distinctive yet complementary role in the grand scheme of operations.

Foundational Components Layer: It is the basic building block required for individual agent operations and system-wide information exchange. The components include Environment, Memory, Roles, Actions, and Tools.

Collaboration Layer: The collaboration layer takes the role of the conductor, leading the ensemble of agents towards collaborative problem-solving. This layer introduces important mechanisms such as Knowledge Sharing and Encapsulating Workflows, which create a cooperative environment among the agents.

The heart of MetaGPT beats through its Role Classes, each a virtuoso with a unique narrative and a crucial part to play in the collaborative saga. Roles are akin to the diverse pieces on a chessboard, each endowed with unique capabilities yet orchestrated towards a common objective. To initiate this process, a user simply provides a one-line requirement. For instance, my chosen task was:

Build a game of tic tac toe in Python with graphical interface

This classic game was selected for its fundamental nature, incorporating both an AI element and the capability for an AI agent to interact with a human player.

  • Product Manager: Envision the Product Manager as the visionary composer, orchestrating the broader goals and sketching the outlines of the digital opus MetaGPT is set to craft. his agent swiftly generates a comprehensive document, detailing Original Requirements, Product Goals, User Stories, and even a competitive analysis, complete with a quadrant chart. Furthermore, it breaks down the project into smaller, prioritised sub-tasks, laying the groundwork for efficient project execution.
Competitive Quadrant Chart
  • Architect: Next, the Architect steps in, a seasoned virtuoso responsible for crafting the structural blueprint of the project, ensuring all components are aligned for seamless execution. This role delves into the implementation approach, defining necessary files, data structures, interface definitions, and the overall program flow.
Interface definitions

Program Call Flow
  • Project Manager: The Project Manager, a maestro of the organisation, ensures that each note of the project is in its rightful place, ensuring a harmonious flow towards the finale. This role identifies necessary third-party packages (such as tkinter for the tic-tac-toe game) and integrates the detailed specifications into the file structure outlined by the Architect.
  • Engineer: The Engineer, a master craftsman in code, translates the collective vision into tangible digital outcomes. This agent implements the code and compiles all components, preparing them for deployment. Below is an example of the file created by the Engineer:

The ensemble of MetaGPT is further enriched by roles such as the QA Engineer, who ensures the quality of the final product; the UI Designer, responsible for the visual appeal; and the Sales Agent, who introduces the masterpiece to the audience. Each role plays a pivotal part in the grand narrative of MetaGPT, showcasing the framework's power in harmonising diverse skills and perspectives towards crafting innovative solutions.

Mechanisms of MetaGPT: The Technical Conduits of Collaboration

The allure of MetaGPT lies not just in its visionary design but in the technical mechanisms that drive its operational prowess. These mechanisms form the backbone of MetaGPT, orchestrating a realm where code, collaboration, and creativity converge into a harmonious ballet of digital solutions. Let’s delve into the technical choreography that defines the MetaGPT framework:

Prompt Engineering: The Lexicon of Collaboration

At the heart of MetaGPT's operations is the art and science of Prompt Engineering. This mechanism transmutes human Standard Operating Procedures (SOPs) into comprehendible prompts for GPT, serving as the lexicon maestro conducting the language model to resonate with the desired operational tune.

  • Encoding SOPs: The essence of human procedural knowledge is encoded into prompts that GPT can decipher. This encoding is the bridge that melds human expertise with digital execution.
  • Quality Output Generation: By translating SOPs into a language that GPT understands, MetaGPT orchestrates the generation of high-quality outputs that conform to human standards.

Collaborative Agent Module: The Orchestra Conductor

The Collaborative Agent Module is where the solo performances of individual agents are orchestrated into a collaborative symphony. It's the conductor that ensures each agent's expertise is harmonised into a cohesive narrative.

  • Message Routing: Efficient communication is facilitated through message routing, ensuring that the dialogue among agents is smooth, relevant, and timely.
  • Agent Coordination: A meticulous coordination among agents ensures that each agent’s contribution is synchronised with the collective goal.
  • Knowledge Sharing: The mechanism fosters a culture of knowledge sharing, enriching the collective understanding and fostering a collaborative problem-solving environment.

Meta Programming: The Sculptor of Code

Meta Programming in MetaGPT is akin to a code virtuoso capable of dynamic compositions. It's not just about writing code; it's about writing code that can manipulate other programs, enabling a dynamic manipulation at runtime.

  • Runtime Code Generation and Analysis: MetaGPT can conjure, analyse, and modify code on the fly, showcasing a level of dynamism that’s pivotal for tackling complex tasks.
  • Program Manipulation: By treating programs as data, MetaGPT can manipulate them to generate new programs, analyse existing ones, and transform code to meet evolving requirements.
  • Automated Code Transformation: This mechanism facilitates automated code transformation, ensuring the code is robust and adaptable to the changing contours of the problem landscape.

These three are the gears and levers that propel MetaGPT into the realms of enhanced robustness, reduced errors, and an ability to engineer software solutions adept at navigating the complex labyrinths of modern-day computational challenges.

Action Framework within MetaGPT: Choreographing Digital Expertise

In the theatre of MetaGPT, actions are the script that guides the performance of each role on the digital stage. The Action Framework is a meticulously crafted paradigm that encapsulates the essence of role-specific interactions, ensuring a harmonious ballet of computational processes towards achieving the desired outcomes. This section delves into the core mechanisms that constitute the Action Framework, namely role-specific prompt prefixing, LLM proxy integration, and the standardised output schema definition for actions.

  • Role-specific Prompt Prefixing: MetaGPT employs a strategy of prefixing prompts with role-specific identifiers, establishing a contextual groundwork for the ensuing interaction with the LLM proxy. The set_prefix() method configures identifiers for role-specific prompts, ensuring that the ensuing interactions are tuned to the nuances of the respective roles.
  • LLM Proxy Integration: Each action within MetaGPT houses an LLM proxy, which can be invoked via the ask() method, serving as a conduit for enriching action details using input context expressed in natural language prompts. Within the Action class, role-specific context parsing functions are devised to extract and provide sufficient contextual information from inputs to the LLMs, ensuring a focused and relevant interaction.
  • Standardised Output Schema Definition: MetaGPT emphasises on defining standardised output schemas for actions, providing a structural representation to extract structured data from action output. The structured data, encapsulated as messages and published to the environment, adheres to the predefined schemas, ensuring consistency, quality, and relevance in the generated outputs.
  • Ensuring Quality and Consistency: By adhering to a standardized output schema, MetaGPT ensures that the LLM’s behaviour is steered towards generating normalized outputs that align with real-world quality standards. The Action Framework is instrumental in guiding the LLM interactions, ensuring that the generated content not only meets the predefined standards but also aligns with the role-specific objectives and constraints.

The Action Framework within MetaGPT is a testament to the refined approach towards orchestrating a coherent and goal-oriented interaction among digital agents. By instilling a structured, role-specific, and standardised schema-driven interaction paradigm, MetaGPT elevates the quality and relevance of the generated outputs.

The MetaGPT Workflow: A Digital Ballet from Conception to Culmination

Embarking on a journey with MetaGPT is akin to orchestrating a digital ballet, where each step is meticulously choreographed from the user's initial whisper of a requirement to the grand finale of a polished output. Here, we unravel the step-by-step choreography that unfolds in the MetaGPT workflow:

  • The Prelude: Capturing User Requirement: Our narrative commences with the user presenting MetaGPT with a one-line requirement, the seed from which the mighty oak of solution will germinate.
  • The Act of Interpretation: Prompt Engineering: The Prompt Engineering module takes the baton, encoding the user's requirement into a prompt that GPT-3 can comprehend, laying the first stone on the path of solution crafting.
  • The Ensemble Performance: Collaborative Agent Module: The Collaborative Agent Module steps into the limelight, distributing the encoded prompt to the relevant agents. It's the maestro ensuring each musician in the orchestra has the sheet music ready.
  • The Solo Acts: Individual Agent Processing: Each agent, now with the prompt, embarks on its solo performance, processing the prompt and generating its own unique output, adding its melody to the growing symphony.
  • The Harmonic Convergence: Output Consolidation: Once again, the Collaborative Agent Module takes centre stage, collecting the outputs from the agents. It's the moment where individual melodies start to harmonize into a collective tune. The module then meticulously weaves the individual outputs into a final output, a cohesive solution ready to be presented to the user.
  • The Grand Finale: Delivering the Output: The final act unveils as the consolidated output is gracefully handed over to the user, marking the culmination of a meticulously orchestrated workflow.

Through this workflow, MetaGPT unveils a narrative of how modern computational frameworks can translate a whisper of a requirement into a roar of a solution, each step a stride towards a future where collaborative intelligence defines the realm of possibilities.

Evaluation and Performance: The Litmus Test of MetaGPT

MetaGPT was put through the crucible of evaluation using two revered open-source benchmarks - HumanEval and MBPP. HumanEval includes 164 handwritten programming tasks, each furnished with a function specification, description, reference code, and multiple unit tests. On the other hand, MBPP boasts of 427 Python programming tasks, each carrying a description, reference code, and a battery of automated test cases. MetaGPT was juxtaposed against the prowess of frameworks like CodeX, CodeT, and the behemoth GPT-4. MetaGPT didn't just hold its ground but soared, setting a new benchmark for state-of-the-art performance, significantly outperforming the other methods in both benchmarks.

*Figure source: Figure 7, MetaGPT paper

Not only this, MetaGPT exhibited a robust performance across a diverse set of tasks, achieving successful execution in all tasks, which was a feat that other frameworks like AutoGPT, LangChain, and AgentVerse found elusive.

Returning to the Tic Tac Toe example, while MetaGPT generated a comprehensive code, I encountered a hiccup during execution:

Traceback (most recent call last):
File "/Users/am/Documents/metagpt-tic_tac_toe2/tic_tac_toe/", line 1, in <module>
from player import Player
File "/Users/am/Documents/metagpt-tic_tac_toe2/tic_tac_toe/", line 4, in <module>
from game import Game
File "/Users/am/Documents/metagpt-tic_tac_toe2/tic_tac_toe/", line 2, in <module>
from player import Player
ImportError: cannot import name 'Player' from partially initialized module 'player' (most likely due to a circular import) (/Users/am/Documents/metagpt-tic_tac_toe2/tic_tac_toe/

This cyclic import issue likely stemmed from the context limit or what is referred to as the hallucinatory tendency in the MetaGPT paper. Despite this initial setback, it's important to highlight that with minimal (human-expert) adjustments, the code was swiftly corrected and ran seamlessly. This instance underscores MetaGPT's user-friendly design, ensuring that solutions are readily attainable even when faced with challenges.


In this extensive exploration of MetaGPT, we've delved deep into the intricacies of its framework, uncovering the symphony of collaboration and innovation that it brings to the table. From the initial whispers of a user's requirement to the final polished output, MetaGPT has proven itself as a formidable force in the realm of Large Language Models, orchestrating a seamless blend of individual expertise and collective intelligence.

Despite facing challenges such as the circular import issue in our Tic Tac Toe example, MetaGPT demonstrated resilience and adaptability, showcasing that with a few tweaks, it could overcome hurdles and deliver a robust solution. This journey has not only been a testament to MetaGPT's technical prowess but also a narrative of the relentless pursuit of excellence in the ever-evolving domain of AI.

As we stand on the precipice of this digital renaissance, MetaGPT emerges as a beacon of collaborative intelligence, guiding us towards a future where the boundaries of what's possible are continually pushed, and the harmony between code and collaboration paints a canvas of endless possibilities. The dance between man and machine has never been more synchronised, and the future, undoubtedly, is bright.


Amita Kapoor: Author, Research & Code Development
Narotam Singh: Copy Editor, Design & Digital Management

Consent Preferences