01 / Scientific foundationBuilt on peer-reviewed research, not vibes.
MAD Studio operationalizes the leading academic frameworks in multi-agent debate, turning them from research notebooks into a production-grade workbench. Every protocol is traceable to a published methodology.
Paper 01MIT / Google Brain · ICML 2024
Improving Factuality and Reasoning in Language Models through Multiagent DebateDu, Li, Torralba, Tenenbaum, Mordatch
Foundational result: agents critiquing each other across rounds converge on more factual, better-reasoned answers.
Paper 02Tencent AI Lab · EMNLP 2024
Encouraging Divergent Thinking in LLMs through Multi-Agent DebateLiang, He, Ma, Zhang, Wang, Hu, Zhang, Lin
Establishes that adversarial multi-agent debate counteracts degeneration of thought and unlocks deeper reasoning.
Paper 03ACL 2025
M-MAD: Multidimensional Multi-Agent Debate for Translation EvaluationFeng, Zhao, Lyu, Li, Tu, Wang
Introduces the per-dimension arbiter sweep that powers MAD Studio's truth-seeking verdict scoring.
Paper 04ICLR 2024
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent DebateChan, Chen, Yu, Lu, Sun, Liu
Demonstrates that multi-agent debate panels evaluate generated text more reliably than single-judge baselines.
Paper 05Anthropic · ICML 2024 (Best Paper)
Debating with More Persuasive LLMs Leads to More Truthful AnswersKhan, Hughes, Valentine, Ruis, Sachan, Radhakrishnan, Bowman, Perez
Strong empirical evidence that debate makes weaker judges reliably select truthful answers from stronger debaters — the modern, capable-model successor to the original debate-as-alignment thesis.
Paper 06Microsoft · COLM 2024
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent ConversationWu, Bansal, Zhang, Wu, Li, Zhu, Wang, Saied, Awadallah, Awadalla, Wang
Shows that role-specialized agent groups orchestrated through structured conversation consistently outperform monolithic prompts on complex tasks.
Paper 07Northeastern · NeurIPS 2023
Reflexion: Language Agents with Verbal Reinforcement LearningShinn, Cassano, Berman, Gopinath, Narasimhan, Yao
Verbal self-critique loops iteratively raise agent performance — the direct precedent for Saga's recursive optimization passes.
Paper 08CMU · NeurIPS 2023
Self-Refine: Iterative Refinement with Self-FeedbackMadaan, Tandon, Gupta, Hallinan, Gao, Wiegreffe, Alon, et al.
Single-model iterative refinement via self-generated feedback. The minimal version of what multi-agent debate scales up across roles.
Paper 09Together AI · ICLR 2025
Mixture-of-Agents Enhances Large Language Model CapabilitiesWang, Bai, Liu, Chen, Cardie, Zhang, et al.
Layered multi-LLM collaboration where each layer's agents refine the previous layer's outputs. Open-source MoA reaches 65.1% on AlpacaEval 2.0, beating GPT-4 Omni.
Paper 102026
Demystifying Multi-Agent Debate: The Role of Confidence and DiversityChoi, Zhu, Li, et al.
Pinpoints when multi-agent debate actually beats majority vote: diversity-aware initialization plus calibrated confidence communication. Directly informs MAD Studio's persona and confidence design.