Multiagent Adversarial Collaborative Learning via Mean-Field Theory

doi:10.1109/TCYB.2020.3025491

CORC > 自动化研究所 > 中国科学院自动化研究所 > 复杂系统管理与控制国家重点实验室 > 先进控制与自动化团队

	Multiagent Adversarial Collaborative Learning via Mean-Field Theory
	Luo, Guiyang 4; Zhang, Hui 2,3; He, Haibo 1; Li, Jinglin 4; Wang, Fei-Yue 2,5,6
刊名	IEEE TRANSACTIONS ON CYBERNETICS
	2021-10-01
卷号	51 期号:10 页码:4994-5007
关键词	Games Training Collaborative work Task analysis Nash equilibrium Sociology Statistics Adversarial collaborative learning (ACL) friend-or-foe Q-learning mean-field theory multiagent reinforcement learning (MARL)
ISSN号	2168-2267
DOI	10.1109/TCYB.2020.3025491
通讯作者	Luo, Guiyang(luoguiyang@bupt.edu.cn)
英文摘要	Multiagent reinforcement learning (MARL) has recently attracted considerable attention from both academics and practitioners. Core issues, e.g., the curse of dimensionality due to the exponential growth of agent interactions and nonstationary environments due to simultaneous learning, hinder the large-scale proliferation of MARL. These problems deteriorate with an increased number of agents. To address these challenges, we propose an adversarial collaborative learning method in a mixed cooperative-competitive environment, exploiting friend-or-foe Q-learning and mean-field theory. We first treat neighbors of agent i as two coalitions (i's friend and opponent coalition, respectively), and convert the Markov game into a two-player zero-sum game with an extended action set. By exploiting mean-field theory, this new game simplifies the interactions as those between a single agent and the mean effects of friends and opponents. A neural network is employed to learn the optimal mean effects of these two coalitions, which are trained via adversarial max and min steps. In the max step, with fixed policies of opponents, we optimize the friends' mean action to maximize their rewards. In the min step, the mean action of opponents is trained to minimize the friends' rewards when the policies of friends are frozen. These two steps are proved to converge to a Nash equilibrium. Then, another neural network is applied to learn the best response of each agent toward the mean effects. Finally, the adversarial max and min steps can jointly optimize the two networks. Experiments on two platforms demonstrate the learning effectiveness and strength of our approach, especially with many agents.
资助项目	Natural Science Foundation of China[61876023] ; National Science Foundation[ECCS 1917275]
WOS关键词	COMPREHENSIVE SURVEY ; CONTROL SCHEME ; SYSTEM ; DESIGN
WOS研究方向	Automation & Control Systems ; Computer Science
语种	英语
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号	WOS:000706832000023
资助机构	Natural Science Foundation of China ; National Science Foundation
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/46227]
专题	自动化研究所_复杂系统管理与控制国家重点实验室_先进控制与自动化团队
通讯作者	Luo, Guiyang
作者单位	1.Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA 2.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 3.Tencent Res, Technol & Engn Grp, Beijing 100193, Peoples R China 4.Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100088, Peoples R China 5.Macau Univ Sci & Technol, Inst Syst Engn, Macau, Peoples R China 6.Qingdao Acad Intelligent Ind, Innovat Ctr Parallel Vis, Qingdao 266109, Peoples R China
推荐引用方式 GB/T 7714	Luo, Guiyang,Zhang, Hui,He, Haibo,et al. Multiagent Adversarial Collaborative Learning via Mean-Field Theory[J]. IEEE TRANSACTIONS ON CYBERNETICS,2021,51(10):4994-5007.
APA	Luo, Guiyang,Zhang, Hui,He, Haibo,Li, Jinglin,&Wang, Fei-Yue.(2021).Multiagent Adversarial Collaborative Learning via Mean-Field Theory.IEEE TRANSACTIONS ON CYBERNETICS,51(10),4994-5007.
MLA	Luo, Guiyang,et al."Multiagent Adversarial Collaborative Learning via Mean-Field Theory".IEEE TRANSACTIONS ON CYBERNETICS 51.10(2021):4994-5007.