Yudi Ruan

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include Deep Reinforcement Learning and motion planning.

Research and Interests

Deep Reinforcement Learning
Motion planning

Publications

Weiwei Liu, Wei Jing, Shanqi Liu, Yudi Ruan, Kexin Zhang, Jian Yang, and Yong Liu. Expert Demonstrations Guide Reward Decomposition for Multi-Agent Cooperation. Neural Computing and Applications, 35:19847-19863, 2023.
[BibTeX] [Abstract] [DOI] [PDF]

Humans are able to achieve good teamwork through collaboration, since the contributions of the actions from human team members are properly understood by each individual. Therefore, reasonable credit assignment is crucial for multi-agent cooperation. Although existing work uses value decomposition algorithms to mitigate the credit assignment problem, since they decompose the global value function at multi-agents’ local value function level, the overall evaluation of the value function can easily lead to approximation errors. Moreover, such strategies are vulnerable to sparse reward scenarios. In this paper, we propose to use expert demonstrations to guide the team reward decomposition at each time step, rather than value decomposition. The proposed method computes the reward ratio of each agent according to the similarity between the state-action pair of the agent and the expert demonstrations. In addition, under this setting, each agent can independently train its value function and evaluate its behavior, which makes the algorithm highly robust to team rewards. Moreover, the proposed method constrains the policy to collect data with similar distribution to the expert data during the exploration, which makes policy update more robust. We conduct extensive experiments to validate our proposed method in various MARL environments, the results show that our algorithm outperforms the state-of-the-art algorithms in most scenarios; our method is robust to various reward functions; and the trajectories by our policy is closer to that of the expert policy.

@article{liu2023edg,
title = {Expert Demonstrations Guide Reward Decomposition for Multi-Agent Cooperation},
author = {Weiwei Liu and Wei Jing and Shanqi Liu and Yudi Ruan and Kexin Zhang and Jian Yang and Yong Liu},
year = 2023,
journal = {Neural Computing and Applications},
volume = 35,
pages = {19847-19863},
doi = {10.1007/s00521-023-08785-6},
abstract = {Humans are able to achieve good teamwork through collaboration, since the contributions of the actions from human team members are properly understood by each individual. Therefore, reasonable credit assignment is crucial for multi-agent cooperation. Although existing work uses value decomposition algorithms to mitigate the credit assignment problem, since they decompose the global value function at multi-agents' local value function level, the overall evaluation of the value function can easily lead to approximation errors. Moreover, such strategies are vulnerable to sparse reward scenarios. In this paper, we propose to use expert demonstrations to guide the team reward decomposition at each time step, rather than value decomposition. The proposed method computes the reward ratio of each agent according to the similarity between the state-action pair of the agent and the expert demonstrations. In addition, under this setting, each agent can independently train its value function and evaluate its behavior, which makes the algorithm highly robust to team rewards. Moreover, the proposed method constrains the policy to collect data with similar distribution to the expert data during the exploration, which makes policy update more robust. We conduct extensive experiments to validate our proposed method in various MARL environments, the results show that our algorithm outperforms the state-of-the-art algorithms in most scenarios; our method is robust to various reward functions; and the trajectories by our policy is closer to that of the expert policy.}
}

Address

Yudi Ruan

Biography

Research and Interests

Publications

Latest Events

APRIL实验室刘勇教授荣获首届浙江省知识产权奖发明专利一等奖

APRIL实验室联合腾讯优图实验室斩获CVPR 2023视觉异常检测（VAND）挑战赛冠军

APRIL实验室刘勇教授入选全球前2%顶尖科学家榜单