Weiwei Liu

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. in College of Control Science and Engineering， Zhejiang University, Hangzhou, China.

Research and Interests

Reinforcement Learning

Publications

Chengrui Zhu, Zhen Zhang, Weiwei Liu, Siqi Li, and Yong Liu. Learning Accurate and Robust Velocity Tracking for Quadrupedal Robots. Journal of Field Robotics, 2025.
[BibTeX] [DOI]

@article{zhu2025lar,
title = {Learning Accurate and Robust Velocity Tracking for Quadrupedal Robots},
author = {Chengrui Zhu and Zhen Zhang and Weiwei Liu and Siqi Li and Yong Liu},
year = 2025,
journal = {Journal of Field Robotics},
doi = {10.1002/rob.70028}
}

Linpeng Peng, Rongyao Cai, Jingyang Xiang, Junyu Zhu, Weiwei Liu, Wang Gao, and Yong Liu. LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation. IEEE Robotics and Automation Letters, 9:7995-8002, 2024.
[BibTeX] [Abstract] [DOI] [PDF]

Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net’s original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.

@article{peng2024lal,
title = {LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation},
author = {Linpeng Peng and Rongyao Cai and Jingyang Xiang and Junyu Zhu and Weiwei Liu and Wang Gao and Yong Liu},
year = 2024,
journal = {IEEE Robotics and Automation Letters},
volume = 9,
pages = {7995-8002},
doi = {10.1109/LRA.2024.3436336},
abstract = {Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net's original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.}
}

Shanqi Liu, Weiwei Liu, Wenzhou Chen, Guanzhong Tian, Jun Chen, Yao Tong, Junjie Cao, and Yong Liu. Learning Multi-Agent Cooperation via Considering Actions of Teammates. IEEE Transactions on Neural Networks and Learning Systems, 35:11553-11564, 2024.
[BibTeX] [Abstract] [DOI] [PDF]

Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ‘ s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ‘ actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.

@article{liu2024lma,
title = {Learning Multi-Agent Cooperation via Considering Actions of Teammates},
author = {Shanqi Liu and Weiwei Liu and Wenzhou Chen and Guanzhong Tian and Jun Chen and Yao Tong and Junjie Cao and Yong Liu},
year = 2024,
journal = {IEEE Transactions on Neural Networks and Learning Systems},
volume = 35,
pages = {11553-11564},
doi = {10.1109/TNNLS.2023.3262921},
abstract = {Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ' s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ' actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.}
}

Gang Xu, Xiao Kang, Helei Yang, Yuchen Wu, Weiwei Liu, Junjie Cao, and Yong Liu. Distributed Multi-Vehicle Task Assignment and Motion Planning in Dense Environments. IEEE Transactions on Automation Science and Engineering, 21:7027-7039, 2024.
[BibTeX] [Abstract] [DOI] [PDF]

This article investigates the multi-vehicle task assignment and motion planning (MVTAMP) problem. In a dense environment, a fleet of non-holonomic vehicles is appointed to visit a series of target positions and then move to a specific ending area for real-world applications such as clearing threat targets, aid rescue, and package delivery. We presented a novel hierarchical method to simultaneously address the multiple vehicles’ task assignment and motion planning problem. Unlike most related work, our method considers the MVTAMP problem applied to non-holonomic vehicles in large-scale scenarios. At the high level, we proposed a novel distributed algorithm to address task assignment, which produces a closer to the optimal task assignment scheme by reducing the intersection paths between vehicles and tasks or between tasks and tasks. At the low level, we proposed a novel distributed motion planning algorithm that addresses the vehicle deadlocks in local planning and then quickly generates a feasible new velocity for the non-holonomic vehicle in dense environments, guaranteeing that each vehicle efficiently visits its assigned target positions. Extensive simulation experiments in large-scale scenarios for non-holonomic vehicles and two real-world experiments demonstrate the effectiveness and advantages of our method in practical applications. The source code of our method can be available at https://github.com/wuuya1/LRGO. Note to Practitioners-The motivation for this article stems from the need to solve the multi-vehicle task assignment and motion planning (MVTAMP) problem for non-holonomic vehicles in dense environments. Many real-world applications exist, such as clearing threat targets, aid rescue, and package delivery. However, when vehicles need to continuously visit a series of assigned targets, motion planning for non-holonomic vehicles becomes more difficult because it is more likely to occur sharp turns between adjacent target path nodes. In this case, a better task allocation scheme can often lead to more efficient target visits and save all vehicles’ total traveling distance. To bridge this, we proposed a hierarchical method for solving the MVTAMP problem in large-scale complex scenarios. The numerous large-scale simulations and two real-world experiments show the effectiveness of the proposed method. Our future work will focus on the integrated task assignment and motion planning problem for non-holonomic vehicles in highly dynamic scenarios.

@article{xu2024dmv,
title = {Distributed Multi-Vehicle Task Assignment and Motion Planning in Dense Environments},
author = {Gang Xu and Xiao Kang and Helei Yang and Yuchen Wu and Weiwei Liu and Junjie Cao and Yong Liu},
year = 2024,
journal = {IEEE Transactions on Automation Science and Engineering},
volume = 21,
pages = {7027-7039},
doi = {10.1109/TASE.2023.3336076},
abstract = {This article investigates the multi-vehicle task assignment and motion planning (MVTAMP) problem. In a dense environment, a fleet of non-holonomic vehicles is appointed to visit a series of target positions and then move to a specific ending area for real-world applications such as clearing threat targets, aid rescue, and package delivery. We presented a novel hierarchical method to simultaneously address the multiple vehicles' task assignment and motion planning problem. Unlike most related work, our method considers the MVTAMP problem applied to non-holonomic vehicles in large-scale scenarios. At the high level, we proposed a novel distributed algorithm to address task assignment, which produces a closer to the optimal task assignment scheme by reducing the intersection paths between vehicles and tasks or between tasks and tasks. At the low level, we proposed a novel distributed motion planning algorithm that addresses the vehicle deadlocks in local planning and then quickly generates a feasible new velocity for the non-holonomic vehicle in dense environments, guaranteeing that each vehicle efficiently visits its assigned target positions. Extensive simulation experiments in large-scale scenarios for non-holonomic vehicles and two real-world experiments demonstrate the effectiveness and advantages of our method in practical applications. The source code of our method can be available at https://github.com/wuuya1/LRGO. Note to Practitioners-The motivation for this article stems from the need to solve the multi-vehicle task assignment and motion planning (MVTAMP) problem for non-holonomic vehicles in dense environments. Many real-world applications exist, such as clearing threat targets, aid rescue, and package delivery. However, when vehicles need to continuously visit a series of assigned targets, motion planning for non-holonomic vehicles becomes more difficult because it is more likely to occur sharp turns between adjacent target path nodes. In this case, a better task allocation scheme can often lead to more efficient target visits and save all vehicles' total traveling distance. To bridge this, we proposed a hierarchical method for solving the MVTAMP problem in large-scale complex scenarios. The numerous large-scale simulations and two real-world experiments show the effectiveness of the proposed method. Our future work will focus on the integrated task assignment and motion planning problem for non-holonomic vehicles in highly dynamic scenarios.}
}

Weiwei Liu, Wei Jing, Shanqi Liu, Yudi Ruan, Kexin Zhang, Jian Yang, and Yong Liu. Expert Demonstrations Guide Reward Decomposition for Multi-Agent Cooperation. Neural Computing and Applications, 35:19847-19863, 2023.
[BibTeX] [Abstract] [DOI] [PDF]

Humans are able to achieve good teamwork through collaboration, since the contributions of the actions from human team members are properly understood by each individual. Therefore, reasonable credit assignment is crucial for multi-agent cooperation. Although existing work uses value decomposition algorithms to mitigate the credit assignment problem, since they decompose the global value function at multi-agents’ local value function level, the overall evaluation of the value function can easily lead to approximation errors. Moreover, such strategies are vulnerable to sparse reward scenarios. In this paper, we propose to use expert demonstrations to guide the team reward decomposition at each time step, rather than value decomposition. The proposed method computes the reward ratio of each agent according to the similarity between the state-action pair of the agent and the expert demonstrations. In addition, under this setting, each agent can independently train its value function and evaluate its behavior, which makes the algorithm highly robust to team rewards. Moreover, the proposed method constrains the policy to collect data with similar distribution to the expert data during the exploration, which makes policy update more robust. We conduct extensive experiments to validate our proposed method in various MARL environments, the results show that our algorithm outperforms the state-of-the-art algorithms in most scenarios; our method is robust to various reward functions; and the trajectories by our policy is closer to that of the expert policy.

@article{liu2023edg,
title = {Expert Demonstrations Guide Reward Decomposition for Multi-Agent Cooperation},
author = {Weiwei Liu and Wei Jing and Shanqi Liu and Yudi Ruan and Kexin Zhang and Jian Yang and Yong Liu},
year = 2023,
journal = {Neural Computing and Applications},
volume = 35,
pages = {19847-19863},
doi = {10.1007/s00521-023-08785-6},
abstract = {Humans are able to achieve good teamwork through collaboration, since the contributions of the actions from human team members are properly understood by each individual. Therefore, reasonable credit assignment is crucial for multi-agent cooperation. Although existing work uses value decomposition algorithms to mitigate the credit assignment problem, since they decompose the global value function at multi-agents' local value function level, the overall evaluation of the value function can easily lead to approximation errors. Moreover, such strategies are vulnerable to sparse reward scenarios. In this paper, we propose to use expert demonstrations to guide the team reward decomposition at each time step, rather than value decomposition. The proposed method computes the reward ratio of each agent according to the similarity between the state-action pair of the agent and the expert demonstrations. In addition, under this setting, each agent can independently train its value function and evaluate its behavior, which makes the algorithm highly robust to team rewards. Moreover, the proposed method constrains the policy to collect data with similar distribution to the expert data during the exploration, which makes policy update more robust. We conduct extensive experiments to validate our proposed method in various MARL environments, the results show that our algorithm outperforms the state-of-the-art algorithms in most scenarios; our method is robust to various reward functions; and the trajectories by our policy is closer to that of the expert policy.}
}

Weiwei Liu, Linpeng Peng, Licheng Wen, Jian Yang, and Yong Liu. Decomposing Shared Networks for Separate Cooperation with Multi-agent Reinforcement Learning. Information Sciences, 641:119085, 2023.
[BibTeX] [Abstract] [DOI] [PDF]

Sharing network parameters between agents is an essential and typical operation to improve the scalability of multi-agent reinforcement learning algorithms. However, agents with different tasks sharing the same network parameters are not conducive to distinguishing the agents’ skills. In addition, the importance of communication between agents undertaking the same task is much higher than that with external agents. Therefore, we propose Dual Cooperation Networks (DCN). In order to distinguish whether agents undertake the same task, all agents are grouped according to their status through the graph neural network instead of the traditional proximity. The agent communicates within the group to achieve strong cooperation. After that, the global value function is decomposed by groups to facilitate cooperation between groups. Finally, we have verified it in simulation and physical hardware, and the algorithm has achieved excellent performance.

@article{liu2023dsn,
title = {Decomposing Shared Networks for Separate Cooperation with Multi-agent Reinforcement Learning},
author = {Weiwei Liu and Linpeng Peng and Licheng Wen and Jian Yang and Yong Liu},
year = 2023,
journal = {Information Sciences},
volume = 641,
pages = {119085},
doi = {10.1016/j.ins.2023.119085},
abstract = {Sharing network parameters between agents is an essential and typical operation to improve the scalability of multi-agent reinforcement learning algorithms. However, agents with different tasks sharing the same network parameters are not conducive to distinguishing the agents' skills. In addition, the importance of communication between agents undertaking the same task is much higher than that with external agents. Therefore, we propose Dual Cooperation Networks (DCN). In order to distinguish whether agents undertake the same task, all agents are grouped according to their status through the graph neural network instead of the traditional proximity. The agent communicates within the group to achieve strong cooperation. After that, the global value function is decomposed by groups to facilitate cooperation between groups. Finally, we have verified it in simulation and physical hardware, and the algorithm has achieved excellent performance.}
}

Gang Xu, Yansong Chen, Junjie Cao, Deye Zhu, Weiwei Liu, and Yong Liu. Multivehicle Motion Planning with Posture Constraints in Real World. IEEE-ASME Transactions on Mechatronics, 27(4):2125-2133, 2022.
[BibTeX] [Abstract] [DOI] [PDF]

This article addresses the posture constraints problem in multivehicle motion planning for specific applications such as ground exploration tasks. Unlike most of the related work in motion planning, this article investigates more practical applications in the real world for nonholonomic unmanned ground vehicles (UGVs). In this case, a strategy of diversion is designed to optimize the smoothness of motion. Considering the problem of the posture constraints, a postured collision avoidance algorithm is proposed for the motion planning of the multiple nonholonomic UGVs. Two simulation experiments were conducted to verify the effectiveness and analyze the quantitative performance of the proposed method. Then, the practicability of the proposed algorithm was verified with an experiment in a natural environment.

@article{xu2022mmp,
title = {Multivehicle Motion Planning with Posture Constraints in Real World},
author = {Gang Xu and Yansong Chen and Junjie Cao and Deye Zhu and Weiwei Liu and Yong Liu},
year = 2022,
journal = {IEEE-ASME Transactions on Mechatronics},
volume = {27},
number = {4},
pages = {2125-2133},
doi = {10.1109/TMECH.2022.3173130},
abstract = {This article addresses the posture constraints problem in multivehicle motion planning for specific applications such as ground exploration tasks. Unlike most of the related work in motion planning, this article investigates more practical applications in the real world for nonholonomic unmanned ground vehicles (UGVs). In this case, a strategy of diversion is designed to optimize the smoothness of motion. Considering the problem of the posture constraints, a postured collision avoidance algorithm is proposed for the motion planning of the multiple nonholonomic UGVs. Two simulation experiments were conducted to verify the effectiveness and analyze the quantitative performance of the proposed method. Then, the practicability of the proposed algorithm was verified with an experiment in a natural environment.}
}

Weiwei Liu, Shanqi Liu, Junjie Cao, Qi Wang, Xiaolei Lang, and Yong Liu. Learning Communication for Cooperation in Dynamic Agent-Number Environment. IEEE/ASME Transactions on Mechatronics, 26:1846-1857, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

The number of agents in many multi-agent systems in the real world changes all the time, such as storage robots and drone cluster systems. Still, most current multi-agent reinforcement learning algorithms are limited to fixed network dimensions, and prior knowledge is used to preset the number of agents in the training phase, which leads to a poor generalization of the algorithm. In addition, these algorithms use centralized training to solve the instability problem of multi-agent systems. However, the centralized learning of large-scale multi-agent reinforcement learning algorithms will lead to an explosion of network dimensions, which in turn leads to very limited scalability of centralized learning algorithms. To solve these two difficulties, we propose Group Centralized Training and Decentralized Execution-Unlimited Dynamic Agent-number Network (GCTDE-UDAN). Firstly, since we use the attention mechanism to select several leaders and establish a dynamic number of teams, and UDAN performs a non-linear combination of all agents’ Q values when performing value decomposition, it is not affected by changes in the number of agents. Moreover, our algorithm can unite any agent to form a group and conduct centralized training within the group, avoiding network dimension explosion caused by global centralized training of large-scale agents. Finally, we verified on the simulation and experimental platform that the algorithm can learn and perform cooperative behaviors in many dynamic multi-agent environments.

@article{liu2021lcf,
title = {Learning Communication for Cooperation in Dynamic Agent-Number Environment},
author = {Weiwei Liu and Shanqi Liu and Junjie Cao and Qi Wang and Xiaolei Lang and Yong Liu},
year = 2021,
journal = {IEEE/ASME Transactions on Mechatronics},
volume = 26,
pages = {1846-1857},
doi = {10.1109/TMECH.2021.3076080},
abstract = {The number of agents in many multi-agent systems in the real world changes all the time, such as storage robots and drone cluster systems. Still, most current multi-agent reinforcement learning algorithms are limited to fixed network dimensions, and prior knowledge is used to preset the number of agents in the training phase, which leads to a poor generalization of the algorithm. In addition, these algorithms use centralized training to solve the instability problem of multi-agent systems. However, the centralized learning of large-scale multi-agent reinforcement learning algorithms will lead to an explosion of network dimensions, which in turn leads to very limited scalability of centralized learning algorithms. To solve these two difficulties, we propose Group Centralized Training and Decentralized Execution-Unlimited Dynamic Agent-number Network (GCTDE-UDAN). Firstly, since we use the attention mechanism to select several leaders and establish a dynamic number of teams, and UDAN performs a non-linear combination of all agents' Q values when performing value decomposition, it is not affected by changes in the number of agents. Moreover, our algorithm can unite any agent to form a group and conduct centralized training within the group, avoiding network dimension explosion caused by global centralized training of large-scale agents. Finally, we verified on the simulation and experimental platform that the algorithm can learn and perform cooperative behaviors in many dynamic multi-agent environments.}
}

Weiwei Liu, Linpeng Peng, Junjie Cao, Xiaokuan Fu, Yong Liu, and Zaisheng Pan. Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping. IEEE Access, 9:19916–19925, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson’s sampling idea, the input action is evaluated from different angles, which increases the algorithm’s exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators’ success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.

@article{liu2021ensemblebd,
title = {Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping},
author = {Weiwei Liu and Linpeng Peng and Junjie Cao and Xiaokuan Fu and Yong Liu and Zaisheng Pan},
year = 2021,
journal = {IEEE Access},
volume = 9,
pages = {19916--19925},
doi = {10.1109/ACCESS.2021.3049860},
abstract = {With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.}
}

Weiwei Liu, Shanqi Liu, Jian Yang, and Yong Liu. Learning Intra-group Cooperation in Multi-agent Systems. In 2021 27th International Conference on Mechatronics and Machine Vision in Practice, pages 688-692, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

Reinforcement learning is one of the algorithms used in multi-agent systems to promote agent cooperation. However, most current multi-agent reinforcement learning algorithms improve the communication capabilities of agents for cooperation, but the overall communication is costly and even harmful due to bandwidth limitations. In addition, de-centralized execution cannot generate joint actions, which is not conducive to cooperation. Therefore, we proposed the Hierarchical Group Cooperation Network (HGCN). Advanced strategy, Group Network (GroNet), learns to group all agents based on their state rather than their location. The Low-level strategy, Group Cooperation Network (GCoNet), is a method of centralized training and centralized execution within a group, which effectively promotes agent collaboration. Finally, we validated our method in various experiments.

@inproceedings{liu2021lig,
title = {Learning Intra-group Cooperation in Multi-agent Systems},
author = {Weiwei Liu and Shanqi Liu and Jian Yang and Yong Liu},
year = 2021,
booktitle = {2021 27th International Conference on Mechatronics and Machine Vision in Practice},
pages = {688-692},
doi = {https://doi.org/10.1109/M2VIP49856.2021.9665049},
abstract = {Reinforcement learning is one of the algorithms used in multi-agent systems to promote agent cooperation. However, most current multi-agent reinforcement learning algorithms improve the communication capabilities of agents for cooperation, but the overall communication is costly and even harmful due to bandwidth limitations. In addition, de-centralized execution cannot generate joint actions, which is not conducive to cooperation. Therefore, we proposed the Hierarchical Group Cooperation Network (HGCN). Advanced strategy, Group Network (GroNet), learns to group all agents based on their state rather than their location. The Low-level strategy, Group Cooperation Network (GCoNet), is a method of centralized training and centralized execution within a group, which effectively promotes agent collaboration. Finally, we validated our method in various experiments.}
}

Junjie Cao, Weiwei Liu, Yong Liu, and Jian Yang. Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient. Frontiers in Neurorobotics, 14, 2020.
[BibTeX] [Abstract] [DOI] [PDF]

There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.

@article{cao2020generalizerl,
title = {Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient},
author = {Junjie Cao and Weiwei Liu and Yong Liu and Jian Yang},
year = 2020,
journal = {Frontiers in Neurorobotics},
volume = 14,
doi = { https://doi.org/10.3389/fnbot.2020.00021},
abstract = {There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.}
}

Address

Weiwei Liu

Biography

Research and Interests

Publications

Latest Events

APRIL实验室斩获ATEC 2025科技精英赛冠军，具身智能技术实现真实场景重大突破

喜报！APRIL实验室硕士生侯典泳荣获IROS 2025“移动操作领域最佳论文提名奖”

喜报！APRIL实验室在IROS 2025四足机器人挑战赛上荣获最佳自主导航奖