Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: penglinpeng@zju.edu.cn

Linpeng Peng

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My research lies at computer vision and robotics. My goal is to develop intelligent algorithms and systems for robots to finish complex tasks and help people.

Research and Interests

  • Computer Vision
  • Robotic Manipulation
  • Machine Learning

Publications

  • Weiwei Liu, Linpeng Peng, Licheng Wen, Jian Yang, and Yong Liu. Decomposing Shared Networks for Separate Cooperation with Multi-agent Reinforcement Learning. Information Sciences, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Sharing network parameters between agents is an essential and typical operation to improve the scalability of multi-agent reinforcement learning algorithms. However, agents with different tasks sharing the same network parameters are not conducive to distinguishing the agents’ skills. In addition, the importance of communication between agents undertaking the same task is much higher than that with external agents. Therefore, we propose Dual Cooperation Networks (DCN). In order to distinguish whether agents undertake the same task, all agents are grouped according to their status through the graph neural network instead of the traditional proximity. The agent communicates within the group to achieve strong cooperation. After that, the global value function is decomposed by groups to facilitate cooperation between groups. Finally, we have verified it in simulation and physical hardware, and the algorithm has achieved excellent performance.
    @article{liu2023dsn,
    title = {Decomposing Shared Networks for Separate Cooperation with Multi-agent Reinforcement Learning},
    author = {Weiwei Liu and Linpeng Peng and Licheng Wen and Jian Yang and Yong Liu},
    year = 2023,
    journal = {Information Sciences},
    doi = {10.1016/j.ins.2023.119085},
    abstract = {Sharing network parameters between agents is an essential and typical operation to improve the scalability of multi-agent reinforcement learning algorithms. However, agents with different tasks sharing the same network parameters are not conducive to distinguishing the agents' skills. In addition, the importance of communication between agents undertaking the same task is much higher than that with external agents. Therefore, we propose Dual Cooperation Networks (DCN). In order to distinguish whether agents undertake the same task, all agents are grouped according to their status through the graph neural network instead of the traditional proximity. The agent communicates within the group to achieve strong cooperation. After that, the global value function is decomposed by groups to facilitate cooperation between groups. Finally, we have verified it in simulation and physical hardware, and the algorithm has achieved excellent performance.}
    }
  • Weiwei Liu, Linpeng Peng, Junjie Cao, Xiaokuan Fu, Yong Liu, and Zaisheng Pan. Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping. IEEE Access, 9:19916–19925, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson’s sampling idea, the input action is evaluated from different angles, which increases the algorithm’s exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators’ success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.
    @article{liu2021ensemblebd,
    title = {Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping},
    author = {Weiwei Liu and Linpeng Peng and Junjie Cao and Xiaokuan Fu and Yong Liu and Zaisheng Pan},
    year = 2021,
    journal = {IEEE Access},
    volume = 9,
    pages = {19916--19925},
    doi = {10.1109/ACCESS.2021.3049860},
    abstract = {With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.}
    }