Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: wenzhouchen@zju.edu.cn

Wenzhou Chen

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include Deep Reinforcement Learning and Robot Navigation.

Research and Interests

  • Deep Reinforcement Learning
  • Robot Navigation

Publications

  • Shanqi Liu, Weiwei Liu, Wenzhou Chen, Guanzhong Tian, Jun Chen, Yao Tong, Junjie Cao, and Yong Liu. Learning Multi-Agent Cooperation via Considering Actions of Teammates. IEEE Transactions on Neural Networks and Learning Systems, 35:11553-11564, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ‘ s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ‘ actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.
    @article{liu2024lma,
    title = {Learning Multi-Agent Cooperation via Considering Actions of Teammates},
    author = {Shanqi Liu and Weiwei Liu and Wenzhou Chen and Guanzhong Tian and Jun Chen and Yao Tong and Junjie Cao and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    volume = 35,
    pages = {11553-11564},
    doi = {10.1109/TNNLS.2023.3262921},
    abstract = {Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ' s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ' actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.}
    }
  • Yeneng Lin, Mengmeng Wang, Wenzhou Chen, Wang Gao, Lei Li, and Yong Liu. Multiple Object Tracking of Drone Videos by a Temporal-Association Network with Separated-Tasks Structure. Remote Sensing, 14(16):3862, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    The task of multi-object tracking via deep learning methods for UAV videos has become an important research direction. However, with some current multiple object tracking methods, the relationship between object detection and tracking is not well handled, and decisions on how to make good use of temporal information can affect tracking performance as well. To improve the performance of multi-object tracking, this paper proposes an improved multiple object tracking model based on FairMOT. The proposed model contains a structure to separate the detection and ReID heads to decrease the influence between every function head. Additionally, we develop a temporal embedding structure to strengthen the representational ability of the model. By combing the temporal-association structure and separating different function heads, the model’s performance in object detection and tracking tasks is improved, which has been verified on the VisDrone2019 dataset. Compared with the original method, the proposed model improves MOTA by 4.9% and MOTP by 1.2% and has better tracking performance than the models such as SORT and HDHNet on the UAV video dataset.
    @article{lin2022mot,
    title = {Multiple Object Tracking of Drone Videos by a Temporal-Association Network with Separated-Tasks Structure},
    author = {Yeneng Lin and Mengmeng Wang and Wenzhou Chen and Wang Gao and Lei Li and Yong Liu},
    year = 2022,
    journal = {Remote Sensing},
    volume = {14},
    number = {16},
    pages = {3862},
    doi = {10.3390/rs14163862},
    abstract = {The task of multi-object tracking via deep learning methods for UAV videos has become an important research direction. However, with some current multiple object tracking methods,
    the relationship between object detection and tracking is not well handled, and decisions on how to make good use of temporal information can affect tracking performance as well. To improve the performance of multi-object tracking, this paper proposes an improved multiple object tracking model based on FairMOT. The proposed model contains a structure to separate the detection and ReID heads to decrease the influence between every function head. Additionally, we develop a temporal embedding structure to strengthen the representational ability of the model. By combing the temporal-association structure and separating different function heads, the model’s performance in object detection and tracking tasks is improved, which has been verified on the VisDrone2019 dataset. Compared with the original method, the proposed model improves MOTA by 4.9% and MOTP by 1.2% and has better tracking performance than the models such as SORT and HDHNet on the UAV video dataset.}
    }
  • Shanqi Liu, Junjie Cao, Yujie Wang, Wenzhou Chen, and Yong Liu. Self-play reinforcement learning with comprehensive critic in computer games. Neurocomputing, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Self-play reinforcement learning, where agents learn by playing with themselves, has been successfully applied in many game scenarios. However, the training procedure for self-play reinforcement learning is unstable and more sample-inefficient than (general) reinforcement learning, especially in imperfect information games. To improve the self-play training process, we incorporate a comprehensive critic into the policy gradient method to form a self-play actor-critic (SPAC) method for training agents to play com-puter games. We evaluate our method in four different environments in both competitive and coopera-tive tasks. The results show that the agent trained with our SPAC method outperforms those trained with deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO) algorithms in many different evaluation approaches, which vindicate the effect of our comprehensive critic in the self-play training procedure. CO 2021 Elsevier B.V. All rights reserved.
    @article{liu2021spr,
    title = {Self-play reinforcement learning with comprehensive critic in computer games},
    author = {Shanqi Liu and Junjie Cao and Yujie Wang and Wenzhou Chen and Yong Liu},
    year = 2021,
    journal = {Neurocomputing},
    doi = {10.1016/j.neucom.2021.04.006},
    abstract = {Self-play reinforcement learning, where agents learn by playing with themselves, has been successfully applied in many game scenarios. However, the training procedure for self-play reinforcement learning is unstable and more sample-inefficient than (general) reinforcement learning, especially in imperfect information games. To improve the self-play training process, we incorporate a comprehensive critic into the policy gradient method to form a self-play actor-critic (SPAC) method for training agents to play com-puter games. We evaluate our method in four different environments in both competitive and coopera-tive tasks. The results show that the agent trained with our SPAC method outperforms those trained with deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO) algorithms in many different evaluation approaches, which vindicate the effect of our comprehensive critic in the self-play training procedure. CO 2021 Elsevier B.V. All rights reserved.}
    }
  • Sen Lin, Jianxin Huang, Wenzhou Chen, Wenlong Zhou, Jinhong Xu, Yong Liu, and Jinqiang Yao. Intelligent warehouse monitoring based on distributed system and edge computing. International Journal of Intelligent Robotics and Applications, 5:130–142, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper mainly focuses on the volume calculation of materials in the warehouse where sand and gravel materials are stored and monitored whether materials are lacking in real-time. Specifically, we proposed the sandpile model and the point cloud projection obtained from the LiDAR sensors to calculate the material volume. We use distributed edge computing modules to build a centralized system and transmit data remotely through a high-power wireless network, which solves sensor placement and data transmission in a complex warehouse environment. Our centralized system can also reduce worker participation in a harsh factorial environment. Furthermore, the point cloud data of the warehouse is colored to visualize the actual factorial environment. Our centralized system has been deployed in the real factorial environment and got a good performance.
    @article{huang2021iwm,
    title = {Intelligent warehouse monitoring based on distributed system and edge computing},
    author = {Sen Lin and Jianxin Huang and Wenzhou Chen and Wenlong Zhou and Jinhong Xu and Yong Liu and Jinqiang Yao},
    year = 2021,
    journal = {International Journal of Intelligent Robotics and Applications},
    volume = 5,
    pages = {130--142},
    doi = {10.1007/s41315-021-00173-4},
    issue = 2,
    abstract = {This paper mainly focuses on the volume calculation of materials in the warehouse where sand and gravel materials are stored and monitored whether materials are lacking in real-time. Specifically, we proposed the sandpile model and the point cloud projection obtained from the LiDAR sensors to calculate the material volume. We use distributed edge computing modules to build a centralized system and transmit data remotely through a high-power wireless network, which solves sensor placement and data transmission in a complex warehouse environment. Our centralized system can also reduce worker participation in a harsh factorial environment. Furthermore, the point cloud data of the warehouse is colored to visualize the actual factorial environment. Our centralized system has been deployed in the real factorial environment and got a good performance.}
    }
  • Wenzhou Chen, Jinhong Xu, Xiangrui Zhao, Yong Liu, and Jian Yang. Separated Sonar Localization System for Indoor Robot Navigation. IEEE Transactions on Industrial Electronics, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    This work addresses the task of mobile robot local-ization for indoor navigation. In this paper, we propose a novel indoor localization system based on separated sonar sensors which can be deployed in large-scale indoor environments conveniently. In our approach, the separated sonar receivers deploy on the top ceiling, and the mobile robot equipped with the separated sonar transmitters navigates in the indoor environment. The distance measurements between the receivers and the transmitters can be obtained in real-time from the control board of receivers with the infrared synchronization. The positions of the mobile robot can be computed without accumulative error. And the proposed localization method can achieve high precision in the indoor localization tasks at a very low cost. We also present a calibration method based on the simultaneous localization and mapping(SLAM) to initialize the positions of our system. To evaluate the feasibility and the dynamic accuracy of the proposed system, we construct our localization system in the Virtual Robot Experimentation Platform(V-REP) simulation platform and deploy this system in a real-world environment. Both the simulation and real-world experiments have demonstrated that our system can achieve centimeter-level accuracy. The localization accuracy of the proposed system is sufficient for robot indoor navigation.
    @article{chen2021separatedsl,
    title = {Separated Sonar Localization System for Indoor Robot Navigation},
    author = {Wenzhou Chen and Jinhong Xu and Xiangrui Zhao and Yong Liu and Jian Yang},
    year = 2021,
    journal = {IEEE Transactions on Industrial Electronics},
    doi = {10.1109/TIE.2020.2994856},
    abstract = {This work addresses the task of mobile robot local-ization for indoor navigation. In this paper, we propose a novel indoor localization system based on separated sonar sensors which can be deployed in large-scale indoor environments conveniently. In our approach, the separated sonar receivers deploy on the top ceiling, and the mobile robot equipped with the separated sonar transmitters navigates in the indoor environment. The distance measurements between the receivers and the transmitters can be obtained in real-time from the control board of receivers with the infrared synchronization. The positions of the mobile robot can be computed without accumulative error. And the proposed localization method can achieve high precision in the indoor localization tasks at a very low cost. We also present a calibration method based on the simultaneous localization and mapping(SLAM) to initialize the positions of our system. To evaluate the feasibility and the dynamic accuracy of the proposed system, we construct our localization system in the Virtual Robot Experimentation Platform(V-REP) simulation platform and deploy this system in a real-world environment. Both the simulation and real-world experiments have demonstrated that our system can achieve centimeter-level accuracy. The localization accuracy of the proposed system is sufficient for robot indoor navigation.}
    }
  • Jiangning Zhang, Chao Xu, Jian Li, Wenzhou Chen, Yabiao Wang, Ying Tai, Shuo Chen, Chengjie Wang, Feiyue Huang, and Yong Liu. Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model. In Advances in Neural Information Processing Systems 34 – 35th Conference on Neural Information Processing Systems, pages 26674-26688, 2021.
    [BibTeX] [Abstract] [PDF]
    Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the dynamic local population in EA, we improve the existing transformer structure and propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Moreover, we introduce the spatial-filling curve into the current vision transformer to sequence image data into a uniform sequential format. Thus we can design a unified EAT framework to address multi-modal tasks, separating the network architecture from the data format adaptation. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-modal tasks to demonstrate the superiority of the unified EAT, e.g., Text-Based Image Retrieval, and our approach improves the rank-1 by +3.7 points over the baseline on the CSS dataset.
    @inproceedings{zhang2021analogous,
    title = {Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model},
    author = {Jiangning Zhang and Chao Xu and Jian Li and Wenzhou Chen and Yabiao Wang and Ying Tai and Shuo Chen and Chengjie Wang and Feiyue Huang and Yong Liu},
    year = 2021,
    booktitle = {Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems},
    pages = {26674-26688},
    abstract = {Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the dynamic local population in EA, we improve the existing transformer structure and propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Moreover, we introduce the spatial-filling curve into the current vision transformer to sequence image data into a uniform sequential format. Thus we can design a unified EAT framework to address multi-modal tasks, separating the network architecture from the data format adaptation. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-modal tasks to demonstrate the superiority of the unified EAT, e.g., Text-Based Image Retrieval, and our approach improves the rank-1 by +3.7 points over the baseline on the CSS dataset.}
    }
  • Shanqi Liu, Junjie Cao, Wenzhou Chen, licheng Wen, and Yong Liu. HILONet: Hierarchical Imitation Learning from Non-Aligned Observations. In 2021 IEEE 10th data Driven Control And Learning Systems Conference, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    It is challenging learning from demonstrated observation-only trajectories in a non-time-aligned environment because most imitation learning methods aim to imitate experts by following the demonstration step-by-step. However, aligned demonstrations are seldom obtainable in real-world scenarios. In this work, we propose a new imitation learning approach called Hierarchical Imitation Learning from Observation(HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically. Our method can solve all kinds of tasks by achieving these sub-goals, whether it has a single goal position or not. We also present three different ways to increase sample efficiency in the hierarchical structure. We conduct extensive experiments using several environments. The results show the improvement in both performance and learning efficiency.
    @inproceedings{liu2021hilonethi,
    title = {HILONet: Hierarchical Imitation Learning from Non-Aligned Observations},
    author = {Shanqi Liu and Junjie Cao and Wenzhou Chen and licheng Wen and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE 10th data Driven Control And Learning Systems Conference},
    doi = {https://doi.org/10.48550/arXiv.2011.02671},
    abstract = {It is challenging learning from demonstrated observation-only trajectories in a non-time-aligned environment because most imitation learning methods aim to imitate experts by following the demonstration step-by-step. However, aligned demonstrations are seldom obtainable in real-world scenarios. In this work, we propose a new imitation learning approach called Hierarchical Imitation Learning from Observation(HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically. Our method can solve all kinds of tasks by achieving these sub-goals, whether it has a single goal position or not. We also present three different ways to increase sample efficiency in the hierarchical structure. We conduct extensive experiments using several environments. The results show the improvement in both performance and learning efficiency.}
    }
  • Wenzhou Chen, Shizheng Zhou, Zaisheng Pan, Huixian Zheng, and Yong Liu. Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning. Applied Sciences, 9:4198, 2019.
    [BibTeX] [Abstract] [DOI] [PDF]
    Compared with the single robot system, a multi-robot system has higher efficiency and fault tolerance. The multi-robot system has great potential in some application scenarios, such as the robot search, rescue and escort tasks, and so on. Deep reinforcement learning provides a potential framework for multi-robot formation and collaborative navigation. This paper mainly studies the collaborative formation and navigation of multi-robots by using the deep reinforcement learning algorithm. The proposed method improves the classical Deep Deterministic Policy Gradient (DDPG) to address the single robot mapless navigation task. We also extend the single-robot Deep Deterministic Policy Gradient algorithm to the multi-robot system, and obtain the Parallel Deep Deterministic Policy Gradient (PDDPG). By utilizing the 2D lidar sensor, the group of robots can accomplish the formation construction task and the collaborative formation navigation task. The experiment results in a Gazebo simulation platform illustrates that our method is capable of guiding mobile robots to construct the formation and keep the formation during group navigation, directly through raw lidar data inputs.
    @article{chen2019maplesscn,
    title = {Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning},
    author = {Wenzhou Chen and Shizheng Zhou and Zaisheng Pan and Huixian Zheng and Yong Liu},
    year = 2019,
    journal = {Applied Sciences},
    volume = 9,
    pages = 4198,
    doi = {10.3390/app9204198},
    abstract = {Compared with the single robot system, a multi-robot system has higher efficiency and fault tolerance. The multi-robot system has great potential in some application scenarios, such as the robot search, rescue and escort tasks, and so on. Deep reinforcement learning provides a potential framework for multi-robot formation and collaborative navigation. This paper mainly studies the collaborative formation and navigation of multi-robots by using the deep reinforcement learning algorithm. The proposed method improves the classical Deep Deterministic Policy Gradient (DDPG) to address the single robot mapless navigation task. We also extend the single-robot Deep Deterministic Policy Gradient algorithm to the multi-robot system, and obtain the Parallel Deep Deterministic Policy Gradient (PDDPG). By utilizing the 2D lidar sensor, the group of robots can accomplish the formation construction task and the collaborative formation navigation task. The experiment results in a Gazebo simulation platform illustrates that our method is capable of guiding mobile robots to construct the formation and keep the formation during group navigation, directly through raw lidar data inputs.}
    }
  • Wenzhou Chen and Yong Liu. Active Planning of Robot Navigation for 3D Scene Exploration. In 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), page 516–520, 2018.
    [BibTeX] [Abstract] [DOI] [PDF]
    This work addresses the active planning of robot navigation tasks for 3D scene exploration. 3D scene exploration is an old and difficult task in robotics. In this paper, we present a strategy to guide a mobile autonomous robot equipped with a camera in order to autonomously explore the unknown 3D scene. By merging the particle filter into 3D scene exploration, we address the robot navigation problem in a heuristic way, and generate a sequence of camera poses to coverage the unknown 3D scene. First, we randomly generate a bunch of potential camera pose vectors. Then, we select the vectors through our criteria. After determining the first camera pose vector, we generate the next group of vectors based on the former one. We select the new camera pose vector and thereafter. We verify the algorithm theoretically and show the good performance in the simulation environment.
    @inproceedings{chen2018activepo,
    title = {Active Planning of Robot Navigation for 3D Scene Exploration},
    author = {Wenzhou Chen and Yong Liu},
    year = 2018,
    booktitle = {2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)},
    pages = {516--520},
    doi = {https://doi.org/10.1109/AIM.2018.8452299},
    abstract = {This work addresses the active planning of robot navigation tasks for 3D scene exploration. 3D scene exploration is an old and difficult task in robotics. In this paper, we present a strategy to guide a mobile autonomous robot equipped with a camera in order to autonomously explore the unknown 3D scene. By merging the particle filter into 3D scene exploration, we address the robot navigation problem in a heuristic way, and generate a sequence of camera poses to coverage the unknown 3D scene. First, we randomly generate a bunch of potential camera pose vectors. Then, we select the vectors through our criteria. After determining the first camera pose vector, we generate the next group of vectors based on the former one. We select the new camera pose vector and thereafter. We verify the algorithm theoretically and show the good performance in the simulation environment.}
    }