Chengrui Zhu
PhD Student
Institute of Cyber-Systems and Control, Zhejiang University, China
Biography
I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interest is intelligent quadruped locomotion and reinforcement learning.
Research and Interests
- Intelligent quadruped locomotion
- Reinforcement learning
Publications
- Siqi Li, Jun Chen, Jingyang Xiang, Chengrui Zhu, Jiandang Yang, Xiaobin Wei, Yunliang Jiang, and Yong Liu. Automatic Data-Free Pruning via Channel Similarity Reconstruction. Neurocomputing, 661:131885, 2026.
[BibTeX] [Abstract] [DOI] [PDF]Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-free methods have been proposed, however, these methods often require handcrafted parameter tuning and can only achieve inflexible reconstruction. In this paper, we propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning. Our approach is based on the assumption that the loss of information can be partially compensated by retaining focused information from similar channels. Specifically, we formulate data-free pruning as an optimization problem, which can be effectively addressed through reinforcement learning. AutoDFP assesses the similarity of channels for each layer and provides this information to the reinforcement learning agent, guiding the pruning and reconstruction process of the network. We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.
@article{li2026adf, title = {Automatic Data-Free Pruning via Channel Similarity Reconstruction}, author = {Siqi Li and Jun Chen and Jingyang Xiang and Chengrui Zhu and Jiandang Yang and Xiaobin Wei and Yunliang Jiang and Yong Liu}, year = 2026, journal = {Neurocomputing}, volume = 661, pages = {131885}, doi = {10.1016/j.neucom.2025.131885}, abstract = {Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-free methods have been proposed, however, these methods often require handcrafted parameter tuning and can only achieve inflexible reconstruction. In this paper, we propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning. Our approach is based on the assumption that the loss of information can be partially compensated by retaining focused information from similar channels. Specifically, we formulate data-free pruning as an optimization problem, which can be effectively addressed through reinforcement learning. AutoDFP assesses the similarity of channels for each layer and provides this information to the reinforcement learning agent, guiding the pruning and reconstruction process of the network. We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.} } - Chengrui Zhu, Zhen Zhang, Weiwei Liu, Siqi Li, and Yong Liu. Learning Accurate and Robust Velocity Tracking for Quadrupedal Robots. Journal of Field Robotics, 2025.
[BibTeX] [DOI]@article{zhu2025lar, title = {Learning Accurate and Robust Velocity Tracking for Quadrupedal Robots}, author = {Chengrui Zhu and Zhen Zhang and Weiwei Liu and Siqi Li and Yong Liu}, year = 2025, journal = {Journal of Field Robotics}, doi = {10.1002/rob.70028} } - Siqi Li, Jun Chen, Shanqi Liu, Chengrui Zhu, Guanzhong Tian, and Yong Liu. MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems, 36:3410-3422, 2025.
[BibTeX] [Abstract] [DOI] [PDF]Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.
@article{li2025mcmc, title = {MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning}, author = {Siqi Li and Jun Chen and Shanqi Liu and Chengrui Zhu and Guanzhong Tian and Yong Liu}, year = 2025, journal = {IEEE Transactions on Neural Networks and Learning Systems}, volume = 36, pages = {3410-3422}, doi = {10.1109/TNNLS.2024.3353763}, abstract = {Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.} } - Dianyong Hou, Chengrui Zhu, Zhen Zhang, Zhibin Li, Chuang Guo, and Yong Liu. Efficient Learning of A Unified Policy For Whole-body Manipulation and Locomotion Skills. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025.
[BibTeX] [Abstract] [DOI]Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities, enabling diverse practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control policies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides feedback on the mapping of the body postures to the manipulator’s workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully deployed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior performance of this approach. We have established a project website to showcase our experiments.
@inproceedings{hou2025elo, title = {Efficient Learning of A Unified Policy For Whole-body Manipulation and Locomotion Skills}, author = {Dianyong Hou and Chengrui Zhu and Zhen Zhang and Zhibin Li and Chuang Guo and Yong Liu}, year = 2025, booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, doi = {10.1109/IROS60139.2025.11246644}, abstract = {Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities, enabling diverse practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control policies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides feedback on the mapping of the body postures to the manipulator’s workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully deployed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior performance of this approach. We have established a project website to showcase our experiments.} } - Chengrui Zhu, Zhen Zhang, Siqi Li, Qingpeng Li, and Yong Liu. Learning Symmetric Legged Locomotion via State Distribution Symmetrization. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025.
[BibTeX] [Abstract] [DOI]Morphological symmetry is a fundamental characteristic of legged animals and robots. Most existing Deep Reinforcement Learning approaches for legged locomotion neglect to exploit this inherent symmetry, often producing unnatural and suboptimal behaviors such as dominant legs or non-periodic gaits. To address this limitation, we propose a novel learning-based framework to systematically optimize symmetry by state distribution symmetrization. First, we introduce the degree of asymmetry (DoA), a quantitative metric that measures the discrepancy between original and mirrored state distributions. Second, we develop an efficient computation method for DoA using gradient ascent with a trained discriminator network. This metric is then incorporated into a reinforcement learning framework by introducing it to the reward function, explicitly encouraging symmetry during policy training. We validate our framework with extensive experiments on quadrupedal and humanoid robots in simulated and real-world environments. Results demonstrate the efficacy of our approach for improving policy symmetry and overall locomotion performance.
@inproceedings{zhu2025lsl, title = {Learning Symmetric Legged Locomotion via State Distribution Symmetrization}, author = {Chengrui Zhu and Zhen Zhang and Siqi Li and Qingpeng Li and Yong Liu}, year = 2025, booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, doi = {10.1109/IROS60139.2025.11246183}, abstract = {Morphological symmetry is a fundamental characteristic of legged animals and robots. Most existing Deep Reinforcement Learning approaches for legged locomotion neglect to exploit this inherent symmetry, often producing unnatural and suboptimal behaviors such as dominant legs or non-periodic gaits. To address this limitation, we propose a novel learning-based framework to systematically optimize symmetry by state distribution symmetrization. First, we introduce the degree of asymmetry (DoA), a quantitative metric that measures the discrepancy between original and mirrored state distributions. Second, we develop an efficient computation method for DoA using gradient ascent with a trained discriminator network. This metric is then incorporated into a reinforcement learning framework by introducing it to the reward function, explicitly encouraging symmetry during policy training. We validate our framework with extensive experiments on quadrupedal and humanoid robots in simulated and real-world environments. Results demonstrate the efficacy of our approach for improving policy symmetry and overall locomotion performance.} } - Junhao Chen, Zhen Zhang, Chengrui Zhu, Xiaojun Hou, Tianyang Hu, Huifeng Wu, and Yong Liu. LITE: A Learning-Integrated Topological Explorer for Multi-Floor Indoor Environments. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025.
[BibTeX] [Abstract] [DOI]This work focuses on multi-floor indoor exploration, which remains an open area of research. Compared to traditional methods, recent learning-based explorers have demonstrated significant potential due to their robust environmental learning and modeling capabilities, but most are restricted to 2D environments. In this paper, we proposed a learning-integrated topological explorer, LITE, for multi-floor indoor environments. LITE decomposes the environment into a floor-stair topology, enabling seamless integration of learning or non-learning-based 2D exploration methods for 3D exploration. As we incrementally build floor-stair topology in exploration using YOLO11-based instance segmentation model, the agent can transition between floors through a finite state machine. Additionally, we implement an attention-based 2D exploration policy that utilizes an attention mechanism to capture spatial dependencies between different regions, thereby determining the next global goal for more efficient exploration. Extensive comparison and ablation studies conducted on the HM3D and MP3D datasets demonstrate that our proposed 2D exploration policy significantly outperforms all baseline explorers in terms of exploration efficiency. Furthermore, experiments in several 3D multi-floor environments indicate that our framework is compatible with various 2D exploration methods, facilitating effective multi-floor indoor exploration. Finally, we validate our method in the real world with a quadruped robot, highlighting its strong generalization capabilities.
@inproceedings{chen2025lite, title = {LITE: A Learning-Integrated Topological Explorer for Multi-Floor Indoor Environments}, author = {Junhao Chen and Zhen Zhang and Chengrui Zhu and Xiaojun Hou and Tianyang Hu and Huifeng Wu and Yong Liu}, year = 2025, booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, doi = {10.1109/IROS60139.2025.11246317}, abstract = {This work focuses on multi-floor indoor exploration, which remains an open area of research. Compared to traditional methods, recent learning-based explorers have demonstrated significant potential due to their robust environmental learning and modeling capabilities, but most are restricted to 2D environments. In this paper, we proposed a learning-integrated topological explorer, LITE, for multi-floor indoor environments. LITE decomposes the environment into a floor-stair topology, enabling seamless integration of learning or non-learning-based 2D exploration methods for 3D exploration. As we incrementally build floor-stair topology in exploration using YOLO11-based instance segmentation model, the agent can transition between floors through a finite state machine. Additionally, we implement an attention-based 2D exploration policy that utilizes an attention mechanism to capture spatial dependencies between different regions, thereby determining the next global goal for more efficient exploration. Extensive comparison and ablation studies conducted on the HM3D and MP3D datasets demonstrate that our proposed 2D exploration policy significantly outperforms all baseline explorers in terms of exploration efficiency. Furthermore, experiments in several 3D multi-floor environments indicate that our framework is compatible with various 2D exploration methods, facilitating effective multi-floor indoor exploration. Finally, we validate our method in the real world with a quadruped robot, highlighting its strong generalization capabilities.} } - Baorun Li, Chengrui Zhu, Siyi Du, Bingran Chen, Jie Ren, Wenfei Wang, Yong Liu, and Jiajun Lv. L2Calib: SE (3)-Manifold Reinforcement Learning for Robust Extrinsic Calibration with Degenerate Motion Resilience. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025.
[BibTeX] [Abstract] [DOI]Extrinsic calibration is essential for multi-sensor fusion, existing methods rely on structured targets or fully-excited data, limiting real-world applicability. Online calibration further suffers from weak excitation, leading to unreliable estimates. To address these limitations, we propose a reinforcement learning (RL)-based extrinsic calibration framework that formulates extrinsic calibration as a decisionmaking problem, directly optimizes SE(3) extrinsics to enhance odometry accuracy. Our approach leverages a probabilistic Bingham distribution to model 3D rotations, ensuring stable optimization while inherently retaining quaternion symmetry. A trajectory alignment reward mechanism enables robust calibration without structured targets by quantitatively evaluating estimated tightly-coupled trajectory against a reference trajectory. Additionally, an automated data selection module filters uninformative samples, significantly improving efficiency and scalability for large-scale datasets. Extensive experiments on UAVs, UGVs, and handheld platforms demonstrate that our method outperforms traditional optimization-based approaches, achieving high-precision calibration even under weak excitation conditions. Our framework simplifies deployment on diverse robotic platforms by eliminating the need for high-quality initial extrinsics and enabling calibration from routine operating data. The code is available at https://github.com/APRIL-ZJU/learn-to-calibrate.
@inproceedings{li2025l2ca, title = {L2Calib: SE (3)-Manifold Reinforcement Learning for Robust Extrinsic Calibration with Degenerate Motion Resilience}, author = {Baorun Li and Chengrui Zhu and Siyi Du and Bingran Chen and Jie Ren and Wenfei Wang and Yong Liu and Jiajun Lv}, year = 2025, booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, doi = {10.1109/IROS60139.2025.11246454}, abstract = {Extrinsic calibration is essential for multi-sensor fusion, existing methods rely on structured targets or fully-excited data, limiting real-world applicability. Online calibration further suffers from weak excitation, leading to unreliable estimates. To address these limitations, we propose a reinforcement learning (RL)-based extrinsic calibration framework that formulates extrinsic calibration as a decisionmaking problem, directly optimizes SE(3) extrinsics to enhance odometry accuracy. Our approach leverages a probabilistic Bingham distribution to model 3D rotations, ensuring stable optimization while inherently retaining quaternion symmetry. A trajectory alignment reward mechanism enables robust calibration without structured targets by quantitatively evaluating estimated tightly-coupled trajectory against a reference trajectory. Additionally, an automated data selection module filters uninformative samples, significantly improving efficiency and scalability for large-scale datasets. Extensive experiments on UAVs, UGVs, and handheld platforms demonstrate that our method outperforms traditional optimization-based approaches, achieving high-precision calibration even under weak excitation conditions. Our framework simplifies deployment on diverse robotic platforms by eliminating the need for high-quality initial extrinsics and enabling calibration from routine operating data. The code is available at https://github.com/APRIL-ZJU/learn-to-calibrate.} } - Tianyang Hu, Zhen Zhang, Chengrui Zhu, Gang Xu, Yuchen Wu, Huifeng Wu, and Yong Liu. MARF: Cooperative Multi-Agent Path Finding with Reinforcement Learning and Frenet Lattice in Dynamic Environments. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 12607-12613, 2025.
[BibTeX] [Abstract] [DOI] [PDF]Multi-agent path finding (MAPF) in dynamic and complex environments is a highly challenging task. Recent research has focused on the scalability of agent numbers or the complexity of the environment. Usually, they disregard the agents’ physical constraints or use a differential-driven model. However, this approach fails to adequately capture the kinematic and dynamic constraints of real-world vehicles, particularly those equipped with Ackermann steering. This paper presents a novel algorithm named MARF that combines multi-agent reinforcement learning (MARL) with a Frenet lattice planner. The MARL foundation endows the algorithm with enhanced generalization capabilities while preserving computational efficiency. By incorporating Frenet lattice trajectories into the action space of the MARL framework, agents are capable of generating smooth and feasible trajectories that respect the kinematic and dynamic constraints. In addition, we adopt a centralized training and decentralized execution (CTDE) framework, where a network of shared value functions enables efficient cooperation among agents during decision-making. Simulation results and real-world experiments in different scenarios demonstrate that our method achieves superior performance in terms of success rate, average speed, extra distance of trajectory, and computing time.
@inproceedings{hu2025marf, title = {MARF: Cooperative Multi-Agent Path Finding with Reinforcement Learning and Frenet Lattice in Dynamic Environments}, author = {Tianyang Hu and Zhen Zhang and Chengrui Zhu and Gang Xu and Yuchen Wu and Huifeng Wu and Yong Liu}, year = 2025, booktitle = {2025 IEEE International Conference on Robotics and Automation (ICRA)}, pages = {12607-12613}, doi = {10.1109/ICRA55743.2025.11128009}, abstract = {Multi-agent path finding (MAPF) in dynamic and complex environments is a highly challenging task. Recent research has focused on the scalability of agent numbers or the complexity of the environment. Usually, they disregard the agents' physical constraints or use a differential-driven model. However, this approach fails to adequately capture the kinematic and dynamic constraints of real-world vehicles, particularly those equipped with Ackermann steering. This paper presents a novel algorithm named MARF that combines multi-agent reinforcement learning (MARL) with a Frenet lattice planner. The MARL foundation endows the algorithm with enhanced generalization capabilities while preserving computational efficiency. By incorporating Frenet lattice trajectories into the action space of the MARL framework, agents are capable of generating smooth and feasible trajectories that respect the kinematic and dynamic constraints. In addition, we adopt a centralized training and decentralized execution (CTDE) framework, where a network of shared value functions enables efficient cooperation among agents during decision-making. Simulation results and real-world experiments in different scenarios demonstrate that our method achieves superior performance in terms of success rate, average speed, extra distance of trajectory, and computing time.} } - Deye Zhu, Chengrui Zhu, Zhen Zhang, Shuo Xin, and Yong Liu. Learning Safe Locomotion for Quadrupedal Robots by Derived-Action Optimization. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6870-6876, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Deep reinforcement learning controllers with exteroception have enabled quadrupedal robots to traverse terrain robustly. However, most of these controllers heavily depend on complex reward functions and suffer from poor convergence. This work proposes a novel learning framework called derived-action optimization. The derived action is defined as a high-level representation of a policy and can be introduced into the reward function to guide decision-making behaviors. The proposed derived-action optimization method is applied to learn safer quadrupedal locomotion, achieving fast convergence and better performance. Specifically, we choose the foothold as the derived action and optimize the flatness of the terrain around the foothold to reduce potential sliding and collisions. Extensive experiments demonstrate the high safety and effectiveness of our method.
@inproceedings{zhu2024lsl, title = {Learning Safe Locomotion for Quadrupedal Robots by Derived-Action Optimization}, author = {Deye Zhu and Chengrui Zhu and Zhen Zhang and Shuo Xin and Yong Liu}, year = 2024, booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, pages = {6870-6876}, doi = {10.1109/IROS58592.2024.10802725}, abstract = {Deep reinforcement learning controllers with exteroception have enabled quadrupedal robots to traverse terrain robustly. However, most of these controllers heavily depend on complex reward functions and suffer from poor convergence. This work proposes a novel learning framework called derived-action optimization. The derived action is defined as a high-level representation of a policy and can be introduced into the reward function to guide decision-making behaviors. The proposed derived-action optimization method is applied to learn safer quadrupedal locomotion, achieving fast convergence and better performance. Specifically, we choose the foothold as the derived action and optimize the flatness of the terrain around the foothold to reduce potential sliding and collisions. Extensive experiments demonstrate the high safety and effectiveness of our method.} }
