Siqi Li
PhD Student
Institute of Cyber-Systems and Control, Zhejiang University, China
Biography
I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include network pruning.
Research and Interests
- Network pruning
- Reinforcement learning
Publications
- Siqi Li, Jun Chen, Jingyang Xiang, Chengrui Zhu, Jiandang Yang, Xiaobin Wei, Yunliang Jiang, and Yong Liu. Automatic Data-Free Pruning via Channel Similarity Reconstruction. Neurocomputing, 661:131885, 2026.
[BibTeX] [Abstract] [DOI] [PDF]Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-free methods have been proposed, however, these methods often require handcrafted parameter tuning and can only achieve inflexible reconstruction. In this paper, we propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning. Our approach is based on the assumption that the loss of information can be partially compensated by retaining focused information from similar channels. Specifically, we formulate data-free pruning as an optimization problem, which can be effectively addressed through reinforcement learning. AutoDFP assesses the similarity of channels for each layer and provides this information to the reinforcement learning agent, guiding the pruning and reconstruction process of the network. We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.
@article{li2026adf, title = {Automatic Data-Free Pruning via Channel Similarity Reconstruction}, author = {Siqi Li and Jun Chen and Jingyang Xiang and Chengrui Zhu and Jiandang Yang and Xiaobin Wei and Yunliang Jiang and Yong Liu}, year = 2026, journal = {Neurocomputing}, volume = 661, pages = {131885}, doi = {10.1016/j.neucom.2025.131885}, abstract = {Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-free methods have been proposed, however, these methods often require handcrafted parameter tuning and can only achieve inflexible reconstruction. In this paper, we propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning. Our approach is based on the assumption that the loss of information can be partially compensated by retaining focused information from similar channels. Specifically, we formulate data-free pruning as an optimization problem, which can be effectively addressed through reinforcement learning. AutoDFP assesses the similarity of channels for each layer and provides this information to the reinforcement learning agent, guiding the pruning and reconstruction process of the network. We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.} } - Chengrui Zhu, Zhen Zhang, Weiwei Liu, Siqi Li, and Yong Liu. Learning Accurate and Robust Velocity Tracking for Quadrupedal Robots. Journal of Field Robotics, 2025.
[BibTeX] [DOI]@article{zhu2025lar, title = {Learning Accurate and Robust Velocity Tracking for Quadrupedal Robots}, author = {Chengrui Zhu and Zhen Zhang and Weiwei Liu and Siqi Li and Yong Liu}, year = 2025, journal = {Journal of Field Robotics}, doi = {10.1002/rob.70028} } - Siqi Li, Jun Chen, Shanqi Liu, Chengrui Zhu, Guanzhong Tian, and Yong Liu. MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems, 36:3410-3422, 2025.
[BibTeX] [Abstract] [DOI] [PDF]Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.
@article{li2025mcmc, title = {MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning}, author = {Siqi Li and Jun Chen and Shanqi Liu and Chengrui Zhu and Guanzhong Tian and Yong Liu}, year = 2025, journal = {IEEE Transactions on Neural Networks and Learning Systems}, volume = 36, pages = {3410-3422}, doi = {10.1109/TNNLS.2024.3353763}, abstract = {Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.} } - Chengrui Zhu, Zhen Zhang, Siqi Li, Qingpeng Li, and Yong Liu. Learning Symmetric Legged Locomotion via State Distribution Symmetrization. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025.
[BibTeX] [Abstract] [DOI]Morphological symmetry is a fundamental characteristic of legged animals and robots. Most existing Deep Reinforcement Learning approaches for legged locomotion neglect to exploit this inherent symmetry, often producing unnatural and suboptimal behaviors such as dominant legs or non-periodic gaits. To address this limitation, we propose a novel learning-based framework to systematically optimize symmetry by state distribution symmetrization. First, we introduce the degree of asymmetry (DoA), a quantitative metric that measures the discrepancy between original and mirrored state distributions. Second, we develop an efficient computation method for DoA using gradient ascent with a trained discriminator network. This metric is then incorporated into a reinforcement learning framework by introducing it to the reward function, explicitly encouraging symmetry during policy training. We validate our framework with extensive experiments on quadrupedal and humanoid robots in simulated and real-world environments. Results demonstrate the efficacy of our approach for improving policy symmetry and overall locomotion performance.
@inproceedings{zhu2025lsl, title = {Learning Symmetric Legged Locomotion via State Distribution Symmetrization}, author = {Chengrui Zhu and Zhen Zhang and Siqi Li and Qingpeng Li and Yong Liu}, year = 2025, booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, doi = {10.1109/IROS60139.2025.11246183}, abstract = {Morphological symmetry is a fundamental characteristic of legged animals and robots. Most existing Deep Reinforcement Learning approaches for legged locomotion neglect to exploit this inherent symmetry, often producing unnatural and suboptimal behaviors such as dominant legs or non-periodic gaits. To address this limitation, we propose a novel learning-based framework to systematically optimize symmetry by state distribution symmetrization. First, we introduce the degree of asymmetry (DoA), a quantitative metric that measures the discrepancy between original and mirrored state distributions. Second, we develop an efficient computation method for DoA using gradient ascent with a trained discriminator network. This metric is then incorporated into a reinforcement learning framework by introducing it to the reward function, explicitly encouraging symmetry during policy training. We validate our framework with extensive experiments on quadrupedal and humanoid robots in simulated and real-world environments. Results demonstrate the efficacy of our approach for improving policy symmetry and overall locomotion performance.} } - Jiateng Wei, Quan Lu, Ning Jiang, Siqi Li, Jingyang Xiang, Jun Chen, and Yong Liu. Structured Optimal Brain Pruning for Large Language Models. In The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 13991-14007, 2024.
[BibTeX] [Abstract] [DOI]language=”eng” data-ev-field=”abstract”>The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.
@inproceedings{wei2024sob, title = {Structured Optimal Brain Pruning for Large Language Models}, author = {Jiateng Wei and Quan Lu and Ning Jiang and Siqi Li and Jingyang Xiang and Jun Chen and Yong Liu}, year = 2024, booktitle = {The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, pages = {13991-14007}, doi = {10.18653/v1/2024.emnlp-main.775}, abstract = {language="eng" data-ev-field="abstract">The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.} } - Jingyang Xiang, Zuohui Chen, Siqi Li, Qing Wu, and Yong Liu. OvSW:Overcoming Silent Weights for Accurate Binary Neural Networks. In 18th European Conference on Computer Vision (ECCV), pages 1-18, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as “silent weights”, which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify “silent weights” by tracking weight flipping state, and apply an additional penalty to “silent weights” to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/ OvSW.
@inproceedings{xiang2024ovsw, title = {OvSW:Overcoming Silent Weights for Accurate Binary Neural Networks}, author = {Jingyang Xiang and Zuohui Chen and Siqi Li and Qing Wu and Yong Liu}, year = 2024, booktitle = {18th European Conference on Computer Vision (ECCV)}, pages = {1-18}, doi = {10.1007/978-3-031-73414-4_1}, abstract = {Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as "silent weights", which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify "silent weights" by tracking weight flipping state, and apply an additional penalty to "silent weights" to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/ OvSW.} } - Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, and Yong Liu. MaxQ: Multi-Axis Query for N:m Sparsity Network. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15845-15854, 2024.
[BibTeX] [Abstract] [DOI] [PDF]N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.
@inproceedings{xiang2024maxq, title = {MaxQ: Multi-Axis Query for N:m Sparsity Network}, author = {Jingyang Xiang and Siqi Li and Junhao Chen and Zhuangzhi Chen and Tianxin Huang and Linpeng Peng and Yong Liu}, year = 2024, booktitle = {2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, pages = {15845-15854}, doi = {10.1109/CVPR52733.2024.01500}, abstract = {N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.} } - Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, and Yong Liu. SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration. In 37th Conference on Neural Information Processing Systems (NeurIPS), pages 52033-52050, 2023.
[BibTeX] [Abstract] [PDF]The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.
@inproceedings{xiang2023subp, title = {SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration}, author = {Jingyang Xiang and Siqi Li and Jun Chen and Shipeng Bai and Yukai Ma and Guang Dai and Yong Liu}, year = 2023, booktitle = {37th Conference on Neural Information Processing Systems (NeurIPS)}, pages = {52033-52050}, abstract = {The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.} }
