Siqi Li
PhD Student
Institute of Cyber-Systems and Control, Zhejiang University, China
Biography
I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include network pruning.
Research and Interests
- Network pruning
- Reinforcement learning
Publications
- Siqi Li, Jun Chen, Shanqi Liu, Chengrui Zhu, Guanzhong Tian, and Yong Liu. MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems, 2024.
[BibTeX] [Abstract] [DOI]Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.
@article{li2024mcmc, title = {MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning}, author = {Siqi Li and Jun Chen and Shanqi Liu and Chengrui Zhu and Guanzhong Tian and Yong Liu}, year = 2024, journal = {IEEE Transactions on Neural Networks and Learning Systems}, doi = {10.1109/TNNLS.2024.3353763}, abstract = {Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.} }
- Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, and Yong Liu. MaxQ: Multi-Axis Query for N:m Sparsity Network. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15845-15854, 2024.
[BibTeX] [Abstract] [DOI] [PDF]N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.
@inproceedings{xiang2024maxq, title = {MaxQ: Multi-Axis Query for N:m Sparsity Network}, author = {Jingyang Xiang and Siqi Li and Junhao Chen and Zhuangzhi Chen and Tianxin Huang and Linpeng Peng and Yong Liu}, year = 2024, booktitle = {2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, pages = {15845-15854}, doi = {10.1109/CVPR52733.2024.01500}, abstract = {N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.} }
- Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, and Yong Liu. SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration. In 37th Conference on Neural Information Processing Systems (NeurIPS), pages 52033-52050, 2023.
[BibTeX] [Abstract] [PDF]The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.
@inproceedings{xiang2023subp, title = {SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration}, author = {Jingyang Xiang and Siqi Li and Jun Chen and Shipeng Bai and Yukai Ma and Guang Dai and Yong Liu}, year = 2023, booktitle = {37th Conference on Neural Information Processing Systems (NeurIPS)}, pages = {52033-52050}, abstract = {The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.} }