Jingyang Xiang
MS Student
Institute of Cyber-Systems and Control, Zhejiang University, China
I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interest is network pruning, network binarization and contrastive learning.
Research and Interests
- Network pruning
- Network binarization
- Contrastive learning
Publications
- Siqi Li, Jun Chen, Jingyang Xiang, Chengrui Zhu, Jiandang Yang, Xiaobin Wei, Yunliang Jiang, and Yong Liu. Automatic Data-Free Pruning via Channel Similarity Reconstruction. Neurocomputing, 661:131885, 2026.
[BibTeX] [Abstract] [DOI] [PDF]Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-free methods have been proposed, however, these methods often require handcrafted parameter tuning and can only achieve inflexible reconstruction. In this paper, we propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning. Our approach is based on the assumption that the loss of information can be partially compensated by retaining focused information from similar channels. Specifically, we formulate data-free pruning as an optimization problem, which can be effectively addressed through reinforcement learning. AutoDFP assesses the similarity of channels for each layer and provides this information to the reinforcement learning agent, guiding the pruning and reconstruction process of the network. We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.
@article{li2026adf, title = {Automatic Data-Free Pruning via Channel Similarity Reconstruction}, author = {Siqi Li and Jun Chen and Jingyang Xiang and Chengrui Zhu and Jiandang Yang and Xiaobin Wei and Yunliang Jiang and Yong Liu}, year = 2026, journal = {Neurocomputing}, volume = 661, pages = {131885}, doi = {10.1016/j.neucom.2025.131885}, abstract = {Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-free methods have been proposed, however, these methods often require handcrafted parameter tuning and can only achieve inflexible reconstruction. In this paper, we propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning. Our approach is based on the assumption that the loss of information can be partially compensated by retaining focused information from similar channels. Specifically, we formulate data-free pruning as an optimization problem, which can be effectively addressed through reinforcement learning. AutoDFP assesses the similarity of channels for each layer and provides this information to the reinforcement learning agent, guiding the pruning and reconstruction process of the network. We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.} } - Jun Chen, Jingyang Xiang, Tianxin Huang, Xiangrui Zhao, and Yong Liu. Hyperbolic Binary Neural Network. IEEE Transactions on Neural Networks and Learning Systems, 36:10325-10333, 2025.
[BibTeX] [Abstract] [DOI] [PDF]Binary neural network (BNN) converts full-precision weights and activations into their extreme 1-bit counterparts, making it particularly suitable for deployment on lightweight mobile devices. While BNNs are typically formulated as a constrained optimization problem and optimized in the binarized space, general neural networks are formulated as an unconstrained optimization problem and optimized in the continuous space. This article introduces the hyperbolic BNN (HBNN) by leveraging the framework of hyperbolic geometry to optimize the constrained problem. Specifically, we transform the constrained problem in hyperbolic space into an unconstrained one in Euclidean space using the Riemannian exponential map. On the other hand, we also propose the exponential parametrization cluster (EPC) method, which, compared with the Riemannian exponential map, shrinks the segment domain based on a diffeomorphism. This approach increases the probability of weight flips, thereby maximizing the information gain in BNNs. Experimental results on CIFAR10, CIFAR100, and ImageNet classification datasets with VGGsmall, ResNet18, and ResNet34 models illustrate the superior performance of our HBNN over state-of-the-art methods.
@article{chen2025hbn, title = {Hyperbolic Binary Neural Network}, author = {Jun Chen and Jingyang Xiang and Tianxin Huang and Xiangrui Zhao and Yong Liu}, year = 2025, journal = {IEEE Transactions on Neural Networks and Learning Systems}, volume = 36, pages = {10325-10333}, doi = {10.1109/TNNLS.2024.3485115}, abstract = {Binary neural network (BNN) converts full-precision weights and activations into their extreme 1-bit counterparts, making it particularly suitable for deployment on lightweight mobile devices. While BNNs are typically formulated as a constrained optimization problem and optimized in the binarized space, general neural networks are formulated as an unconstrained optimization problem and optimized in the continuous space. This article introduces the hyperbolic BNN (HBNN) by leveraging the framework of hyperbolic geometry to optimize the constrained problem. Specifically, we transform the constrained problem in hyperbolic space into an unconstrained one in Euclidean space using the Riemannian exponential map. On the other hand, we also propose the exponential parametrization cluster (EPC) method, which, compared with the Riemannian exponential map, shrinks the segment domain based on a diffeomorphism. This approach increases the probability of weight flips, thereby maximizing the information gain in BNNs. Experimental results on CIFAR10, CIFAR100, and ImageNet classification datasets with VGGsmall, ResNet18, and ResNet34 models illustrate the superior performance of our HBNN over state-of-the-art methods.} } - Linpeng Peng, Rongyao Cai, Jingyang Xiang, Junyu Zhu, Weiwei Liu, Wang Gao, and Yong Liu. LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation. IEEE Robotics and Automation Letters, 9:7995-8002, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net’s original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.
@article{peng2024lal, title = {LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation}, author = {Linpeng Peng and Rongyao Cai and Jingyang Xiang and Junyu Zhu and Weiwei Liu and Wang Gao and Yong Liu}, year = 2024, journal = {IEEE Robotics and Automation Letters}, volume = 9, pages = {7995-8002}, doi = {10.1109/LRA.2024.3436336}, abstract = {Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net's original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.} } - Jiateng Wei, Quan Lu, Ning Jiang, Siqi Li, Jingyang Xiang, Jun Chen, and Yong Liu. Structured Optimal Brain Pruning for Large Language Models. In The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 13991-14007, 2024.
[BibTeX] [Abstract] [DOI]language=”eng” data-ev-field=”abstract”>The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.
@inproceedings{wei2024sob, title = {Structured Optimal Brain Pruning for Large Language Models}, author = {Jiateng Wei and Quan Lu and Ning Jiang and Siqi Li and Jingyang Xiang and Jun Chen and Yong Liu}, year = 2024, booktitle = {The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, pages = {13991-14007}, doi = {10.18653/v1/2024.emnlp-main.775}, abstract = {language="eng" data-ev-field="abstract">The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.} } - Jingyang Xiang, Zuohui Chen, Siqi Li, Qing Wu, and Yong Liu. OvSW:Overcoming Silent Weights for Accurate Binary Neural Networks. In 18th European Conference on Computer Vision (ECCV), pages 1-18, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as “silent weights”, which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify “silent weights” by tracking weight flipping state, and apply an additional penalty to “silent weights” to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/ OvSW.
@inproceedings{xiang2024ovsw, title = {OvSW:Overcoming Silent Weights for Accurate Binary Neural Networks}, author = {Jingyang Xiang and Zuohui Chen and Siqi Li and Qing Wu and Yong Liu}, year = 2024, booktitle = {18th European Conference on Computer Vision (ECCV)}, pages = {1-18}, doi = {10.1007/978-3-031-73414-4_1}, abstract = {Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as "silent weights", which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify "silent weights" by tracking weight flipping state, and apply an additional penalty to "silent weights" to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/ OvSW.} } - Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, and Yong Liu. MaxQ: Multi-Axis Query for N:m Sparsity Network. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15845-15854, 2024.
[BibTeX] [Abstract] [DOI] [PDF]N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.
@inproceedings{xiang2024maxq, title = {MaxQ: Multi-Axis Query for N:m Sparsity Network}, author = {Jingyang Xiang and Siqi Li and Junhao Chen and Zhuangzhi Chen and Tianxin Huang and Linpeng Peng and Yong Liu}, year = 2024, booktitle = {2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, pages = {15845-15854}, doi = {10.1109/CVPR52733.2024.01500}, abstract = {N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.} } - Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, and Yong Liu. SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration. In 37th Conference on Neural Information Processing Systems (NeurIPS), pages 52033-52050, 2023.
[BibTeX] [Abstract] [PDF]The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.
@inproceedings{xiang2023subp, title = {SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration}, author = {Jingyang Xiang and Siqi Li and Jun Chen and Shipeng Bai and Yukai Ma and Guang Dai and Yong Liu}, year = 2023, booktitle = {37th Conference on Neural Information Processing Systems (NeurIPS)}, pages = {52033-52050}, abstract = {The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.} }
