Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: junc@zju.edu.cn

Jun Chen

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include network quantization.

Research and Interests

  • Deep Learning
  • network compression
  • network quantization

Publications

  • Siqi Li, Jun Chen, Shanqi Liu, Chengrui Zhu, Guanzhong Tian, and Yong Liu. MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems, 2024.
    [BibTeX] [Abstract] [DOI]
    Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.
    @article{li2024mcmc,
    title = {MCMC: Multi-Constrained Model Compression via One-stage Envelope Reinforcement Learning},
    author = {Siqi Li and Jun Chen and Shanqi Liu and Chengrui Zhu and Guanzhong Tian and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    doi = {10.1109/TNNLS.2024.3353763},
    abstract = {Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31x reduction in memory usage, and a 1.92x acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7x faster speed and 1.48x memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63x faster speed, 1.64x memory compression, and an accuracy improvement.}
    }
  • Tianxin Huang, Qingyao Liu, Xiangrui Zhao, Jun Chen, and Yong Liu. Learnable Chamfer Distance for Point Cloud Reconstruction. Pattern Recognition Letters, 178:43-48, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    As point clouds are 3D signals with permutation invariance, most existing works train their reconstruction networks by measuring shape differences with the average point-to-point distance between point clouds matched with predefined rules. However, the static matching rules may deviate from actual shape differences. Although some works propose dynamically -updated learnable structures to replace matching rules, they need more iterations to converge well. In this work, we propose a simple but effective reconstruction loss, named Learnable Chamfer Distance (LCD) by dynamically paying attention to matching distances with different weight distributions controlled with a group of learnable networks. By training with adversarial strategy, LCD learns to search defects in reconstructed results and overcomes the weaknesses of static matching rules, while the performances at low iterations can also be guaranteed by the basic matching algorithm. Experiments on multiple reconstruction networks confirm that LCD can help achieve better reconstruction performances and extract more representative representations with faster convergence and comparable training efficiency.
    @article{huang2024lcd,
    title = {Learnable Chamfer Distance for Point Cloud Reconstruction},
    author = {Tianxin Huang and Qingyao Liu and Xiangrui Zhao and Jun Chen and Yong Liu},
    year = 2024,
    journal = {Pattern Recognition Letters},
    volume = 178,
    pages = {43-48},
    doi = {10.1016/j.patrec.2023.12.015},
    abstract = {As point clouds are 3D signals with permutation invariance, most existing works train their reconstruction networks by measuring shape differences with the average point-to-point distance between point clouds matched with predefined rules. However, the static matching rules may deviate from actual shape differences. Although some works propose dynamically -updated learnable structures to replace matching rules, they need more iterations to converge well. In this work, we propose a simple but effective reconstruction loss, named Learnable Chamfer Distance (LCD) by dynamically paying attention to matching distances with different weight distributions controlled with a group of learnable networks. By training with adversarial strategy, LCD learns to search defects in reconstructed results and overcomes the weaknesses of static matching rules, while the performances at low iterations can also be guaranteed by the basic matching algorithm. Experiments on multiple reconstruction networks confirm that LCD can help achieve better reconstruction performances and extract more representative representations with faster convergence and comparable training efficiency.}
    }
  • Yuang Liu, Jun Chen, and Yong Liu. DCCD: Reducing Neural Network Redundancy via Distillation. IEEE Transactions on Neural Networks and Learning Systems, 35:10006-10017, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Deep neural models have achieved remarkable performance on various supervised and unsupervised learning tasks, but it is a challenge to deploy these large-size networks on resource-limited devices. As a representative type of model compression and acceleration methods, knowledge distillation (KD) solves this problem by transferring knowledge from heavy teachers to lightweight students. However, most distillation methods focus on imitating the responses of teacher networks but ignore the information redundancy of student networks. In this article, we propose a novel distillation framework difference-based channel contrastive distillation (DCCD), which introduces channel contrastive knowledge and dynamic difference knowledge into student networks for redundancy reduction. At the feature level, we construct an efficient contrastive objective that broadens student networks’ feature expression space and preserves richer information in the feature extraction stage. At the final output level, more detailed knowledge is extracted from teacher networks by making a difference between multiview augmented responses of the same instance. We enhance student networks to be more sensitive to minor dynamic changes. With the improvement of two aspects of DCCD, the student network gains contrastive and difference knowledge and reduces its overfitting and redundancy. Finally, we achieve surprising results that the student approaches and even outperforms the teacher in test accuracy on CIFAR-100. We reduce the top-1 error to 28.16% on ImageNet classification and 24.15% for cross-model transfer with ResNet-18. Empirical experiments and ablation studies on popular datasets show that our proposed method can achieve state-of-the-art accuracy compared with other distillation methods.
    @article{liu2024dccd,
    title = {DCCD: Reducing Neural Network Redundancy via Distillation},
    author = {Yuang Liu and Jun Chen and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    volume = 35,
    pages = {10006-10017},
    doi = {10.1109/TNNLS.2023.3238337},
    abstract = {Deep neural models have achieved remarkable performance on various supervised and unsupervised learning tasks, but it is a challenge to deploy these large-size networks on resource-limited devices. As a representative type of model compression and acceleration methods, knowledge distillation (KD) solves this problem by transferring knowledge from heavy teachers to lightweight students. However, most distillation methods focus on imitating the responses of teacher networks but ignore the information redundancy of student networks. In this article, we propose a novel distillation framework difference-based channel contrastive distillation (DCCD), which introduces channel contrastive knowledge and dynamic difference knowledge into student networks for redundancy reduction. At the feature level, we construct an efficient contrastive objective that broadens student networks' feature expression space and preserves richer information in the feature extraction stage. At the final output level, more detailed knowledge is extracted from teacher networks by making a difference between multiview augmented responses of the same instance. We enhance student networks to be more sensitive to minor dynamic changes. With the improvement of two aspects of DCCD, the student network gains contrastive and difference knowledge and reduces its overfitting and redundancy. Finally, we achieve surprising results that the student approaches and even outperforms the teacher in test accuracy on CIFAR-100. We reduce the top-1 error to 28.16% on ImageNet classification and 24.15% for cross-model transfer with ResNet-18. Empirical experiments and ablation studies on popular datasets show that our proposed method can achieve state-of-the-art accuracy compared with other distillation methods.}
    }
  • Shanqi Liu, Weiwei Liu, Wenzhou Chen, Guanzhong Tian, Jun Chen, Yao Tong, Junjie Cao, and Yong Liu. Learning Multi-Agent Cooperation via Considering Actions of Teammates. IEEE Transactions on Neural Networks and Learning Systems, 35:11553-11564, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ‘ s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ‘ actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.
    @article{liu2024lma,
    title = {Learning Multi-Agent Cooperation via Considering Actions of Teammates},
    author = {Shanqi Liu and Weiwei Liu and Wenzhou Chen and Guanzhong Tian and Jun Chen and Yao Tong and Junjie Cao and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    volume = 35,
    pages = {11553-11564},
    doi = {10.1109/TNNLS.2023.3262921},
    abstract = {Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ' s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ' actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.}
    }
  • Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, and Yong Liu. Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold. In 12nd International Conference on Learning Representations (ICLR), 2024.
    [BibTeX] [Abstract]
    The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and the communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.
    @inproceedings{chen2024drc,
    title = {Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold},
    author = {Jun Chen and Haishan Ye and Mengmeng Wang and Tianxin Huang and Guang Dai and Ivor W Tsang and Yong Liu},
    year = 2024,
    booktitle = {12nd International Conference on Learning Representations (ICLR)},
    abstract = {The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and the communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.}
    }
  • Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, and Yong Liu. A Multimodal, Multi-task Adapting Framework for Video Action Recognition. In 38th AAAI Conference on Artificial Intelligence (AAAI), pages 5517-5525, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models’ generalization capabilities during transfer. In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named M2-CLIP to address these challenges, preserving both high supervised performance and robust transferability. Firstly, to enhance the individual modality architectures, we introduce multimodal adapters to both the visual and text branches. Specifically, we design a novel visual TED-Adapter, that performs global Temporal Enhancement and local temporal Difference modeling to improve the temporal representation capabilities of the visual encoder. Moreover, we adopt text encoder adapters to strengthen the learning of semantic label information. Secondly, we design a multi-task decoder with a rich set of supervisory signals to adeptly satisfy the need for strong supervised performance and generalization within a multimodal framework. Experimental results validate the efficacy of our approach, demonstrating exceptional performance in supervised learning while maintaining strong generalization in zero-shot scenarios.
    @inproceedings{wang2024amm,
    title = {A Multimodal, Multi-task Adapting Framework for Video Action Recognition},
    author = {Mengmeng Wang and Jiazheng Xing and Boyuan Jiang and Jun Chen and Jianbiao Mei and Xingxing Zuo and Guang Dai and Jingdong Wang and Yong Liu},
    year = 2024,
    booktitle = {38th AAAI Conference on Artificial Intelligence (AAAI)},
    pages = {5517-5525},
    doi = {10.1609/aaai.v38i6.28361},
    abstract = {Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named M2-CLIP to address these challenges, preserving both high supervised performance and robust transferability. Firstly, to enhance the individual modality architectures, we introduce multimodal adapters to both the visual and text branches. Specifically, we design a novel visual TED-Adapter, that performs global Temporal Enhancement and local temporal Difference modeling to improve the temporal representation capabilities of the visual encoder. Moreover, we adopt text encoder adapters to strengthen the learning of semantic label information. Secondly, we design a multi-task decoder with a rich set of supervisory signals to adeptly satisfy the need for strong supervised performance and generalization within a multimodal framework. Experimental results validate the efficacy of our approach, demonstrating exceptional performance in supervised learning while maintaining strong generalization in zero-shot scenarios.}
    }
  • Bofeng Jiang, Jun Chen, and Yong Liu. Single-Shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration. Engineering Applications of Artificial Intelligence, 126:106816, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Applying CNN on embedded systems is challenging due to model size limitations. Pruning and quantization can help, but are time-consuming to apply separately. Our Single-Shot Pruning and Quantization strategy addresses these issues by quantizing and pruning in a single process. We evaluated our method on CIFAR-10 and CIFAR-100 datasets for image classification. Our model is 69.4% smaller with little accuracy loss, and runs 6-8 times faster on NVIDIA Xavier NX hardware.
    @article{jiang2023ssp,
    title = {Single-Shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration},
    author = {Bofeng Jiang and Jun Chen and Yong Liu},
    year = 2023,
    journal = {Engineering Applications of Artificial Intelligence},
    volume = 126,
    pages = {106816},
    doi = {10.1016/j.engappai.2023.106816},
    abstract = {Applying CNN on embedded systems is challenging due to model size limitations. Pruning and quantization can help, but are time-consuming to apply separately. Our Single-Shot Pruning and Quantization strategy addresses these issues by quantizing and pruning in a single process. We evaluated our method on CIFAR-10 and CIFAR-100 datasets for image classification. Our model is 69.4% smaller with little accuracy loss, and runs 6-8 times faster on NVIDIA Xavier NX hardware.}
    }
  • Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, and Yong Liu. Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning. Pattern Recognition, 143:109780, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
    @article{chen2023dfq,
    title = {Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning},
    author = {Jun Chen and Shipeng Bai and Tianxin Huang and Mengmeng Wang and Guanzhong Tian and Yong Liu},
    year = 2023,
    journal = {Pattern Recognition},
    volume = 143,
    pages = {109780},
    doi = {10.1016/j.patcog.2023.109780},
    abstract = {Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.}
    }
  • Mengmeng Wang, Jiazheng Xing, Jing Su, Jun Chen, and Yong Liu. Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:3347-3362, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recent methods for action recognition always apply 3D Convolutional Neural Networks (CNNs) to extract spatiotemporal features and introduce optical flows to present motion features. Although achieving state-of-the-art performance, they are expensive in both time and space. In this paper, we propose to represent both the two kinds of features in a unified 2D CNN without any 3D convolution or optical flows calculation. In particular, we first design a channel-wise spatiotemporal module to present the spatiotemporal features and a channel-wise motion module to encode feature-level motion features efficiently. Secondly, we combine these two modules and an identity mapping path into one united block that can easily replaces the original residual block in the ResNet architecture, forming a simple yet effective network termed STM network by introducing very limited extra computation cost and parameters. Thirdly, we propose a novel Twins Training framework for action recognition by incorporating a correlation loss to optimize the inter-class and intra-class correlation and a siamese structure to fully stretch the training data. We extensively validate the proposed STM on both temporal-related datasets (i.e., Something-Something v1 & v2) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51). It achieves favorable results against state-of-the-art methods in all the datasets.
    @article{wang2022lsm,
    title = {Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition},
    author = {Mengmeng Wang and Jiazheng Xing and Jing Su and Jun Chen and Yong Liu},
    year = 2023,
    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
    volume = 45,
    pages = {3347-3362},
    doi = {10.1109/TPAMI.2022.3173658},
    abstract = {Recent methods for action recognition always apply 3D Convolutional Neural Networks (CNNs) to extract spatiotemporal features and introduce optical flows to present motion features. Although achieving state-of-the-art performance, they are expensive in both time and space. In this paper, we propose to represent both the two kinds of features in a unified 2D CNN without any 3D convolution or optical flows calculation. In particular, we first design a channel-wise spatiotemporal module to present the spatiotemporal features and a channel-wise motion module to encode feature-level motion features efficiently. Secondly, we combine these two modules and an identity mapping path into one united block that can easily replaces the original residual block in the ResNet architecture, forming a simple yet effective network termed STM network by introducing very limited extra computation cost and parameters. Thirdly, we propose a novel Twins Training framework for action recognition by incorporating a correlation loss to optimize the inter-class and intra-class correlation and a siamese structure to fully stretch the training data. We extensively validate the proposed STM on both temporal-related datasets (i.e., Something-Something v1 \& v2) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51). It achieves favorable results against state-of-the-art methods in all the datasets.}
    }
  • Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, and Yong Liu. SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration. In 37th Conference on Neural Information Processing Systems (NeurIPS), pages 52033-52050, 2023.
    [BibTeX] [Abstract] [PDF]
    The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.
    @inproceedings{xiang2023subp,
    title = {SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration},
    author = {Jingyang Xiang and Siqi Li and Jun Chen and Shipeng Bai and Yukai Ma and Guang Dai and Yong Liu},
    year = 2023,
    booktitle = {37th Conference on Neural Information Processing Systems (NeurIPS)},
    pages = {52033-52050},
    abstract = {The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.}
    }
  • Xintian Shen, Jiangning Zhang, Jun Chen, Shipeng Bai, Yue Han, Yabiao Wang, Chengjie Wang, and Yong Liu. Learning Global-Aware Kernel for Image Harmonization. In 19th IEEE/CVF International Conference on Computer Vision (ICCV), pages 7501-7510, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Image harmonization aims to solve the visual inconsistency problem in composited images by adaptively adjusting the foreground pixels with the background as references. Existing methods employ local color transformation or region matching between foreground and background, which neglects powerful proximity prior and independently distinguishes fore-/back-ground as a whole part for harmonization. As a result, they still show a limited performance across varied foreground objects and scenes. To address this issue, we propose a novel Global-aware Kernel Net-work (GKNet) to harmonize local regions with comprehensive consideration of long-distance background references. Specifically, GKNet includes two parts, i.e., harmony kernel prediction and harmony kernel modulation branches. The former includes a Long-distance Reference Extractor (LRE) to obtain long-distance context and Kernel Prediction Blocks (KPB) to predict multi-level harmony kernels by fusing global information with local features. To achieve this goal, a novel Selective Correlation Fusion (SCF) module is proposed to better select relevant long-distance background references for local harmonization. The latter employs the predicted kernels to harmonize foreground regions with local and global awareness. Abundant experiments demonstrate the superiority of our method for image harmonization over state-of-the-art methods, e.g., achieving 39.53dB PSNR that surpasses the best counterpart by +0.78dB ↑; decreasing fMSE/MSE by 11.5%↓/6.7%↓ compared with the SoTA method. Code will be available at here.
    @inproceedings{shen2023lga,
    title = {Learning Global-Aware Kernel for Image Harmonization},
    author = {Xintian Shen and Jiangning Zhang and Jun Chen and Shipeng Bai and Yue Han and Yabiao Wang and Chengjie Wang and Yong Liu},
    year = 2023,
    booktitle = {19th IEEE/CVF International Conference on Computer Vision (ICCV)},
    pages = {7501-7510},
    doi = {10.1109/ICCV51070.2023.00693},
    abstract = {Image harmonization aims to solve the visual inconsistency problem in composited images by adaptively adjusting the foreground pixels with the background as references. Existing methods employ local color transformation or region matching between foreground and background, which neglects powerful proximity prior and independently distinguishes fore-/back-ground as a whole part for harmonization. As a result, they still show a limited performance across varied foreground objects and scenes. To address this issue, we propose a novel Global-aware Kernel Net-work (GKNet) to harmonize local regions with comprehensive consideration of long-distance background references. Specifically, GKNet includes two parts, i.e., harmony kernel prediction and harmony kernel modulation branches. The former includes a Long-distance Reference Extractor (LRE) to obtain long-distance context and Kernel Prediction Blocks (KPB) to predict multi-level harmony kernels by fusing global information with local features. To achieve this goal, a novel Selective Correlation Fusion (SCF) module is proposed to better select relevant long-distance background references for local harmonization. The latter employs the predicted kernels to harmonize foreground regions with local and global awareness. Abundant experiments demonstrate the superiority of our method for image harmonization over state-of-the-art methods, e.g., achieving 39.53dB PSNR that surpasses the best counterpart by +0.78dB ↑; decreasing fMSE/MSE by 11.5%↓/6.7%↓ compared with the SoTA method. Code will be available at here.}
    }
  • Shipeng Bai, Jun Chen, Xintian Shen, Yixuan Qian, and Yong liu. Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning. In 19th IEEE/CVF International Conference on Computer Vision (ICCV), pages 5853-5862, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Structured pruning and quantization are promising approaches for reducing the inference time and memory footprint of neural networks. However, most existing methods require the original training dataset to fine-tune the model. This not only brings heavy resource consumption but also is not possible for applications with sensitive or proprietary data due to privacy and security concerns. Therefore, a few data-free methods are proposed to address this problem, but they perform data-free pruning and quantization separately, which does not explore the complementarity of pruning and quantization. In this paper, we propose a novel framework named Unified Data-Free Compression(UDFC), which performs pruning and quantization simultaneously without any data and fine-tuning process. Specifically, UDFC starts with the assumption that the partial information of a damaged(e.g., pruned or quantized) channel can be preserved by a linear combination of other channels, and then derives the reconstruction form from the assumption to restore the information loss due to compression. Finally, we formulate the reconstruction error between the original network and its compressed network, and theoretically deduce the closed-form solution. We evaluate the UDFC on the large-scale image classification task and obtain significant improvements over various network architectures and compression methods. For example, we achieve a 20.54% accuracy improvement on ImageNet dataset compared to SOTA method with 30% pruning ratio and 6-bit quantization on ResNet-34. Code will be available at here.
    @inproceedings{bai2023udf,
    title = {Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning},
    author = {Shipeng Bai and Jun Chen and Xintian Shen and Yixuan Qian and Yong liu},
    year = 2023,
    booktitle = {19th IEEE/CVF International Conference on Computer Vision (ICCV)},
    pages = {5853-5862},
    doi = {10.1109/ICCV51070.2023.00540},
    abstract = {Structured pruning and quantization are promising approaches for reducing the inference time and memory footprint of neural networks. However, most existing methods require the original training dataset to fine-tune the model. This not only brings heavy resource consumption but also is not possible for applications with sensitive or proprietary data due to privacy and security concerns. Therefore, a few data-free methods are proposed to address this problem, but they perform data-free pruning and quantization separately, which does not explore the complementarity of pruning and quantization. In this paper, we propose a novel framework named Unified Data-Free Compression(UDFC), which performs pruning and quantization simultaneously without any data and fine-tuning process. Specifically, UDFC starts with the assumption that the partial information of a damaged(e.g., pruned or quantized) channel can be preserved by a linear combination of other channels, and then derives the reconstruction form from the assumption to restore the information loss due to compression. Finally, we formulate the reconstruction error between the original network and its compressed network, and theoretically deduce the closed-form solution. We evaluate the UDFC on the large-scale image classification task and obtain significant improvements over various network architectures and compression methods. For example, we achieve a 20.54% accuracy improvement on ImageNet dataset compared to SOTA method with 30% pruning ratio and 6-bit quantization on ResNet-34. Code will be available at here.}
    }
  • Tianxin Huang, Jun Chen, Jiangning Zhang, Yong Liu, and Jie Liang. Fast Point Cloud Sampling Network. Pattern Recognition Letters, 2022.
    [BibTeX] [Abstract] [DOI]
    The increasing number of points in 3D point clouds has brought great challenges for subsequent algorithm efficiencies. Down-sampling algorithms are adopted to simplify the data and accelerate the computation. Except the well-known random sampling and farthest distance sampling, some recent works have tried to learn a sampling pattern according to the downstream task, which helps generate sampled points by fully-connected networks with fixed output point numbers. In this condition, a progress-net structure covering all resolutions sampling networks or multiple separate sampling networks for different resolutions are required, which is inconvenient. In this work, we propose a novel learning-based point cloud sampling framework, named Fast point cloud sampling network (FPN), which drives initial randomly sampled points to better positions instead of generating coordinates. FPN can be used to sample points clouds to any resolution once trained by changing the number of initial randomly sampled points. Results on point cloud reconstruction and recognition confirm that FPN can reach state-of-the-art performances with much higher sampling efficiency than most existing sampling methods.
    @article{huang2022fast,
    title = {Fast Point Cloud Sampling Network},
    author = {Tianxin Huang and Jun Chen and Jiangning Zhang and Yong Liu and Jie Liang},
    year = 2022,
    journal = {Pattern Recognition Letters},
    doi = {10.1016/j.patrec.2022.11.006},
    abstract = {The increasing number of points in 3D point clouds has brought great challenges for subsequent algorithm efficiencies. Down-sampling algorithms are adopted to simplify the data and accelerate the computation. Except the well-known random sampling and farthest distance sampling, some recent works have tried to learn a sampling pattern according to the downstream task, which helps generate sampled points by fully-connected networks with fixed output point numbers. In this condition, a progress-net structure covering all resolutions sampling networks or multiple separate sampling networks for different resolutions are required, which is inconvenient. In this work, we propose a novel learning-based point cloud sampling framework, named Fast point cloud sampling network (FPN), which drives initial randomly sampled points to better positions instead of generating coordinates. FPN can be used to sample points clouds to any resolution once trained by changing the number of initial randomly sampled points. Results on point cloud reconstruction and recognition confirm that FPN can reach state-of-the-art performances with much higher sampling efficiency than most existing sampling methods.}
    }
  • Guanzhong Tian, Yiran Sun, Yuang Liu, Xianfang Zeng, Mengmeng Wang, Yong Liu, Jiangning Zhang, and Jun Chen. Adding before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention. IEEE Transactions on Neural Networks and Learning Systems, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural networks (DNNs) and accelerates the inference process dramatically. Existing methods mainly rely on manual constraints such as normalization to select the filters. A typical pipeline comprises two stages: first pruning the original neural network and then fine-tuning the pruned model. However, choosing a manual criterion can be somehow tricky and stochastic. Moreover, directly regularizing and modifying filters in the pipeline suffer from being sensitive to the choice of hyperparameters, thus making the pruning procedure less robust. To address these challenges, we propose to handle the filter pruning issue through one stage: using an attention-based architecture that adaptively fuses the filter selection with filter learning in a unified network. Specifically, we present a pruning method named adding before pruning (ABP) to make the model focus on the filters of higher significance by training instead of man-made criteria such as norm, rank, etc. First, we add an auxiliary attention layer into the original model and set the significance scores in this layer to be binary. Furthermore, to propagate the gradients in the auxiliary attention layer, we design a specific gradient estimator and prove its effectiveness for convergence in the graph flow through mathematical derivation. In the end, to relieve the dependence on the complicated prior knowledge for designing the thresholding criterion, we simultaneously prune and train the filters to automatically eliminate network redundancy with recoverability. Extensive experimental results on the two typical image classification benchmarks, CIFAR-10 and ILSVRC-2012, illustrate that the proposed approach performs favorably against previous state-of-the-art filter pruning algorithms.
    @article{tian2021abp,
    title = {Adding before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention},
    author = {Guanzhong Tian and Yiran Sun and Yuang Liu and Xianfang Zeng and Mengmeng Wang and Yong Liu and Jiangning Zhang and Jun Chen},
    year = 2021,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    doi = {10.1109/TNNLS.2021.3106917},
    abstract = {Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural networks (DNNs) and accelerates the inference process dramatically. Existing methods mainly rely on manual constraints such as normalization to select the filters. A typical pipeline comprises two stages: first pruning the original neural network and then fine-tuning the pruned model. However, choosing a manual criterion can be somehow tricky and stochastic. Moreover, directly regularizing and modifying filters in the pipeline suffer from being sensitive to the choice of hyperparameters, thus making the pruning procedure less robust. To address these challenges, we propose to handle the filter pruning issue through one stage: using an attention-based architecture that adaptively fuses the filter selection with filter learning in a unified network. Specifically, we present a pruning method named adding before pruning (ABP) to make the model focus on the filters of higher significance by training instead of man-made criteria such as norm, rank, etc. First, we add an auxiliary attention layer into the original model and set the significance scores in this layer to be binary. Furthermore, to propagate the gradients in the auxiliary attention layer, we design a specific gradient estimator and prove its effectiveness for convergence in the graph flow through mathematical derivation. In the end, to relieve the dependence on the complicated prior knowledge for designing the thresholding criterion, we simultaneously prune and train the filters to automatically eliminate network redundancy with recoverability. Extensive experimental results on the two typical image classification benchmarks, CIFAR-10 and ILSVRC-2012, illustrate that the proposed approach performs favorably against previous state-of-the-art filter pruning algorithms.}
    }
  • Guanzhong Tian, Jun Chen, Xianfang Zeng, and Yong Liu. Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing. IEEE Signal Processing Letters, 28:344–348, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Filter pruning for a pre-trained convolutional neural network is most normally performed through human-made constraints or criteria such as norms, ranks, etc. Typically, the pruning pipeline comprises two-stage: first learn a sparse structure from the original model, then optimize the weights in the new prune model. One disadvantage of using human-made criteria to prune filters is that the design and selection of threshold criteria depend on complicated prior knowledge. Besides, the pruning process is less robust due to the impact of directly regularizing on filters. To address the problems mentioned, we propose an effective one-stage pruning framework: introducing a trainable collaborative layer to jointly prune and learn neural networks in one go. In our framework, we first add a binary collaborative layer for each original filter. Then, a new type of gradient estimator – asymptotic gradient estimator is first introduced to pass the gradient in the binary collaborative layer. Finally, we simultaneously learn the sparse structure and optimize the weights from the original model in the training process. Our evaluation results on typical benchmarks, CIFAR and ImageNet, demonstrate very promising results against other state-of-the-art filter pruning methods.
    @article{tian2021pbt,
    title = {Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing},
    author = {Guanzhong Tian and Jun Chen and Xianfang Zeng and Yong Liu},
    year = 2021,
    journal = {IEEE Signal Processing Letters},
    volume = 28,
    pages = {344--348},
    doi = {10.1109/LSP.2021.3054315},
    abstract = {Filter pruning for a pre-trained convolutional neural network is most normally performed through human-made constraints or criteria such as norms, ranks, etc. Typically, the pruning pipeline comprises two-stage: first learn a sparse structure from the original model, then optimize the weights in the new prune model. One disadvantage of using human-made criteria to prune filters is that the design and selection of threshold criteria depend on complicated prior knowledge. Besides, the pruning process is less robust due to the impact of directly regularizing on filters. To address the problems mentioned, we propose an effective one-stage pruning framework: introducing a trainable collaborative layer to jointly prune and learn neural networks in one go. In our framework, we first add a binary collaborative layer for each original filter. Then, a new type of gradient estimator - asymptotic gradient estimator is first introduced to pass the gradient in the binary collaborative layer. Finally, we simultaneously learn the sparse structure and optimize the weights from the original model in the training process. Our evaluation results on typical benchmarks, CIFAR and ImageNet, demonstrate very promising results against other state-of-the-art filter pruning methods.}
    }
  • Jun Chen, Liang Liu, Yong Liu, and Xianfang Zeng. A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs. IEEE Transactions on Neural Networks and Learning Systems, 32:1067–1081, 2021.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    The quantized neural network (QNN) is an efficient approach for network compression and can be widely used in the implementation of field-programmable gate arrays (FPGAs). This article proposes a novel learning framework for $n$ -bit QNNs, whose weights are constrained to the power of two. To solve the gradient vanishing problem, we propose a reconstructed gradient function for QNNs in the back-propagation algorithm that can directly get the real gradient rather than estimating an approximate gradient of the expected loss. We also propose a novel QNN structure named $n$ -BQ-NN, which uses shift operation to replace the multiply operation and is more suitable for the inference on FPGAs. Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs. We also carry out comparable experiments to evaluate our framework. The experimental results show that the quantized models of ResNet, DenseNet, and AlexNet through our learning framework can achieve almost the same accuracies with the original full-precision models. Moreover, when using our learning framework to train our $n$ -BQ-NN from scratch, it can achieve state-of-the-art results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102 platform show that our $n$ -BQ-NN with our SVPE can execute 2.9 times faster than that with the vector processing element (VPE) in inference. As the SHIFT operation in our SVPE array will not consume digital signal processing (DSP) resources on FPGAs, the experiments have shown that the use of SVPE array also reduces average energy consumption to 68.7% of the VPE array with 16 bit.
    @article{chen2021alf,
    title = {A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs},
    author = {Jun Chen and Liang Liu and Yong Liu and Xianfang Zeng},
    year = 2021,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    volume = 32,
    pages = {1067--1081},
    doi = {https://doi.org/10.1109/TNNLS.2020.2980041},
    abstract = {The quantized neural network (QNN) is an efficient approach for network compression and can be widely used in the implementation of field-programmable gate arrays (FPGAs). This article proposes a novel learning framework for  $n$ -bit QNNs, whose weights are constrained to the power of two. To solve the gradient vanishing problem, we propose a reconstructed gradient function for QNNs in the back-propagation algorithm that can directly get the real gradient rather than estimating an approximate gradient of the expected loss. We also propose a novel QNN structure named  $n$ -BQ-NN, which uses shift operation to replace the multiply operation and is more suitable for the inference on FPGAs. Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs. We also carry out comparable experiments to evaluate our framework. The experimental results show that the quantized models of ResNet, DenseNet, and AlexNet through our learning framework can achieve almost the same accuracies with the original full-precision models. Moreover, when using our learning framework to train our  $n$ -BQ-NN from scratch, it can achieve state-of-the-art results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102 platform show that our  $n$ -BQ-NN with our SVPE can execute 2.9 times faster than that with the vector processing element (VPE) in inference. As the SHIFT operation in our SVPE array will not consume digital signal processing (DSP) resources on FPGAs, the experiments have shown that the use of SVPE array also reduces average energy consumption to 68.7% of the VPE array with 16 bit.},
    arxiv = {http://arxiv.org/pdf/2004.02396}
    }
  • Jun Chen, Yong Liu, Hao Zhang, Shengnan Hou, and Jian Yang. Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks. IEEE Journal of Selected Topics in Signal Processing, 14:848–859, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1–3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.
    @article{chen2020propagatingag,
    title = {Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks},
    author = {Jun Chen and Yong Liu and Hao Zhang and Shengnan Hou and Jian Yang},
    year = 2020,
    journal = {IEEE Journal of Selected Topics in Signal Processing},
    volume = 14,
    pages = {848--859},
    doi = {https://doi.org/10.1109/JSTSP.2020.2966327},
    abstract = {The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1–3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.},
    arxiv = {http://arxiv.org/pdf/2003.04296}
    }
  • Jun Chen, Jinhui Zhao, Wei Zhang, and Yong Liu. Tracking an Object over 200 FPS with the Fusion of Prior Probability and Kalman Filter. In 12th International Conference on Machine Learning and Computing (ICMLC), 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    Efficient object tracking is a challenge problem as it needs to distinguish the object by learned appearance model as quickly as possible. In this paper, a novel robust approach fusing the prediction information of Kalman filter and prior probability is proposed for tracking arbitrary objects. Firstly, we obtain an image patch based on predicted information by fusing the prior probability and Kalman filter. Secondly, the samples derived from the obtained image patch for our tracker are entered into support vector machine (SVM) to classify the object, where these samples need to be extracted features by Histogram of Oriented Gradients (HOG). Our approach has two advantages: efficient computation, and certain anti-interference ability. The samples obtained from image patch is less than that obtained from image, which makes SVM model more efficient in classification and reduces interference outside the image patch. Experimentally, we evaluate our approach on a standard tracking benchmark that includes 50 video sequences to demonstrate our tracker’s nearly state-of-the-art performance compared with 5 trackers. Furthermore, because extracting samples and classifying HOG features is computationally very cheap, our tracker is much faster than these mentioned trackers. It achieves over 200 fps on the Intel i3 CPU for tracking an arbitrary object on benchmark.
    @inproceedings{chen2020trackingao,
    title = {Tracking an Object over 200 FPS with the Fusion of Prior Probability and Kalman Filter},
    author = {Jun Chen and Jinhui Zhao and Wei Zhang and Yong Liu},
    year = 2020,
    booktitle = {12th International Conference on Machine Learning and Computing (ICMLC)},
    doi = {https://doi.org/10.1145/3383972.3384011},
    abstract = {Efficient object tracking is a challenge problem as it needs to distinguish the object by learned appearance model as quickly as possible. In this paper, a novel robust approach fusing the prediction information of Kalman filter and prior probability is proposed for tracking arbitrary objects. Firstly, we obtain an image patch based on predicted information by fusing the prior probability and Kalman filter. Secondly, the samples derived from the obtained image patch for our tracker are entered into support vector machine (SVM) to classify the object, where these samples need to be extracted features by Histogram of Oriented Gradients (HOG). Our approach has two advantages: efficient computation, and certain anti-interference ability. The samples obtained from image patch is less than that obtained from image, which makes SVM model more efficient in classification and reduces interference outside the image patch. Experimentally, we evaluate our approach on a standard tracking benchmark that includes 50 video sequences to demonstrate our tracker's nearly state-of-the-art performance compared with 5 trackers. Furthermore, because extracting samples and classifying HOG features is computationally very cheap, our tracker is much faster than these mentioned trackers. It achieves over 200 fps on the Intel i3 CPU for tracking an arbitrary object on benchmark.}
    }