Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: junc@zju.edu.cn

Jun Chen

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include network quantization.

Research and Interests

  • Deep Learning
  • network compression
  • network quantization

Publications

  • Bofeng Jiang, Jun Chen, and Yong Liu. Single-Shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration. Engineering Applications of Artificial Intelligence, 126, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Applying CNN on embedded systems is challenging due to model size limitations. Pruning and quantization can help, but are time-consuming to apply separately. Our Single-Shot Pruning and Quantization strategy addresses these issues by quantizing and pruning in a single process. We evaluated our method on CIFAR-10 and CIFAR-100 datasets for image classification. Our model is 69.4% smaller with little accuracy loss, and runs 6-8 times faster on NVIDIA Xavier NX hardware.
    @article{jiang2023ssp,
    title = {Single-Shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration},
    author = {Bofeng Jiang and Jun Chen and Yong Liu},
    year = 2023,
    journal = {Engineering Applications of Artificial Intelligence},
    volume = 126,
    doi = {10.1016/j.engappai.2023.106816},
    abstract = {Applying CNN on embedded systems is challenging due to model size limitations. Pruning and quantization can help, but are time-consuming to apply separately. Our Single-Shot Pruning and Quantization strategy addresses these issues by quantizing and pruning in a single process. We evaluated our method on CIFAR-10 and CIFAR-100 datasets for image classification. Our model is 69.4% smaller with little accuracy loss, and runs 6-8 times faster on NVIDIA Xavier NX hardware.}
    }
  • Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, and Yong Liu. Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning. Pattern Recognition, 143:109780, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
    @article{chen2023dfq,
    title = {Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning},
    author = {Jun Chen and Shipeng Bai and Tianxin Huang and Mengmeng Wang and Guanzhong Tian and Yong Liu},
    year = 2023,
    journal = {Pattern Recognition},
    volume = 143,
    pages = {109780},
    doi = {10.1016/j.patcog.2023.109780},
    abstract = {Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.}
    }
  • Shanqi Liu, Weiwei Liu, Wenzhou Chen, Guanzhong Tian, Jun Chen, Yao Tong, Junjie Cao, and Yong Liu. Learning Multi-Agent Cooperation via Considering Actions of Teammates. IEEE Transactions on Neural Networks and Learning Systems, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ‘ s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ‘ actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.
    @article{liu2023lma,
    title = {Learning Multi-Agent Cooperation via Considering Actions of Teammates},
    author = {Shanqi Liu and Weiwei Liu and Wenzhou Chen and Guanzhong Tian and Jun Chen and Yao Tong and Junjie Cao and Yong Liu},
    year = 2023,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    doi = {10.1109/TNNLS.2023.3262921},
    abstract = {Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action Q values to be a monotonic mixing of each agent ' s utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel Q values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents ' actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.}
    }
  • Yuang Liu, Jun Chen, and Yong Liu. DCCD: Reducing Neural Network Redundancy via Distillation. IEEE Transactions on Neural Networks and Learning Systems, 2023.
    [BibTeX] [Abstract] [DOI]
    Deep neural models have achieved remarkable performance on various supervised and unsupervised learning tasks, but it is a challenge to deploy these large-size networks on resource-limited devices. As a representative type of model compression and acceleration methods, knowledge distillation (KD) solves this problem by transferring knowledge from heavy teachers to lightweight students. However, most distillation methods focus on imitating the responses of teacher networks but ignore the information redundancy of student networks. In this article, we propose a novel distillation framework difference-based channel contrastive distillation (DCCD), which introduces channel contrastive knowledge and dynamic difference knowledge into student networks for redundancy reduction. At the feature level, we construct an efficient contrastive objective that broadens student networks’ feature expression space and preserves richer information in the feature extraction stage. At the final output level, more detailed knowledge is extracted from teacher networks by making a difference between multiview augmented responses of the same instance. We enhance student networks to be more sensitive to minor dynamic changes. With the improvement of two aspects of DCCD, the student network gains contrastive and difference knowledge and reduces its overfitting and redundancy. Finally, we achieve surprising results that the student approaches and even outperforms the teacher in test accuracy on CIFAR-100. We reduce the top-1 error to 28.16% on ImageNet classification and 24.15% for cross-model transfer with ResNet-18. Empirical experiments and ablation studies on popular datasets show that our proposed method can achieve state-of-the-art accuracy compared with other distillation methods.
    @article{liu2023dccd,
    title = {DCCD: Reducing Neural Network Redundancy via Distillation},
    author = {Yuang Liu and Jun Chen and Yong Liu},
    year = 2023,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    doi = {10.1109/TNNLS.2023.3238337},
    abstract = {Deep neural models have achieved remarkable performance on various supervised and unsupervised learning tasks, but it is a challenge to deploy these large-size networks on resource-limited devices. As a representative type of model compression and acceleration methods, knowledge distillation (KD) solves this problem by transferring knowledge from heavy teachers to lightweight students. However, most distillation methods focus on imitating the responses of teacher networks but ignore the information redundancy of student networks. In this article, we propose a novel distillation framework difference-based channel contrastive distillation (DCCD), which introduces channel contrastive knowledge and dynamic difference knowledge into student networks for redundancy reduction. At the feature level, we construct an efficient contrastive objective that broadens student networks' feature expression space and preserves richer information in the feature extraction stage. At the final output level, more detailed knowledge is extracted from teacher networks by making a difference between multiview augmented responses of the same instance. We enhance student networks to be more sensitive to minor dynamic changes. With the improvement of two aspects of DCCD, the student network gains contrastive and difference knowledge and reduces its overfitting and redundancy. Finally, we achieve surprising results that the student approaches and even outperforms the teacher in test accuracy on CIFAR-100. We reduce the top-1 error to 28.16% on ImageNet classification and 24.15% for cross-model transfer with ResNet-18. Empirical experiments and ablation studies on popular datasets show that our proposed method can achieve state-of-the-art accuracy compared with other distillation methods.}
    }
  • Mengmeng Wang, Jiazheng Xing, Jing Su, Jun Chen, and Yong Liu. Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:3347-3362, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recent methods for action recognition always apply 3D Convolutional Neural Networks (CNNs) to extract spatiotemporal features and introduce optical flows to present motion features. Although achieving state-of-the-art performance, they are expensive in both time and space. In this paper, we propose to represent both the two kinds of features in a unified 2D CNN without any 3D convolution or optical flows calculation. In particular, we first design a channel-wise spatiotemporal module to present the spatiotemporal features and a channel-wise motion module to encode feature-level motion features efficiently. Secondly, we combine these two modules and an identity mapping path into one united block that can easily replaces the original residual block in the ResNet architecture, forming a simple yet effective network termed STM network by introducing very limited extra computation cost and parameters. Thirdly, we propose a novel Twins Training framework for action recognition by incorporating a correlation loss to optimize the inter-class and intra-class correlation and a siamese structure to fully stretch the training data. We extensively validate the proposed STM on both temporal-related datasets (i.e., Something-Something v1 & v2) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51). It achieves favorable results against state-of-the-art methods in all the datasets.
    @article{wang2022lsm,
    title = {Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition},
    author = {Mengmeng Wang and Jiazheng Xing and Jing Su and Jun Chen and Yong Liu},
    year = 2022,
    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
    volume = 45,
    pages = {3347-3362},
    doi = {10.1109/TPAMI.2022.3173658},
    abstract = {Recent methods for action recognition always apply 3D Convolutional Neural Networks (CNNs) to extract spatiotemporal features and introduce optical flows to present motion features. Although achieving state-of-the-art performance, they are expensive in both time and space. In this paper, we propose to represent both the two kinds of features in a unified 2D CNN without any 3D convolution or optical flows calculation. In particular, we first design a channel-wise spatiotemporal module to present the spatiotemporal features and a channel-wise motion module to encode feature-level motion features efficiently. Secondly, we combine these two modules and an identity mapping path into one united block that can easily replaces the original residual block in the ResNet architecture, forming a simple yet effective network termed STM network by introducing very limited extra computation cost and parameters. Thirdly, we propose a novel Twins Training framework for action recognition by incorporating a correlation loss to optimize the inter-class and intra-class correlation and a siamese structure to fully stretch the training data. We extensively validate the proposed STM on both temporal-related datasets (i.e., Something-Something v1 \& v2) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51). It achieves favorable results against state-of-the-art methods in all the datasets.}
    }
  • Tianxin Huang, Jun Chen, Jiangning Zhang, Yong Liu, and Jie Liang. Fast Point Cloud Sampling Network. Pattern Recognition Letters, 2022.
    [BibTeX] [Abstract] [DOI]
    The increasing number of points in 3D point clouds has brought great challenges for subsequent algorithm efficiencies. Down-sampling algorithms are adopted to simplify the data and accelerate the computation. Except the well-known random sampling and farthest distance sampling, some recent works have tried to learn a sampling pattern according to the downstream task, which helps generate sampled points by fully-connected networks with fixed output point numbers. In this condition, a progress-net structure covering all resolutions sampling networks or multiple separate sampling networks for different resolutions are required, which is inconvenient. In this work, we propose a novel learning-based point cloud sampling framework, named Fast point cloud sampling network (FPN), which drives initial randomly sampled points to better positions instead of generating coordinates. FPN can be used to sample points clouds to any resolution once trained by changing the number of initial randomly sampled points. Results on point cloud reconstruction and recognition confirm that FPN can reach state-of-the-art performances with much higher sampling efficiency than most existing sampling methods.
    @article{huang2022fast,
    title = {Fast Point Cloud Sampling Network},
    author = {Tianxin Huang and Jun Chen and Jiangning Zhang and Yong Liu and Jie Liang},
    year = 2022,
    journal = {Pattern Recognition Letters},
    doi = {10.1016/j.patrec.2022.11.006},
    abstract = {The increasing number of points in 3D point clouds has brought great challenges for subsequent algorithm efficiencies. Down-sampling algorithms are adopted to simplify the data and accelerate the computation. Except the well-known random sampling and farthest distance sampling, some recent works have tried to learn a sampling pattern according to the downstream task, which helps generate sampled points by fully-connected networks with fixed output point numbers. In this condition, a progress-net structure covering all resolutions sampling networks or multiple separate sampling networks for different resolutions are required, which is inconvenient. In this work, we propose a novel learning-based point cloud sampling framework, named Fast point cloud sampling network (FPN), which drives initial randomly sampled points to better positions instead of generating coordinates. FPN can be used to sample points clouds to any resolution once trained by changing the number of initial randomly sampled points. Results on point cloud reconstruction and recognition confirm that FPN can reach state-of-the-art performances with much higher sampling efficiency than most existing sampling methods.}
    }
  • Guanzhong Tian, Yiran Sun, Yuang Liu, Xianfang Zeng, Mengmeng Wang, Yong Liu, Jiangning Zhang, and Jun Chen. Adding before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention. IEEE Transactions on Neural Networks and Learning Systems, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural networks (DNNs) and accelerates the inference process dramatically. Existing methods mainly rely on manual constraints such as normalization to select the filters. A typical pipeline comprises two stages: first pruning the original neural network and then fine-tuning the pruned model. However, choosing a manual criterion can be somehow tricky and stochastic. Moreover, directly regularizing and modifying filters in the pipeline suffer from being sensitive to the choice of hyperparameters, thus making the pruning procedure less robust. To address these challenges, we propose to handle the filter pruning issue through one stage: using an attention-based architecture that adaptively fuses the filter selection with filter learning in a unified network. Specifically, we present a pruning method named adding before pruning (ABP) to make the model focus on the filters of higher significance by training instead of man-made criteria such as norm, rank, etc. First, we add an auxiliary attention layer into the original model and set the significance scores in this layer to be binary. Furthermore, to propagate the gradients in the auxiliary attention layer, we design a specific gradient estimator and prove its effectiveness for convergence in the graph flow through mathematical derivation. In the end, to relieve the dependence on the complicated prior knowledge for designing the thresholding criterion, we simultaneously prune and train the filters to automatically eliminate network redundancy with recoverability. Extensive experimental results on the two typical image classification benchmarks, CIFAR-10 and ILSVRC-2012, illustrate that the proposed approach performs favorably against previous state-of-the-art filter pruning algorithms.
    @article{tian2021abp,
    title = {Adding before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention},
    author = {Guanzhong Tian and Yiran Sun and Yuang Liu and Xianfang Zeng and Mengmeng Wang and Yong Liu and Jiangning Zhang and Jun Chen},
    year = 2021,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    doi = {10.1109/TNNLS.2021.3106917},
    abstract = {Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural networks (DNNs) and accelerates the inference process dramatically. Existing methods mainly rely on manual constraints such as normalization to select the filters. A typical pipeline comprises two stages: first pruning the original neural network and then fine-tuning the pruned model. However, choosing a manual criterion can be somehow tricky and stochastic. Moreover, directly regularizing and modifying filters in the pipeline suffer from being sensitive to the choice of hyperparameters, thus making the pruning procedure less robust. To address these challenges, we propose to handle the filter pruning issue through one stage: using an attention-based architecture that adaptively fuses the filter selection with filter learning in a unified network. Specifically, we present a pruning method named adding before pruning (ABP) to make the model focus on the filters of higher significance by training instead of man-made criteria such as norm, rank, etc. First, we add an auxiliary attention layer into the original model and set the significance scores in this layer to be binary. Furthermore, to propagate the gradients in the auxiliary attention layer, we design a specific gradient estimator and prove its effectiveness for convergence in the graph flow through mathematical derivation. In the end, to relieve the dependence on the complicated prior knowledge for designing the thresholding criterion, we simultaneously prune and train the filters to automatically eliminate network redundancy with recoverability. Extensive experimental results on the two typical image classification benchmarks, CIFAR-10 and ILSVRC-2012, illustrate that the proposed approach performs favorably against previous state-of-the-art filter pruning algorithms.}
    }
  • Guanzhong Tian, Jun Chen, Xianfang Zeng, and Yong Liu. Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing. IEEE Signal Processing Letters, 28:344–348, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Filter pruning for a pre-trained convolutional neural network is most normally performed through human-made constraints or criteria such as norms, ranks, etc. Typically, the pruning pipeline comprises two-stage: first learn a sparse structure from the original model, then optimize the weights in the new prune model. One disadvantage of using human-made criteria to prune filters is that the design and selection of threshold criteria depend on complicated prior knowledge. Besides, the pruning process is less robust due to the impact of directly regularizing on filters. To address the problems mentioned, we propose an effective one-stage pruning framework: introducing a trainable collaborative layer to jointly prune and learn neural networks in one go. In our framework, we first add a binary collaborative layer for each original filter. Then, a new type of gradient estimator – asymptotic gradient estimator is first introduced to pass the gradient in the binary collaborative layer. Finally, we simultaneously learn the sparse structure and optimize the weights from the original model in the training process. Our evaluation results on typical benchmarks, CIFAR and ImageNet, demonstrate very promising results against other state-of-the-art filter pruning methods.
    @article{tian2021pbt,
    title = {Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing},
    author = {Guanzhong Tian and Jun Chen and Xianfang Zeng and Yong Liu},
    year = 2021,
    journal = {IEEE Signal Processing Letters},
    volume = 28,
    pages = {344--348},
    doi = {10.1109/LSP.2021.3054315},
    abstract = {Filter pruning for a pre-trained convolutional neural network is most normally performed through human-made constraints or criteria such as norms, ranks, etc. Typically, the pruning pipeline comprises two-stage: first learn a sparse structure from the original model, then optimize the weights in the new prune model. One disadvantage of using human-made criteria to prune filters is that the design and selection of threshold criteria depend on complicated prior knowledge. Besides, the pruning process is less robust due to the impact of directly regularizing on filters. To address the problems mentioned, we propose an effective one-stage pruning framework: introducing a trainable collaborative layer to jointly prune and learn neural networks in one go. In our framework, we first add a binary collaborative layer for each original filter. Then, a new type of gradient estimator - asymptotic gradient estimator is first introduced to pass the gradient in the binary collaborative layer. Finally, we simultaneously learn the sparse structure and optimize the weights from the original model in the training process. Our evaluation results on typical benchmarks, CIFAR and ImageNet, demonstrate very promising results against other state-of-the-art filter pruning methods.}
    }
  • Jun Chen, Liang Liu, Yong Liu, and Xianfang Zeng. A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs. IEEE Transactions on Neural Networks and Learning Systems, 32:1067–1081, 2021.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    The quantized neural network (QNN) is an efficient approach for network compression and can be widely used in the implementation of field-programmable gate arrays (FPGAs). This article proposes a novel learning framework for $n$ -bit QNNs, whose weights are constrained to the power of two. To solve the gradient vanishing problem, we propose a reconstructed gradient function for QNNs in the back-propagation algorithm that can directly get the real gradient rather than estimating an approximate gradient of the expected loss. We also propose a novel QNN structure named $n$ -BQ-NN, which uses shift operation to replace the multiply operation and is more suitable for the inference on FPGAs. Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs. We also carry out comparable experiments to evaluate our framework. The experimental results show that the quantized models of ResNet, DenseNet, and AlexNet through our learning framework can achieve almost the same accuracies with the original full-precision models. Moreover, when using our learning framework to train our $n$ -BQ-NN from scratch, it can achieve state-of-the-art results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102 platform show that our $n$ -BQ-NN with our SVPE can execute 2.9 times faster than that with the vector processing element (VPE) in inference. As the SHIFT operation in our SVPE array will not consume digital signal processing (DSP) resources on FPGAs, the experiments have shown that the use of SVPE array also reduces average energy consumption to 68.7% of the VPE array with 16 bit.
    @article{chen2021alf,
    title = {A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs},
    author = {Jun Chen and Liang Liu and Yong Liu and Xianfang Zeng},
    year = 2021,
    journal = {IEEE Transactions on Neural Networks and Learning Systems},
    volume = 32,
    pages = {1067--1081},
    doi = {https://doi.org/10.1109/TNNLS.2020.2980041},
    abstract = {The quantized neural network (QNN) is an efficient approach for network compression and can be widely used in the implementation of field-programmable gate arrays (FPGAs). This article proposes a novel learning framework for  $n$ -bit QNNs, whose weights are constrained to the power of two. To solve the gradient vanishing problem, we propose a reconstructed gradient function for QNNs in the back-propagation algorithm that can directly get the real gradient rather than estimating an approximate gradient of the expected loss. We also propose a novel QNN structure named  $n$ -BQ-NN, which uses shift operation to replace the multiply operation and is more suitable for the inference on FPGAs. Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs. We also carry out comparable experiments to evaluate our framework. The experimental results show that the quantized models of ResNet, DenseNet, and AlexNet through our learning framework can achieve almost the same accuracies with the original full-precision models. Moreover, when using our learning framework to train our  $n$ -BQ-NN from scratch, it can achieve state-of-the-art results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102 platform show that our  $n$ -BQ-NN with our SVPE can execute 2.9 times faster than that with the vector processing element (VPE) in inference. As the SHIFT operation in our SVPE array will not consume digital signal processing (DSP) resources on FPGAs, the experiments have shown that the use of SVPE array also reduces average energy consumption to 68.7% of the VPE array with 16 bit.},
    arxiv = {http://arxiv.org/pdf/2004.02396}
    }
  • Jun Chen, Yong Liu, Hao Zhang, Shengnan Hou, and Jian Yang. Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks. IEEE Journal of Selected Topics in Signal Processing, 14:848–859, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1–3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.
    @article{chen2020propagatingag,
    title = {Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks},
    author = {Jun Chen and Yong Liu and Hao Zhang and Shengnan Hou and Jian Yang},
    year = 2020,
    journal = {IEEE Journal of Selected Topics in Signal Processing},
    volume = 14,
    pages = {848--859},
    doi = {https://doi.org/10.1109/JSTSP.2020.2966327},
    abstract = {The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1–3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.},
    arxiv = {http://arxiv.org/pdf/2003.04296}
    }
  • Jun Chen, Jinhui Zhao, Wei Zhang, and Yong Liu. Tracking an Object over 200 FPS with the Fusion of Prior Probability and Kalman Filter. In 12th International Conference on Machine Learning and Computing (ICMLC), 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    Efficient object tracking is a challenge problem as it needs to distinguish the object by learned appearance model as quickly as possible. In this paper, a novel robust approach fusing the prediction information of Kalman filter and prior probability is proposed for tracking arbitrary objects. Firstly, we obtain an image patch based on predicted information by fusing the prior probability and Kalman filter. Secondly, the samples derived from the obtained image patch for our tracker are entered into support vector machine (SVM) to classify the object, where these samples need to be extracted features by Histogram of Oriented Gradients (HOG). Our approach has two advantages: efficient computation, and certain anti-interference ability. The samples obtained from image patch is less than that obtained from image, which makes SVM model more efficient in classification and reduces interference outside the image patch. Experimentally, we evaluate our approach on a standard tracking benchmark that includes 50 video sequences to demonstrate our tracker’s nearly state-of-the-art performance compared with 5 trackers. Furthermore, because extracting samples and classifying HOG features is computationally very cheap, our tracker is much faster than these mentioned trackers. It achieves over 200 fps on the Intel i3 CPU for tracking an arbitrary object on benchmark.
    @inproceedings{chen2020trackingao,
    title = {Tracking an Object over 200 FPS with the Fusion of Prior Probability and Kalman Filter},
    author = {Jun Chen and Jinhui Zhao and Wei Zhang and Yong Liu},
    year = 2020,
    booktitle = {12th International Conference on Machine Learning and Computing (ICMLC)},
    doi = {https://doi.org/10.1145/3383972.3384011},
    abstract = {Efficient object tracking is a challenge problem as it needs to distinguish the object by learned appearance model as quickly as possible. In this paper, a novel robust approach fusing the prediction information of Kalman filter and prior probability is proposed for tracking arbitrary objects. Firstly, we obtain an image patch based on predicted information by fusing the prior probability and Kalman filter. Secondly, the samples derived from the obtained image patch for our tracker are entered into support vector machine (SVM) to classify the object, where these samples need to be extracted features by Histogram of Oriented Gradients (HOG). Our approach has two advantages: efficient computation, and certain anti-interference ability. The samples obtained from image patch is less than that obtained from image, which makes SVM model more efficient in classification and reduces interference outside the image patch. Experimentally, we evaluate our approach on a standard tracking benchmark that includes 50 video sequences to demonstrate our tracker's nearly state-of-the-art performance compared with 5 trackers. Furthermore, because extracting samples and classifying HOG features is computationally very cheap, our tracker is much faster than these mentioned trackers. It achieves over 200 fps on the Intel i3 CPU for tracking an arbitrary object on benchmark.}
    }