Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: penglinpeng@zju.edu.cn

Linpeng Peng

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My research lies at computer vision and robotics. My goal is to develop intelligent algorithms and systems for robots to finish complex tasks and help people.

Research and Interests

  • Computer Vision
  • Robotic Manipulation
  • Machine Learning

Publications

  • Linpeng Peng, Rongyao Cai, Jingyang Xiang, Junyu Zhu, Weiwei Liu, Wang Gao, and Yong Liu. LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation. IEEE Robotics and Automation Letters, 9:7995-8002, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net’s original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.
    @article{peng2024lal,
    title = {LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation},
    author = {Linpeng Peng and Rongyao Cai and Jingyang Xiang and Junyu Zhu and Weiwei Liu and Wang Gao and Yong Liu},
    year = 2024,
    journal = {IEEE Robotics and Automation Letters},
    volume = 9,
    pages = {7995-8002},
    doi = {10.1109/LRA.2024.3436336},
    abstract = {Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net's original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.}
    }
  • Rongyao Cai, Wang Gao, Linpeng Peng, Zhengming Lu, Kexin Zhang, and Yong Liu. Debiased Contrastive Learning With Supervision Guidance for Industrial Fault Detection. IEEE Transactions on Industrial Informatics, 2024.
    [BibTeX] [Abstract] [DOI]
    The time series self-supervised contrastive learning framework has succeeded significantly in industrial fault detection scenarios. It typically consists of pretraining on abundant unlabeled data and fine-tuning on limited annotated data. However, the two-phase framework faces three challenges: Sampling bias, task-agnostic representation issue, and angular-centricity issue. These challenges hinder further development in industrial applications. This article introduces a debiased contrastive learning with supervision guidance (DCLSG) framework and applies it to industrial fault detection tasks. First, DCLSG employs channel augmentation to integrate temporal and frequency domain information. Pseudolabels based on momentum clustering operation are assigned to extracted representations, thereby mitigating the sampling bias raised by the selection of positive pairs. Second, the generated supervisory signal guides the pretraining phase, tackling the task-agnostic representation issue. Third, the angular-centricity issue is addressed using the proposed Gaussian distance metric measuring the radial distribution of representations. The experiments conducted on three industrial datasets (ISDB, CWRU, and practical datasets) validate the superior performance of the DCLSG compared to other fault detection methods.
    @article{cai2024dcl,
    title = {Debiased Contrastive Learning With Supervision Guidance for Industrial Fault Detection},
    author = {Rongyao Cai and Wang Gao and Linpeng Peng and Zhengming Lu and Kexin Zhang and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Industrial Informatics},
    doi = {10.1109/TII.2024.3424561},
    abstract = {The time series self-supervised contrastive learning framework has succeeded significantly in industrial fault detection scenarios. It typically consists of pretraining on abundant unlabeled data and fine-tuning on limited annotated data. However, the two-phase framework faces three challenges: Sampling bias, task-agnostic representation issue, and angular-centricity issue. These challenges hinder further development in industrial applications. This article introduces a debiased contrastive learning with supervision guidance (DCLSG) framework and applies it to industrial fault detection tasks. First, DCLSG employs channel augmentation to integrate temporal and frequency domain information. Pseudolabels based on momentum clustering operation are assigned to extracted representations, thereby mitigating the sampling bias raised by the selection of positive pairs. Second, the generated supervisory signal guides the pretraining phase, tackling the task-agnostic representation issue. Third, the angular-centricity issue is addressed using the proposed Gaussian distance metric measuring the radial distribution of representations. The experiments conducted on three industrial datasets (ISDB, CWRU, and practical datasets) validate the superior performance of the DCLSG compared to other fault detection methods.}
    }
  • Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, and Yong Liu. MaxQ: Multi-Axis Query for N:m Sparsity Network. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15845-15854, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.
    @inproceedings{xiang2024maxq,
    title = {MaxQ: Multi-Axis Query for N:m Sparsity Network},
    author = {Jingyang Xiang and Siqi Li and Junhao Chen and Zhuangzhi Chen and Tianxin Huang and Linpeng Peng and Yong Liu},
    year = 2024,
    booktitle = {2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    pages = {15845-15854},
    doi = {10.1109/CVPR52733.2024.01500},
    abstract = {N:m sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. How-ever, existing N:m sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they di-rectly apply N:m sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:m masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a spar-sity strategy that gradually increases the percentage of N:m weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:m soft masks can be precom-puted as constants and folded into weights without causing any distortion to the sparse pattern and incurring ad-ditional computational overhead. Comprehensive experi-ments demonstrate that MaxQ achieves consistent improve-ments across diverse CNN architectures in various com-puter vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6% top-1 ac-curacy on ImageNet and improve by over 2.8% over the state-of-the-art. Codes and checkpoints are available at https://github.com/JingyangXiang/MaxQ.}
    }
  • Chencan Fu, Lin Li, Jianbiao Mei, Yukai Ma, Linpeng Peng, Xiangrui Zhao, and Yong Liu. A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 8493-8499, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird’s Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.
    @inproceedings{fu2024ctf,
    title = {A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation},
    author = {Chencan Fu and Lin Li and Jianbiao Mei and Yukai Ma and Linpeng Peng and Xiangrui Zhao and Yong Liu},
    year = 2024,
    booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)},
    pages = {8493-8499},
    doi = {10.1109/ICRA57147.2024.10611569},
    abstract = {Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird's Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.}
    }
  • Rongyao Cai, Linpeng Peng, Zhengming Lu, Kexin Zhang, and Yong Liu. DCS: Debiased Contrastive Learning with Weak Supervision for Time Series Classification. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5625-5629, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Self-supervised contrastive learning (SSCL) has performed excellently on time series classification tasks. Most SSCL-based classification algorithms generate positive and negative samples in the time or frequency domains, focusing on mining similarities between them. However, two issues are not well addressed in the SSCL framework: the sampling bias and the task-agnostic representation problems. Sampling bias indicates fake negative sample selection in SSCL, and task-agnostic representation results in the unknown correlation between the extracted feature and downstream tasks. To address the issues, we propose Debiased Contrastive learning with weak Supervision framework, abbreviated as DCS. It employs the clustering operation to remove fake negative samples and introduces weak supervisory signals into the SSCL framework to guide feature extraction. Additionally, we propose a channel augmentation method that allows the DCS to extract features from local and global perspectives simultaneously. The comprehensive experiments on the widely used datasets show that DCS achieves performance superior to state-of-the-art methods on the widely used popular benchmark datasets.
    @inproceedings{cai2024dcs,
    title = {DCS: Debiased Contrastive Learning with Weak Supervision for Time Series Classification},
    author = {Rongyao Cai and Linpeng Peng and Zhengming Lu and Kexin Zhang and Yong Liu},
    year = 2024,
    booktitle = {2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
    pages = {5625-5629},
    doi = {10.1109/ICASSP48485.2024.10446381},
    abstract = {Self-supervised contrastive learning (SSCL) has performed excellently on time series classification tasks. Most SSCL-based classification algorithms generate positive and negative samples in the time or frequency domains, focusing on mining similarities between them. However, two issues are not well addressed in the SSCL framework: the sampling bias and the task-agnostic representation problems. Sampling bias indicates fake negative sample selection in SSCL, and task-agnostic representation results in the unknown correlation between the extracted feature and downstream tasks. To address the issues, we propose Debiased Contrastive learning with weak Supervision framework, abbreviated as DCS. It employs the clustering operation to remove fake negative samples and introduces weak supervisory signals into the SSCL framework to guide feature extraction. Additionally, we propose a channel augmentation method that allows the DCS to extract features from local and global perspectives simultaneously. The comprehensive experiments on the widely used datasets show that DCS achieves performance superior to state-of-the-art methods on the widely used popular benchmark datasets.}
    }
  • Weiwei Liu, Linpeng Peng, Licheng Wen, Jian Yang, and Yong Liu. Decomposing Shared Networks for Separate Cooperation with Multi-agent Reinforcement Learning. Information Sciences, 641:119085, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Sharing network parameters between agents is an essential and typical operation to improve the scalability of multi-agent reinforcement learning algorithms. However, agents with different tasks sharing the same network parameters are not conducive to distinguishing the agents’ skills. In addition, the importance of communication between agents undertaking the same task is much higher than that with external agents. Therefore, we propose Dual Cooperation Networks (DCN). In order to distinguish whether agents undertake the same task, all agents are grouped according to their status through the graph neural network instead of the traditional proximity. The agent communicates within the group to achieve strong cooperation. After that, the global value function is decomposed by groups to facilitate cooperation between groups. Finally, we have verified it in simulation and physical hardware, and the algorithm has achieved excellent performance.
    @article{liu2023dsn,
    title = {Decomposing Shared Networks for Separate Cooperation with Multi-agent Reinforcement Learning},
    author = {Weiwei Liu and Linpeng Peng and Licheng Wen and Jian Yang and Yong Liu},
    year = 2023,
    journal = {Information Sciences},
    volume = 641,
    pages = {119085},
    doi = {10.1016/j.ins.2023.119085},
    abstract = {Sharing network parameters between agents is an essential and typical operation to improve the scalability of multi-agent reinforcement learning algorithms. However, agents with different tasks sharing the same network parameters are not conducive to distinguishing the agents' skills. In addition, the importance of communication between agents undertaking the same task is much higher than that with external agents. Therefore, we propose Dual Cooperation Networks (DCN). In order to distinguish whether agents undertake the same task, all agents are grouped according to their status through the graph neural network instead of the traditional proximity. The agent communicates within the group to achieve strong cooperation. After that, the global value function is decomposed by groups to facilitate cooperation between groups. Finally, we have verified it in simulation and physical hardware, and the algorithm has achieved excellent performance.}
    }
  • Weiwei Liu, Linpeng Peng, Junjie Cao, Xiaokuan Fu, Yong Liu, and Zaisheng Pan. Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping. IEEE Access, 9:19916–19925, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson’s sampling idea, the input action is evaluated from different angles, which increases the algorithm’s exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators’ success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.
    @article{liu2021ensemblebd,
    title = {Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping},
    author = {Weiwei Liu and Linpeng Peng and Junjie Cao and Xiaokuan Fu and Yong Liu and Zaisheng Pan},
    year = 2021,
    journal = {IEEE Access},
    volume = 9,
    pages = {19916--19925},
    doi = {10.1109/ACCESS.2021.3049860},
    abstract = {With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.}
    }