Rongyao Cai
MS Student
Institute of Cyber-Systems and Control, Zhejiang University, China
Biography
I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include process control, data mining, and deep learning.
Research and Interests
- Data Mining
- Process Control
Publications
- Linpeng Peng, Rongyao Cai, Jingyang Xiang, Junyu Zhu, Weiwei Liu, Wang Gao, and Yong Liu. LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation. IEEE Robotics and Automation Letters, 9:7995-8002, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net’s original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.
@article{peng2024lal, title = {LiteGrasp: A Light Robotic Grasp Detection via Semi-Supervised Knowledge Distillation}, author = {Linpeng Peng and Rongyao Cai and Jingyang Xiang and Junyu Zhu and Weiwei Liu and Wang Gao and Yong Liu}, year = 2024, journal = {IEEE Robotics and Automation Letters}, volume = 9, pages = {7995-8002}, doi = {10.1109/LRA.2024.3436336}, abstract = {Grasping detection from single images in robotic applications poses a significant challenge. While contemporary deep learning techniques excel, their success often hinges on large annotated datasets and intricate network architectures. In this letter, we present LiteGrasp, a novel semi-supervised lightweight framework purpose-built for grasp detection, eliminating the necessity for exhaustive supervision and intricate networks. Our approach uses a limited amount of labeled data via a knowledge distillation method, introducing HRGrasp-Net, a model with high efficiency for extracting features and largely based on HRNet. We incorporate pseudo-label filtering within a mutual learning model set within a teacher-student paradigm. This enhances the transference of data from images with labels to those without. Additionally, we introduce the streamlined Lite HRGrasp-Net, acting as the student network which gains further distillation knowledge using a multi-level fusion cascade originating from HRGrasp-Net. Impressively, LiteGrasp thrives with just a fraction (4.3%) of HRGrasp-Net's original model size, and with limited labeled data relative to total data (25% ratio) across all benchmarks, regularly outperforming solely supervised and semi-supervised models. Taking just 6 ms for execution, LiteGrasp showcases exceptional accuracy (99.99% and 97.21% on Cornell and Jacquard data sets respectively), as well as an impressive 95.3% rate of success in grasping when deployed using a 6DoF UR5e robotic arm. These highlights underscore the effectiveness and efficiency of LiteGrasp for grasp detection, even under resource-limited conditions.} }
- Rongyao Cai, Wang Gao, Linpeng Peng, Zhengming Lu, Kexin Zhang, and Yong Liu. Debiased Contrastive Learning With Supervision Guidance for Industrial Fault Detection. IEEE Transactions on Industrial Informatics, 2024.
[BibTeX] [Abstract] [DOI]The time series self-supervised contrastive learning framework has succeeded significantly in industrial fault detection scenarios. It typically consists of pretraining on abundant unlabeled data and fine-tuning on limited annotated data. However, the two-phase framework faces three challenges: Sampling bias, task-agnostic representation issue, and angular-centricity issue. These challenges hinder further development in industrial applications. This article introduces a debiased contrastive learning with supervision guidance (DCLSG) framework and applies it to industrial fault detection tasks. First, DCLSG employs channel augmentation to integrate temporal and frequency domain information. Pseudolabels based on momentum clustering operation are assigned to extracted representations, thereby mitigating the sampling bias raised by the selection of positive pairs. Second, the generated supervisory signal guides the pretraining phase, tackling the task-agnostic representation issue. Third, the angular-centricity issue is addressed using the proposed Gaussian distance metric measuring the radial distribution of representations. The experiments conducted on three industrial datasets (ISDB, CWRU, and practical datasets) validate the superior performance of the DCLSG compared to other fault detection methods.
@article{cai2024dcl, title = {Debiased Contrastive Learning With Supervision Guidance for Industrial Fault Detection}, author = {Rongyao Cai and Wang Gao and Linpeng Peng and Zhengming Lu and Kexin Zhang and Yong Liu}, year = 2024, journal = {IEEE Transactions on Industrial Informatics}, doi = {10.1109/TII.2024.3424561}, abstract = {The time series self-supervised contrastive learning framework has succeeded significantly in industrial fault detection scenarios. It typically consists of pretraining on abundant unlabeled data and fine-tuning on limited annotated data. However, the two-phase framework faces three challenges: Sampling bias, task-agnostic representation issue, and angular-centricity issue. These challenges hinder further development in industrial applications. This article introduces a debiased contrastive learning with supervision guidance (DCLSG) framework and applies it to industrial fault detection tasks. First, DCLSG employs channel augmentation to integrate temporal and frequency domain information. Pseudolabels based on momentum clustering operation are assigned to extracted representations, thereby mitigating the sampling bias raised by the selection of positive pairs. Second, the generated supervisory signal guides the pretraining phase, tackling the task-agnostic representation issue. Third, the angular-centricity issue is addressed using the proposed Gaussian distance metric measuring the radial distribution of representations. The experiments conducted on three industrial datasets (ISDB, CWRU, and practical datasets) validate the superior performance of the DCLSG compared to other fault detection methods.} }
- Kexin Zhang, Qingsong Wen, Chaoli Zhang, Rongyao Cai, Ming Jin, Yong Liu, James Y. Zhang, Yuxuan Liang, Guansong Pang, Dongjin Song, and Shirui Pan. Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46:6775-6794, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article. To this end, we first comprehensively review existing surveys related to SSL and time series, and then provide a new taxonomy of existing time series SSL methods by summarizing them from three perspectives: generative-based, contrastive-based, and adversarial-based. These methods are further divided into ten subcategories with detailed reviews and discussions about their key intuitions, main frameworks, advantages and disadvantages. To facilitate the experiments and validation of time series SSL methods, we also summarize datasets commonly used in time series forecasting, classification, anomaly detection, and clustering tasks. Finally, we present the future directions of SSL for time series analysis.
@article{zhang2024ssl, title = {Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects}, author = {Kexin Zhang and Qingsong Wen and Chaoli Zhang and Rongyao Cai and Ming Jin and Yong Liu and James Y. Zhang and Yuxuan Liang and Guansong Pang and Dongjin Song and Shirui Pan}, year = 2024, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = 46, pages = {6775-6794}, doi = {10.1109/TPAMI.2024.3387317}, abstract = {Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article. To this end, we first comprehensively review existing surveys related to SSL and time series, and then provide a new taxonomy of existing time series SSL methods by summarizing them from three perspectives: generative-based, contrastive-based, and adversarial-based. These methods are further divided into ten subcategories with detailed reviews and discussions about their key intuitions, main frameworks, advantages and disadvantages. To facilitate the experiments and validation of time series SSL methods, we also summarize datasets commonly used in time series forecasting, classification, anomaly detection, and clustering tasks. Finally, we present the future directions of SSL for time series analysis.} }
- Kexin Zhang, Rongyao Cai, Chunlin Zhou, and Yong Liu. Debiased Contrastive Learning for Time-Series Representation Learning and Fault Detection. IEEE Transactions on Industrial Informatics, 20:7641-7653, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Building reliable fault detection systems through deep neural networks is an appealing topic in industrial scenarios. In these contexts, the representations extracted by neural networks on available labeled time-series data can reflect system states. However, this endeavor remains challenging due to the necessity of labeled data. Self-supervised contrastive learning (SSCL) is one of the effective approaches to deal with this challenge, but existing SSCL-based models suffer from sampling bias and representation bias problems. This article introduces a debiased contrastive learning framework for time-series data and applies it to industrial fault detection tasks. This framework first develops the multigranularity augmented view generation method to generate augmented views at different granularities. It then introduces the momentum clustering contrastive learning strategy and the expert knowledge guidance mechanism to mitigate sampling bias and representation bias, respectively. Finally, the experiments on a public bearing fault detection dataset and a widely used valve stiction detection dataset show the effectiveness of the proposed feature learning framework.
@article{zhang2024dcl, title = {Debiased Contrastive Learning for Time-Series Representation Learning and Fault Detection}, author = {Kexin Zhang and Rongyao Cai and Chunlin Zhou and Yong Liu}, year = 2024, journal = {IEEE Transactions on Industrial Informatics}, volume = 20, pages = {7641-7653}, doi = {10.1109/TII.2024.3359409}, abstract = {Building reliable fault detection systems through deep neural networks is an appealing topic in industrial scenarios. In these contexts, the representations extracted by neural networks on available labeled time-series data can reflect system states. However, this endeavor remains challenging due to the necessity of labeled data. Self-supervised contrastive learning (SSCL) is one of the effective approaches to deal with this challenge, but existing SSCL-based models suffer from sampling bias and representation bias problems. This article introduces a debiased contrastive learning framework for time-series data and applies it to industrial fault detection tasks. This framework first develops the multigranularity augmented view generation method to generate augmented views at different granularities. It then introduces the momentum clustering contrastive learning strategy and the expert knowledge guidance mechanism to mitigate sampling bias and representation bias, respectively. Finally, the experiments on a public bearing fault detection dataset and a widely used valve stiction detection dataset show the effectiveness of the proposed feature learning framework.} }
- Rongyao Cai, Xiao Xv, Zhengming Lu, Kexin Zhang, and Yong Liu. Fusion Assessment of Safety and Security for Intelligent Industrial Unmanned Systems. In 7th International Symposium on Autonomous Systems (ISAS), 2024.
[BibTeX] [Abstract] [DOI] [PDF]Fault tree analysis is the most commonly used methodology in industrial safety analysis to predict the probability or frequency of system failure. Although fault tree analysis has been proposed for more than six decades, the assumptions used in most commercial fault tree analysis codes have not changed significantly, which limits the ability of the method to represent design, operation, and maintenance characteristics in the context of the increasing complexity and specialization of modern industrial systems. The basic setup of traditional fault trees is unable to include dependencies between events, time-varying failures, and repair rate realities to explain complex maintenance strategies. To address the above shortcomings, we propose a fusion tree model combining fault tree and attack tree, and simplify the causal structure of the fusion tree by modularization, and utilize the dynamic Markov model to represent the complex coupling relationship between components or nodes. Finally, we demonstrate the calculation process of fusion tree in pressure vessel systems with temporal control.
@inproceedings{cai2024fas, title = {Fusion Assessment of Safety and Security for Intelligent Industrial Unmanned Systems}, author = {Rongyao Cai and Xiao Xv and Zhengming Lu and Kexin Zhang and Yong Liu}, year = 2024, booktitle = {7th International Symposium on Autonomous Systems (ISAS)}, doi = {10.1109/ISAS61044.2024.10552597}, abstract = {Fault tree analysis is the most commonly used methodology in industrial safety analysis to predict the probability or frequency of system failure. Although fault tree analysis has been proposed for more than six decades, the assumptions used in most commercial fault tree analysis codes have not changed significantly, which limits the ability of the method to represent design, operation, and maintenance characteristics in the context of the increasing complexity and specialization of modern industrial systems. The basic setup of traditional fault trees is unable to include dependencies between events, time-varying failures, and repair rate realities to explain complex maintenance strategies. To address the above shortcomings, we propose a fusion tree model combining fault tree and attack tree, and simplify the causal structure of the fusion tree by modularization, and utilize the dynamic Markov model to represent the complex coupling relationship between components or nodes. Finally, we demonstrate the calculation process of fusion tree in pressure vessel systems with temporal control.} }
- Rongyao Cai, Linpeng Peng, Zhengming Lu, Kexin Zhang, and Yong Liu. DCS: Debiased Contrastive Learning with Weak Supervision for Time Series Classification. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5625-5629, 2024.
[BibTeX] [Abstract] [DOI] [PDF]Self-supervised contrastive learning (SSCL) has performed excellently on time series classification tasks. Most SSCL-based classification algorithms generate positive and negative samples in the time or frequency domains, focusing on mining similarities between them. However, two issues are not well addressed in the SSCL framework: the sampling bias and the task-agnostic representation problems. Sampling bias indicates fake negative sample selection in SSCL, and task-agnostic representation results in the unknown correlation between the extracted feature and downstream tasks. To address the issues, we propose Debiased Contrastive learning with weak Supervision framework, abbreviated as DCS. It employs the clustering operation to remove fake negative samples and introduces weak supervisory signals into the SSCL framework to guide feature extraction. Additionally, we propose a channel augmentation method that allows the DCS to extract features from local and global perspectives simultaneously. The comprehensive experiments on the widely used datasets show that DCS achieves performance superior to state-of-the-art methods on the widely used popular benchmark datasets.
@inproceedings{cai2024dcs, title = {DCS: Debiased Contrastive Learning with Weak Supervision for Time Series Classification}, author = {Rongyao Cai and Linpeng Peng and Zhengming Lu and Kexin Zhang and Yong Liu}, year = 2024, booktitle = {2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, pages = {5625-5629}, doi = {10.1109/ICASSP48485.2024.10446381}, abstract = {Self-supervised contrastive learning (SSCL) has performed excellently on time series classification tasks. Most SSCL-based classification algorithms generate positive and negative samples in the time or frequency domains, focusing on mining similarities between them. However, two issues are not well addressed in the SSCL framework: the sampling bias and the task-agnostic representation problems. Sampling bias indicates fake negative sample selection in SSCL, and task-agnostic representation results in the unknown correlation between the extracted feature and downstream tasks. To address the issues, we propose Debiased Contrastive learning with weak Supervision framework, abbreviated as DCS. It employs the clustering operation to remove fake negative samples and introduces weak supervisory signals into the SSCL framework to guide feature extraction. Additionally, we propose a channel augmentation method that allows the DCS to extract features from local and global perspectives simultaneously. The comprehensive experiments on the widely used datasets show that DCS achieves performance superior to state-of-the-art methods on the widely used popular benchmark datasets.} }
- Rongyao Cai, Kexin Zhang, and Yong Liu. Industrial Fault Detection Based on Time-Frequency Distillation Autoencoder. In The 42nd Chinese Control Conference (CCC), pages 5120-5125, 2023.
[BibTeX] [Abstract] [DOI] [PDF]Data-driven feature extraction is a crucial research area in control loop performance assessment (CLPA). Deep learning is a widely used technique for building feature learning models based on neural networks (NNs). However, most NN-based CLPA methods require a large amount of labeled data and do not fully leverage the potential of frequency features. We propose a novel model called time-frequency distillation autoencoder (TFDAE) to address these limitations. The TFDAE consists of a frequency distillation encoder and a representation extraction decoder. The encoder leverages self-supervised contrastive learning to learn time features that guide the distillation of key frequency information. Additionally, a multi-kernel pooling block is incorporated in the encoder, enabling multi-scale information refinement for time feature extraction. The decoder uses the distilled information to extract informative representations and reconstruct the original input series. Taking valve stiction detection in CLPA as the evaluation task, we developed a stiction detection method based on TFDAE. Finally, We evaluate our model on the benchmark dataset: International Stiction Data Base (ISDB), and the experimental results show that TFDAE outperforms traditional knowledge-based and recent NN-based methods.
@inproceedings{cai2023ifd, title = {Industrial Fault Detection Based on Time-Frequency Distillation Autoencoder}, author = {Rongyao Cai and Kexin Zhang and Yong Liu}, year = 2023, booktitle = {The 42nd Chinese Control Conference (CCC)}, pages = {5120-5125}, doi = {10.23919/CCC58697.2023.10239980}, abstract = {Data-driven feature extraction is a crucial research area in control loop performance assessment (CLPA). Deep learning is a widely used technique for building feature learning models based on neural networks (NNs). However, most NN-based CLPA methods require a large amount of labeled data and do not fully leverage the potential of frequency features. We propose a novel model called time-frequency distillation autoencoder (TFDAE) to address these limitations. The TFDAE consists of a frequency distillation encoder and a representation extraction decoder. The encoder leverages self-supervised contrastive learning to learn time features that guide the distillation of key frequency information. Additionally, a multi-kernel pooling block is incorporated in the encoder, enabling multi-scale information refinement for time feature extraction. The decoder uses the distilled information to extract informative representations and reconstruct the original input series. Taking valve stiction detection in CLPA as the evaluation task, we developed a stiction detection method based on TFDAE. Finally, We evaluate our model on the benchmark dataset: International Stiction Data Base (ISDB), and the experimental results show that TFDAE outperforms traditional knowledge-based and recent NN-based methods.} }
- Kexin Zhang, Rongyao Cai, and Yong Liu. Industrial Fault Detection using Contrastive Representation Learning on Time-series Data. In The 22nd World Congress of the International Federation of Automatic Control (IFAC), pages 3197-3202, 2023.
[BibTeX] [Abstract] [DOI] [PDF]Deep learning (DL) has been known as one of the effective techniques for building data-driven fault detection methods. The successful DL-based methods require the condition that massive labeled data are available, but this is sometimes an inevitable obstacle in real industrial environments. As one of the solutions, autoencoders (AEs) are widely adopted since AEs can extract features from unlabeled data. However, some challenges in AE- based fault detection methods remain, such as the design of encoder architecture, the computational cost, and the usage of the limited labeled data. This paper proposes a new industrial fault detection method through learning instance-level representation of time-series based on the self-supervised contrastive learning framework (SSCL). The proposed method uses dilated-causal-convolution-based encoder-only architecture to extract the information from industrial time- series data. A new data augmentation method for time-series data is proposed based on the temporal distance distribution, which is used to construct positive pairs in SSCL. Moreover, the encoder is alternately trained by the new weighted contrastive loss and the traditional classification loss. Finally, the experiments are conducted on the industrial data set and a semi-physical system, showing the effectiveness of the proposed method.
@inproceedings{zhang2023ifd, title = {Industrial Fault Detection using Contrastive Representation Learning on Time-series Data}, author = {Kexin Zhang and Rongyao Cai and Yong Liu}, year = 2023, booktitle = {The 22nd World Congress of the International Federation of Automatic Control (IFAC)}, pages = {3197-3202}, doi = {10.1016/j.ifacol.2023.10.1456}, abstract = {Deep learning (DL) has been known as one of the effective techniques for building data-driven fault detection methods. The successful DL-based methods require the condition that massive labeled data are available, but this is sometimes an inevitable obstacle in real industrial environments. As one of the solutions, autoencoders (AEs) are widely adopted since AEs can extract features from unlabeled data. However, some challenges in AE- based fault detection methods remain, such as the design of encoder architecture, the computational cost, and the usage of the limited labeled data. This paper proposes a new industrial fault detection method through learning instance-level representation of time-series based on the self-supervised contrastive learning framework (SSCL). The proposed method uses dilated-causal-convolution-based encoder-only architecture to extract the information from industrial time- series data. A new data augmentation method for time-series data is proposed based on the temporal distance distribution, which is used to construct positive pairs in SSCL. Moreover, the encoder is alternately trained by the new weighted contrastive loss and the traditional classification loss. Finally, the experiments are conducted on the industrial data set and a semi-physical system, showing the effectiveness of the proposed method.} }