Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: xuemengyang@zju.edu.cn

Xuemeng Yang

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include deep learning and 3D computer vision.

Research and Interests

  • Deep Learning
  • 3D computer vision

Publications

  • Tianxin Huang, Hao Zou, Jinhao Cui, Jiangning Zhang, Xuemeng Yang, Lin Li, and Yong Liu. Adaptive Recurrent Forward Network for Dense Point Cloud Completion. IEEE Transactions on Multimedia, 25:5903-5915, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    Point cloud completion is an interesting and challenging task in 3D vision, which aims to recover complete shapes from sparse and incomplete point clouds. Existing completion networks often require a vast number of parameters and substantial computational costs to achieve a high performance level, which may limit their practical application. In this work, we propose a novel Adaptive efficient Recurrent Forward Network (ARFNet), which is composed of three parts: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). In an RFE, multiple short global features are extracted from incomplete point clouds, while a dense quantity of completed results are generated in a coarse-to-fine pipeline in the FDC. Finally, we propose the Adamerge module to preserve the details from the original models by merging the generated results with the original incomplete point clouds in the RSP. In addition, we introduce the Sampling Chamfer Distance to better capture the shapes of the models and the balanced expansion constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve state-of-the-art completion performances on dense point clouds with fewer parameters, smaller model sizes, lower memory costs and a faster convergence.
    @article{huang2022arf,
    title = {Adaptive Recurrent Forward Network for Dense Point Cloud Completion},
    author = {Tianxin Huang and Hao Zou and Jinhao Cui and Jiangning Zhang and Xuemeng Yang and Lin Li and Yong Liu},
    year = 2022,
    journal = {IEEE Transactions on Multimedia},
    volume = {25},
    pages = {5903-5915},
    doi = {10.1109/TMM.2022.3200851},
    abstract = {Point cloud completion is an interesting and challenging task in 3D vision, which aims to recover complete shapes from sparse and incomplete point clouds. Existing completion networks often require a vast number of parameters and substantial computational costs to achieve a high performance level, which may limit their practical application. In this work, we propose a novel Adaptive efficient Recurrent Forward Network (ARFNet), which is composed of three parts: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). In an RFE, multiple short global features are extracted from incomplete point clouds, while a dense quantity of completed results are generated in a coarse-to-fine pipeline in the FDC. Finally, we propose the Adamerge module to preserve the details from the original models by merging the generated results with the original incomplete point clouds in the RSP. In addition, we introduce the Sampling Chamfer Distance to better capture the shapes of the models and the balanced expansion constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve state-of-the-art completion performances on dense point clouds with fewer parameters, smaller model sizes, lower memory costs and a faster convergence.}
    }
  • Tianxin Huang, Xuemeng Yang, Jiangning Zhang, Jinhao Cui, Hao Zou, Jun Chen and Xiangrui Zhao, and Yong Liu. Learning to Train a Point Cloud Reconstruction Network Without Matching. In European Conference on Computer Vision (ECCV), 2022.
    [BibTeX] [Abstract] [DOI]
    Reconstruction networks for well-ordered data such as 2D images and 1D continuous signals are easy to optimize through element-wised squared errors, while permutation-arbitrary point clouds cannot be constrained directly because their points permutations are not fixed. Though existing works design algorithms to match two point clouds and evaluate shape errors based on matched results, they are limited by pre-defined matching processes. In this work, we propose a novel framework named PCLossNet which learns to train a point cloud reconstruction network without any matching. By training through an adversarial process together with the reconstruction network, PCLossNet can better explore the differences between point clouds and create more precise reconstruction results. Experiments on multiple datasets prove the superiority of our method, where PCLossNet can help networks achieve much lower reconstruction errors and extract more representative features, with about 4 times faster training efficiency than the commonly-used EMD loss. Our codes can be found in https://github.com/Tianxinhuang/PCLossNet.
    @inproceedings{huang2022ltt,
    title = {Learning to Train a Point Cloud Reconstruction Network Without Matching},
    author = {Tianxin Huang and Xuemeng Yang and Jiangning Zhang and Jinhao Cui and Hao Zou and Jun Chen and Xiangrui Zhao and Yong Liu},
    year = 2022,
    booktitle = {European Conference on Computer Vision (ECCV)},
    doi = {10.1007/978-3-031-19769-7_11},
    abstract = {Reconstruction networks for well-ordered data such as 2D images and 1D continuous signals are easy to optimize through element-wised squared errors, while permutation-arbitrary point clouds cannot be constrained directly because their points permutations are not fixed. Though existing works design algorithms to match two point clouds and evaluate shape errors based on matched results, they are limited by pre-defined matching processes. In this work, we propose a novel framework named PCLossNet which learns to train a point cloud reconstruction network without any matching. By training through an adversarial process together with the reconstruction network, PCLossNet can better explore the differences between point clouds and create more precise reconstruction results. Experiments on multiple datasets prove the superiority of our method, where PCLossNet can help networks achieve much lower reconstruction errors and extract more representative features, with about 4 times faster training efficiency than the commonly-used EMD loss. Our codes can be found in https://github.com/Tianxinhuang/PCLossNet.}
    }
  • Tianxin Huang, Hao Zou, Jinhao Cui, Xuemeng Yang, Mengmeng Wang, Xiangrui Zhao, Jiangning Zhang and Yi Yuan, Yifan Xu, and Yong Liu. RFNet: Recurrent Forward Network for Dense Point Cloud Completion. In 2021 International Conference on Computer Vision, pages 12488-12497, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Point cloud completion is an interesting and challenging task in 3D vision, aiming to recover complete shapes from sparse and incomplete point clouds. Existing learning based methods often require vast computation cost to achieve excellent performance, which limits their practical applications. In this paper, we propose a novel Recurrent Forward Network (RFNet), which is composed of three modules: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline. The RSP introduces details from the original incomplete models to refine the completion results. Besides, we propose a Sampling Chamfer Distance to better capture the shapes of models and a new Balanced Expansion Constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve the state-of-the-art with lower memory cost and faster convergence.
    @inproceedings{huang2021rfnetrf,
    title = {RFNet: Recurrent Forward Network for Dense Point Cloud Completion},
    author = {Tianxin Huang and Hao Zou and Jinhao Cui and Xuemeng Yang and Mengmeng Wang and Xiangrui Zhao and Jiangning Zhang and Yi Yuan and Yifan Xu and Yong Liu},
    year = 2021,
    booktitle = {2021 International Conference on Computer Vision},
    pages = {12488-12497},
    doi = {https://doi.org/10.1109/ICCV48922.2021.01228},
    abstract = {Point cloud completion is an interesting and challenging task in 3D vision, aiming to recover complete shapes from sparse and incomplete point clouds. Existing learning based methods often require vast computation cost to achieve excellent performance, which limits their practical applications. In this paper, we propose a novel Recurrent Forward Network (RFNet), which is composed of three modules: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline. The RSP introduces details from the original incomplete models to refine the completion results. Besides, we propose a Sampling Chamfer Distance to better capture the shapes of models and a new Balanced Expansion Constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve the state-of-the-art with lower memory cost and faster convergence.}
    }
  • Shanqi Liu, licheng Wen, Jinhao Cui, Xuemeng Yang, Junjie Cao, and Yong Liu. Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4777-4784, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Multi-agent path finding in formation has manypotential real-world applications like mobile warehouse robotics. However, previous multi-agent path finding (MAPF) methods hardly take formation into consideration. Furthermore, they are usually centralized planners and require the whole state of the environment. Other decentralized partially observable approaches to MAPF are reinforcement learning (RL) methods. However, these RL methods encounter difficulties when learning path finding and formation problems at the same time. In this paper, we propose a novel decentralized partially observable RL algorithm that uses a hierarchical structure to decompose the multi-objective task into unrelated ones. It also calculates a theoretical weight that makes each tasks reward has equal influence on the final RL value function. Additionally, we introduce a communication method that helps agents cooperate with each other. Experiments in simulation show that our method outperforms other end-toend RL methods and our method can naturally scale to large world sizes where centralized planner struggles. We also deploy and validate our method in a real-world scenario.
    @inproceedings{liu2021movingfi,
    title = {Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together},
    author = {Shanqi Liu and licheng Wen and Jinhao Cui and Xuemeng Yang and Junjie Cao and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {4777-4784},
    doi = {https://doi.org/10.1109/IROS51168.2021.9636224},
    abstract = {Multi-agent path finding in formation has manypotential real-world applications like mobile warehouse robotics. However, previous multi-agent path finding (MAPF) methods hardly take formation into consideration. Furthermore, they are usually centralized planners and require the whole state of the environment. Other decentralized partially observable approaches to MAPF are reinforcement learning (RL) methods. However, these RL methods encounter difficulties when learning path finding and formation problems at the same time. In this paper, we propose a novel decentralized partially observable RL algorithm that uses a hierarchical structure to decompose the multi-objective task into unrelated ones. It also calculates a theoretical weight that makes each tasks reward has equal influence on the final RL value function. Additionally, we introduce a communication method that helps agents cooperate with each other. Experiments in simulation show that our method outperforms other end-toend RL methods and our method can naturally scale to large world sizes where centralized planner struggles. We also deploy and validate our method in a real-world scenario.}
    }
  • Hao Zou, Xuemeng Yang, Tianxin Huang, Chujuan Zhang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 16-23, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.
    @inproceedings{zou2021utd,
    title = {Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion},
    author = {Hao Zou and Xuemeng Yang and Tianxin Huang and Chujuan Zhang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {16-23},
    doi = {https://doi.org/10.1109/IROS51168.2021.9635888},
    abstract = {An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.}
    }
  • Xuemeng Yang, Hao Zou, Xin Kong, Tianxin Huang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3555-3562, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth exploring. Therefore, we propose an end-to-end semantic segmentation-assisted scene completion network, including a 2D completion branch and a 3D semantic segmentation branch. Specifically, the network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically to provide semantic information. By adopting BEV representation and 3D sparse convolution, we can benefit from the lower operand while maintaining effective expression. Besides, the decoder of the segmentation branch is used as an auxiliary, which can be discarded in the inference stage to save computational consumption. Extensive experiments demonstrate that our method achieves competitive performance on SemanticKITTI dataset with low latency. Code and models will be released at https://github.com/jokester-zzz/SSA-SC.
    @inproceedings{yang2021ssa,
    title = {Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds},
    author = {Xuemeng Yang and Hao Zou and Xin Kong and Tianxin Huang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {3555-3562},
    doi = {https://doi.org/10.1109/IROS51168.2021.9636662},
    abstract = {Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth exploring. Therefore, we propose an end-to-end semantic segmentation-assisted scene completion network, including a 2D completion branch and a 3D semantic segmentation branch. Specifically, the network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically to provide semantic information. By adopting BEV representation and 3D sparse convolution, we can benefit from the lower operand while maintaining effective expression. Besides, the decoder of the segmentation branch is used as an auxiliary, which can be discarded in the inference stage to save computational consumption. Extensive experiments demonstrate that our method achieves competitive performance on SemanticKITTI dataset with low latency. Code and models will be released at https://github.com/jokester-zzz/SSA-SC.}
    }
  • Jinhao Cui, Hao Zou, Xin Kong, Xuemeng Yang, Xiangrui Zhao, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network. In 2021 IEEE International Conference on Robotics and Automation, pages 1868-1874, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.
    @inproceedings{cui2021poconetso,
    title = {PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network},
    author = {Jinhao Cui and Hao Zou and Xin Kong and Xuemeng Yang and Xiangrui Zhao and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation},
    pages = {1868-1874},
    doi = {https://doi.org/10.1109/ICRA48506.2021.9561309},
    abstract = {In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.}
    }
  • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong Liu, Wanlong Li, and Feng Wen. Semantic Graph Based Place Recognition for 3D Point Clouds. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8216–8223, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.
    @inproceedings{kong2020semanticgb,
    title = {Semantic Graph Based Place Recognition for 3D Point Clouds},
    author = {Xin Kong and Xuemeng Yang and Guangyao Zhai and Xiangrui Zhao and Xianfang Zeng and Mengmeng Wang and Yong Liu and Wanlong Li and Feng Wen},
    year = 2020,
    booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {8216--8223},
    doi = {https://doi.org/10.1109/IROS45743.2020.9341060},
    abstract = {Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.},
    arxiv = {https://arxiv.org/pdf/2008.11459.pdf}
    }
  • Licheng Wen, Jiaqing Yan, Xuemeng Yang, Yong Liu, and Yong Gu. Collision-free Trajectory Planning for Autonomous Surface Vehicle. In 2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), page 1098–1105, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    In this paper, we propose an efficient and accurate method for autonomous surface vehicles to generate a smooth and collision-free trajectory considering its dynamics constraints. We decouple the trajectory planning problem as a front-end feasible path searching and a back-end kinodynamic trajectory optimization. Firstly, we model the type of two-thrusts under-actuated surface vessel. Then we adopt a sampling-based path searching to find an asymptotic optimal path through the obstacle-surrounding environment and extract several waypoints from it. We apply a numerical optimization method in the back-end to generate the trajectory. From the perspective of security in the field voyage, we propose the sailing corridor method to guarantee the trajectory away from obstacles. Moreover, considering limited fuel ASV carrying, we design a numerical objective function which can optimize a fuel-saving trajectory. Finally, we validate and compare the proposed method in simulation environments and the results fit our expected trajectory.
    @inproceedings{wen2020collisionfreetp,
    title = {Collision-free Trajectory Planning for Autonomous Surface Vehicle},
    author = {Licheng Wen and Jiaqing Yan and Xuemeng Yang and Yong Liu and Yong Gu},
    year = 2020,
    booktitle = {2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)},
    pages = {1098--1105},
    doi = {https://doi.org/10.1109/AIM43001.2020.9158907},
    abstract = {In this paper, we propose an efficient and accurate method for autonomous surface vehicles to generate a smooth and collision-free trajectory considering its dynamics constraints. We decouple the trajectory planning problem as a front-end feasible path searching and a back-end kinodynamic trajectory optimization. Firstly, we model the type of two-thrusts under-actuated surface vessel. Then we adopt a sampling-based path searching to find an asymptotic optimal path through the obstacle-surrounding environment and extract several waypoints from it. We apply a numerical optimization method in the back-end to generate the trajectory. From the perspective of security in the field voyage, we propose the sailing corridor method to guarantee the trajectory away from obstacles. Moreover, considering limited fuel ASV carrying, we design a numerical objective function which can optimize a fuel-saving trajectory. Finally, we validate and compare the proposed method in simulation environments and the results fit our expected trajectory.},
    arxiv = {http://arxiv.org/pdf/2005.09857}
    }
  • Wei Ye, Jiasai Sun, Min Xu, Xuemeng Yang, Hongliang Li, and Yong Liu. Detecting Aging Substation Transformers by Audio Signal with Deep Neural Network. Lecture Notes in Computer Science, page 70–82, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    In order to monitor the aging of transformers and ensure the operational safety in substations, a practical detection system for indoor substation transformers based on the analysis of audio signal is designed, which use computer technology instead of manpower to efficiently monitor the transformers working states in real-time. Our work consists of a small and low cost AI-STBOX and an intelligent AI Cloud Platform. AI-STBOX is installed directionally in each transformer room for continuously collecting, compressing and uploading the transformers audio data. The AI Cloud Platform receives audio data from AI-STBOX, analyses and organizes the data to low-dimensional speech features with STFT and Mel cepstrum analysis. Input the features into a powerful deep neural network, the system can quickly distinguish the working states of each substation transformer before is has serious faults. It can locate aging transformers, command the maintenance platform to quickly release the repair task, thus avoid unforeseeable outages and minimize planned downtimes. The approach has achieved excellent results in the substation aging transformers detection scene.
    @article{ye2020detectingas,
    title = {Detecting Aging Substation Transformers by Audio Signal with Deep Neural Network},
    author = {Wei Ye and Jiasai Sun and Min Xu and Xuemeng Yang and Hongliang Li and Yong Liu},
    year = 2020,
    journal = {Lecture Notes in Computer Science},
    pages = {70--82},
    doi = {https://doi.org/10.1007/978-3-662-61510-2_7},
    abstract = {In order to monitor the aging of transformers and ensure the operational safety in substations, a practical detection system for indoor substation transformers based on the analysis of audio signal is designed, which use computer technology instead of manpower to efficiently monitor the transformers working states in real-time. Our work consists of a small and low cost AI-STBOX and an intelligent AI Cloud Platform. AI-STBOX is installed directionally in each transformer room for continuously collecting, compressing and uploading the transformers audio data. The AI Cloud Platform receives audio data from AI-STBOX, analyses and organizes the data to low-dimensional speech features with STFT and Mel cepstrum analysis. Input the features into a powerful deep neural network, the system can quickly distinguish the working states of each substation transformer before is has serious faults. It can locate aging transformers, command the maintenance platform to quickly release the repair task, thus avoid unforeseeable outages and minimize planned downtimes. The approach has achieved excellent results in the substation aging transformers detection scene.}
    }