Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: xiangruizhao@zju.edu.cn

Xiangrui Zhao

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University after getting my B.S. degree in automation from Huazhong University of Science and Technology in 2018. My major research interests include machine learning in sensor fusion and SLAM.

Research and Interests

  • Sensor Fusion
  • Machine Learning in Sensor Fusion
  • SLAM

Publications

  • Laijian Li, Yukai Ma, Kai Tang, Xiangrui Zhao, Chao Chen, Jianxin Huang, Jianbiao Mei, and Yong Liu. Geo-localization with Transformer-based 2D-3D match Network. IEEE Robotics and Automation Letters (RA-L), 8:4855-4862, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This letter presents a novel method for geographical localization by registering satellite maps with LiDAR point clouds. This method includes a Transformer-based 2D-3D matching network called D-GLSNet that directly matches the LiDAR point clouds and satellite images through end-to-end learning. Without the need for feature point detection, D-GLSNet provides accurate pixel-to-point association between the LiDAR point clouds and satellite images. And then, we can easily calculate the horizontal offset (Δx,Δy) and angular deviation Δθyaw between them, thereby achieving accurate registration. To demonstrate our network’s localization potential, we have designed a Geo-localization Node (GLN) that implements geographical localization and is plug-and-play in the SLAM system. Compared to GPS, GLN is less susceptible to external interference, such as building occlusion. In urban scenarios, our proposed D-GLSNet can output high-quality matching, enabling GLN to function stably and deliver more accurate localization results. Extensive experiments on the KITTI dataset show that our D-GLSNet method achieves a mean Relative Translation Error (RTE) of 1.43 m. Furthermore, our method outperforms state-of-the-art LiDAR-based geospatial localization methods when combined with odometry.
    @article{li2023glw,
    title = {Geo-localization with Transformer-based 2D-3D match Network},
    author = {Laijian Li and Yukai Ma and Kai Tang and Xiangrui Zhao and Chao Chen and Jianxin Huang and Jianbiao Mei and Yong Liu},
    year = 2023,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = 8,
    pages = {4855-4862},
    doi = {10.1109/LRA.2023.3290526},
    abstract = {This letter presents a novel method for geographical localization by registering satellite maps with LiDAR point clouds. This method includes a Transformer-based 2D-3D matching network called D-GLSNet that directly matches the LiDAR point clouds and satellite images through end-to-end learning. Without the need for feature point detection, D-GLSNet provides accurate pixel-to-point association between the LiDAR point clouds and satellite images. And then, we can easily calculate the horizontal offset (Δx,Δy) and angular deviation Δθyaw between them, thereby achieving accurate registration. To demonstrate our network's localization potential, we have designed a Geo-localization Node (GLN) that implements geographical localization and is plug-and-play in the SLAM system. Compared to GPS, GLN is less susceptible to external interference, such as building occlusion. In urban scenarios, our proposed D-GLSNet can output high-quality matching, enabling GLN to function stably and deliver more accurate localization results. Extensive experiments on the KITTI dataset show that our D-GLSNet method achieves a mean Relative Translation Error (RTE) of 1.43 m. Furthermore, our method outperforms state-of-the-art LiDAR-based geospatial localization methods when combined with odometry.}
    }
  • Chao Chen, Yukai Ma, Jiajun Lv, Xiangrui Zhao, Laijian Li, Yong Liu, and Wang Gao. OL-SLAM: A Robust and Versatile System of Object Localization and SLAM. Sensors, 23:801, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a real-time, versatile Simultaneous Localization and Mapping (SLAM) and object localization system, which fuses measurements from LiDAR, camera, Inertial Measurement Unit (IMU), and Global Positioning System (GPS). Our system can locate itself in an unknown environment and build a scene map based on which we can also track and obtain the global location of objects of interest. Precisely, our SLAM subsystem consists of the following four parts: LiDAR-inertial odometry, Visual-inertial odometry, GPS-inertial odometry, and global pose graph optimization. The target-tracking and positioning subsystem is developed based on YOLOv4. Benefiting from the use of GPS sensor in the SLAM system, we can obtain the global positioning information of the target; therefore, it can be highly useful in military operations, rescue and disaster relief, and other scenarios.
    @article{chen2023ols,
    title = {OL-SLAM: A Robust and Versatile System of Object Localization and SLAM},
    author = {Chao Chen and Yukai Ma and Jiajun Lv and Xiangrui Zhao and Laijian Li and Yong Liu and Wang Gao},
    year = 2023,
    journal = {Sensors},
    volume = 23,
    pages = {801},
    doi = {10.3390/s23020801},
    abstract = {This paper proposes a real-time, versatile Simultaneous Localization and Mapping (SLAM) and object localization system, which fuses measurements from LiDAR, camera, Inertial Measurement Unit (IMU), and Global Positioning System (GPS). Our system can locate itself in an unknown environment and build a scene map based on which we can also track and obtain the global location of objects of interest. Precisely, our SLAM subsystem consists of the following four parts: LiDAR-inertial odometry, Visual-inertial odometry, GPS-inertial odometry, and global pose graph optimization. The target-tracking and positioning subsystem is developed based on YOLOv4. Benefiting from the use of GPS sensor in the SLAM system, we can obtain the global positioning information of the target; therefore, it can be highly useful in military operations, rescue and disaster relief, and other scenarios.}
    }
  • Yukai Ma, Xiangrui Zhao, Han Li, Yaqing Gu, Xiaolei Lang, and Yong Liu. RoLM:Radar on LiDAR Map Localization. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Multi-sensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the FMCW radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method of Radar on LiDAR Map (RoLM), which can eliminate the accumulated error of radar odometry in real-time to achieve higher localization accuracy without dependence on loop closures. We embed the two sensor modalities into a density map and calculate the spatial vector similarity with offset to seek the corresponding place index in the candidates and calculate the rotation and translation. We use the ICP to pursue perfect matching on the LiDAR submap based on the coarse alignment. Extensive experiments on Mulran Radar Dataset, Oxford Radar RobotCar Dataset, and our data verify the feasibility and effectiveness of our approach.
    @inproceedings{ma2023rol,
    title = {RoLM:Radar on LiDAR Map Localization},
    author = {Yukai Ma and Xiangrui Zhao and Han Li and Yaqing Gu and Xiaolei Lang and Yong Liu},
    year = 2023,
    booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)},
    doi = {10.1109/ICRA48891.2023.10161203},
    abstract = {Multi-sensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the FMCW radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method of Radar on LiDAR Map (RoLM), which can eliminate the accumulated error of radar odometry in real-time to achieve higher localization accuracy without dependence on loop closures. We embed the two sensor modalities into a density map and calculate the spatial vector similarity with offset to seek the corresponding place index in the candidates and calculate the rotation and translation. We use the ICP to pursue perfect matching on the LiDAR submap based on the coarse alignment. Extensive experiments on Mulran Radar Dataset, Oxford Radar RobotCar Dataset, and our data verify the feasibility and effectiveness of our approach.}
    }
  • Xiangrui Zhao, Yu Liu, Zhengbo Wang, Kanzhi Wu, Gamini Dissanayake, and Yong Liu. TG: Accurate and Efficient RGB-D Feature with Texture and Geometric Information. IEEE-ASME Transactions on Mechatronics, 27(4):1973-1981, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    Feature extraction and matching are the basis of many computer vision problems, such as image retrieval, object recognition, and visual odometry. In this article, we present a novel RGB-D feature with texture and geometric information (TG). It consists of a keypoint detector and a feature descriptor, which is accurate, efficient, and robust to scene variance. In the keypoint detection, we build a simplified Gaussian image pyramid to extract the texture feature. Meanwhile, the gradient of the point cloud is superimposed as the geometric feature. In the feature description, the texture information and spatial information are encoded in relative order to build a discriminative descriptor. We also construct a novel RGB-D benchmark dataset for RGB-D detector and descriptor evaluation under single variation. Comprehensive experiments are carried out to prove the superior performance of the proposed feature compared with state-of-the-art algorithms. The experimental results also demonstrate that our TG can achieve better performance especially on accuracy and the computational efficiency, making it more suitable for the real-time applications, e.g., visual odometry.
    @article{zhao2022tga,
    title = {TG: Accurate and Efficient RGB-D Feature with Texture and Geometric Information},
    author = {Xiangrui Zhao and Yu Liu and Zhengbo Wang and Kanzhi Wu and Gamini Dissanayake and Yong Liu},
    year = 2022,
    journal = {IEEE-ASME Transactions on Mechatronics},
    volume = {27},
    number = {4},
    pages = {1973-1981},
    doi = {10.1109/TMECH.2022.3175812},
    abstract = {Feature extraction and matching are the basis of many computer vision problems, such as image retrieval, object recognition, and visual odometry. In this article, we present a novel RGB-D feature with texture and geometric information (TG). It consists of a keypoint detector and a feature descriptor, which is accurate, efficient, and robust to scene variance. In the keypoint detection, we build a simplified Gaussian image pyramid to extract the texture feature. Meanwhile, the gradient of the point cloud is superimposed as the geometric feature. In the feature description, the texture information and spatial information are encoded in relative order to build a discriminative descriptor. We also construct a novel RGB-D benchmark dataset for RGB-D detector and descriptor evaluation under single variation. Comprehensive experiments are carried out to prove the superior performance of the proposed feature compared with state-of-the-art algorithms. The experimental results also demonstrate that our TG can achieve better performance especially on accuracy and the computational efficiency, making it more suitable for the real-time applications, e.g., visual odometry.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, and Yong Liu. Semantic Scan Context: A Novel Semantic-based Loop-closure Method for LiDAR SLAM. Autonomous Robots, 46(4):535-551, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    As one of the key technologies of SLAM, loop-closure detection can help eliminate the cumulative errors of the odometry. Many of the current LiDAR-based SLAM systems do not integrate a loop-closure detection module, so they will inevitably suffer from cumulative errors. This paper proposes a semantic-based place recognition method called Semantic Scan Context (SSC), which consists of the two-step global ICP and the semantic-based descriptor. Thanks to the use of high-level semantic features, our descriptor can effectively encode scene information. The proposed two-step global ICP can help eliminate the influence of rotation and translation on descriptor matching and provide a good initial value for geometric verification. Further, we built a complete loop-closure detection module based on SSC and combined it with the famous LOAM to form a full LiDAR SLAM system. Exhaustive experiments on the KITTI and KITTI-360 datasets show that our approach is competitive to the state-of-the-art methods, robust to the environment, and has good generalization ability. Our code is available at:https://github.com/lilin-hitcrt/SSC.
    @article{li2022ssc,
    title = {Semantic Scan Context: A Novel Semantic-based Loop-closure Method for LiDAR SLAM},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Tianxin Huang and Yong Liu},
    year = 2022,
    journal = {Autonomous Robots},
    volume = {46},
    number = {4},
    pages = {535-551},
    doi = {10.1007/s10514-022-10037-w},
    abstract = {As one of the key technologies of SLAM, loop-closure detection can help eliminate the cumulative errors of the odometry. Many of the current LiDAR-based SLAM systems do not integrate a loop-closure detection module, so they will inevitably suffer from cumulative errors. This paper proposes a semantic-based place recognition method called Semantic Scan Context (SSC), which consists of the two-step global ICP and the semantic-based descriptor. Thanks to the use of high-level semantic features, our descriptor can effectively encode scene information. The proposed two-step global ICP can help eliminate the influence of rotation and translation on descriptor matching and provide a good initial value for geometric verification. Further, we built a complete loop-closure detection module based on SSC and combined it with the famous LOAM to form a full LiDAR SLAM system. Exhaustive experiments on the KITTI and KITTI-360 datasets show that our approach is competitive to the state-of-the-art methods, robust to the environment, and has good generalization ability. Our code is available at:https://github.com/lilin-hitcrt/SSC.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, Wanlong li, Feng Wen, Hongbo Zhang, and Yong Liu. RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network. IEEE Robotics and Automation Letters (RA-L), 7(2):4321-4328, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    LiDAR-based place recognition (LPR) is one of the basic capabilities of robots, which can retrieve scenes from maps and identify previously visited locations based on 3D point clouds. As robots often pass the same place from different views, LPR methods are supposed to be robust to rotation, which is lacking in most current learning-based approaches. In this letter, we propose a rotation invariant neural network structure that can detect reverse loop closures even training data is all in the same direction. Specifically, we design a novel rotation equivariant global descriptor, which combines semantic and geometric features to improve description ability. Then a rotation invariant siamese neural network is implemented to predict the similarity of descriptor pairs. Our network is lightweight and can operate more than 8000 FPS on an i7-9700 CPU. Exhaustive evaluations and robustness tests on the KITTI, KITTI-360, and NCLT datasets show that our approach can work stably in various scenarios and achieve state-of-the-art performance.
    @article{li2022rinet,
    title = {RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Tianxin Huang and Wanlong li and Feng Wen and Hongbo Zhang and Yong Liu},
    year = 2022,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = {7},
    number = {2},
    pages = {4321-4328},
    doi = {10.1109/LRA.2022.3150499},
    abstract = {LiDAR-based place recognition (LPR) is one of the basic capabilities of robots, which can retrieve scenes from maps and identify previously visited locations based on 3D point clouds. As robots often pass the same place from different views, LPR methods are supposed to be robust to rotation, which is lacking in most current learning-based approaches. In this letter, we propose a rotation invariant neural network structure that can detect reverse loop closures even training data is all in the same direction. Specifically, we design a novel rotation equivariant global descriptor, which combines semantic and geometric features to improve description ability. Then a rotation invariant siamese neural network is implemented to predict the similarity of descriptor pairs. Our network is lightweight and can operate more than 8000 FPS on an i7-9700 CPU. Exhaustive evaluations and robustness tests on the KITTI, KITTI-360, and NCLT datasets show that our approach can work stably in various scenarios and achieve state-of-the-art performance.}
    }
  • Xiangrui Zhao, Sheng Yang, Tianxin Huang, Jun Chen and Teng Ma, Mingyang Li, and Yong Liu. SuperLine3D:Self-supervised 3D Line Segmentation and Description for LiDAR Point Cloud. In European Conference on Computer Vision (ECCV), 2022.
    [BibTeX] [Abstract] [DOI]
    Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud. To train our model without the time consuming and tedious data labeling process, we first generate synthetic primitives for the basic appearance of target lines, and build an iterative line auto-labeling process to gradually refine line labels on real LiDAR scans. Our segmentation model can extract lines under arbitrary scale perturbations, and we use shared EdgeConv encoder layers to train the two segmentation and descriptor heads jointly. Base on the model, we can build a highly-available global registration module for point cloud registration, in conditions without initial transformation hints. Experiments have demonstrated that our line-based registration method is highly competitive to state-of-the-art point-based approaches. Our code is available at https://github.com/zxrzju/SuperLine3D.git.
    @inproceedings{zhao2022sls,
    title = {SuperLine3D:Self-supervised 3D Line Segmentation and Description for LiDAR Point Cloud},
    author = {Xiangrui Zhao and Sheng Yang and Tianxin Huang and Jun Chen and Teng Ma and Mingyang Li and Yong Liu},
    year = 2022,
    booktitle = {European Conference on Computer Vision (ECCV)},
    doi = {10.1007/978-3-031-20077-9_16},
    abstract = {Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud. To train our model without the time consuming and tedious data labeling process, we first generate synthetic primitives for the basic appearance of target lines, and build an iterative line auto-labeling process to gradually refine line labels on real LiDAR scans. Our segmentation model can extract lines under arbitrary scale perturbations, and we use shared EdgeConv encoder layers to train the two segmentation and descriptor heads jointly. Base on the model, we can build a highly-available global registration module for point cloud registration, in conditions without initial transformation hints. Experiments have demonstrated that our line-based registration method is highly competitive to state-of-the-art point-based approaches. Our code is available at https://github.com/zxrzju/SuperLine3D.git.}
    }
  • Jianxin Huang, Laijian Li, Xiangrui Zhao, Xiaolei Lang, Deye Zhu, and Yong Liu. LODM: Large-scale Online Dense Mapping for UAV. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a method for online large-scale dense mapping. The UAV is within a range of 150-250 meters, combining GPS and visual odometry to estimate the scaled pose and sparse points. In order to use the depth of sparse points for depth map, we propose Sparse Confidence Cascade View-Aggregation MVSNet (SCCVA-MVSNet), which projects the depth-converged points in the sliding window on keyframes to obtain a sparse depth map. The photometric error constructs sparse confidence. The coarse depth and confidence through normalized convolution use the images of all keyframes, coarse depth, and confidence as the input of CVA-MVSNet to extract features and construct 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Our proposed network utilizes sparse features point information, the output of the network better maintains the consistency of the scale. Our experiments show that MVSNet using sparse feature point information outperforms image-only MVSNet, and our online reconstruction results are comparable to offline reconstruction methods. To benefit the research community, we open-source our code at https://github.com/hjxwhy/LODM.git
    @inproceedings{huang2022lls,
    title = {LODM: Large-scale Online Dense Mapping for UAV},
    author = {Jianxin Huang and Laijian Li and Xiangrui Zhao and Xiaolei Lang and Deye Zhu and Yong Liu},
    year = 2022,
    booktitle = {2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    doi = {10.1109/IROS47612.2022.9981994},
    abstract = {This paper proposes a method for online large-scale dense mapping. The UAV is within a range of 150-250 meters, combining GPS and visual odometry to estimate the scaled pose and sparse points. In order to use the depth of sparse points for depth map, we propose Sparse Confidence Cascade View-Aggregation MVSNet (SCCVA-MVSNet), which projects the depth-converged points in the sliding window on keyframes to obtain a sparse depth map. The photometric error constructs sparse confidence. The coarse depth and confidence through normalized convolution use the images of all keyframes, coarse depth, and confidence as the input of CVA-MVSNet to extract features and construct 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Our proposed network utilizes sparse features point information, the output of the network better maintains the consistency of the scale. Our experiments show that MVSNet using sparse feature point information outperforms image-only MVSNet, and our online reconstruction results are comparable to offline reconstruction methods. To benefit the research community, we open-source our code at https://github.com/hjxwhy/LODM.git}
    }
  • Jiangning Zhang, Chao Xu, Xiangrui Zhao, Liang Liu, Yong Liu, Jinqiang Yao, and Zaisheng Pan. Learning hierarchical and efficient Person re-identification for robotic navigation. International Journal of Intelligent Robotics and Applications, 5:104–118, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Recent works in the person re-identification task mainly focus on the model accuracy while ignoring factors related to efficiency, e.g., model size and latency, which are critical for practical application. In this paper, we propose a novel Hierarchical andEfficientNetwork (HENet) that learns hierarchical global, partial, and recovery features ensemble under the supervision of multiple loss combinations. To further improve the robustness against the irregular occlusion, we propose a new dataset augmentation approach, dubbed random polygon erasing, to random erase the input image’s irregular area imitating the body part missing. We also propose an EfficiencyScore (ES) metric to evaluate the model efficiency. Extensive experiments on Market1501, DukeMTMC-ReID, and CUHK03 datasets show the efficiency and superiority of our approach compared with epoch-making methods. We further deploy HENet on a robotic car, and the experimental result demonstrates the effectiveness of our method for robotic navigation.
    @article{zhang2021lha,
    title = {Learning hierarchical and efficient Person re-identification for robotic navigation},
    author = {Jiangning Zhang and Chao Xu and Xiangrui Zhao and Liang Liu and Yong Liu and Jinqiang Yao and Zaisheng Pan},
    year = 2021,
    journal = {International Journal of Intelligent Robotics and Applications},
    volume = 5,
    pages = {104--118},
    doi = {10.1007/s41315-021-00167-2},
    issue = 2,
    abstract = {Recent works in the person re-identification task mainly focus on the model accuracy while ignoring factors related to efficiency, e.g., model size and latency, which are critical for practical application. In this paper, we propose a novel Hierarchical andEfficientNetwork (HENet) that learns hierarchical global, partial, and recovery features ensemble under the supervision of multiple loss combinations. To further improve the robustness against the irregular occlusion, we propose a new dataset augmentation approach, dubbed random polygon erasing, to random erase the input image's irregular area imitating the body part missing. We also propose an EfficiencyScore (ES) metric to evaluate the model efficiency. Extensive experiments on Market1501, DukeMTMC-ReID, and CUHK03 datasets show the efficiency and superiority of our approach compared with epoch-making methods. We further deploy HENet on a robotic car, and the experimental result demonstrates the effectiveness of our method for robotic navigation.}
    }
  • Wenzhou Chen, Jinhong Xu, Xiangrui Zhao, Yong Liu, and Jian Yang. Separated Sonar Localization System for Indoor Robot Navigation. IEEE Transactions on Industrial Electronics, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    This work addresses the task of mobile robot local-ization for indoor navigation. In this paper, we propose a novel indoor localization system based on separated sonar sensors which can be deployed in large-scale indoor environments conveniently. In our approach, the separated sonar receivers deploy on the top ceiling, and the mobile robot equipped with the separated sonar transmitters navigates in the indoor environment. The distance measurements between the receivers and the transmitters can be obtained in real-time from the control board of receivers with the infrared synchronization. The positions of the mobile robot can be computed without accumulative error. And the proposed localization method can achieve high precision in the indoor localization tasks at a very low cost. We also present a calibration method based on the simultaneous localization and mapping(SLAM) to initialize the positions of our system. To evaluate the feasibility and the dynamic accuracy of the proposed system, we construct our localization system in the Virtual Robot Experimentation Platform(V-REP) simulation platform and deploy this system in a real-world environment. Both the simulation and real-world experiments have demonstrated that our system can achieve centimeter-level accuracy. The localization accuracy of the proposed system is sufficient for robot indoor navigation.
    @article{chen2021separatedsl,
    title = {Separated Sonar Localization System for Indoor Robot Navigation},
    author = {Wenzhou Chen and Jinhong Xu and Xiangrui Zhao and Yong Liu and Jian Yang},
    year = 2021,
    journal = {IEEE Transactions on Industrial Electronics},
    doi = {10.1109/TIE.2020.2994856},
    abstract = {This work addresses the task of mobile robot local-ization for indoor navigation. In this paper, we propose a novel indoor localization system based on separated sonar sensors which can be deployed in large-scale indoor environments conveniently. In our approach, the separated sonar receivers deploy on the top ceiling, and the mobile robot equipped with the separated sonar transmitters navigates in the indoor environment. The distance measurements between the receivers and the transmitters can be obtained in real-time from the control board of receivers with the infrared synchronization. The positions of the mobile robot can be computed without accumulative error. And the proposed localization method can achieve high precision in the indoor localization tasks at a very low cost. We also present a calibration method based on the simultaneous localization and mapping(SLAM) to initialize the positions of our system. To evaluate the feasibility and the dynamic accuracy of the proposed system, we construct our localization system in the Virtual Robot Experimentation Platform(V-REP) simulation platform and deploy this system in a real-world environment. Both the simulation and real-world experiments have demonstrated that our system can achieve centimeter-level accuracy. The localization accuracy of the proposed system is sufficient for robot indoor navigation.}
    }
  • Tianxin Huang, Hao Zou, Jinhao Cui, Xuemeng Yang, Mengmeng Wang, Xiangrui Zhao, Jiangning Zhang and Yi Yuan, Yifan Xu, and Yong Liu. RFNet: Recurrent Forward Network for Dense Point Cloud Completion. In 2021 International Conference on Computer Vision, pages 12488-12497, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Point cloud completion is an interesting and challenging task in 3D vision, aiming to recover complete shapes from sparse and incomplete point clouds. Existing learning based methods often require vast computation cost to achieve excellent performance, which limits their practical applications. In this paper, we propose a novel Recurrent Forward Network (RFNet), which is composed of three modules: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline. The RSP introduces details from the original incomplete models to refine the completion results. Besides, we propose a Sampling Chamfer Distance to better capture the shapes of models and a new Balanced Expansion Constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve the state-of-the-art with lower memory cost and faster convergence.
    @inproceedings{huang2021rfnetrf,
    title = {RFNet: Recurrent Forward Network for Dense Point Cloud Completion},
    author = {Tianxin Huang and Hao Zou and Jinhao Cui and Xuemeng Yang and Mengmeng Wang and Xiangrui Zhao and Jiangning Zhang and Yi Yuan and Yifan Xu and Yong Liu},
    year = 2021,
    booktitle = {2021 International Conference on Computer Vision},
    pages = {12488-12497},
    doi = {https://doi.org/10.1109/ICCV48922.2021.01228},
    abstract = {Point cloud completion is an interesting and challenging task in 3D vision, aiming to recover complete shapes from sparse and incomplete point clouds. Existing learning based methods often require vast computation cost to achieve excellent performance, which limits their practical applications. In this paper, we propose a novel Recurrent Forward Network (RFNet), which is composed of three modules: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline. The RSP introduces details from the original incomplete models to refine the completion results. Besides, we propose a Sampling Chamfer Distance to better capture the shapes of models and a new Balanced Expansion Constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve the state-of-the-art with lower memory cost and faster convergence.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, and Yong Liu. SSC: Semantic Scan Context for Large-Scale Place Recognition. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2092-2099, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Place recognition gives a SLAM system the ability to correct cumulative errors. Unlike images that contain rich texture features, point clouds are almost pure geometric information which makes place recognition based on point clouds challenging. Existing works usually encode low-level features such as coordinate, normal, reflection intensity, etc., as local or global descriptors to represent scenes. Besides, they often ignore the translation between point clouds when matching descriptors. Different from most existing methods, we explore the use of high-level features, namely semantics, to improve the descriptor’s representation ability. Also, when matching descriptors, we try to correct the translation between point clouds to improve accuracy. Concretely, we propose a novel global descriptor, Semantic Scan Context, which explores semantic information to represent scenes more effectively. We also present a two-step global semantic ICP to obtain the 3D pose (x, y, yaw) used to align the point cloud to improve matching performance. Our experiments on the KITTI dataset show that our approach outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/lilin-hitcrt/SSC.
    @inproceedings{li2021ssc,
    title = {SSC: Semantic Scan Context for Large-Scale Place Recognition},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Tianxin Huang and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {2092-2099},
    doi = {https://doi.org/10.1109/IROS51168.2021.9635904},
    abstract = {Place recognition gives a SLAM system the ability to correct cumulative errors. Unlike images that contain rich texture features, point clouds are almost pure geometric information which makes place recognition based on point clouds challenging. Existing works usually encode low-level features such as coordinate, normal, reflection intensity, etc., as local or global descriptors to represent scenes. Besides, they often ignore the translation between point clouds when matching descriptors. Different from most existing methods, we explore the use of high-level features, namely semantics, to improve the descriptor’s representation ability. Also, when matching descriptors, we try to correct the translation between point clouds to improve accuracy. Concretely, we propose a novel global descriptor, Semantic Scan Context, which explores semantic information to represent scenes more effectively. We also present a two-step global semantic ICP to obtain the 3D pose (x, y, yaw) used to align the point cloud to improve matching performance. Our experiments on the KITTI dataset show that our approach outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/lilin-hitcrt/SSC.}
    }
  • Jinhao Cui, Hao Zou, Xin Kong, Xuemeng Yang, Xiangrui Zhao, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network. In 2021 IEEE International Conference on Robotics and Automation, pages 1868-1874, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.
    @inproceedings{cui2021poconetso,
    title = {PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network},
    author = {Jinhao Cui and Hao Zou and Xin Kong and Xuemeng Yang and Xiangrui Zhao and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation},
    pages = {1868-1874},
    doi = {https://doi.org/10.1109/ICRA48506.2021.9561309},
    abstract = {In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, and Yong Liu. SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure. In 2021 IEEE International Conference on Robotics and Automation, pages 7627-7634, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure de-tection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM, named SA-LOAM, which leverages semantics in odometry as well as loop closure detection. Specifically, we propose a semantic-assisted ICP, including semantically matching, downsampling and plane constraint, and integrates a semantic graph-based place recognition method in our loop closure detection module. Benefitting from semantics, we can improve the localization accuracy, detect loop closures effec-tively, and construct a global consistent semantic map even in large-scale scenes. Extensive experiments on KITTI and Ford Campus dataset show that our system significantly improves baseline performance, has generalization ability to unseen data and achieves competitive results compared with state-of-the-art methods.
    @inproceedings{li2021ssa,
    title = {SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation},
    pages = {7627-7634},
    doi = {https://doi.org/10.1109/ICRA48506.2021.9560884},
    abstract = {LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure de-tection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM, named SA-LOAM, which leverages semantics in odometry as well as loop closure detection. Specifically, we propose a semantic-assisted ICP, including semantically matching, downsampling and plane constraint, and integrates a semantic graph-based place recognition method in our loop closure detection module. Benefitting from semantics, we can improve the localization accuracy, detect loop closures effec-tively, and construct a global consistent semantic map even in large-scale scenes. Extensive experiments on KITTI and Ford Campus dataset show that our system significantly improves baseline performance, has generalization ability to unseen data and achieves competitive results compared with state-of-the-art methods.}
    }
  • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong Liu, Wanlong Li, and Feng Wen. Semantic Graph Based Place Recognition for 3D Point Clouds. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8216–8223, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.
    @inproceedings{kong2020semanticgb,
    title = {Semantic Graph Based Place Recognition for 3D Point Clouds},
    author = {Xin Kong and Xuemeng Yang and Guangyao Zhai and Xiangrui Zhao and Xianfang Zeng and Mengmeng Wang and Yong Liu and Wanlong Li and Feng Wen},
    year = 2020,
    booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {8216--8223},
    doi = {https://doi.org/10.1109/IROS45743.2020.9341060},
    abstract = {Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.},
    arxiv = {https://arxiv.org/pdf/2008.11459.pdf}
    }
  • Xiangrui Zhao, Lina Liu, Renjie Zheng, Wenlong Ye, and Yong Liu. A Robust Stereo Feature-aided Semi-direct SLAM System. Robotics and Autonomous Systems, 132:103597, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    In autonomous driving, many intelligent perception technologies have been put in use. However, visual SLAM still has problems with robustness, which limits its application, although it has been developed for a long time. We propose a feature-aided semi-direct approach to combine the direct and indirect methods in visual SLAM to allow robust localization under various situations, including large-baseline motion, textureless environment, and great illumination changes. In our approach, we first calculate inter-frame pose estimation by feature matching. Then we use the direct alignment and a multi-scale pyramid, which employs the previous coarse estimation as a priori, to obtain a more precise result. To get more accurate photometric parameters, we combine the online photometric calibration method with visual odometry. Furthermore, we replace the Shi–Tomasi corner with the ORB feature, which is more robust to illumination. For extreme brightness change, we employ the dark channel prior to weaken the halation and maintain the consistency of the image. To evaluate our approach, we build a full stereo visual SLAM system. Experiments on the publicly available dataset and our mobile robot dataset indicate that our approach improves the accuracy and robustness of the SLAM system.
    @article{zhao2020ars,
    title = {A Robust Stereo Feature-aided Semi-direct SLAM System},
    author = {Xiangrui Zhao and Lina Liu and Renjie Zheng and Wenlong Ye and Yong Liu},
    year = 2020,
    journal = {Robotics and Autonomous Systems},
    volume = 132,
    pages = 103597,
    doi = {https://doi.org/10.1016/j.robot.2020.103597},
    abstract = {In autonomous driving, many intelligent perception technologies have been put in use. However, visual SLAM still has problems with robustness, which limits its application, although it has been developed for a long time. We propose a feature-aided semi-direct approach to combine the direct and indirect methods in visual SLAM to allow robust localization under various situations, including large-baseline motion, textureless environment, and great illumination changes. In our approach, we first calculate inter-frame pose estimation by feature matching. Then we use the direct alignment and a multi-scale pyramid, which employs the previous coarse estimation as a priori, to obtain a more precise result. To get more accurate photometric parameters, we combine the online photometric calibration method with visual odometry. Furthermore, we replace the Shi–Tomasi corner with the ORB feature, which is more robust to illumination. For extreme brightness change, we employ the dark channel prior to weaken the halation and maintain the consistency of the image. To evaluate our approach, we build a full stereo visual SLAM system. Experiments on the publicly available dataset and our mobile robot dataset indicate that our approach improves the accuracy and robustness of the SLAM system.}
    }
  • Xiangrui Zhao, Chunfang Deng, Xin Kong, Jinhong Xu, and Yong Liu. Learning to Compensate for the Drift and Error of Gyroscope in Vehicle Localization. In 2020 IEEE Intelligent Vehicles Symposium (IV), page 852–857, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    Self-localization is an essential technology for autonomous vehicles. Building robust odometry in a GPS-denied environment is still challenging, especially when LiDAR and camera are uninformative. In this paper, We propose a learning-based approach to cure the drift of gyroscope for vehicle localization. For consumer-level MEMS gyroscope (stability ∼10° /h), our GyroNet can estimate the error of each measurement. For high-precision Fiber optics Gyroscope (stability ∼0.05° /h), we build a FoGNet which can obtain its drift by observing data in a long time window. We perform comparative experiments on publicly available datasets. The results demonstrate that our GyroNet can get higher precision angular velocity than traditional digital filters and static initialization methods. In the vehicle localization, the FoGNet can effectively correct the small drift of the Fiber optics Gyroscope (FoG) and can achieve better results than the state-of-the-art method.
    @inproceedings{zhao2020learningtc,
    title = {Learning to Compensate for the Drift and Error of Gyroscope in Vehicle Localization},
    author = {Xiangrui Zhao and Chunfang Deng and Xin Kong and Jinhong Xu and Yong Liu},
    year = 2020,
    booktitle = {2020 IEEE Intelligent Vehicles Symposium (IV)},
    pages = {852--857},
    doi = {https://doi.org/10.1109/IV47402.2020.9304715},
    abstract = {Self-localization is an essential technology for autonomous vehicles. Building robust odometry in a GPS-denied environment is still challenging, especially when LiDAR and camera are uninformative. In this paper, We propose a learning-based approach to cure the drift of gyroscope for vehicle localization. For consumer-level MEMS gyroscope (stability ∼10° /h), our GyroNet can estimate the error of each measurement. For high-precision Fiber optics Gyroscope (stability ∼0.05° /h), we build a FoGNet which can obtain its drift by observing data in a long time window. We perform comparative experiments on publicly available datasets. The results demonstrate that our GyroNet can get higher precision angular velocity than traditional digital filters and static initialization methods. In the vehicle localization, the FoGNet can effectively correct the small drift of the Fiber optics Gyroscope (FoG) and can achieve better results than the state-of-the-art method.}
    }
  • Xiangrui Zhao, Renjie Zheng, Wenlong Ye, Yong Liu, and Mingyang Li. A Robust Stereo Semi-direct SLAM System Based on Hybrid Pyramid. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 5376–5382, 2019.
    [BibTeX] [Abstract] [DOI] [PDF]
    We propose a hybrid pyramid based approach to fuse the direct and indirect methods in visual SLAM, to allow robust localization under various situations including large-baseline motion, low-texture environment, and various illumination changes. In our approach, we first calculate coarse inter-frame pose estimation by matching the feature points. Subsequently, we use both direct image alignment and a multiscale pyramid method, for refining the previous estimation to attain better precision. Furthermore, we perform online photometric calibration along with pose estimation, to reduce un-modelled errors. To evaluate our approach, we conducted various real-world experiments on both public datasets and self-collected ones, by implementing a full SLAM system with the proposed methods. The results show that our system improves both localization accuracy and robustness by a wide margin.
    @inproceedings{zhao2019ars,
    title = {A Robust Stereo Semi-direct SLAM System Based on Hybrid Pyramid},
    author = {Xiangrui Zhao and Renjie Zheng and Wenlong Ye and Yong Liu and Mingyang Li},
    year = 2019,
    booktitle = {2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {5376--5382},
    doi = {https://doi.org/10.1109/IROS40897.2019.8968008},
    abstract = {We propose a hybrid pyramid based approach to fuse the direct and indirect methods in visual SLAM, to allow robust localization under various situations including large-baseline motion, low-texture environment, and various illumination changes. In our approach, we first calculate coarse inter-frame pose estimation by matching the feature points. Subsequently, we use both direct image alignment and a multiscale pyramid method, for refining the previous estimation to attain better precision. Furthermore, we perform online photometric calibration along with pose estimation, to reduce un-modelled errors. To evaluate our approach, we conducted various real-world experiments on both public datasets and self-collected ones, by implementing a full SLAM system with the proposed methods. The results show that our system improves both localization accuracy and robustness by a wide margin.}
    }