Address

Room 105, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Call: yukaima@zju.edu.cn

Yukai Ma

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include machine learning in sensor fusion and SLAM.

Research and Interests

  • Sensor Fusion
  • Machine Learning in Sensor Fusion
  • SLAM

Publications

  • Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Jongwon Ra, Yukai Ma, Laijian Li, and Yong Liu. Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network. IEEE Transactions on Image Processing, 33:5468-5481, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80% mIoU and 45.45% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at https://github.com/Jieqianyu/SGN.
    @article{mei2024cbs,
    title = {Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network},
    author = {Jianbiao Mei and Yu Yang and Mengmeng Wang and Junyu Zhu and Jongwon Ra and Yukai Ma and Laijian Li and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Image Processing},
    volume = 33,
    pages = {5468-5481},
    doi = {10.1109/TIP.2024.3461989},
    abstract = {Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80% mIoU and 45.45% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at https://github.com/Jieqianyu/SGN.}
    }
  • Han Li, Yukai Ma, Yuehao Huang, Yaqing Gu, Weihua Xu, Yong Liu, and Xingxing Zuo. RIDERS: Radar-Infrared Depth Estimation for Robust Sensing. IEEE Transactions on Intelligent Transportation Systems, 2024.
    [BibTeX] [Abstract] [DOI]
    Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense radar augmentation by learning radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios.
    @article{li2024riders,
    title = {RIDERS: Radar-Infrared Depth Estimation for Robust Sensing},
    author = {Han Li and Yukai Ma and Yuehao Huang and Yaqing Gu and Weihua Xu and Yong Liu and Xingxing Zuo},
    year = 2024,
    journal = {IEEE Transactions on Intelligent Transportation Systems},
    doi = {10.1109/TITS.2024.3432996},
    abstract = {Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense radar augmentation by learning radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios.}
    }
  • Yukai Ma, Han Li, Xiangrui Zhao, Yaqing Gu, Xiaolei Lang, Laijian Li, and Yong Liu. FMCW Radar on LiDAR Map Localization in Structural Urban Environments. Journal of Field Robotics, 41:699-717, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Multisensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the frequency-modulated continuous wave Radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method called Radar on LiDAR Map, which aims to enhance localization accuracy without relying on loop closures by mitigating the accumulated error in Radar odometry in real time. To accomplish this, we utilize LiDAR scans and ground truth paths as Teach paths and Radar scans as the trajectories to be estimated, referred to as Repeat paths. By establishing a correlation between the Radar and LiDAR scan data, we can enhance the accuracy of Radar odometry estimation. Our approach involves embedding the data from both Radar and LiDAR sensors into a density map. We calculate the spatial vector similarity with an offset to determine the corresponding place index within the candidate map and estimate the rotation and translation. To refine the alignment, we utilize the Iterative Closest Point algorithm to achieve optimal matching on the LiDAR submap. The estimated bias is subsequently incorporated into the Radar SLAM for optimizing the position map. We conducted extensive experiments on the Mulran Radar Data set, Oxford Radar RobotCar Dataset, and our data set to demonstrate the feasibility and effectiveness of our proposed approach. Our proposed scan projection descriptors achieves homogeneous and heterogeneous place recognition and works much better than existing methods. Its application to the Radar SLAM system also substantially improves the positioning accuracy. All sequences’ root mean square error is 2.53 m for positioning and 1.83 degrees for angle.
    @article{ma2024fmcw,
    title = {FMCW Radar on LiDAR Map Localization in Structural Urban Environments},
    author = {Yukai Ma and Han Li and Xiangrui Zhao and Yaqing Gu and Xiaolei Lang and Laijian Li and Yong Liu},
    year = 2024,
    journal = {Journal of Field Robotics},
    volume = 41,
    pages = {699-717},
    doi = {10.1002/rob.22291},
    abstract = {Multisensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the frequency-modulated continuous wave Radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method called Radar on LiDAR Map, which aims to enhance localization accuracy without relying on loop closures by mitigating the accumulated error in Radar odometry in real time. To accomplish this, we utilize LiDAR scans and ground truth paths as Teach paths and Radar scans as the trajectories to be estimated, referred to as Repeat paths. By establishing a correlation between the Radar and LiDAR scan data, we can enhance the accuracy of Radar odometry estimation. Our approach involves embedding the data from both Radar and LiDAR sensors into a density map. We calculate the spatial vector similarity with an offset to determine the corresponding place index within the candidate map and estimate the rotation and translation. To refine the alignment, we utilize the Iterative Closest Point algorithm to achieve optimal matching on the LiDAR submap. The estimated bias is subsequently incorporated into the Radar SLAM for optimizing the position map. We conducted extensive experiments on the Mulran Radar Data set, Oxford Radar RobotCar Dataset, and our data set to demonstrate the feasibility and effectiveness of our proposed approach. Our proposed scan projection descriptors achieves homogeneous and heterogeneous place recognition and works much better than existing methods. Its application to the Radar SLAM system also substantially improves the positioning accuracy. All sequences' root mean square error is 2.53 m for positioning and 1.83 degrees for angle.}
    }
  • Han Li, Yukai Ma, Yaqing Gu, Kewei Hu, Yong Liu, and Xingxing Zuo. RadarCam-Depth: Radar-Camera Fusion for Depth Estimation with Learned Metric Scale. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 10665-10672, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    We present a novel approach for metric dense depth estimation based on the fusion of a single-view image and a sparse, noisy Radar point cloud. The direct fusion of heterogeneous Radar and image data, or their encodings, tends to yield dense depth maps with significant artifacts, blurred boundaries, and suboptimal accuracy. To circumvent this issue, we learn to augment versatile and robust monocular depth prediction with the dense metric scale induced from sparse and noisy Radar data. We propose a Radar-Camera framework for highly accurate and fine-detailed dense depth estimation with four stages, including monocular depth prediction, global scale alignment of monocular depth with sparse Radar points, quasi-dense scale estimation through learning the association between Radar points and image patches, and local scale refinement of dense depth using a scale map learner. Our proposed method significantly outperforms the state-of-the-art Radar-Camera depth estimation methods by reducing the mean absolute error (MAE) of depth estimation by 25.6% and 40.2% on the challenging nuScenes dataset and our self-collected ZJU-4DRadarCam dataset, respectively. Our code and dataset will be released at https://github.com/MMOCKING/RadarCam-Depth.
    @inproceedings{li2024rcd,
    title = {RadarCam-Depth: Radar-Camera Fusion for Depth Estimation with Learned Metric Scale},
    author = {Han Li and Yukai Ma and Yaqing Gu and Kewei Hu and Yong Liu and Xingxing Zuo},
    year = 2024,
    booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)},
    pages = {10665-10672},
    doi = {10.1109/ICRA57147.2024.10610929},
    abstract = {We present a novel approach for metric dense depth estimation based on the fusion of a single-view image and a sparse, noisy Radar point cloud. The direct fusion of heterogeneous Radar and image data, or their encodings, tends to yield dense depth maps with significant artifacts, blurred boundaries, and suboptimal accuracy. To circumvent this issue, we learn to augment versatile and robust monocular depth prediction with the dense metric scale induced from sparse and noisy Radar data. We propose a Radar-Camera framework for highly accurate and fine-detailed dense depth estimation with four stages, including monocular depth prediction, global scale alignment of monocular depth with sparse Radar points, quasi-dense scale estimation through learning the association between Radar points and image patches, and local scale refinement of dense depth using a scale map learner. Our proposed method significantly outperforms the state-of-the-art Radar-Camera depth estimation methods by reducing the mean absolute error (MAE) of depth estimation by 25.6% and 40.2% on the challenging nuScenes dataset and our self-collected ZJU-4DRadarCam dataset, respectively. Our code and dataset will be released at https://github.com/MMOCKING/RadarCam-Depth.}
    }
  • Chencan Fu, Lin Li, Jianbiao Mei, Yukai Ma, Linpeng Peng, Xiangrui Zhao, and Yong Liu. A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 8493-8499, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird’s Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.
    @inproceedings{fu2024ctf,
    title = {A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation},
    author = {Chencan Fu and Lin Li and Jianbiao Mei and Yukai Ma and Linpeng Peng and Xiangrui Zhao and Yong Liu},
    year = 2024,
    booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)},
    pages = {8493-8499},
    doi = {10.1109/ICRA57147.2024.10611569},
    abstract = {Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird's Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.}
    }
  • Xiaolei Lang, Chao Chen, Kai Tang, Yukai Ma, Jiajun Lv, Yong Liu, and Xingxing Zuo. Coco-LIC: Continuous-Time Tightly-Coupled LiDAR-Inertial-Camera Odometry using Non-Uniform B-spline. IEEE Robotics and Automation Letters, 8:7074-7081, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this paper, we propose an effcient continuous-time LiDAR-Inertial-Camera Odometry, utilizing non-uniform B-splines to tightly couple measurements from the LiDAR, IMU, and camera. In contrast to uniform B-spline-based continuous-time methods, our non-uniform B-spline approach offers signifcant advantages in terms of achieving real-time effciency and high accuracy. This is accomplished by dynamically and adaptively placing control points, taking into account the varying dynamics of the motion. To enable effcient fusion of heterogeneous LiDAR-Inertial-Camera data within a short sliding-window optimization, we assign depth to visual pixels using corresponding map points from a global LiDAR map, and formulate frame-to-map reprojection factors for the associated pixels in the current image frame. This way circumvents the necessity for depth optimization of visual pixels, which typically entails a lengthy sliding window with numerous control points for continuous-time trajectory estimation. We conduct dedicated experiments on real-world datasets to demonstrate the advantage and effcacy of adopting non-uniform continuous-time trajectory representation. Our LiDAR-Inertial-Camera odometry system is also extensively evaluated on both challenging scenarios with sensor degenerations and large-scale scenarios, and has shown comparable or higher accuracy than the state-of-the-art methods. The codebase of this paper will also be open-sourced at https://github.com/APRIL-ZJU/Coco-LIC .
    @article{lang2023lic,
    title = {Coco-LIC: Continuous-Time Tightly-Coupled LiDAR-Inertial-Camera Odometry using Non-Uniform B-spline},
    author = {Xiaolei Lang and Chao Chen and Kai Tang and Yukai Ma and Jiajun Lv and Yong Liu and Xingxing Zuo},
    year = 2023,
    journal = {IEEE Robotics and Automation Letters},
    volume = 8,
    pages = {7074-7081},
    doi = {10.1109/LRA.2023.3315542},
    abstract = {In this paper, we propose an effcient continuous-time LiDAR-Inertial-Camera Odometry, utilizing non-uniform B-splines to tightly couple measurements from the LiDAR, IMU, and camera. In contrast to uniform B-spline-based continuous-time methods, our non-uniform B-spline approach offers signifcant advantages in terms of achieving real-time effciency and high accuracy. This is accomplished by dynamically and adaptively placing control points, taking into account the varying dynamics of the motion. To enable effcient fusion of heterogeneous LiDAR-Inertial-Camera data within a short sliding-window optimization, we assign depth to visual pixels using corresponding map points from a global LiDAR map, and formulate frame-to-map reprojection factors for the associated pixels in the current image frame. This way circumvents the necessity for depth optimization of visual pixels, which typically entails a lengthy sliding window with numerous control points for continuous-time trajectory estimation. We conduct dedicated experiments on real-world datasets to demonstrate the advantage and effcacy of adopting non-uniform continuous-time trajectory representation. Our LiDAR-Inertial-Camera odometry system is also extensively evaluated on both challenging scenarios with sensor degenerations and large-scale scenarios, and has shown comparable or higher accuracy than the state-of-the-art methods. The codebase of this paper will also be open-sourced at https://github.com/APRIL-ZJU/Coco-LIC .}
    }
  • Laijian Li, Yukai Ma, Kai Tang, Xiangrui Zhao, Chao Chen, Jianxin Huang, Jianbiao Mei, and Yong Liu. Geo-localization with Transformer-based 2D-3D match Network. IEEE Robotics and Automation Letters (RA-L), 8:4855-4862, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This letter presents a novel method for geographical localization by registering satellite maps with LiDAR point clouds. This method includes a Transformer-based 2D-3D matching network called D-GLSNet that directly matches the LiDAR point clouds and satellite images through end-to-end learning. Without the need for feature point detection, D-GLSNet provides accurate pixel-to-point association between the LiDAR point clouds and satellite images. And then, we can easily calculate the horizontal offset (Δx,Δy) and angular deviation Δθyaw between them, thereby achieving accurate registration. To demonstrate our network’s localization potential, we have designed a Geo-localization Node (GLN) that implements geographical localization and is plug-and-play in the SLAM system. Compared to GPS, GLN is less susceptible to external interference, such as building occlusion. In urban scenarios, our proposed D-GLSNet can output high-quality matching, enabling GLN to function stably and deliver more accurate localization results. Extensive experiments on the KITTI dataset show that our D-GLSNet method achieves a mean Relative Translation Error (RTE) of 1.43 m. Furthermore, our method outperforms state-of-the-art LiDAR-based geospatial localization methods when combined with odometry.
    @article{li2023glw,
    title = {Geo-localization with Transformer-based 2D-3D match Network},
    author = {Laijian Li and Yukai Ma and Kai Tang and Xiangrui Zhao and Chao Chen and Jianxin Huang and Jianbiao Mei and Yong Liu},
    year = 2023,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = 8,
    pages = {4855-4862},
    doi = {10.1109/LRA.2023.3290526},
    abstract = {This letter presents a novel method for geographical localization by registering satellite maps with LiDAR point clouds. This method includes a Transformer-based 2D-3D matching network called D-GLSNet that directly matches the LiDAR point clouds and satellite images through end-to-end learning. Without the need for feature point detection, D-GLSNet provides accurate pixel-to-point association between the LiDAR point clouds and satellite images. And then, we can easily calculate the horizontal offset (Δx,Δy) and angular deviation Δθyaw between them, thereby achieving accurate registration. To demonstrate our network's localization potential, we have designed a Geo-localization Node (GLN) that implements geographical localization and is plug-and-play in the SLAM system. Compared to GPS, GLN is less susceptible to external interference, such as building occlusion. In urban scenarios, our proposed D-GLSNet can output high-quality matching, enabling GLN to function stably and deliver more accurate localization results. Extensive experiments on the KITTI dataset show that our D-GLSNet method achieves a mean Relative Translation Error (RTE) of 1.43 m. Furthermore, our method outperforms state-of-the-art LiDAR-based geospatial localization methods when combined with odometry.}
    }
  • Chao Chen, Yukai Ma, Jiajun Lv, Xiangrui Zhao, Laijian Li, Yong Liu, and Wang Gao. OL-SLAM: A Robust and Versatile System of Object Localization and SLAM. Sensors, 23:801, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a real-time, versatile Simultaneous Localization and Mapping (SLAM) and object localization system, which fuses measurements from LiDAR, camera, Inertial Measurement Unit (IMU), and Global Positioning System (GPS). Our system can locate itself in an unknown environment and build a scene map based on which we can also track and obtain the global location of objects of interest. Precisely, our SLAM subsystem consists of the following four parts: LiDAR-inertial odometry, Visual-inertial odometry, GPS-inertial odometry, and global pose graph optimization. The target-tracking and positioning subsystem is developed based on YOLOv4. Benefiting from the use of GPS sensor in the SLAM system, we can obtain the global positioning information of the target; therefore, it can be highly useful in military operations, rescue and disaster relief, and other scenarios.
    @article{chen2023ols,
    title = {OL-SLAM: A Robust and Versatile System of Object Localization and SLAM},
    author = {Chao Chen and Yukai Ma and Jiajun Lv and Xiangrui Zhao and Laijian Li and Yong Liu and Wang Gao},
    year = 2023,
    journal = {Sensors},
    volume = 23,
    pages = {801},
    doi = {10.3390/s23020801},
    abstract = {This paper proposes a real-time, versatile Simultaneous Localization and Mapping (SLAM) and object localization system, which fuses measurements from LiDAR, camera, Inertial Measurement Unit (IMU), and Global Positioning System (GPS). Our system can locate itself in an unknown environment and build a scene map based on which we can also track and obtain the global location of objects of interest. Precisely, our SLAM subsystem consists of the following four parts: LiDAR-inertial odometry, Visual-inertial odometry, GPS-inertial odometry, and global pose graph optimization. The target-tracking and positioning subsystem is developed based on YOLOv4. Benefiting from the use of GPS sensor in the SLAM system, we can obtain the global positioning information of the target; therefore, it can be highly useful in military operations, rescue and disaster relief, and other scenarios.}
    }
  • Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, and Yong Liu. SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration. In 37th Conference on Neural Information Processing Systems (NeurIPS), pages 52033-52050, 2023.
    [BibTeX] [Abstract] [PDF]
    The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.
    @inproceedings{xiang2023subp,
    title = {SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration},
    author = {Jingyang Xiang and Siqi Li and Jun Chen and Shipeng Bai and Yukai Ma and Guang Dai and Yong Liu},
    year = 2023,
    booktitle = {37th Conference on Neural Information Processing Systems (NeurIPS)},
    pages = {52033-52050},
    abstract = {The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1×N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1×N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1×N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1×N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.}
    }
  • Chao Chen, Hangyu Wu, Yukai Ma, Jiajun Lv, Laijian Li, and Yong Liu. LiDAR-Inertial SLAM with Efficiently Extracted Planes. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1497-1504, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a LiDAR-Inertial SLAM with efficiently extracted planes, which couples explicit planes in the odometry to improve accuracy and in the mapping for consistency. The proposed method consists of three parts: an efficient Point →Line→Plane extraction algorithm, a LiDAR-Inertial-Plane tightly coupled odometry, and a global plane-aided mapping. Specifically, we leverage the ring field of the LiDAR point cloud to accelerate the region-growing-based plane extraction algorithm. Then we tightly coupled IMU pre-integration factors, LiDAR odometry factors, and explicit plane factors in the sliding window to obtain a more accurate initial pose for mapping. Finally, we maintain explicit planes in the global map, and enhance system consistency by optimizing the factor graph of optimized odometry factors and plane observation factors. Experimental results show that our plane extraction method is efficient, and the proposed plane-aided LiDAR-Inertial SLAM significantly improves the accuracy and consistency compared to the other state-of-the-art algorithms with only a small increase in time consumption.
    @inproceedings{chen2023lidar,
    title = {LiDAR-Inertial SLAM with Efficiently Extracted Planes},
    author = {Chao Chen and Hangyu Wu and Yukai Ma and Jiajun Lv and Laijian Li and Yong Liu},
    year = 2023,
    booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {1497-1504},
    doi = {10.1109/IROS55552.2023.10342325},
    abstract = {This paper proposes a LiDAR-Inertial SLAM with efficiently extracted planes, which couples explicit planes in the odometry to improve accuracy and in the mapping for consistency. The proposed method consists of three parts: an efficient Point →Line→Plane extraction algorithm, a LiDAR-Inertial-Plane tightly coupled odometry, and a global plane-aided mapping. Specifically, we leverage the ring field of the LiDAR point cloud to accelerate the region-growing-based plane extraction algorithm. Then we tightly coupled IMU pre-integration factors, LiDAR odometry factors, and explicit plane factors in the sliding window to obtain a more accurate initial pose for mapping. Finally, we maintain explicit planes in the global map, and enhance system consistency by optimizing the factor graph of optimized odometry factors and plane observation factors. Experimental results show that our plane extraction method is efficient, and the proposed plane-aided LiDAR-Inertial SLAM significantly improves the accuracy and consistency compared to the other state-of-the-art algorithms with only a small increase in time consumption.}
    }
  • Yukai Ma, Xiangrui Zhao, Han Li, Yaqing Gu, Xiaolei Lang, and Yong Liu. RoLM:Radar on LiDAR Map Localization. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3976-3982, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Multi-sensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the FMCW radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method of Radar on LiDAR Map (RoLM), which can eliminate the accumulated error of radar odometry in real-time to achieve higher localization accuracy without dependence on loop closures. We embed the two sensor modalities into a density map and calculate the spatial vector similarity with offset to seek the corresponding place index in the candidates and calculate the rotation and translation. We use the ICP to pursue perfect matching on the LiDAR submap based on the coarse alignment. Extensive experiments on Mulran Radar Dataset, Oxford Radar RobotCar Dataset, and our data verify the feasibility and effectiveness of our approach.
    @inproceedings{ma2023rol,
    title = {RoLM:Radar on LiDAR Map Localization},
    author = {Yukai Ma and Xiangrui Zhao and Han Li and Yaqing Gu and Xiaolei Lang and Yong Liu},
    year = 2023,
    booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)},
    pages = {3976-3982},
    doi = {10.1109/ICRA48891.2023.10161203},
    abstract = {Multi-sensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the FMCW radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method of Radar on LiDAR Map (RoLM), which can eliminate the accumulated error of radar odometry in real-time to achieve higher localization accuracy without dependence on loop closures. We embed the two sensor modalities into a density map and calculate the spatial vector similarity with offset to seek the corresponding place index in the candidates and calculate the rotation and translation. We use the ICP to pursue perfect matching on the LiDAR submap based on the coarse alignment. Extensive experiments on Mulran Radar Dataset, Oxford Radar RobotCar Dataset, and our data verify the feasibility and effectiveness of our approach.}
    }
  • Xiaolei Lang, Jiajun Lv, Jianxin Huang, Yukai Ma, Yong Liu, and Xingxing Zuo. Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras. IEEE Robotics and Automation Letters (RA-L), 7(4):11537-11544, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this letter, we propose a probabilistic continuoustime visual-inertial odometry (VIO) for rolling shutter cameras. The continuous-time trajectory formulation naturally facilitates the fusion of asynchronized high-frequency IMU data and motion distorted rolling shutter images. To prevent intractable computation load, the proposed VIO is sliding-window and keyframe-based. We propose to probabilistically marginalize the control points to keep the constant number of keyframes in the sliding window. Furthermore, the line exposure time difference (line delay) of the rolling shutter camera can be online calibrated in our continuous-time VIO. To extensively examine the performance of our continuoustime VIO, experiments are conducted on publicly-available WHURSVI, TUM-RSVI, and SenseTime-RSVI rolling shutter datasets. The results demonstrate the proposed continuous-time VIO significantly outperforms the existing state-of-the-art VIO methods.
    @article{lang2022ctv,
    title = {Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras},
    author = {Xiaolei Lang and Jiajun Lv and Jianxin Huang and Yukai Ma and Yong Liu and Xingxing Zuo},
    year = 2022,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = {7},
    number = {4},
    pages = {11537-11544},
    doi = {10.1109/LRA.2022.3202349},
    abstract = {In this letter, we propose a probabilistic continuoustime visual-inertial odometry (VIO) for rolling shutter cameras. The continuous-time trajectory formulation naturally facilitates
    the fusion of asynchronized high-frequency IMU data and motion distorted rolling shutter images. To prevent intractable computation load, the proposed VIO is sliding-window and keyframe-based. We propose to probabilistically marginalize the control points to keep the constant number of keyframes in the sliding window. Furthermore, the line exposure time difference (line delay) of the rolling
    shutter camera can be online calibrated in our continuous-time VIO. To extensively examine the performance of our continuoustime VIO, experiments are conducted on publicly-available WHURSVI, TUM-RSVI, and SenseTime-RSVI rolling shutter datasets. The results demonstrate the proposed continuous-time VIO significantly outperforms the existing state-of-the-art VIO methods.}
    }

Links