Address

Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: lilaijian@zju.edu.cn

Laijian Li

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interest is Simultaneous Localization And Mapping(SLAM).

Research and Interests

  • Simultaneous Localization
  • Mapping(SLAM)

Publications

  • Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Jongwon Ra, Yukai Ma, Laijian Li, and Yong Liu. Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network. IEEE Transactions on Image Processing, 33:5468-5481, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80% mIoU and 45.45% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at https://github.com/Jieqianyu/SGN.
    @article{mei2024cbs,
    title = {Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network},
    author = {Jianbiao Mei and Yu Yang and Mengmeng Wang and Junyu Zhu and Jongwon Ra and Yukai Ma and Laijian Li and Yong Liu},
    year = 2024,
    journal = {IEEE Transactions on Image Processing},
    volume = 33,
    pages = {5468-5481},
    doi = {10.1109/TIP.2024.3461989},
    abstract = {Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80% mIoU and 45.45% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at https://github.com/Jieqianyu/SGN.}
    }
  • Yukai Ma, Han Li, Xiangrui Zhao, Yaqing Gu, Xiaolei Lang, Laijian Li, and Yong Liu. FMCW Radar on LiDAR Map Localization in Structural Urban Environments. Journal of Field Robotics, 41:699-717, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Multisensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the frequency-modulated continuous wave Radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method called Radar on LiDAR Map, which aims to enhance localization accuracy without relying on loop closures by mitigating the accumulated error in Radar odometry in real time. To accomplish this, we utilize LiDAR scans and ground truth paths as Teach paths and Radar scans as the trajectories to be estimated, referred to as Repeat paths. By establishing a correlation between the Radar and LiDAR scan data, we can enhance the accuracy of Radar odometry estimation. Our approach involves embedding the data from both Radar and LiDAR sensors into a density map. We calculate the spatial vector similarity with an offset to determine the corresponding place index within the candidate map and estimate the rotation and translation. To refine the alignment, we utilize the Iterative Closest Point algorithm to achieve optimal matching on the LiDAR submap. The estimated bias is subsequently incorporated into the Radar SLAM for optimizing the position map. We conducted extensive experiments on the Mulran Radar Data set, Oxford Radar RobotCar Dataset, and our data set to demonstrate the feasibility and effectiveness of our proposed approach. Our proposed scan projection descriptors achieves homogeneous and heterogeneous place recognition and works much better than existing methods. Its application to the Radar SLAM system also substantially improves the positioning accuracy. All sequences’ root mean square error is 2.53 m for positioning and 1.83 degrees for angle.
    @article{ma2024fmcw,
    title = {FMCW Radar on LiDAR Map Localization in Structural Urban Environments},
    author = {Yukai Ma and Han Li and Xiangrui Zhao and Yaqing Gu and Xiaolei Lang and Laijian Li and Yong Liu},
    year = 2024,
    journal = {Journal of Field Robotics},
    volume = 41,
    pages = {699-717},
    doi = {10.1002/rob.22291},
    abstract = {Multisensor fusion-based localization technology has achieved high accuracy in autonomous systems. How to improve the robustness is the main challenge at present. The most commonly used LiDAR and camera are weather-sensitive, while the frequency-modulated continuous wave Radar has strong adaptability but suffers from noise and ghost effects. In this paper, we propose a heterogeneous localization method called Radar on LiDAR Map, which aims to enhance localization accuracy without relying on loop closures by mitigating the accumulated error in Radar odometry in real time. To accomplish this, we utilize LiDAR scans and ground truth paths as Teach paths and Radar scans as the trajectories to be estimated, referred to as Repeat paths. By establishing a correlation between the Radar and LiDAR scan data, we can enhance the accuracy of Radar odometry estimation. Our approach involves embedding the data from both Radar and LiDAR sensors into a density map. We calculate the spatial vector similarity with an offset to determine the corresponding place index within the candidate map and estimate the rotation and translation. To refine the alignment, we utilize the Iterative Closest Point algorithm to achieve optimal matching on the LiDAR submap. The estimated bias is subsequently incorporated into the Radar SLAM for optimizing the position map. We conducted extensive experiments on the Mulran Radar Data set, Oxford Radar RobotCar Dataset, and our data set to demonstrate the feasibility and effectiveness of our proposed approach. Our proposed scan projection descriptors achieves homogeneous and heterogeneous place recognition and works much better than existing methods. Its application to the Radar SLAM system also substantially improves the positioning accuracy. All sequences' root mean square error is 2.53 m for positioning and 1.83 degrees for angle.}
    }
  • Laijian Li, Yukai Ma, Kai Tang, Xiangrui Zhao, Chao Chen, Jianxin Huang, Jianbiao Mei, and Yong Liu. Geo-localization with Transformer-based 2D-3D match Network. IEEE Robotics and Automation Letters (RA-L), 8:4855-4862, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This letter presents a novel method for geographical localization by registering satellite maps with LiDAR point clouds. This method includes a Transformer-based 2D-3D matching network called D-GLSNet that directly matches the LiDAR point clouds and satellite images through end-to-end learning. Without the need for feature point detection, D-GLSNet provides accurate pixel-to-point association between the LiDAR point clouds and satellite images. And then, we can easily calculate the horizontal offset (Δx,Δy) and angular deviation Δθyaw between them, thereby achieving accurate registration. To demonstrate our network’s localization potential, we have designed a Geo-localization Node (GLN) that implements geographical localization and is plug-and-play in the SLAM system. Compared to GPS, GLN is less susceptible to external interference, such as building occlusion. In urban scenarios, our proposed D-GLSNet can output high-quality matching, enabling GLN to function stably and deliver more accurate localization results. Extensive experiments on the KITTI dataset show that our D-GLSNet method achieves a mean Relative Translation Error (RTE) of 1.43 m. Furthermore, our method outperforms state-of-the-art LiDAR-based geospatial localization methods when combined with odometry.
    @article{li2023glw,
    title = {Geo-localization with Transformer-based 2D-3D match Network},
    author = {Laijian Li and Yukai Ma and Kai Tang and Xiangrui Zhao and Chao Chen and Jianxin Huang and Jianbiao Mei and Yong Liu},
    year = 2023,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = 8,
    pages = {4855-4862},
    doi = {10.1109/LRA.2023.3290526},
    abstract = {This letter presents a novel method for geographical localization by registering satellite maps with LiDAR point clouds. This method includes a Transformer-based 2D-3D matching network called D-GLSNet that directly matches the LiDAR point clouds and satellite images through end-to-end learning. Without the need for feature point detection, D-GLSNet provides accurate pixel-to-point association between the LiDAR point clouds and satellite images. And then, we can easily calculate the horizontal offset (Δx,Δy) and angular deviation Δθyaw between them, thereby achieving accurate registration. To demonstrate our network's localization potential, we have designed a Geo-localization Node (GLN) that implements geographical localization and is plug-and-play in the SLAM system. Compared to GPS, GLN is less susceptible to external interference, such as building occlusion. In urban scenarios, our proposed D-GLSNet can output high-quality matching, enabling GLN to function stably and deliver more accurate localization results. Extensive experiments on the KITTI dataset show that our D-GLSNet method achieves a mean Relative Translation Error (RTE) of 1.43 m. Furthermore, our method outperforms state-of-the-art LiDAR-based geospatial localization methods when combined with odometry.}
    }
  • Chao Chen, Yukai Ma, Jiajun Lv, Xiangrui Zhao, Laijian Li, Yong Liu, and Wang Gao. OL-SLAM: A Robust and Versatile System of Object Localization and SLAM. Sensors, 23:801, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a real-time, versatile Simultaneous Localization and Mapping (SLAM) and object localization system, which fuses measurements from LiDAR, camera, Inertial Measurement Unit (IMU), and Global Positioning System (GPS). Our system can locate itself in an unknown environment and build a scene map based on which we can also track and obtain the global location of objects of interest. Precisely, our SLAM subsystem consists of the following four parts: LiDAR-inertial odometry, Visual-inertial odometry, GPS-inertial odometry, and global pose graph optimization. The target-tracking and positioning subsystem is developed based on YOLOv4. Benefiting from the use of GPS sensor in the SLAM system, we can obtain the global positioning information of the target; therefore, it can be highly useful in military operations, rescue and disaster relief, and other scenarios.
    @article{chen2023ols,
    title = {OL-SLAM: A Robust and Versatile System of Object Localization and SLAM},
    author = {Chao Chen and Yukai Ma and Jiajun Lv and Xiangrui Zhao and Laijian Li and Yong Liu and Wang Gao},
    year = 2023,
    journal = {Sensors},
    volume = 23,
    pages = {801},
    doi = {10.3390/s23020801},
    abstract = {This paper proposes a real-time, versatile Simultaneous Localization and Mapping (SLAM) and object localization system, which fuses measurements from LiDAR, camera, Inertial Measurement Unit (IMU), and Global Positioning System (GPS). Our system can locate itself in an unknown environment and build a scene map based on which we can also track and obtain the global location of objects of interest. Precisely, our SLAM subsystem consists of the following four parts: LiDAR-inertial odometry, Visual-inertial odometry, GPS-inertial odometry, and global pose graph optimization. The target-tracking and positioning subsystem is developed based on YOLOv4. Benefiting from the use of GPS sensor in the SLAM system, we can obtain the global positioning information of the target; therefore, it can be highly useful in military operations, rescue and disaster relief, and other scenarios.}
    }
  • Jianbiao Mei, Yu Yang, Mengmeng Wang, Zizhang Li, Xiaojun Hou, Jongwon Ra, Laijian Li, and Yong Liu. CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation. In 31st ACM International Conference on Multimedia (MM), pages 1884-1894, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper focuses on LiDAR Panoptic Segmentation (LPS), which has attracted more attention recently due to its broad application prospect for autonomous driving and robotics. The mainstream LPS approaches either adopt a top-down strategy relying on 3D object detectors to discover instances or utilize time-consuming heuristic clustering algorithms to group instances in a bottom-up manner. Inspired by the center representation and kernel-based segmentation, we propose a new detection-free and clustering-free framework called CenterLPS, with the center-based instance encoding and decoding paradigm. Specifically, we propose a sparse center proposal network to generate the sparse 3D instance centers, as well as center feature embedding, which can well encode characteristics of instances. Then a center-aware transformer is applied to collect the context between different center feature embedding and around centers. Moreover, we generate the kernel weights based on the enhanced center feature embedding and initialize dynamic convolutions to decode the final instance masks. Finally, a mask fusion module is devised to unify the semantic and instance predictions and improve the panoptic quality. Extensive experiments on SemanticKITTI and nuScenes demonstrate the effectiveness of our proposed center-based framework CenterLPS.
    @inproceedings{mei2023lps,
    title = {CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation},
    author = {Jianbiao Mei and Yu Yang and Mengmeng Wang and Zizhang Li and Xiaojun Hou and Jongwon Ra and Laijian Li and Yong Liu},
    year = 2023,
    booktitle = {31st ACM International Conference on Multimedia (MM)},
    pages = {1884-1894},
    doi = {10.1145/3581783.3612080},
    abstract = {This paper focuses on LiDAR Panoptic Segmentation (LPS), which has attracted more attention recently due to its broad application prospect for autonomous driving and robotics. The mainstream LPS approaches either adopt a top-down strategy relying on 3D object detectors to discover instances or utilize time-consuming heuristic clustering algorithms to group instances in a bottom-up manner. Inspired by the center representation and kernel-based segmentation, we propose a new detection-free and clustering-free framework called CenterLPS, with the center-based instance encoding and decoding paradigm. Specifically, we propose a sparse center proposal network to generate the sparse 3D instance centers, as well as center feature embedding, which can well encode characteristics of instances. Then a center-aware transformer is applied to collect the context between different center feature embedding and around centers. Moreover, we generate the kernel weights based on the enhanced center feature embedding and initialize dynamic convolutions to decode the final instance masks. Finally, a mask fusion module is devised to unify the semantic and instance predictions and improve the panoptic quality. Extensive experiments on SemanticKITTI and nuScenes demonstrate the effectiveness of our proposed center-based framework CenterLPS.}
    }
  • Chao Chen, Hangyu Wu, Yukai Ma, Jiajun Lv, Laijian Li, and Yong Liu. LiDAR-Inertial SLAM with Efficiently Extracted Planes. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1497-1504, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a LiDAR-Inertial SLAM with efficiently extracted planes, which couples explicit planes in the odometry to improve accuracy and in the mapping for consistency. The proposed method consists of three parts: an efficient Point →Line→Plane extraction algorithm, a LiDAR-Inertial-Plane tightly coupled odometry, and a global plane-aided mapping. Specifically, we leverage the ring field of the LiDAR point cloud to accelerate the region-growing-based plane extraction algorithm. Then we tightly coupled IMU pre-integration factors, LiDAR odometry factors, and explicit plane factors in the sliding window to obtain a more accurate initial pose for mapping. Finally, we maintain explicit planes in the global map, and enhance system consistency by optimizing the factor graph of optimized odometry factors and plane observation factors. Experimental results show that our plane extraction method is efficient, and the proposed plane-aided LiDAR-Inertial SLAM significantly improves the accuracy and consistency compared to the other state-of-the-art algorithms with only a small increase in time consumption.
    @inproceedings{chen2023lidar,
    title = {LiDAR-Inertial SLAM with Efficiently Extracted Planes},
    author = {Chao Chen and Hangyu Wu and Yukai Ma and Jiajun Lv and Laijian Li and Yong Liu},
    year = 2023,
    booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {1497-1504},
    doi = {10.1109/IROS55552.2023.10342325},
    abstract = {This paper proposes a LiDAR-Inertial SLAM with efficiently extracted planes, which couples explicit planes in the odometry to improve accuracy and in the mapping for consistency. The proposed method consists of three parts: an efficient Point →Line→Plane extraction algorithm, a LiDAR-Inertial-Plane tightly coupled odometry, and a global plane-aided mapping. Specifically, we leverage the ring field of the LiDAR point cloud to accelerate the region-growing-based plane extraction algorithm. Then we tightly coupled IMU pre-integration factors, LiDAR odometry factors, and explicit plane factors in the sliding window to obtain a more accurate initial pose for mapping. Finally, we maintain explicit planes in the global map, and enhance system consistency by optimizing the factor graph of optimized odometry factors and plane observation factors. Experimental results show that our plane extraction method is efficient, and the proposed plane-aided LiDAR-Inertial SLAM significantly improves the accuracy and consistency compared to the other state-of-the-art algorithms with only a small increase in time consumption.}
    }
  • Jianbiao Mei, Yu Yang, Mengmeng Wang, Xiaojun Hou, Laijian Li, and Yong Liu. PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7726-7733, 2023.
    [BibTeX] [Abstract] [DOI] [PDF]
    Reliable LiDAR panoptic segmentation (LPS), including both semantic and instance segmentation, is vital for many robotic applications, such as autonomous driving. This work proposes a new LPS framework named PANet to eliminate the dependency on the offset branch and improve the performance on large objects, which are always over-segmented by clustering algorithms. Firstly, we propose a non-learning Sparse Instance Proposal (SIP) module with the “sampling-shifting-grouping” scheme to directly group thing points into instances from the raw point cloud efficiently. More specifically, balanced point sampling is introduced to generate sparse seed points with more uniform point distribution over the distance range. And a shift module, termed bubble shifting, is proposed to shrink the seed points to the clustered centers. Then we utilize the connected component label algorithm to generate instance proposals. Furthermore, an instance aggregation module is devised to integrate potentially fragmented instances, improving the performance of the SIP module on large objects. Extensive experiments show that PANet achieves state-of-the-art performance among published works on the SemanticKITII validation and nuScenes validation for the panoptic segmentation task. Code is available at https://github.com/Jieqianyu/PANet.git.
    @inproceedings{mei2023pan,
    title = {PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation},
    author = {Jianbiao Mei and Yu Yang and Mengmeng Wang and Xiaojun Hou and Laijian Li and Yong Liu},
    year = 2023,
    booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {7726-7733},
    doi = {10.1109/IROS55552.2023.10342468},
    abstract = {Reliable LiDAR panoptic segmentation (LPS), including both semantic and instance segmentation, is vital for many robotic applications, such as autonomous driving. This work proposes a new LPS framework named PANet to eliminate the dependency on the offset branch and improve the performance on large objects, which are always over-segmented by clustering algorithms. Firstly, we propose a non-learning Sparse Instance Proposal (SIP) module with the “sampling-shifting-grouping” scheme to directly group thing points into instances from the raw point cloud efficiently. More specifically, balanced point sampling is introduced to generate sparse seed points with more uniform point distribution over the distance range. And a shift module, termed bubble shifting, is proposed to shrink the seed points to the clustered centers. Then we utilize the connected component label algorithm to generate instance proposals. Furthermore, an instance aggregation module is devised to integrate potentially fragmented instances, improving the performance of the SIP module on large objects. Extensive experiments show that PANet achieves state-of-the-art performance among published works on the SemanticKITII validation and nuScenes validation for the panoptic segmentation task. Code is available at https://github.com/Jieqianyu/PANet.git.}
    }
  • Jianxin Huang, Laijian Li, Xiangrui Zhao, Xiaolei Lang, Deye Zhu, and Yong Liu. LODM: Large-scale Online Dense Mapping for UAV. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper proposes a method for online large-scale dense mapping. The UAV is within a range of 150-250 meters, combining GPS and visual odometry to estimate the scaled pose and sparse points. In order to use the depth of sparse points for depth map, we propose Sparse Confidence Cascade View-Aggregation MVSNet (SCCVA-MVSNet), which projects the depth-converged points in the sliding window on keyframes to obtain a sparse depth map. The photometric error constructs sparse confidence. The coarse depth and confidence through normalized convolution use the images of all keyframes, coarse depth, and confidence as the input of CVA-MVSNet to extract features and construct 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Our proposed network utilizes sparse features point information, the output of the network better maintains the consistency of the scale. Our experiments show that MVSNet using sparse feature point information outperforms image-only MVSNet, and our online reconstruction results are comparable to offline reconstruction methods. To benefit the research community, we open-source our code at https://github.com/hjxwhy/LODM.git
    @inproceedings{huang2022lls,
    title = {LODM: Large-scale Online Dense Mapping for UAV},
    author = {Jianxin Huang and Laijian Li and Xiangrui Zhao and Xiaolei Lang and Deye Zhu and Yong Liu},
    year = 2022,
    booktitle = {2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    doi = {10.1109/IROS47612.2022.9981994},
    abstract = {This paper proposes a method for online large-scale dense mapping. The UAV is within a range of 150-250 meters, combining GPS and visual odometry to estimate the scaled pose and sparse points. In order to use the depth of sparse points for depth map, we propose Sparse Confidence Cascade View-Aggregation MVSNet (SCCVA-MVSNet), which projects the depth-converged points in the sliding window on keyframes to obtain a sparse depth map. The photometric error constructs sparse confidence. The coarse depth and confidence through normalized convolution use the images of all keyframes, coarse depth, and confidence as the input of CVA-MVSNet to extract features and construct 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Our proposed network utilizes sparse features point information, the output of the network better maintains the consistency of the scale. Our experiments show that MVSNet using sparse feature point information outperforms image-only MVSNet, and our online reconstruction results are comparable to offline reconstruction methods. To benefit the research community, we open-source our code at https://github.com/hjxwhy/LODM.git}
    }