Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: xinkong@zju.edu.cn

Xin Kong

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my master’s degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include Semantic Segmentation for Point Cloud, 3DV, GCN and the combination of deep learning & SLAM.

Research and Interests

  • Semantic Segmentation for Point Cloud
  • 3DV
  • GCN

Publications

  • Guangyao Zhai, Yu Zheng, Ziwei Xu, Xin Kong, Yong Liu, Benjamin Busam, Yi Ren, Nassir Navab, and Zhengyou Zhang. DA^2 Dataset: Toward Dexterity-Aware Dual-Arm Grasping. IEEE Robotics and Automation Letters (RA-L), 7(4):8941-8948, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this paper, we introduce DA^2, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects. The dataset contains about 9M pairs of parallel-jaw grasps, generated from more than 6000 objects and each labeled with various grasp dexterity measures. In addition, we propose an end-to-end dual-arm grasp evaluation model trained on the rendered scenes from this dataset. We utilize the evaluation model as our baseline to show the value of this novel and nontrivial dataset by both online analysis and real robot experiments. All data and related code will be open-sourced at https://sites.google.com/view/da2dataset.
    @article{zhai2022ddt,
    title = {DA^2 Dataset: Toward Dexterity-Aware Dual-Arm Grasping},
    author = {Guangyao Zhai and Yu Zheng and Ziwei Xu and Xin Kong and Yong Liu and Benjamin Busam and Yi Ren and Nassir Navab and Zhengyou Zhang},
    year = 2022,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = {7},
    number = {4},
    pages = {8941-8948},
    doi = {10.1109/LRA.2022.3189959},
    abstract = {In this paper, we introduce DA^2, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects. The dataset contains about 9M pairs of parallel-jaw grasps, generated from more than 6000 objects and each labeled with various grasp dexterity measures. In addition, we propose an end-to-end dual-arm grasp evaluation model trained on the rendered scenes from this dataset. We utilize the evaluation model as our baseline to show the value of this novel and nontrivial dataset by both online analysis and real robot experiments. All data and related code will be open-sourced at https://sites.google.com/view/da2dataset.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, and Yong Liu. Semantic Scan Context: A Novel Semantic-based Loop-closure Method for LiDAR SLAM. Autonomous Robots, 46(4):535-551, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    As one of the key technologies of SLAM, loop-closure detection can help eliminate the cumulative errors of the odometry. Many of the current LiDAR-based SLAM systems do not integrate a loop-closure detection module, so they will inevitably suffer from cumulative errors. This paper proposes a semantic-based place recognition method called Semantic Scan Context (SSC), which consists of the two-step global ICP and the semantic-based descriptor. Thanks to the use of high-level semantic features, our descriptor can effectively encode scene information. The proposed two-step global ICP can help eliminate the influence of rotation and translation on descriptor matching and provide a good initial value for geometric verification. Further, we built a complete loop-closure detection module based on SSC and combined it with the famous LOAM to form a full LiDAR SLAM system. Exhaustive experiments on the KITTI and KITTI-360 datasets show that our approach is competitive to the state-of-the-art methods, robust to the environment, and has good generalization ability. Our code is available at:https://github.com/lilin-hitcrt/SSC.
    @article{li2022ssc,
    title = {Semantic Scan Context: A Novel Semantic-based Loop-closure Method for LiDAR SLAM},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Tianxin Huang and Yong Liu},
    year = 2022,
    journal = {Autonomous Robots},
    volume = {46},
    number = {4},
    pages = {535-551},
    doi = {10.1007/s10514-022-10037-w},
    abstract = {As one of the key technologies of SLAM, loop-closure detection can help eliminate the cumulative errors of the odometry. Many of the current LiDAR-based SLAM systems do not integrate a loop-closure detection module, so they will inevitably suffer from cumulative errors. This paper proposes a semantic-based place recognition method called Semantic Scan Context (SSC), which consists of the two-step global ICP and the semantic-based descriptor. Thanks to the use of high-level semantic features, our descriptor can effectively encode scene information. The proposed two-step global ICP can help eliminate the influence of rotation and translation on descriptor matching and provide a good initial value for geometric verification. Further, we built a complete loop-closure detection module based on SSC and combined it with the famous LOAM to form a full LiDAR SLAM system. Exhaustive experiments on the KITTI and KITTI-360 datasets show that our approach is competitive to the state-of-the-art methods, robust to the environment, and has good generalization ability. Our code is available at:https://github.com/lilin-hitcrt/SSC.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, Wanlong li, Feng Wen, Hongbo Zhang, and Yong Liu. RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network. IEEE Robotics and Automation Letters (RA-L), 7(2):4321-4328, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    LiDAR-based place recognition (LPR) is one of the basic capabilities of robots, which can retrieve scenes from maps and identify previously visited locations based on 3D point clouds. As robots often pass the same place from different views, LPR methods are supposed to be robust to rotation, which is lacking in most current learning-based approaches. In this letter, we propose a rotation invariant neural network structure that can detect reverse loop closures even training data is all in the same direction. Specifically, we design a novel rotation equivariant global descriptor, which combines semantic and geometric features to improve description ability. Then a rotation invariant siamese neural network is implemented to predict the similarity of descriptor pairs. Our network is lightweight and can operate more than 8000 FPS on an i7-9700 CPU. Exhaustive evaluations and robustness tests on the KITTI, KITTI-360, and NCLT datasets show that our approach can work stably in various scenarios and achieve state-of-the-art performance.
    @article{li2022rinet,
    title = {RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Tianxin Huang and Wanlong li and Feng Wen and Hongbo Zhang and Yong Liu},
    year = 2022,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = {7},
    number = {2},
    pages = {4321-4328},
    doi = {10.1109/LRA.2022.3150499},
    abstract = {LiDAR-based place recognition (LPR) is one of the basic capabilities of robots, which can retrieve scenes from maps and identify previously visited locations based on 3D point clouds. As robots often pass the same place from different views, LPR methods are supposed to be robust to rotation, which is lacking in most current learning-based approaches. In this letter, we propose a rotation invariant neural network structure that can detect reverse loop closures even training data is all in the same direction. Specifically, we design a novel rotation equivariant global descriptor, which combines semantic and geometric features to improve description ability. Then a rotation invariant siamese neural network is implemented to predict the similarity of descriptor pairs. Our network is lightweight and can operate more than 8000 FPS on an i7-9700 CPU. Exhaustive evaluations and robustness tests on the KITTI, KITTI-360, and NCLT datasets show that our approach can work stably in various scenarios and achieve state-of-the-art performance.}
    }
  • Zhen Zhang, Jiaqing Yan, Xin Kong, Guangyao Zhai, and Yong Liu. Efficient Motion Planning based on Kinodynamic Model for Quadruped Robots Following Persons in Confined Spaces. IEEE/ASME Transactions on Mechatronics, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time local costmap via onboard sensors, which involves both static and dynamic obstacles. And we exploit a simplified kinodynamic model and formulate the friction pyramids formed by Ground Reaction Forces (GRFs) inequality constraints to ensure the executable of the optimized trajectory. In addition, we obtain the optimal following trajectory in the costmap completely based on the robots rectangular footprint description, which ensures that it can walk through the narrow spaces avoiding collision. Finally, a receding horizon control strategy is employed to improve the robustness of motion in complex environments. The proposed motion planning framework is integrated on the quadruped robot JueYing and tested in simulation as well as real scenarios. It shows that the execution success rates in various scenes are all over 90\%.
    @article{zhang2021emp,
    title = {Efficient Motion Planning based on Kinodynamic Model for Quadruped Robots Following Persons in Confined Spaces},
    author = {Zhen Zhang and Jiaqing Yan and Xin Kong and Guangyao Zhai and Yong Liu},
    year = 2021,
    journal = {IEEE/ASME Transactions on Mechatronics},
    doi = {10.1109/TMECH.2021.3083594},
    abstract = {Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time local costmap via onboard sensors, which involves both static and dynamic obstacles. And we exploit a simplified kinodynamic model and formulate the friction pyramids formed by Ground Reaction Forces (GRFs) inequality constraints to ensure the executable of the optimized trajectory. In addition, we obtain the optimal following trajectory in the costmap completely based on the robots rectangular footprint description, which ensures that it can walk through the narrow spaces avoiding collision. Finally, a receding horizon control strategy is employed to improve the robustness of motion in complex environments. The proposed motion planning framework is integrated on the quadruped robot JueYing and tested in simulation as well as real scenarios. It shows that the execution success rates in various scenes are all over 90\%.}
    }
  • Xuemeng Yang, Hao Zou, Xin Kong, Tianxin Huang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3555-3562, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth exploring. Therefore, we propose an end-to-end semantic segmentation-assisted scene completion network, including a 2D completion branch and a 3D semantic segmentation branch. Specifically, the network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically to provide semantic information. By adopting BEV representation and 3D sparse convolution, we can benefit from the lower operand while maintaining effective expression. Besides, the decoder of the segmentation branch is used as an auxiliary, which can be discarded in the inference stage to save computational consumption. Extensive experiments demonstrate that our method achieves competitive performance on SemanticKITTI dataset with low latency. Code and models will be released at https://github.com/jokester-zzz/SSA-SC.
    @inproceedings{yang2021ssa,
    title = {Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds},
    author = {Xuemeng Yang and Hao Zou and Xin Kong and Tianxin Huang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {3555-3562},
    doi = {https://doi.org/10.1109/IROS51168.2021.9636662},
    abstract = {Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth exploring. Therefore, we propose an end-to-end semantic segmentation-assisted scene completion network, including a 2D completion branch and a 3D semantic segmentation branch. Specifically, the network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically to provide semantic information. By adopting BEV representation and 3D sparse convolution, we can benefit from the lower operand while maintaining effective expression. Besides, the decoder of the segmentation branch is used as an auxiliary, which can be discarded in the inference stage to save computational consumption. Extensive experiments demonstrate that our method achieves competitive performance on SemanticKITTI dataset with low latency. Code and models will be released at https://github.com/jokester-zzz/SSA-SC.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, Tianxin Huang, and Yong Liu. SSC: Semantic Scan Context for Large-Scale Place Recognition. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2092-2099, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Place recognition gives a SLAM system the ability to correct cumulative errors. Unlike images that contain rich texture features, point clouds are almost pure geometric information which makes place recognition based on point clouds challenging. Existing works usually encode low-level features such as coordinate, normal, reflection intensity, etc., as local or global descriptors to represent scenes. Besides, they often ignore the translation between point clouds when matching descriptors. Different from most existing methods, we explore the use of high-level features, namely semantics, to improve the descriptor’s representation ability. Also, when matching descriptors, we try to correct the translation between point clouds to improve accuracy. Concretely, we propose a novel global descriptor, Semantic Scan Context, which explores semantic information to represent scenes more effectively. We also present a two-step global semantic ICP to obtain the 3D pose (x, y, yaw) used to align the point cloud to improve matching performance. Our experiments on the KITTI dataset show that our approach outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/lilin-hitcrt/SSC.
    @inproceedings{li2021ssc,
    title = {SSC: Semantic Scan Context for Large-Scale Place Recognition},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Tianxin Huang and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {2092-2099},
    doi = {https://doi.org/10.1109/IROS51168.2021.9635904},
    abstract = {Place recognition gives a SLAM system the ability to correct cumulative errors. Unlike images that contain rich texture features, point clouds are almost pure geometric information which makes place recognition based on point clouds challenging. Existing works usually encode low-level features such as coordinate, normal, reflection intensity, etc., as local or global descriptors to represent scenes. Besides, they often ignore the translation between point clouds when matching descriptors. Different from most existing methods, we explore the use of high-level features, namely semantics, to improve the descriptor’s representation ability. Also, when matching descriptors, we try to correct the translation between point clouds to improve accuracy. Concretely, we propose a novel global descriptor, Semantic Scan Context, which explores semantic information to represent scenes more effectively. We also present a two-step global semantic ICP to obtain the 3D pose (x, y, yaw) used to align the point cloud to improve matching performance. Our experiments on the KITTI dataset show that our approach outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/lilin-hitcrt/SSC.}
    }
  • Jinhao Cui, Hao Zou, Xin Kong, Xuemeng Yang, Xiangrui Zhao, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network. In 2021 IEEE International Conference on Robotics and Automation, pages 1868-1874, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.
    @inproceedings{cui2021poconetso,
    title = {PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network},
    author = {Jinhao Cui and Hao Zou and Xin Kong and Xuemeng Yang and Xiangrui Zhao and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation},
    pages = {1868-1874},
    doi = {https://doi.org/10.1109/ICRA48506.2021.9561309},
    abstract = {In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.}
    }
  • Lin Li, Xin Kong, Xiangrui Zhao, and Yong Liu. SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure. In 2021 IEEE International Conference on Robotics and Automation, pages 7627-7634, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure de-tection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM, named SA-LOAM, which leverages semantics in odometry as well as loop closure detection. Specifically, we propose a semantic-assisted ICP, including semantically matching, downsampling and plane constraint, and integrates a semantic graph-based place recognition method in our loop closure detection module. Benefitting from semantics, we can improve the localization accuracy, detect loop closures effec-tively, and construct a global consistent semantic map even in large-scale scenes. Extensive experiments on KITTI and Ford Campus dataset show that our system significantly improves baseline performance, has generalization ability to unseen data and achieves competitive results compared with state-of-the-art methods.
    @inproceedings{li2021ssa,
    title = {SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure},
    author = {Lin Li and Xin Kong and Xiangrui Zhao and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation},
    pages = {7627-7634},
    doi = {https://doi.org/10.1109/ICRA48506.2021.9560884},
    abstract = {LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure de-tection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM, named SA-LOAM, which leverages semantics in odometry as well as loop closure detection. Specifically, we propose a semantic-assisted ICP, including semantically matching, downsampling and plane constraint, and integrates a semantic graph-based place recognition method in our loop closure detection module. Benefitting from semantics, we can improve the localization accuracy, detect loop closures effec-tively, and construct a global consistent semantic map even in large-scale scenes. Extensive experiments on KITTI and Ford Campus dataset show that our system significantly improves baseline performance, has generalization ability to unseen data and achieves competitive results compared with state-of-the-art methods.}
    }
  • Guangyao Zhai, Zhen Zhang, Xin Kong, and Yong Liu. Efficient Pedestrian Following by Quadruped Robots. In 2021 IEEE International Conference on Robotics and Automation Workshop, 2021.
    [BibTeX] [Abstract] [PDF]
    Legged robots have superior terrain adaptability and flexible movement capabilities than traditional wheeled robots. In this work, we use a quadruped robot as an example of legged robots to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the various onboard sensors.
    @inproceedings{zhai2021epf,
    title = {Efficient Pedestrian Following by Quadruped Robots},
    author = {Guangyao Zhai and Zhen Zhang and Xin Kong and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation Workshop},
    abstract = {Legged robots have superior terrain adaptability and flexible movement capabilities than traditional wheeled robots. In this work, we use a quadruped robot as an example of legged robots to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the various onboard sensors.}
    }
  • Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, and Yi Yuan. HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 2021.
    [BibTeX] [Abstract] [arXiv] [PDF]
    Self-supervised learning shows great potential in monoculardepth estimation, using image sequences as the only source ofsupervision. Although people try to use the high-resolutionimage for depth estimation, the accuracy of prediction hasnot been significantly improved. In this work, we find thecore reason comes from the inaccurate depth estimation inlarge gradient regions, making the bilinear interpolation er-ror gradually disappear as the resolution increases. To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information. Therefore, we present an improvedDepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution. Moreover, previousstate-of-the-art methods are based on fairly complex and deepnetworks with a mass of parameters which limits their realapplications. Thus we also construct a lightweight networkwhich uses MobileNetV3 as encoder. Experiments show thatthe lightweight network can perform on par with many largemodels like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at this https URL.
    @inproceedings{lyu2020hrdepthhr,
    title = {HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation},
    author = {Xiaoyang Lyu and Liang Liu and Mengmeng Wang and Xin Kong and Lina Liu and Yong Liu and Xinxin Chen and Yi Yuan},
    year = 2021,
    booktitle = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)},
    abstract = {Self-supervised learning shows great potential in monoculardepth estimation, using image sequences as the only source ofsupervision. Although people try to use the high-resolutionimage for depth estimation, the accuracy of prediction hasnot been significantly improved. In this work, we find thecore reason comes from the inaccurate depth estimation inlarge gradient regions, making the bilinear interpolation er-ror gradually disappear as the resolution increases. To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information. Therefore, we present an improvedDepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution. Moreover, previousstate-of-the-art methods are based on fairly complex and deepnetworks with a mass of parameters which limits their realapplications. Thus we also construct a lightweight networkwhich uses MobileNetV3 as encoder. Experiments show thatthe lightweight network can perform on par with many largemodels like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at this https URL.},
    arxiv = {https://arxiv.org/pdf/2012.07356.pdf}
    }
  • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong Liu, Wanlong Li, and Feng Wen. Semantic Graph Based Place Recognition for 3D Point Clouds. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8216–8223, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.
    @inproceedings{kong2020semanticgb,
    title = {Semantic Graph Based Place Recognition for 3D Point Clouds},
    author = {Xin Kong and Xuemeng Yang and Guangyao Zhai and Xiangrui Zhao and Xianfang Zeng and Mengmeng Wang and Yong Liu and Wanlong Li and Feng Wen},
    year = 2020,
    booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {8216--8223},
    doi = {https://doi.org/10.1109/IROS45743.2020.9341060},
    abstract = {Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.},
    arxiv = {https://arxiv.org/pdf/2008.11459.pdf}
    }
  • Xiangrui Zhao, Chunfang Deng, Xin Kong, Jinhong Xu, and Yong Liu. Learning to Compensate for the Drift and Error of Gyroscope in Vehicle Localization. In 2020 IEEE Intelligent Vehicles Symposium (IV), page 852–857, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    Self-localization is an essential technology for autonomous vehicles. Building robust odometry in a GPS-denied environment is still challenging, especially when LiDAR and camera are uninformative. In this paper, We propose a learning-based approach to cure the drift of gyroscope for vehicle localization. For consumer-level MEMS gyroscope (stability ∼10° /h), our GyroNet can estimate the error of each measurement. For high-precision Fiber optics Gyroscope (stability ∼0.05° /h), we build a FoGNet which can obtain its drift by observing data in a long time window. We perform comparative experiments on publicly available datasets. The results demonstrate that our GyroNet can get higher precision angular velocity than traditional digital filters and static initialization methods. In the vehicle localization, the FoGNet can effectively correct the small drift of the Fiber optics Gyroscope (FoG) and can achieve better results than the state-of-the-art method.
    @inproceedings{zhao2020learningtc,
    title = {Learning to Compensate for the Drift and Error of Gyroscope in Vehicle Localization},
    author = {Xiangrui Zhao and Chunfang Deng and Xin Kong and Jinhong Xu and Yong Liu},
    year = 2020,
    booktitle = {2020 IEEE Intelligent Vehicles Symposium (IV)},
    pages = {852--857},
    doi = {https://doi.org/10.1109/IV47402.2020.9304715},
    abstract = {Self-localization is an essential technology for autonomous vehicles. Building robust odometry in a GPS-denied environment is still challenging, especially when LiDAR and camera are uninformative. In this paper, We propose a learning-based approach to cure the drift of gyroscope for vehicle localization. For consumer-level MEMS gyroscope (stability ∼10° /h), our GyroNet can estimate the error of each measurement. For high-precision Fiber optics Gyroscope (stability ∼0.05° /h), we build a FoGNet which can obtain its drift by observing data in a long time window. We perform comparative experiments on publicly available datasets. The results demonstrate that our GyroNet can get higher precision angular velocity than traditional digital filters and static initialization methods. In the vehicle localization, the FoGNet can effectively correct the small drift of the Fiber optics Gyroscope (FoG) and can achieve better results than the state-of-the-art method.}
    }
  • Hao Zou, Jinhao Cui, Xin Kong, Chujuan Zhang, Yong Liu, Feng Wen, and Wanlong Li. F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8133–8139, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an on-line accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.
    @inproceedings{zou2020fsiameseta,
    title = {F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking},
    author = {Hao Zou and Jinhao Cui and Xin Kong and Chujuan Zhang and Yong Liu and Feng Wen and Wanlong Li},
    year = 2020,
    booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {8133--8139},
    doi = {ttps://doi.org/10.1109/IROS45743.2020.9341120},
    abstract = {This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an on-line accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.},
    arxiv = {https://arxiv.org/pdf/2010.11510.pdf}
    }
  • Xin Kong, Guangyao Zhai, Baoquan Zhong, and Yong Liu. PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 3467–3473, 2019.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud. Our framework combines the efficiency of traditional geometric methods with robustness of deep learning methods, consisting of two stages: At stage -1, our accelerated cluster proposal algorithm will generate refined cluster proposals by segmenting point clouds without ground, capable of generating less redundant proposals with higher recall in an extremely short time; stage -2 we will amplify and further process these proposals by a neural network to estimate semantic label for each point and meanwhile propose a novel data augmentation method to enhance the network’s recognition capability for all categories especially for non-rigid objects. Evaluated on KITTI raw dataset, PASS3D stands out against the state-of-the-art on some results, making itself competent to 3D perception in autonomous driving system. Our source code will be open-sourced. A video demonstration is available at https://www.youtube.com/watch?v=cukEqDuP_Qw.
    @inproceedings{kong2019pass3dpa,
    title = {PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud},
    author = {Xin Kong and Guangyao Zhai and Baoquan Zhong and Yong Liu},
    year = 2019,
    booktitle = {2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {3467--3473},
    doi = {https://doi.org/10.1109/IROS40897.2019.8968296},
    abstract = {In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud. Our framework combines the efficiency of traditional geometric methods with robustness of deep learning methods, consisting of two stages: At stage -1, our accelerated cluster proposal algorithm will generate refined cluster proposals by segmenting point clouds without ground, capable of generating less redundant proposals with higher recall in an extremely short time; stage -2 we will amplify and further process these proposals by a neural network to estimate semantic label for each point and meanwhile propose a novel data augmentation method to enhance the network’s recognition capability for all categories especially for non-rigid objects. Evaluated on KITTI raw dataset, PASS3D stands out against the state-of-the-art on some results, making itself competent to 3D perception in autonomous driving system. Our source code will be open-sourced. A video demonstration is available at https://www.youtube.com/watch?v=cukEqDuP_Qw.},
    arxiv = {http://arxiv.org/pdf/1909.01643}
    }