Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: zgyddzyx@zju.edu.cn

Guangyao Zhai

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my M.S. degree at the Institute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang University. I previously received my B.Eng. degree in Automation school from Northwestern Polytechnical University in 2018. My latest research interests include robotics perception, Multi Object Tracking, and geometric learning.

Research and Interests

  • Robotics Perception
  • Geometric Learning

Publications

  • Guangyao Zhai, Yu Zheng, Ziwei Xu, Xin Kong, Yong Liu, Benjamin Busam, Yi Ren, Nassir Navab, and Zhengyou Zhang. DA^2 Dataset: Toward Dexterity-Aware Dual-Arm Grasping. IEEE Robotics and Automation Letters (RA-L), 7(4):8941-8948, 2022.
    [BibTeX] [Abstract] [DOI] [PDF]
    In this paper, we introduce DA^2, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects. The dataset contains about 9M pairs of parallel-jaw grasps, generated from more than 6000 objects and each labeled with various grasp dexterity measures. In addition, we propose an end-to-end dual-arm grasp evaluation model trained on the rendered scenes from this dataset. We utilize the evaluation model as our baseline to show the value of this novel and nontrivial dataset by both online analysis and real robot experiments. All data and related code will be open-sourced at https://sites.google.com/view/da2dataset.
    @article{zhai2022ddt,
    title = {DA^2 Dataset: Toward Dexterity-Aware Dual-Arm Grasping},
    author = {Guangyao Zhai and Yu Zheng and Ziwei Xu and Xin Kong and Yong Liu and Benjamin Busam and Yi Ren and Nassir Navab and Zhengyou Zhang},
    year = 2022,
    journal = {IEEE Robotics and Automation Letters (RA-L)},
    volume = {7},
    number = {4},
    pages = {8941-8948},
    doi = {10.1109/LRA.2022.3189959},
    abstract = {In this paper, we introduce DA^2, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects. The dataset contains about 9M pairs of parallel-jaw grasps, generated from more than 6000 objects and each labeled with various grasp dexterity measures. In addition, we propose an end-to-end dual-arm grasp evaluation model trained on the rendered scenes from this dataset. We utilize the evaluation model as our baseline to show the value of this novel and nontrivial dataset by both online analysis and real robot experiments. All data and related code will be open-sourced at https://sites.google.com/view/da2dataset.}
    }
  • Zhen Zhang, Jiaqing Yan, Xin Kong, Guangyao Zhai, and Yong Liu. Efficient Motion Planning based on Kinodynamic Model for Quadruped Robots Following Persons in Confined Spaces. IEEE/ASME Transactions on Mechatronics, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time local costmap via onboard sensors, which involves both static and dynamic obstacles. And we exploit a simplified kinodynamic model and formulate the friction pyramids formed by Ground Reaction Forces (GRFs) inequality constraints to ensure the executable of the optimized trajectory. In addition, we obtain the optimal following trajectory in the costmap completely based on the robots rectangular footprint description, which ensures that it can walk through the narrow spaces avoiding collision. Finally, a receding horizon control strategy is employed to improve the robustness of motion in complex environments. The proposed motion planning framework is integrated on the quadruped robot JueYing and tested in simulation as well as real scenarios. It shows that the execution success rates in various scenes are all over 90\%.
    @article{zhang2021emp,
    title = {Efficient Motion Planning based on Kinodynamic Model for Quadruped Robots Following Persons in Confined Spaces},
    author = {Zhen Zhang and Jiaqing Yan and Xin Kong and Guangyao Zhai and Yong Liu},
    year = 2021,
    journal = {IEEE/ASME Transactions on Mechatronics},
    doi = {10.1109/TMECH.2021.3083594},
    abstract = {Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time local costmap via onboard sensors, which involves both static and dynamic obstacles. And we exploit a simplified kinodynamic model and formulate the friction pyramids formed by Ground Reaction Forces (GRFs) inequality constraints to ensure the executable of the optimized trajectory. In addition, we obtain the optimal following trajectory in the costmap completely based on the robots rectangular footprint description, which ensures that it can walk through the narrow spaces avoiding collision. Finally, a receding horizon control strategy is employed to improve the robustness of motion in complex environments. The proposed motion planning framework is integrated on the quadruped robot JueYing and tested in simulation as well as real scenarios. It shows that the execution success rates in various scenes are all over 90\%.}
    }
  • Guangyao Zhai, Zhen Zhang, Xin Kong, and Yong Liu. Efficient Pedestrian Following by Quadruped Robots. In 2021 IEEE International Conference on Robotics and Automation Workshop, 2021.
    [BibTeX] [Abstract] [PDF]
    Legged robots have superior terrain adaptability and flexible movement capabilities than traditional wheeled robots. In this work, we use a quadruped robot as an example of legged robots to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the various onboard sensors.
    @inproceedings{zhai2021epf,
    title = {Efficient Pedestrian Following by Quadruped Robots},
    author = {Guangyao Zhai and Zhen Zhang and Xin Kong and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation Workshop},
    abstract = {Legged robots have superior terrain adaptability and flexible movement capabilities than traditional wheeled robots. In this work, we use a quadruped robot as an example of legged robots to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the various onboard sensors.}
    }
  • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong Liu, Wanlong Li, and Feng Wen. Semantic Graph Based Place Recognition for 3D Point Clouds. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8216–8223, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.
    @inproceedings{kong2020semanticgb,
    title = {Semantic Graph Based Place Recognition for 3D Point Clouds},
    author = {Xin Kong and Xuemeng Yang and Guangyao Zhai and Xiangrui Zhao and Xianfang Zeng and Mengmeng Wang and Yong Liu and Wanlong Li and Feng Wen},
    year = 2020,
    booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {8216--8223},
    doi = {https://doi.org/10.1109/IROS45743.2020.9341060},
    abstract = {Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: https://github.com/kxhit/SG_PR.},
    arxiv = {https://arxiv.org/pdf/2008.11459.pdf}
    }
  • Guangyao Zhai, Liang Liu, Linjian Zhang, and Yong Liu. PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning. Pattern Recognit., 102:107187, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    While many visual ego-motion algorithm variants have been proposed in the past decade, learning based ego-motion estimation methods have seen an increasing attention because of its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of fully trainable visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to map directly from input image pairs to an estimate of ego-motion (parameterized as 6-DoF transformation matrices). We introduce a novel two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU, with an explicit sequence pose estimation loss to achieve this. The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6 channels tensor for module-1 to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through a stacked ConvGRU module to generate the relative transformation pose of each image pair. We also augment the training data by randomly skipping frames to simulate the velocity variation which results in a better performance in turning and high-velocity situations. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark. The experiments show a competitive performance of the proposed method to the geometric method and encourage further exploration of learning based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results.
    @article{zhai2020poseconvgruam,
    title = {PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning},
    author = {Guangyao Zhai and Liang Liu and Linjian Zhang and Yong Liu},
    year = 2020,
    journal = {Pattern Recognit.},
    volume = 102,
    pages = 107187,
    doi = {https://doi.org/10.1016/j.patcog.2019.107187},
    abstract = {While many visual ego-motion algorithm variants have been proposed in the past decade, learning based ego-motion estimation methods have seen an increasing attention because of its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of fully trainable visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to map directly from input image pairs to an estimate of ego-motion (parameterized as 6-DoF transformation matrices). We introduce a novel two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU, with an explicit sequence pose estimation loss to achieve this. The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6 channels tensor for module-1 to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through a stacked ConvGRU module to generate the relative transformation pose of each image pair. We also augment the training data by randomly skipping frames to simulate the velocity variation which results in a better performance in turning and high-velocity situations. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark. The experiments show a competitive performance of the proposed method to the geometric method and encourage further exploration of learning based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results.}
    }
  • Xin Kong, Guangyao Zhai, Baoquan Zhong, and Yong Liu. PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 3467–3473, 2019.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud. Our framework combines the efficiency of traditional geometric methods with robustness of deep learning methods, consisting of two stages: At stage -1, our accelerated cluster proposal algorithm will generate refined cluster proposals by segmenting point clouds without ground, capable of generating less redundant proposals with higher recall in an extremely short time; stage -2 we will amplify and further process these proposals by a neural network to estimate semantic label for each point and meanwhile propose a novel data augmentation method to enhance the network’s recognition capability for all categories especially for non-rigid objects. Evaluated on KITTI raw dataset, PASS3D stands out against the state-of-the-art on some results, making itself competent to 3D perception in autonomous driving system. Our source code will be open-sourced. A video demonstration is available at https://www.youtube.com/watch?v=cukEqDuP_Qw.
    @inproceedings{kong2019pass3dpa,
    title = {PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud},
    author = {Xin Kong and Guangyao Zhai and Baoquan Zhong and Yong Liu},
    year = 2019,
    booktitle = {2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {3467--3473},
    doi = {https://doi.org/10.1109/IROS40897.2019.8968296},
    abstract = {In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud. Our framework combines the efficiency of traditional geometric methods with robustness of deep learning methods, consisting of two stages: At stage -1, our accelerated cluster proposal algorithm will generate refined cluster proposals by segmenting point clouds without ground, capable of generating less redundant proposals with higher recall in an extremely short time; stage -2 we will amplify and further process these proposals by a neural network to estimate semantic label for each point and meanwhile propose a novel data augmentation method to enhance the network’s recognition capability for all categories especially for non-rigid objects. Evaluated on KITTI raw dataset, PASS3D stands out against the state-of-the-art on some results, making itself competent to 3D perception in autonomous driving system. Our source code will be open-sourced. A video demonstration is available at https://www.youtube.com/watch?v=cukEqDuP_Qw.},
    arxiv = {http://arxiv.org/pdf/1909.01643}
    }
  • Liang Liu, Guangyao Zhai, Wenlong Ye, and Yong Liu. Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity. In 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.
    [BibTeX] [Abstract] [DOI] [PDF]
    Scene flow estimation in the dynamic scene remains a challenging task. Computing scene flow by a combination of 2D optical flow and depth has shown to be considerably faster with acceptable performance. In this work, we present a unified framework for joint unsupervised learning of stereo depth and optical flow with explicit local rigidity to estimate scene flow. We estimate camera motion directly by a Perspective-n-Point method from the optical flow and depth predictions, with RANSAC outlier rejection scheme. In order to disambiguate the object motion and the camera motion in the scene, we distinguish the rigid region by the re-project error and the photometric similarity. By joint learning with the local rigidity, both depth and optical networks can be refined. This framework boosts all four tasks: depth, optical flow, camera motion estimation, and object motion segmentation. Through the evaluation on the KITTI benchmark, we show that the proposed framework achieves state-of-the-art results amongst unsupervised methods. Our models and code are available at https://github.com/lliuz/unrigidflow.
    @inproceedings{liu2019unsupervisedlo,
    title = {Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity},
    author = {Liang Liu and Guangyao Zhai and Wenlong Ye and Yong Liu},
    year = 2019,
    booktitle = {28th International Joint Conference on Artificial Intelligence (IJCAI)},
    doi = {https://doi.org/10.24963/ijcai.2019%2F123},
    abstract = {Scene flow estimation in the dynamic scene remains a challenging task. Computing scene flow by a combination of 2D optical flow and depth has shown to be considerably faster with acceptable performance. In this work, we present a unified framework for joint unsupervised learning of stereo depth and optical flow with explicit local rigidity to estimate scene flow. We estimate camera motion directly by a Perspective-n-Point method from the optical flow and depth predictions, with RANSAC outlier rejection scheme. In order to disambiguate the object motion and the camera motion in the scene, we distinguish the rigid region by the re-project error and the photometric similarity. By joint learning with the local rigidity, both depth and optical networks can be refined. This framework boosts all four tasks: depth, optical flow, camera motion estimation, and object motion segmentation. Through the evaluation on the KITTI benchmark, we show that the proposed framework achieves state-of-the-art results amongst unsupervised methods. Our models and code are available at https://github.com/lliuz/unrigidflow.}
    }