Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: zhenz@zju.edu.cn

Zhen Zhang

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I received my B.S. degree from Zhejiang University of Technology in 2018. I am currently pursuing the Ph.D. degree with the institute of Cyber Systems and Control, Zhejiang University, Hangzhou, China, working with Prof. Y. Liu. My research area includes autonomous navigation of mobile robots, motion planning of quadruped robots, exploration planning, active-SLAM, multi-agent collaboration and so on.

Research and Interests

  • Autonomous Navigation
  • Motion Planning
  • Exploration planning
  • Active-SLAM

Publications

  • Shuo Xin, Zhen Zhang, Mengmeng Wang, Xiaojun Hou, Yaowei Guo, Xiao Kang, Liang Liu, and Yong Liu. Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 337-344, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Tracking a specific person in 3D scene is gaining momentum due to its numerous applications in robotics. Currently, most 3D trackers focus on driving scenarios with neglected jitter and uncomplicated surroundings, which results in their severe degeneration in complex environments, especially on jolting robot platforms (only 20-60% success rate). To improve the accuracy, a Point-Video-based Transformer Tracking model (PVTrack) is presented for robots. It is the first multi-modal 3D human tracking work that incorporates point clouds together with RGB videos to achieve information complementarity. Moreover, PVTrack proposes the Siamese Point-Video Transformer for feature aggregation to overcome dynamic environments, which captures more target-aware information through the hierarchical attention mechanism adaptively. Considering the violent shaking on robots and rugged terrains, a lateral Human-ware Proposal Network is designed together with an Anti-shake Proposal Compensation module. It alleviates the disturbance caused by complex scenes as well as the particularity of the robot platform. Experiments show that our method achieves state-of-the-art performance on both KITTI/Waymo datasets and a quadruped robot for various indoor and outdoor scenes.
    @inproceedings{xin2024mmh,
    title = {Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer},
    author = {Shuo Xin and Zhen Zhang and Mengmeng Wang and Xiaojun Hou and Yaowei Guo and Xiao Kang and Liang Liu and Yong Liu},
    year = 2024,
    booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)},
    pages = {337-344},
    doi = {10.1109/ICRA57147.2024.10610979},
    abstract = {Tracking a specific person in 3D scene is gaining momentum due to its numerous applications in robotics. Currently, most 3D trackers focus on driving scenarios with neglected jitter and uncomplicated surroundings, which results in their severe degeneration in complex environments, especially on jolting robot platforms (only 20-60% success rate). To improve the accuracy, a Point-Video-based Transformer Tracking model (PVTrack) is presented for robots. It is the first multi-modal 3D human tracking work that incorporates point clouds together with RGB videos to achieve information complementarity. Moreover, PVTrack proposes the Siamese Point-Video Transformer for feature aggregation to overcome dynamic environments, which captures more target-aware information through the hierarchical attention mechanism adaptively. Considering the violent shaking on robots and rugged terrains, a lateral Human-ware Proposal Network is designed together with an Anti-shake Proposal Compensation module. It alleviates the disturbance caused by complex scenes as well as the particularity of the robot platform. Experiments show that our method achieves state-of-the-art performance on both KITTI/Waymo datasets and a quadruped robot for various indoor and outdoor scenes.}
    }
  • Shuo Xin, Liang Liu, Xiao Kang, Zhen Zhang, Mengmeng Wang, and Yong Liu. Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network. In 7th International Symposium on Autonomous Systems (ISAS), 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    3D human tracking plays a crucial role in the automation intelligence system. Current approaches focus on achieving higher performance on traditional driving datasets like KITTI, which overlook the jitteriness of the platform and the complexity of the environments. Once the scenarios are migrated to jolting robot platforms, they all degenerate severely with only a 20-60% success rate, which greatly restricts the high-level application of autonomous systems. In this work, beyond traditional flat scenes, we introduce Multi-modal Human Tracking Paradigm (MHTrack), a unified multimodal transformer-based model that can effectively track the target person frame-by-frame in point and video sequences. Specifically, we design a speed-inertia module-assisted stabilization mechanism along with an alternate training strategy to better migrate the tracking algorithm to the robot platform. To capture more target-aware information, we combine the geometric and appearance features of point clouds and video frames together based on a hierarchical Siamese Transformer Network. Additionally, considering the prior characteristics of the human category, we design a lateral cross-attention pyramid head for deeper feature aggregation and final 3D BBox generation. Extensive experiments confirm that MHTrack significantly outperforms the previous state-of-the-arts on both open-source datasets and large-scale robotic datasets. Further analysis verifies each component’s effectiveness and shows the robotic-centric paradigm’s promising potential when deployed into dynamic robotic systems.
    @inproceedings{xin2024btd,
    title = {Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network},
    author = {Shuo Xin and Liang Liu and Xiao Kang and Zhen Zhang and Mengmeng Wang and Yong Liu},
    year = 2024,
    booktitle = {7th International Symposium on Autonomous Systems (ISAS)},
    doi = {10.1109/ISAS61044.2024.10552604},
    abstract = {3D human tracking plays a crucial role in the automation intelligence system. Current approaches focus on achieving higher performance on traditional driving datasets like KITTI, which overlook the jitteriness of the platform and the complexity of the environments. Once the scenarios are migrated to jolting robot platforms, they all degenerate severely with only a 20-60% success rate, which greatly restricts the high-level application of autonomous systems. In this work, beyond traditional flat scenes, we introduce Multi-modal Human Tracking Paradigm (MHTrack), a unified multimodal transformer-based model that can effectively track the target person frame-by-frame in point and video sequences. Specifically, we design a speed-inertia module-assisted stabilization mechanism along with an alternate training strategy to better migrate the tracking algorithm to the robot platform. To capture more target-aware information, we combine the geometric and appearance features of point clouds and video frames together based on a hierarchical Siamese Transformer Network. Additionally, considering the prior characteristics of the human category, we design a lateral cross-attention pyramid head for deeper feature aggregation and final 3D BBox generation. Extensive experiments confirm that MHTrack significantly outperforms the previous state-of-the-arts on both open-source datasets and large-scale robotic datasets. Further analysis verifies each component's effectiveness and shows the robotic-centric paradigm's promising potential when deployed into dynamic robotic systems.}
    }
  • Zhen Zhang, Jiaqing Yan, Xin Kong, Guangyao Zhai, and Yong Liu. Efficient Motion Planning based on Kinodynamic Model for Quadruped Robots Following Persons in Confined Spaces. IEEE/ASME Transactions on Mechatronics, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time local costmap via onboard sensors, which involves both static and dynamic obstacles. And we exploit a simplified kinodynamic model and formulate the friction pyramids formed by Ground Reaction Forces (GRFs) inequality constraints to ensure the executable of the optimized trajectory. In addition, we obtain the optimal following trajectory in the costmap completely based on the robots rectangular footprint description, which ensures that it can walk through the narrow spaces avoiding collision. Finally, a receding horizon control strategy is employed to improve the robustness of motion in complex environments. The proposed motion planning framework is integrated on the quadruped robot JueYing and tested in simulation as well as real scenarios. It shows that the execution success rates in various scenes are all over 90\%.
    @article{zhang2021emp,
    title = {Efficient Motion Planning based on Kinodynamic Model for Quadruped Robots Following Persons in Confined Spaces},
    author = {Zhen Zhang and Jiaqing Yan and Xin Kong and Guangyao Zhai and Yong Liu},
    year = 2021,
    journal = {IEEE/ASME Transactions on Mechatronics},
    doi = {10.1109/TMECH.2021.3083594},
    abstract = {Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time local costmap via onboard sensors, which involves both static and dynamic obstacles. And we exploit a simplified kinodynamic model and formulate the friction pyramids formed by Ground Reaction Forces (GRFs) inequality constraints to ensure the executable of the optimized trajectory. In addition, we obtain the optimal following trajectory in the costmap completely based on the robots rectangular footprint description, which ensures that it can walk through the narrow spaces avoiding collision. Finally, a receding horizon control strategy is employed to improve the robustness of motion in complex environments. The proposed motion planning framework is integrated on the quadruped robot JueYing and tested in simulation as well as real scenarios. It shows that the execution success rates in various scenes are all over 90\%.}
    }
  • Guangyao Zhai, Zhen Zhang, Xin Kong, and Yong Liu. Efficient Pedestrian Following by Quadruped Robots. In 2021 IEEE International Conference on Robotics and Automation Workshop, 2021.
    [BibTeX] [Abstract] [PDF]
    Legged robots have superior terrain adaptability and flexible movement capabilities than traditional wheeled robots. In this work, we use a quadruped robot as an example of legged robots to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the various onboard sensors.
    @inproceedings{zhai2021epf,
    title = {Efficient Pedestrian Following by Quadruped Robots},
    author = {Guangyao Zhai and Zhen Zhang and Xin Kong and Yong Liu},
    year = 2021,
    booktitle = {2021 IEEE International Conference on Robotics and Automation Workshop},
    abstract = {Legged robots have superior terrain adaptability and flexible movement capabilities than traditional wheeled robots. In this work, we use a quadruped robot as an example of legged robots to complete a pedestrian-following task in challenging scenarios. The whole system consists of two modules: the perception and planning module, relying on the various onboard sensors.}
    }