Address

Room 105, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: yuehaohuang@zju.edu.cn

Yuehao Huang

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include SLAM, embodied intelligence and AIGC.

Research and Interests

  • SLAM
  • Embodied Intelligence
  • AIGC

Publications

  • Yuehao Huang, Liang Liu, Shuangming Lei, Yukai Ma, Hao Su, Jianbiao Mei, Pengxiang Zhao, Yaqing Gu, Yong Liu, and Jiajun Lv. CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking. In Proceedings of the 33rd ACM International Conference on Multimedia (MM), page 5237–5246, 2025.
    [BibTeX] [Abstract] [DOI] [PDF]
    Mobile robots are increasingly required to navigate and interact within unknown and unstructured environments to meet human demands. Demand-driven navigation (DDN) enables robots to identify and locate objects based on implicit human intent, even when object locations are unknown. However, traditional data-driven DDN methods rely on pre-collected data for model training and decision-making, limiting their generalization capability in unseen scenarios. In this paper, we propose CogDDN, a VLM-based framework that emulates the human cognitive and learning mechanisms by integrating fast and slow thinking systems and selectively identifying key objects essential to fulfilling user demands. CogDDN identifies appropriate target objects by semantically aligning detected objects with the given instructions. Furthermore, it incorporates a dual-process decision-making module, comprising a Heuristic Process for rapid, efficient decisions and an Analytic Process that analyzes past errors, accumulates them in a knowledge base, and continuously improves performance. Chain of Thought (CoT) reasoning strengthens the decision-making process. Extensive closed-loop evaluations on the AI2Thor simulator with the ProcThor dataset show that CogDDN outperforms single-view camera-only methods by 15%, demonstrating significant improvements in navigation accuracy and adaptability. The project page is available at https://yuehaohuang.github.io/CogDDN/.
    @inproceedings{huang2025cog,
    title = {CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking},
    author = {Yuehao Huang and Liang Liu and Shuangming Lei and Yukai Ma and Hao Su and Jianbiao Mei and Pengxiang Zhao and Yaqing Gu and Yong Liu and Jiajun Lv},
    year = 2025,
    booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (MM)},
    pages = {5237--5246},
    doi = {10.1145/3746027.3755832},
    abstract = {Mobile robots are increasingly required to navigate and interact within unknown and unstructured environments to meet human demands. Demand-driven navigation (DDN) enables robots to identify and locate objects based on implicit human intent, even when object locations are unknown. However, traditional data-driven DDN methods rely on pre-collected data for model training and decision-making, limiting their generalization capability in unseen scenarios. In this paper, we propose CogDDN, a VLM-based framework that emulates the human cognitive and learning mechanisms by integrating fast and slow thinking systems and selectively identifying key objects essential to fulfilling user demands. CogDDN identifies appropriate target objects by semantically aligning detected objects with the given instructions. Furthermore, it incorporates a dual-process decision-making module, comprising a Heuristic Process for rapid, efficient decisions and an Analytic Process that analyzes past errors, accumulates them in a knowledge base, and continuously improves performance. Chain of Thought (CoT) reasoning strengthens the decision-making process. Extensive closed-loop evaluations on the AI2Thor simulator with the ProcThor dataset show that CogDDN outperforms single-view camera-only methods by 15%, demonstrating significant improvements in navigation accuracy and adaptability. The project page is available at https://yuehaohuang.github.io/CogDDN/.}
    }
  • Han Li, Yukai Ma, Yuehao Huang, Yaqing Gu, Weihua Xu, Yong Liu, and Xingxing Zuo. RIDERS: Radar-Infrared Depth Estimation for Robust Sensing. IEEE Transactions on Intelligent Transportation Systems, 25:18764-18778, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense radar augmentation by learning radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios.
    @article{li2024riders,
    title = {RIDERS: Radar-Infrared Depth Estimation for Robust Sensing},
    author = {Han Li and Yukai Ma and Yuehao Huang and Yaqing Gu and Weihua Xu and Yong Liu and Xingxing Zuo},
    year = 2024,
    journal = {IEEE Transactions on Intelligent Transportation Systems},
    volume = 25,
    pages = {18764-18778},
    doi = {10.1109/TITS.2024.3432996},
    abstract = {Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense radar augmentation by learning radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios.}
    }
  • Kai Tang, Xiaolei Lang, Yukai Ma, Yuehao Huang, Laijian Li, Yong Liu, and Jiajun Lv. Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12544-12551, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Event cameras have garnered considerable attention due to their advantages over traditional cameras in low power consumption, high dynamic range, and no motion blur. This paper proposes a monocular event-inertial odometry incorporating an adaptive decay kernel-based time surface with polarity-aware tracking. We utilize an adaptive decay-based Time Surface to extract texture information from asynchronous events, which adapts to the dynamic characteristics of the event stream and enhances the representation of environmental textures. However, polarity-weighted time surfaces suffer from event polarity shifts during changes in motion direction. To mitigate its adverse effects on feature tracking, we optimize the feature tracking by incorporating an additional polarity-inverted time surface to enhance the robustness. Comparative analysis with visual-inertial and event-inertial odometry methods shows that our approach outperforms state-of-the-art techniques, with competitive results across various datasets.
    @inproceedings{tang2024meio,
    title = {Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking},
    author = {Kai Tang and Xiaolei Lang and Yukai Ma and Yuehao Huang and Laijian Li and Yong Liu and Jiajun Lv},
    year = 2024,
    booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {12544-12551},
    doi = {10.1109/IROS58592.2024.10802605},
    abstract = {Event cameras have garnered considerable attention due to their advantages over traditional cameras in low power consumption, high dynamic range, and no motion blur. This paper proposes a monocular event-inertial odometry incorporating an adaptive decay kernel-based time surface with polarity-aware tracking. We utilize an adaptive decay-based Time Surface to extract texture information from asynchronous events, which adapts to the dynamic characteristics of the event stream and enhances the representation of environmental textures. However, polarity-weighted time surfaces suffer from event polarity shifts during changes in motion direction. To mitigate its adverse effects on feature tracking, we optimize the feature tracking by incorporating an additional polarity-inverted time surface to enhance the robustness. Comparative analysis with visual-inertial and event-inertial odometry methods shows that our approach outperforms state-of-the-art techniques, with competitive results across various datasets.}
    }