Hao Zou

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my M.S degree in Zhejiang University, Hangzhou, China. My major research interests include Object Tracking and SLAM.

Research and Interests

Object Tracking
SLAM

Publications

Tianxin Huang, Hao Zou, Jinhao Cui, Jiangning Zhang, Xuemeng Yang, Lin Li, and Yong Liu. Adaptive Recurrent Forward Network for Dense Point Cloud Completion. IEEE Transactions on Multimedia, 25:5903-5915, 2023.
[BibTeX] [Abstract] [DOI] [PDF]

Point cloud completion is an interesting and challenging task in 3D vision, which aims to recover complete shapes from sparse and incomplete point clouds. Existing completion networks often require a vast number of parameters and substantial computational costs to achieve a high performance level, which may limit their practical application. In this work, we propose a novel Adaptive efficient Recurrent Forward Network (ARFNet), which is composed of three parts: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). In an RFE, multiple short global features are extracted from incomplete point clouds, while a dense quantity of completed results are generated in a coarse-to-fine pipeline in the FDC. Finally, we propose the Adamerge module to preserve the details from the original models by merging the generated results with the original incomplete point clouds in the RSP. In addition, we introduce the Sampling Chamfer Distance to better capture the shapes of the models and the balanced expansion constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve state-of-the-art completion performances on dense point clouds with fewer parameters, smaller model sizes, lower memory costs and a faster convergence.

@article{huang2022arf,
title = {Adaptive Recurrent Forward Network for Dense Point Cloud Completion},
author = {Tianxin Huang and Hao Zou and Jinhao Cui and Jiangning Zhang and Xuemeng Yang and Lin Li and Yong Liu},
year = 2023,
journal = {IEEE Transactions on Multimedia},
volume = {25},
pages = {5903-5915},
doi = {10.1109/TMM.2022.3200851},
abstract = {Point cloud completion is an interesting and challenging task in 3D vision, which aims to recover complete shapes from sparse and incomplete point clouds. Existing completion networks often require a vast number of parameters and substantial computational costs to achieve a high performance level, which may limit their practical application. In this work, we propose a novel Adaptive efficient Recurrent Forward Network (ARFNet), which is composed of three parts: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). In an RFE, multiple short global features are extracted from incomplete point clouds, while a dense quantity of completed results are generated in a coarse-to-fine pipeline in the FDC. Finally, we propose the Adamerge module to preserve the details from the original models by merging the generated results with the original incomplete point clouds in the RSP. In addition, we introduce the Sampling Chamfer Distance to better capture the shapes of the models and the balanced expansion constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve state-of-the-art completion performances on dense point clouds with fewer parameters, smaller model sizes, lower memory costs and a faster convergence.}
}

Tianxin Huang, Xuemeng Yang, Jiangning Zhang, Jinhao Cui, Hao Zou, Jun Chen and Xiangrui Zhao, and Yong Liu. Learning to Train a Point Cloud Reconstruction Network Without Matching. In European Conference on Computer Vision (ECCV), 2022.
[BibTeX] [Abstract] [DOI]

Reconstruction networks for well-ordered data such as 2D images and 1D continuous signals are easy to optimize through element-wised squared errors, while permutation-arbitrary point clouds cannot be constrained directly because their points permutations are not fixed. Though existing works design algorithms to match two point clouds and evaluate shape errors based on matched results, they are limited by pre-defined matching processes. In this work, we propose a novel framework named PCLossNet which learns to train a point cloud reconstruction network without any matching. By training through an adversarial process together with the reconstruction network, PCLossNet can better explore the differences between point clouds and create more precise reconstruction results. Experiments on multiple datasets prove the superiority of our method, where PCLossNet can help networks achieve much lower reconstruction errors and extract more representative features, with about 4 times faster training efficiency than the commonly-used EMD loss. Our codes can be found in https://github.com/Tianxinhuang/PCLossNet.

@inproceedings{huang2022ltt,
title = {Learning to Train a Point Cloud Reconstruction Network Without Matching},
author = {Tianxin Huang and Xuemeng Yang and Jiangning Zhang and Jinhao Cui and Hao Zou and Jun Chen and Xiangrui Zhao and Yong Liu},
year = 2022,
booktitle = {European Conference on Computer Vision (ECCV)},
doi = {10.1007/978-3-031-19769-7_11},
abstract = {Reconstruction networks for well-ordered data such as 2D images and 1D continuous signals are easy to optimize through element-wised squared errors, while permutation-arbitrary point clouds cannot be constrained directly because their points permutations are not fixed. Though existing works design algorithms to match two point clouds and evaluate shape errors based on matched results, they are limited by pre-defined matching processes. In this work, we propose a novel framework named PCLossNet which learns to train a point cloud reconstruction network without any matching. By training through an adversarial process together with the reconstruction network, PCLossNet can better explore the differences between point clouds and create more precise reconstruction results. Experiments on multiple datasets prove the superiority of our method, where PCLossNet can help networks achieve much lower reconstruction errors and extract more representative features, with about 4 times faster training efficiency than the commonly-used EMD loss. Our codes can be found in https://github.com/Tianxinhuang/PCLossNet.}
}

Tianxin Huang, Hao Zou, Jinhao Cui, Xuemeng Yang, Mengmeng Wang, Xiangrui Zhao, Jiangning Zhang and Yi Yuan, Yifan Xu, and Yong Liu. RFNet: Recurrent Forward Network for Dense Point Cloud Completion. In 2021 International Conference on Computer Vision, pages 12488-12497, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

Point cloud completion is an interesting and challenging task in 3D vision, aiming to recover complete shapes from sparse and incomplete point clouds. Existing learning based methods often require vast computation cost to achieve excellent performance, which limits their practical applications. In this paper, we propose a novel Recurrent Forward Network (RFNet), which is composed of three modules: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline. The RSP introduces details from the original incomplete models to refine the completion results. Besides, we propose a Sampling Chamfer Distance to better capture the shapes of models and a new Balanced Expansion Constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve the state-of-the-art with lower memory cost and faster convergence.

@inproceedings{huang2021rfnetrf,
title = {RFNet: Recurrent Forward Network for Dense Point Cloud Completion},
author = {Tianxin Huang and Hao Zou and Jinhao Cui and Xuemeng Yang and Mengmeng Wang and Xiangrui Zhao and Jiangning Zhang and Yi Yuan and Yifan Xu and Yong Liu},
year = 2021,
booktitle = {2021 International Conference on Computer Vision},
pages = {12488-12497},
doi = {https://doi.org/10.1109/ICCV48922.2021.01228},
abstract = {Point cloud completion is an interesting and challenging task in 3D vision, aiming to recover complete shapes from sparse and incomplete point clouds. Existing learning based methods often require vast computation cost to achieve excellent performance, which limits their practical applications. In this paper, we propose a novel Recurrent Forward Network (RFNet), which is composed of three modules: Recurrent Feature Extraction (RFE), Forward Dense Completion (FDC) and Raw Shape Protection (RSP). The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline. The RSP introduces details from the original incomplete models to refine the completion results. Besides, we propose a Sampling Chamfer Distance to better capture the shapes of models and a new Balanced Expansion Constraint to restrict the expansion distances from coarse to fine. According to the experiments on ShapeNet and KITTI, our network can achieve the state-of-the-art with lower memory cost and faster convergence.}
}

Hao Zou, Xuemeng Yang, Tianxin Huang, Chujuan Zhang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 16-23, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.

@inproceedings{zou2021utd,
title = {Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion},
author = {Hao Zou and Xuemeng Yang and Tianxin Huang and Chujuan Zhang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
year = 2021,
booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
pages = {16-23},
doi = {https://doi.org/10.1109/IROS51168.2021.9635888},
abstract = {An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.}
}

Hao Zou, Chujuan Zhang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. PointSiamRCNN: Target Aware Two-stage Siamese Tracker for Point Clouds. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 7029-7035, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

Currently, there have been many kinds of pointbased 3D trackers, while voxel-based methods are still underexplored. In this paper, we first propose a voxel-based tracker, named PointSiamRCNN, improving tracking performance by embedding target information into the search region. Our framework is composed of two parts for achieving proposal generation and proposal refinement, which fully releases the potential of the two-stage object tracking. Specifically, it takes advantage of efficient feature learning of the voxel-based Siamese network and high-quality proposal generation of the Siamese region proposal network head. In the search region, the groundtruth annotations are utilized to realize semantic segmentation, which leads to more discriminative feature learning with pointwise supervisions. Furthermore, we propose the Self and Cross Attention Module for embedding target information into the search region. Finally, the multi-scale RoI pooling module is proposed to obtain compact representations from target-aware features for proposal refinement. Exhaustive experiments on the KITTI tracking dataset demonstrate that our framework reaches the competitive performance with the state-of-the-art 3D tracking methods and achieves the state-of-the-art in terms of BEV tracking.

@inproceedings{zou2021pta,
title = {PointSiamRCNN: Target Aware Two-stage Siamese Tracker for Point Clouds},
author = {Hao Zou and Chujuan Zhang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
year = 2021,
booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
pages = {7029-7035},
doi = {https://doi.org/10.1109/IROS51168.2021.9636863},
abstract = {Currently, there have been many kinds of pointbased 3D trackers, while voxel-based methods are still underexplored. In this paper, we first propose a voxel-based tracker, named PointSiamRCNN, improving tracking performance by embedding target information into the search region. Our framework is composed of two parts for achieving proposal generation and proposal refinement, which fully releases the potential of the two-stage object tracking. Specifically, it takes advantage of efficient feature learning of the voxel-based Siamese network and high-quality proposal generation of the Siamese region proposal network head. In the search region, the groundtruth annotations are utilized to realize semantic segmentation, which leads to more discriminative feature learning with pointwise supervisions. Furthermore, we propose the Self and Cross Attention Module for embedding target information into the search region. Finally, the multi-scale RoI pooling module is proposed to obtain compact representations from target-aware features for proposal refinement. Exhaustive experiments on the KITTI tracking dataset demonstrate that our framework reaches the competitive performance with the state-of-the-art 3D tracking methods and achieves the state-of-the-art in terms of BEV tracking.}
}

Xuemeng Yang, Hao Zou, Xin Kong, Tianxin Huang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3555-3562, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth exploring. Therefore, we propose an end-to-end semantic segmentation-assisted scene completion network, including a 2D completion branch and a 3D semantic segmentation branch. Specifically, the network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically to provide semantic information. By adopting BEV representation and 3D sparse convolution, we can benefit from the lower operand while maintaining effective expression. Besides, the decoder of the segmentation branch is used as an auxiliary, which can be discarded in the inference stage to save computational consumption. Extensive experiments demonstrate that our method achieves competitive performance on SemanticKITTI dataset with low latency. Code and models will be released at https://github.com/jokester-zzz/SSA-SC.

@inproceedings{yang2021ssa,
title = {Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds},
author = {Xuemeng Yang and Hao Zou and Xin Kong and Tianxin Huang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
year = 2021,
booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
pages = {3555-3562},
doi = {https://doi.org/10.1109/IROS51168.2021.9636662},
abstract = {Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth exploring. Therefore, we propose an end-to-end semantic segmentation-assisted scene completion network, including a 2D completion branch and a 3D semantic segmentation branch. Specifically, the network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically to provide semantic information. By adopting BEV representation and 3D sparse convolution, we can benefit from the lower operand while maintaining effective expression. Besides, the decoder of the segmentation branch is used as an auxiliary, which can be discarded in the inference stage to save computational consumption. Extensive experiments demonstrate that our method achieves competitive performance on SemanticKITTI dataset with low latency. Code and models will be released at https://github.com/jokester-zzz/SSA-SC.}
}

Jinhao Cui, Hao Zou, Xin Kong, Xuemeng Yang, Xiangrui Zhao, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network. In 2021 IEEE International Conference on Robotics and Automation, pages 1868-1874, 2021.
[BibTeX] [Abstract] [DOI] [PDF]

In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.

@inproceedings{cui2021poconetso,
title = {PocoNet: SLAM-oriented 3D LiDAR Point Cloud Online Compression Network},
author = {Jinhao Cui and Hao Zou and Xin Kong and Xuemeng Yang and Xiangrui Zhao and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
year = 2021,
booktitle = {2021 IEEE International Conference on Robotics and Automation},
pages = {1868-1874},
doi = {https://doi.org/10.1109/ICRA48506.2021.9561309},
abstract = {In this paper, we present PocoNet: Point cloud Online COmpression NETwork to address the task of SLAM- oriented compression. The aim of this task is to select a compact subset of points with high priority to maintain localization accuracy. The key insight is that points with high priority have similar geometric features in SLAM scenarios. Hence, we tackle this task as point cloud segmentation to capture complex geometric information. We calculate observation counts by matching between maps and point clouds and divide them into different priority levels. Trained by labels annotated with such observation counts, the proposed network could evaluate the point-wise priority. Experiments are conducted by integrating our compression module into an existing SLAM system to evaluate compression ratios and localization performances. Ex- perimental results on two different datasets verify the feasibility and generalization of our approach.}
}

Hao Zou, Jinhao Cui, Xin Kong, Chujuan Zhang, Yong Liu, Feng Wen, and Wanlong Li. F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8133–8139, 2020.
[BibTeX] [Abstract] [DOI] [arXiv] [PDF]

This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an on-line accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.

@inproceedings{zou2020fsiameseta,
title = {F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking},
author = {Hao Zou and Jinhao Cui and Xin Kong and Chujuan Zhang and Yong Liu and Feng Wen and Wanlong Li},
year = 2020,
booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {8133--8139},
doi = {ttps://doi.org/10.1109/IROS45743.2020.9341120},
abstract = {This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an on-line accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.},
arxiv = {https://arxiv.org/pdf/2010.11510.pdf}
}

Address

Hao Zou

Biography

Research and Interests

Publications

Latest Events

喜报！APRIL实验室刘勇教授入选2024年度教育部-华为“智能基座”优秀教师奖励计划

喜报！APRIL实验室荣获2024年中关村机器人大赛多足赛道障碍赛冠军

喜报！APRIL实验室在IROS 2024四足机器人挑战赛上荣获冠军