Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: 21932124@zju.edu.cn

Chujuan Zhang

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my master degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interest is Visual SLAM.

Research and Interests

  • Visual SLAM

Publications

  • Hao Zou, Xuemeng Yang, Tianxin Huang, Chujuan Zhang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 16-23, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.
    @inproceedings{zou2021utd,
    title = {Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion},
    author = {Hao Zou and Xuemeng Yang and Tianxin Huang and Chujuan Zhang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {16-23},
    doi = {https://doi.org/10.1109/IROS51168.2021.9635888},
    abstract = {An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.}
    }
  • Hao Zou, Chujuan Zhang, Yong Liu, Wanlong Li, Feng Wen, and Hongbo Zhang. PointSiamRCNN: Target Aware Two-stage Siamese Tracker for Point Clouds. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 7029-7035, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    Currently, there have been many kinds of pointbased 3D trackers, while voxel-based methods are still underexplored. In this paper, we first propose a voxel-based tracker, named PointSiamRCNN, improving tracking performance by embedding target information into the search region. Our framework is composed of two parts for achieving proposal generation and proposal refinement, which fully releases the potential of the two-stage object tracking. Specifically, it takes advantage of efficient feature learning of the voxel-based Siamese network and high-quality proposal generation of the Siamese region proposal network head. In the search region, the groundtruth annotations are utilized to realize semantic segmentation, which leads to more discriminative feature learning with pointwise supervisions. Furthermore, we propose the Self and Cross Attention Module for embedding target information into the search region. Finally, the multi-scale RoI pooling module is proposed to obtain compact representations from target-aware features for proposal refinement. Exhaustive experiments on the KITTI tracking dataset demonstrate that our framework reaches the competitive performance with the state-of-the-art 3D tracking methods and achieves the state-of-the-art in terms of BEV tracking.
    @inproceedings{zou2021pta,
    title = {PointSiamRCNN: Target Aware Two-stage Siamese Tracker for Point Clouds},
    author = {Hao Zou and Chujuan Zhang and Yong Liu and Wanlong Li and Feng Wen and Hongbo Zhang},
    year = 2021,
    booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems},
    pages = {7029-7035},
    doi = {https://doi.org/10.1109/IROS51168.2021.9636863},
    abstract = {Currently, there have been many kinds of pointbased 3D trackers, while voxel-based methods are still underexplored. In this paper, we first propose a voxel-based tracker, named PointSiamRCNN, improving tracking performance by embedding target information into the search region. Our framework is composed of two parts for achieving proposal generation and proposal refinement, which fully releases the potential of the two-stage object tracking. Specifically, it takes advantage of efficient feature learning of the voxel-based Siamese network and high-quality proposal generation of the Siamese region proposal network head. In the search region, the groundtruth annotations are utilized to realize semantic segmentation, which leads to more discriminative feature learning with pointwise supervisions. Furthermore, we propose the Self and Cross Attention Module for embedding target information into the search region. Finally, the multi-scale RoI pooling module is proposed to obtain compact representations from target-aware features for proposal refinement. Exhaustive experiments on the KITTI tracking dataset demonstrate that our framework reaches the competitive performance with the state-of-the-art 3D tracking methods and achieves the state-of-the-art in terms of BEV tracking.}
    }
  • Hao Zou, Jinhao Cui, Xin Kong, Chujuan Zhang, Yong Liu, Feng Wen, and Wanlong Li. F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 8133–8139, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an on-line accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.
    @inproceedings{zou2020fsiameseta,
    title = {F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking},
    author = {Hao Zou and Jinhao Cui and Xin Kong and Chujuan Zhang and Yong Liu and Feng Wen and Wanlong Li},
    year = 2020,
    booktitle = {2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages = {8133--8139},
    doi = {ttps://doi.org/10.1109/IROS45743.2020.9341120},
    abstract = {This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an on-line accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.},
    arxiv = {https://arxiv.org/pdf/2010.11510.pdf}
    }