Address

Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Zizhang Li

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include object detection and segmentation.

Research and Interests

  • Computer vision
  • Referring segmentation

Publications

  • Zizhang Li, Mengmeng wang, Huaijin Pi, Kechun Xu, Jianbiao Mei, and Yong Liu. E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context. In European Conference on Computer Vision (ECCV), 2022.
    [BibTeX] [Abstract] [DOI]
    Recently, the image-wise implicit neural representation of videos, NeRV, has gained popularity for its promising results and swift speed compared to regular pixel-wise implicit representations. However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance. The key reason of this phenomenon is the coupled formulation of NeRV, which outputs the spatial and temporal information of video frames directly from the frame index input. In this paper, we propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context. Under the guidance of this new formulation, our model greatly reduces the redundant model parameters, while retaining the representation ability. We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than 8× faster speed on convergence. Code is available at https://github.com/kyleleey/E-NeRV.
    @inproceedings{li2022ene,
    title = {E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context},
    author = {Zizhang Li and Mengmeng wang and Huaijin Pi and Kechun Xu and Jianbiao Mei and Yong Liu},
    year = 2022,
    booktitle = {European Conference on Computer Vision (ECCV)},
    doi = {10.1007/978-3-031-19833-5_16},
    abstract = {Recently, the image-wise implicit neural representation of videos, NeRV, has gained popularity for its promising results and swift speed compared to regular pixel-wise implicit representations. However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance. The key reason of this phenomenon is the coupled formulation of NeRV, which outputs the spatial and temporal information of video frames directly from the frame index input. In this paper, we propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context. Under the guidance of this new formulation, our model greatly reduces the redundant model parameters, while retaining the representation ability. We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than 8× faster speed on convergence. Code is available at https://github.com/kyleleey/E-NeRV.}
    }
  • Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong Liu, and Jifeng Dai. Searching Parameterized AP Loss for Object Detection. In Advances in Neural Information Processing Systems 34 – 35th Conference on Neural Information Processing Systems, pages 22021-22033, 2021.
    [BibTeX] [Abstract] [PDF]
    Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object detectors adopt separate differentiable losses for the two sub-tasks. Such a mis-alignment issue may well lead to performance degradation. To address this, existing works seek to design surrogate losses for the AP metric manually, which requires expertise and may still be sub-optimal. In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation. Different AP approximations are thus represented by a family of parameterized functions in a uni-fied formula. Automatic parameter search algorithm is then employed to search for the optimal parameters. Extensive experiments on the COCO benchmark with three different object detectors (i.e., RetinaNet, Faster R-CNN, and Deformable DETR) demonstrate that the proposed Parameterized AP Loss consistently outperforms existing handcrafted losses. Code shall be released.
    @inproceedings{li2021spa,
    title = {Searching Parameterized AP Loss for Object Detection},
    author = {Chenxin Tao and Zizhang Li and Xizhou Zhu and Gao Huang and Yong Liu and Jifeng Dai},
    year = 2021,
    booktitle = {Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems},
    pages = {22021-22033},
    abstract = {Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object detectors adopt separate differentiable losses for the two sub-tasks. Such a mis-alignment issue may well lead to performance degradation. To address this, existing works seek to design surrogate losses for the AP metric manually, which requires expertise and may still be sub-optimal. In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation. Different AP approximations are thus represented by a family of parameterized functions in a uni-fied formula. Automatic parameter search algorithm is then employed to search for the optimal parameters. Extensive experiments on the COCO benchmark with three different object detectors (i.e., RetinaNet, Faster R-CNN, and Deformable DETR) demonstrate that the proposed Parameterized AP Loss consistently outperforms existing handcrafted losses. Code shall be released.}
    }

Links

https://github.com/kylelee