Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: linaliu@zju.edu.cn

Lina Liu

PhD Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I’m pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include depth estimation, OCR, object detection, etc.

Research and Interests

Depth Estimation
OCR

Publications

  • Lina Liu, Yiyi Liao, Yue Wang, Andreas Geiger, and Yong Liu. Learning Steering Kernels for Guided Depth Completion. IEEE Transactions on Image Processing, 30:2850–2861, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    This paper addresses the guided depth completion task in which the goal is to predict a dense depth map given a guidance RGB image and sparse depth measurements. Recent advances on this problem nurture hopes that one day we can acquire accurate and dense depth at a very low cost. A major challenge of guided depth completion is to effectively make use of extremely sparse measurements, e.g., measurements covering less than 1% of the image pixels. In this paper, we propose a fully differentiable model that avoids convolving on sparse tensors by jointly learning depth interpolation and refinement. More specifically, we propose a differentiable kernel regression layer that interpolates the sparse depth measurements via learned kernels. We further refine the interpolated depth map using a residual depth refinement layer which leads to improved performance compared to learning absolute depth prediction using a vanilla network. We provide experimental evidence that our differentiable kernel regression layer not only enables end-to-end training from very sparse measurements using standard convolutional network architectures, but also leads to better depth interpolation results compared to existing heuristically motivated methods. We demonstrate that our method outperforms many state-of-the-art guided depth completion techniques on both NYUv2 and KITTI. We further show the generalization ability of our method with respect to the density and spatial statistics of the sparse depth measurements.
    @article{liu2021lsk,
    title = {Learning Steering Kernels for Guided Depth Completion},
    author = {Lina Liu and Yiyi Liao and Yue Wang and Andreas Geiger and Yong Liu},
    year = 2021,
    journal = {IEEE Transactions on Image Processing},
    volume = 30,
    pages = {2850--2861},
    doi = {10.1109/TIP.2021.3055629},
    abstract = {This paper addresses the guided depth completion task in which the goal is to predict a dense depth map given a guidance RGB image and sparse depth measurements. Recent advances on this problem nurture hopes that one day we can acquire accurate and dense depth at a very low cost. A major challenge of guided depth completion is to effectively make use of extremely sparse measurements, e.g., measurements covering less than 1% of the image pixels. In this paper, we propose a fully differentiable model that avoids convolving on sparse tensors by jointly learning depth interpolation and refinement. More specifically, we propose a differentiable kernel regression layer that interpolates the sparse depth measurements via learned kernels. We further refine the interpolated depth map using a residual depth refinement layer which leads to improved performance compared to learning absolute depth prediction using a vanilla network. We provide experimental evidence that our differentiable kernel regression layer not only enables end-to-end training from very sparse measurements using standard convolutional network architectures, but also leads to better depth interpolation results compared to existing heuristically motivated methods. We demonstrate that our method outperforms many state-of-the-art guided depth completion techniques on both NYUv2 and KITTI. We further show the generalization ability of our method with respect to the density and spatial statistics of the sparse depth measurements.}
    }
  • Lina Liu, Xibin Song, Mengmeng Wang, Yong Liu, and Liangjun Zhang. Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation. In 2021 International Conference on Computer Vision, 2021.
    [BibTeX] [Abstract] [PDF]
    Remarkable results have been achieved by DCNN based self-supervised depth estimation approaches. However, most of these approaches can only handle either day-time or night-time images, while their performance degrades for all-day images due to large domain shift and the variation of illumination between day and night images. To relieve these limitations, we propose a domain-separated network for self-supervised depth estimation of all-day images. Specifically, to relieve the negative influence of disturbing terms (illumination, etc.), we partition the information of day and night image pairs into two complementary sub-spaces: private and invariant domains, where the former contains the unique information (illumination, etc.) of day and night images and the latter contains essential shared information (texture, etc.). Meanwhile, to guarantee that the day and night images contain the same information, the domain-separated network takes the day-time images and corresponding night-time images (generated by GAN) as input, and the private and invariant feature extractors are learned by orthogonality and similarity loss, where the domain gap can be alleviated, thus better depth maps can be expected. Meanwhile, the reconstruction and photometric losses are utilized to estimate complementary information and depth maps effectively. Experimental results demonstrate that our approach achieves state-of-the art depth estimation results for all-day images on the challenging Oxford RobotCar dataset, proving the superiority of our proposed approach. Code and data split are available at https://github.com/LINA-lln/ADDS-DepthNet.
    @inproceedings{liu2021selfsm,
    title = {Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation},
    author = {Lina Liu and Xibin Song and Mengmeng Wang and Yong Liu and Liangjun Zhang},
    year = 2021,
    booktitle = {2021 International Conference on Computer Vision},
    abstract = {Remarkable results have been achieved by DCNN based self-supervised depth estimation approaches. However, most of these approaches can only handle either day-time or night-time images, while their performance degrades for all-day images due to large domain shift and the variation of illumination between day and night images. To relieve these limitations, we propose a domain-separated network for self-supervised depth estimation of all-day images. Specifically, to relieve the negative influence of disturbing terms (illumination, etc.), we partition the information of day and night image pairs into two complementary sub-spaces: private and invariant domains, where the former contains the unique information (illumination, etc.) of day and night images and the latter contains essential shared information (texture, etc.). Meanwhile, to guarantee that the day and night images contain the same information, the domain-separated network takes the day-time images and corresponding night-time images (generated by GAN) as input, and the private and invariant feature extractors are learned by orthogonality and similarity loss, where the domain gap can be alleviated, thus better depth maps can be expected. Meanwhile, the reconstruction and photometric losses are utilized to estimate complementary information and depth maps effectively. Experimental results demonstrate that our approach achieves state-of-the art depth estimation results for all-day images on the challenging Oxford RobotCar dataset, proving the superiority of our proposed approach. Code and data split are available at https://github.com/LINA-lln/ADDS-DepthNet.}
    }
  • Lina Liu, Xibin Song, Xiaoyang Lyu, Junwei Diao, Mengmeng Wang, Yong Liu, and Liangjun Zhang. FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Monocular Depth Completion. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 2021.
    [BibTeX] [Abstract] [arXiv] [PDF]
    Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate the depth completion as a one-stage end-to-end learning task, which outputs dense depth maps directly. However, the feature extraction and supervision in one-stage frameworks are insufficient, limiting the performance of these approaches. To address this problem, we propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task, i.e., a sparse-to-coarse stage and a coarse-to-fine stage. First, a coarse dense depth map is obtained by a simple CNN framework. Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with coarse depth map and color image as input. Specially, in the coarse-to-fine stage, a channel shuffle extraction operation is utilized to extract more representative features from color image and coarse depth map, and an energy based fusion operation is exploited to effectively fuse these features obtained by channel shuffle operation, thus leading to more accurate and refined depth maps. We achieve SoTA performance in RMSE on KITTI benchmark. Extensive experiments on other datasets future demonstrate the superiority of our approach over current state-of-the-art depth completion approaches.
    @inproceedings{liu2020fcfrnetff,
    title = {FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Monocular Depth Completion},
    author = {Lina Liu and Xibin Song and Xiaoyang Lyu and Junwei Diao and Mengmeng Wang and Yong Liu and Liangjun Zhang},
    year = 2021,
    booktitle = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)},
    abstract = {Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate the depth completion as a one-stage end-to-end learning task, which outputs dense depth maps directly. However, the feature extraction and supervision in one-stage frameworks are insufficient, limiting the performance of these approaches. To address this problem, we propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task, i.e., a sparse-to-coarse stage and a coarse-to-fine stage. First, a coarse dense depth map is obtained by a simple CNN framework. Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with coarse depth map and color image as input. Specially, in the coarse-to-fine stage, a channel shuffle extraction operation is utilized to extract more representative features from color image and coarse depth map, and an energy based fusion operation is exploited to effectively fuse these features obtained by channel shuffle operation, thus leading to more accurate and refined depth maps. We achieve SoTA performance in RMSE on KITTI benchmark. Extensive experiments on other datasets future demonstrate the superiority of our approach over current state-of-the-art depth completion approaches.},
    arxiv = {https://arxiv.org/pdf/2012.08270.pdf}
    }
  • Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, and Yi Yuan. HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 2021.
    [BibTeX] [Abstract] [arXiv] [PDF]
    Self-supervised learning shows great potential in monoculardepth estimation, using image sequences as the only source ofsupervision. Although people try to use the high-resolutionimage for depth estimation, the accuracy of prediction hasnot been significantly improved. In this work, we find thecore reason comes from the inaccurate depth estimation inlarge gradient regions, making the bilinear interpolation er-ror gradually disappear as the resolution increases. To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information. Therefore, we present an improvedDepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution. Moreover, previousstate-of-the-art methods are based on fairly complex and deepnetworks with a mass of parameters which limits their realapplications. Thus we also construct a lightweight networkwhich uses MobileNetV3 as encoder. Experiments show thatthe lightweight network can perform on par with many largemodels like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at this https URL.
    @inproceedings{lyu2020hrdepthhr,
    title = {HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation},
    author = {Xiaoyang Lyu and Liang Liu and Mengmeng Wang and Xin Kong and Lina Liu and Yong Liu and Xinxin Chen and Yi Yuan},
    year = 2021,
    booktitle = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)},
    abstract = {Self-supervised learning shows great potential in monoculardepth estimation, using image sequences as the only source ofsupervision. Although people try to use the high-resolutionimage for depth estimation, the accuracy of prediction hasnot been significantly improved. In this work, we find thecore reason comes from the inaccurate depth estimation inlarge gradient regions, making the bilinear interpolation er-ror gradually disappear as the resolution increases. To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information. Therefore, we present an improvedDepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution. Moreover, previousstate-of-the-art methods are based on fairly complex and deepnetworks with a mass of parameters which limits their realapplications. Thus we also construct a lightweight networkwhich uses MobileNetV3 as encoder. Experiments show thatthe lightweight network can perform on par with many largemodels like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at this https URL.},
    arxiv = {https://arxiv.org/pdf/2012.07356.pdf}
    }
  • Xiangrui Zhao, Lina Liu, Renjie Zheng, Wenlong Ye, and Yong Liu. A Robust Stereo Feature-aided Semi-direct SLAM System. Robotics and Autonomous Systems, 132:103597, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]
    In autonomous driving, many intelligent perception technologies have been put in use. However, visual SLAM still has problems with robustness, which limits its application, although it has been developed for a long time. We propose a feature-aided semi-direct approach to combine the direct and indirect methods in visual SLAM to allow robust localization under various situations, including large-baseline motion, textureless environment, and great illumination changes. In our approach, we first calculate inter-frame pose estimation by feature matching. Then we use the direct alignment and a multi-scale pyramid, which employs the previous coarse estimation as a priori, to obtain a more precise result. To get more accurate photometric parameters, we combine the online photometric calibration method with visual odometry. Furthermore, we replace the Shi–Tomasi corner with the ORB feature, which is more robust to illumination. For extreme brightness change, we employ the dark channel prior to weaken the halation and maintain the consistency of the image. To evaluate our approach, we build a full stereo visual SLAM system. Experiments on the publicly available dataset and our mobile robot dataset indicate that our approach improves the accuracy and robustness of the SLAM system.
    @article{zhao2020ars,
    title = {A Robust Stereo Feature-aided Semi-direct SLAM System},
    author = {Xiangrui Zhao and Lina Liu and Renjie Zheng and Wenlong Ye and Yong Liu},
    year = 2020,
    journal = {Robotics and Autonomous Systems},
    volume = 132,
    pages = 103597,
    doi = {https://doi.org/10.1016/j.robot.2020.103597},
    abstract = {In autonomous driving, many intelligent perception technologies have been put in use. However, visual SLAM still has problems with robustness, which limits its application, although it has been developed for a long time. We propose a feature-aided semi-direct approach to combine the direct and indirect methods in visual SLAM to allow robust localization under various situations, including large-baseline motion, textureless environment, and great illumination changes. In our approach, we first calculate inter-frame pose estimation by feature matching. Then we use the direct alignment and a multi-scale pyramid, which employs the previous coarse estimation as a priori, to obtain a more precise result. To get more accurate photometric parameters, we combine the online photometric calibration method with visual odometry. Furthermore, we replace the Shi–Tomasi corner with the ORB feature, which is more robust to illumination. For extreme brightness change, we employ the dark channel prior to weaken the halation and maintain the consistency of the image. To evaluate our approach, we build a full stereo visual SLAM system. Experiments on the publicly available dataset and our mobile robot dataset indicate that our approach improves the accuracy and robustness of the SLAM system.}
    }