• Dense 3D reconstruction

    Dense 3D reconstruction results of Kaist-Urban-07 dataset by simply assembling 2D LiDAR scans from SICK LMS-511 with the estimated continuous-time trajectory from CLINS.

  • Time-lapse Video Generation

    In this paper, we propose a novel end-to-end one-stage dynamic time-lapse video generation framework, i.e. DTVNet, to generate diversified time-lapse videos from a single landscape image.

About Research Group

Welcome to the website of the APRIL Lab led by Prof. Yong Liu. Our lab was founded in December 2011 and is part of the Institute of Cyber-Systems and Control, at the Zhejiang University.

Our mission is to investigate the fundamental challenges and practical applications of robotics and computer vision for the benefit of all humanity. Our main interests encompass the areas of deep learning, computer vision, SLAM, and robotics.

read more

Representative Publications

  • X. Zuo, W. Ye, Y. Yang, R. Zheng, T. Vidal-Calleja, G. Huang, and Y. Liu, “Multimodal localization: Stereo over LiDAR map," Journal of Field Robotics, vol. 37, p. 1003–1026, 2020.
    [BibTeX] [Abstract] [DOI] [PDF]

    In this paper, we present a real‐time high‐precision visual localization system for an autonomous vehicle which employs only low‐cost stereo cameras to localize the vehicle with a priori map built using a more expensive 3D LiDAR sensor. To this end, we construct two different visual maps: a sparse feature visual map for visual odometry (VO) based motion tracking, and a semidense visual map for registration with the prior LiDAR map. To register two point clouds sourced from different modalities (i.e., cameras and LiDAR), we leverage probabilistic weighted normal distributions transformation (ProW‐NDT), by particularly taking into account the uncertainty of source point clouds. The registration results are then fused via pose graph optimization to correct the VO drift. Moreover, surfels extracted from the prior LiDAR map are used to refine the sparse 3D visual features that will further improve VO‐based motion estimation. The proposed system has been tested extensively in both simulated and real‐world experiments, showing that robust, high‐precision, real‐time localization can be achieved.

    @article{zuo2020multimodalls,
    title = {Multimodal localization: Stereo over LiDAR map},
    author = {Xingxing Zuo and Wenlong Ye and Yulin Yang and Renjie Zheng and Teresa Vidal-Calleja and Guoquan Huang and Yong Liu},
    year = 2020,
    journal = {Journal of Field Robotics},
    volume = 37,
    pages = {1003--1026},
    doi = {https://doi.org/10.1002/rob.21936},
    abstract = {In this paper, we present a real‐time high‐precision visual localization system for an autonomous vehicle which employs only low‐cost stereo cameras to localize the vehicle with a priori map built using a more expensive 3D LiDAR sensor. To this end, we construct two different visual maps: a sparse feature visual map for visual odometry (VO) based motion tracking, and a semidense visual map for registration with the prior LiDAR map. To register two point clouds sourced from different modalities (i.e., cameras and LiDAR), we leverage probabilistic weighted normal distributions transformation (ProW‐NDT), by particularly taking into account the uncertainty of source point clouds. The registration results are then fused via pose graph optimization to correct the VO drift. Moreover, surfels extracted from the prior LiDAR map are used to refine the sparse 3D visual features that will further improve VO‐based motion estimation. The proposed system has been tested extensively in both simulated and real‐world experiments, showing that robust, high‐precision, real‐time localization can be achieved.}
    }

  • Y. Liao, Y. Wang, and Y. Liu, “Graph Regularized Auto-Encoders for Image Representation," IEEE Transactions on Image Processing, vol. 26, p. 2839–2852, 2017.
    [BibTeX] [Abstract] [DOI] [PDF]

    Image representation has been intensively explored in the domain of computer vision for its significant influence on the relative tasks such as image clustering and classification. It is valuable to learn a low-dimensional representation of an image which preserves its inherent information from the original image space. At the perspective of manifold learning, this is implemented with the local invariant idea to capture the intrinsic low-dimensional manifold embedded in the high-dimensional input space. Inspired by the recent successes of deep architectures, we propose a local invariant deep nonlinear mapping algorithm, called graph regularized auto-encoder (GAE). With the graph regularization, the proposed method preserves the local connectivity from the original image space to the representation space, while the stacked auto-encoders provide explicit encoding model for fast inference and powerful expressive capacity for complex modeling. Theoretical analysis shows that the graph regularizer penalizes the weighted Frobenius norm of the Jacobian matrix of the encoder mapping, where the weight matrix captures the local property in the input space. Furthermore, the underlying effects on the hidden representation space are revealed, providing insightful explanation to the advantage of the proposed method. Finally, the experimental results on both clustering and classification tasks demonstrate the effectiveness of our GAE as well as the correctness of the proposed theoretical analysis, and it also suggests that GAE is a superior solution to the current deep representation learning techniques comparing with variant auto-encoders and existing local invariant methods.

    @article{liao2017graphra,
    title = {Graph Regularized Auto-Encoders for Image Representation},
    author = {Yiyi Liao and Yue Wang and Yong Liu},
    year = 2017,
    journal = {IEEE Transactions on Image Processing},
    volume = 26,
    pages = {2839--2852},
    doi = {https://doi.org/10.1109/TIP.2016.2605010},
    abstract = {Image representation has been intensively explored in the domain of computer vision for its significant influence on the relative tasks such as image clustering and classification. It is valuable to learn a low-dimensional representation of an image which preserves its inherent information from the original image space. At the perspective of manifold learning, this is implemented with the local invariant idea to capture the intrinsic low-dimensional manifold embedded in the high-dimensional input space. Inspired by the recent successes of deep architectures, we propose a local invariant deep nonlinear mapping algorithm, called graph regularized auto-encoder (GAE). With the graph regularization, the proposed method preserves the local connectivity from the original image space to the representation space, while the stacked auto-encoders provide explicit encoding model for fast inference and powerful expressive capacity for complex modeling. Theoretical analysis shows that the graph regularizer penalizes the weighted Frobenius norm of the Jacobian matrix of the encoder mapping, where the weight matrix captures the local property in the input space. Furthermore, the underlying effects on the hidden representation space are revealed, providing insightful explanation to the advantage of the proposed method. Finally, the experimental results on both clustering and classification tasks demonstrate the effectiveness of our GAE as well as the correctness of the proposed theoretical analysis, and it also suggests that GAE is a superior solution to the current deep representation learning techniques comparing with variant auto-encoders and existing local invariant methods.}
    }

  • Y. Wang, Y. Liu, Y. Liao, and R. Xiong, “Scalable Learning Framework for Traversable Region Detection Fusing With Appearance and Geometrical Information," IEEE Transactions on Intelligent Transportation Systems, vol. 18, p. 3267–3281, 2017.
    [BibTeX] [Abstract] [DOI] [PDF]

    In this paper, we present an online learning framework for traversable region detection fusing both appearance and geometry information. Our framework proposes an appearance classifier supervised by the sparse geometric clues to capture the variation in online data, yielding dense detection result in real time. It provides superior detection performance using appearance information with weak geometric prior and can be further improved with more geometry from external sensors. The learning process is divided into three steps: First, we construct features from the super-pixel level, which reduces the computational cost compared with the pixel level processing. Then we classify the multi-scale super-pixels to vote the label of each pixel. Second, we use weighted extreme learning machine as our classifier to deal with the imbalanced data distribution since the weak geometric prior only initializes the labels in a small region. Finally, we employ the online learning process so that our framework can be adaptive to the changing scenes. Experimental results on three different styles of image sequences, i.e., shadow road, rain sequence, and variational sequence, demonstrate the adaptability, stability, and parameter insensitivity of our weak geometry motivated method. We further demonstrate the performance of learning framework on additional five challenging data sets captured by Kinect V2 and stereo camera, validating the method’s effectiveness and efficiency.

    @article{wang2017scalablelf,
    title = {Scalable Learning Framework for Traversable Region Detection Fusing With Appearance and Geometrical Information},
    author = {Yue Wang and Yong Liu and Yiyi Liao and Rong Xiong},
    year = 2017,
    journal = {IEEE Transactions on Intelligent Transportation Systems},
    volume = 18,
    pages = {3267--3281},
    doi = {https://doi.org/10.1109/TITS.2017.2682218},
    abstract = {In this paper, we present an online learning framework for traversable region detection fusing both appearance and geometry information. Our framework proposes an appearance classifier supervised by the sparse geometric clues to capture the variation in online data, yielding dense detection result in real time. It provides superior detection performance using appearance information with weak geometric prior and can be further improved with more geometry from external sensors. The learning process is divided into three steps: First, we construct features from the super-pixel level, which reduces the computational cost compared with the pixel level processing. Then we classify the multi-scale super-pixels to vote the label of each pixel. Second, we use weighted extreme learning machine as our classifier to deal with the imbalanced data distribution since the weak geometric prior only initializes the labels in a small region. Finally, we employ the online learning process so that our framework can be adaptive to the changing scenes. Experimental results on three different styles of image sequences, i.e., shadow road, rain sequence, and variational sequence, demonstrate the adaptability, stability, and parameter insensitivity of our weak geometry motivated method. We further demonstrate the performance of learning framework on additional five challenging data sets captured by Kinect V2 and stereo camera, validating the method’s effectiveness and efficiency.}
    }

  • Y. Liu, R. Xiong, Y. Wang, H. Huang, X. Xie, X. Liu, and G. Zhang, “Stereo Visual-Inertial Odometry With Multiple Kalman Filters Ensemble," IEEE Transactions on Industrial Electronics, vol. 63, p. 6205–6216, 2016.
    [BibTeX] [Abstract] [DOI] [PDF]

    In this paper, we present a stereo visual-inertial odometry algorithm assembled with three separated Kalman filters, i.e., attitude filter, orientation filter, and position filter. Our algorithm carries out the orientation and position estimation with three filters working on different fusion intervals, which can provide more robustness even when the visual odometry estimation fails. In our orientation estimation, we propose an improved indirect Kalman filter, which uses the orientation error space represented by unit quaternion as the state of the filter. The performance of the algorithm is demonstrated through extensive experimental results, including the benchmark KITTI datasets and some challenging datasets captured in a rough terrain campus.

    @article{liu2016stereovo,
    title = {Stereo Visual-Inertial Odometry With Multiple Kalman Filters Ensemble},
    author = {Yong Liu and Rong Xiong and Yue Wang and Hong Huang and Xiaojia Xie and Xiaofeng Liu and Gaoming Zhang},
    year = 2016,
    journal = {IEEE Transactions on Industrial Electronics},
    volume = 63,
    pages = {6205--6216},
    doi = {https://doi.org/10.1109/TIE.2016.2573765},
    abstract = {In this paper, we present a stereo visual-inertial odometry algorithm assembled with three separated Kalman filters, i.e., attitude filter, orientation filter, and position filter. Our algorithm carries out the orientation and position estimation with three filters working on different fusion intervals, which can provide more robustness even when the visual odometry estimation fails. In our orientation estimation, we propose an improved indirect Kalman filter, which uses the orientation error space represented by unit quaternion as the state of the filter. The performance of the algorithm is demonstrated through extensive experimental results, including the benchmark KITTI datasets and some challenging datasets captured in a rough terrain campus.}
    }

  • Y. Liu, F. Tang, and Z. Zeng, “Feature Selection Based on Dependency Margin," IEEE Transactions on Cybernetics, vol. 45, p. 1209–1221, 2015.
    [BibTeX] [Abstract] [DOI] [PDF]

    Feature selection tries to find a subset of feature from a larger feature pool and the selected subset can provide the same or even better performance compared with using the whole set. Feature selection is usually a critical preprocessing step for many machine-learning applications such as clustering and classification. In this paper, we focus on feature selection for supervised classification which targets at finding features that can best predict class labels. Traditional greedy search algorithms incrementally find features based on the relevance of candidate features and the class label. However, this may lead to suboptimal results when there are redundant features that may interfere with the selection. To solve this problem, we propose a subset selection algorithm that considers both the selected and remaining features’ relevances with the label. The intuition is that features, which do not have better alternatives from the feature set, should be selected first. We formulate the selection problem as maximizing the dependency margin which is measured by the difference between the selected feature set performance and the remaining feature set performance. Extensive experiments on various data sets show the superiority of the proposed approach against traditional algorithms.

    @article{liu2015featuresb,
    title = {Feature Selection Based on Dependency Margin},
    author = {Yong Liu and Feng Tang and Zhiyong Zeng},
    year = 2015,
    journal = {IEEE Transactions on Cybernetics},
    volume = 45,
    pages = {1209--1221},
    doi = {https://doi.org/10.1109/TCYB.2014.2347372},
    abstract = {Feature selection tries to find a subset of feature from a larger feature pool and the selected subset can provide the same or even better performance compared with using the whole set. Feature selection is usually a critical preprocessing step for many machine-learning applications such as clustering and classification. In this paper, we focus on feature selection for supervised classification which targets at finding features that can best predict class labels. Traditional greedy search algorithms incrementally find features based on the relevance of candidate features and the class label. However, this may lead to suboptimal results when there are redundant features that may interfere with the selection. To solve this problem, we propose a subset selection algorithm that considers both the selected and remaining features' relevances with the label. The intuition is that features, which do not have better alternatives from the feature set, should be selected first. We formulate the selection problem as maximizing the dependency margin which is measured by the difference between the selected feature set performance and the remaining feature set performance. Extensive experiments on various data sets show the superiority of the proposed approach against traditional algorithms.}
    }

  • C. Xu, X. Wu, M. Wang, F. Qiu, Y. Liu, and J. Ren, “Improving Dynamic Gesture Recognition in Untrimmed Videos by An Online Lightweight Framework and A New Gesture Dataset ZJUGesture," Neurocomputing, 2023.
    [BibTeX] [Abstract] [DOI]

    Human–computer interaction technology brings great convenience to people, and dynamic gesture recognition makes it possible for a man to interact naturally with a machine. However, recognizing gestures quickly and precisely in untrimmed videos remains a challenge in real-world systems since: (1) It is challenging to locate the temporal boundaries of performing gestures; (2) There are significant differences in performing gestures among different people, resulting in a variety of gestures; (3) There must be a trade-off between the accuracy and the computational consumption. In this work, we propose an online lightweight two-stage framework, including a detection module and a gesture recognition module, to precisely detect and classify dynamic gestures in untrimmed videos. Specifically, we first design a low-power detection module to locate gestures in time series, then a temporal relational reasoning module is employed for gesture recognition. Moreover, we present a new dynamic gesture dataset named ZJUGesture, which contains nine classes of common gestures in various scenarios. Extensive experiments on the proposed ZJUGesture and 20-bn-Jester dataset demonstrate the attractive performance of our method with high accuracy and a low computational cost.

    @article{xv2022idg,
    title = {Improving Dynamic Gesture Recognition in Untrimmed Videos by An Online Lightweight Framework and A New Gesture Dataset ZJUGesture},
    author = {Chao Xu and Xia Wu and Mengmeng Wang and Feng Qiu and Yong Liu and Jun Ren},
    year = 2023,
    journal = {Neurocomputing},
    doi = {10.1016/j.neucom.2022.12.022},
    abstract = {Human–computer interaction technology brings great convenience to people, and dynamic gesture recognition makes it possible for a man to interact naturally with a machine. However, recognizing gestures quickly and precisely in untrimmed videos remains a challenge in real-world systems since: (1) It is challenging to locate the temporal boundaries of performing gestures; (2) There are significant differences in performing gestures among different people, resulting in a variety of gestures; (3) There must be a trade-off between the accuracy and the computational consumption. In this work, we propose an online lightweight two-stage framework, including a detection module and a gesture recognition module, to precisely detect and classify dynamic gestures in untrimmed videos. Specifically, we first design a low-power detection module to locate gestures in time series, then a temporal relational reasoning module is employed for gesture recognition. Moreover, we present a new dynamic gesture dataset named ZJUGesture, which contains nine classes of common gestures in various scenarios. Extensive experiments on the proposed ZJUGesture and 20-bn-Jester dataset demonstrate the attractive performance of our method with high accuracy and a low computational cost.}
    }

  • L. Li, W. Ding, Y. Wen, Y. Liang, Y. Liu, and G. Wan, “A Unified BEV Model for Joint Learning 3D Local Features and Overlap Estimation," in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
    [BibTeX]
    @inproceedings{li2023bev,
    title = {A Unified BEV Model for Joint Learning 3D Local Features and Overlap Estimation},
    author = {Lin Li and Wendong Ding and Yongkun Wen and Yufei Liang and Yong Liu and Guowei Wan},
    year = 2023,
    booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)}
    }

  • G. Xu, D. Zhu, J. Cao, Y. Liu, and J. Yang, “Shunted Collision Avoidance for Multi-UAV Motion Planning with Posture Constraints," in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
    [BibTeX]
    @inproceedings{xu2023sca,
    title = {Shunted Collision Avoidance for Multi-UAV Motion Planning with Posture Constraints},
    author = {Gang Xu and Deye Zhu and Junjie Cao and Yong Liu and Jian Yang},
    year = 2023,
    booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)}
    }

View all Publications