Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: chencan.fu@zju.edu.cn

Chencan Fu

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I have recived my bachelor’s degree in Mechatronic Engineering, Harbin Institute of Technology, Harbin, China and I am pursuing my M.S. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China. My major research interests include sensor fusion and SLAM.

Research and Interests

  • sensor fusion
  • SLAM

Publications

  • Chencan Fu, Yabiao Wang, Jiangning Zhang, Zhengkai Jiang, Xiaofeng Mao, Jiafu Wu, Weijian Cao, Chengjie Wang, Yanhao Ge, and Yong Liu. MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion. In 32nd ACM International Conference on Multimedia (MM), pages 10794-10803, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Co-speech gesture generation is crucial for producing synchronized and realistic human gestures that accompany speech, enhancing the animation of lifelike avatars in virtual environments. While diffusion models have shown impressive capabilities, current approaches often overlook a wide range of modalities and their interactions, resulting in less dynamic and contextually varied gestures. To address these challenges, we present MambaGesture, a novel framework integrating a Mamba-based attention block, MambaAttn, with a multi-modality feature fusion module, SEAD. The MambaAttn block combines the sequential data processing strengths of the Mamba model with the contextual richness of attention mechanisms, enhancing the temporal coherence of generated gestures. SEAD adeptly fuses audio, text, style, and emotion modalities, employing disentanglement to deepen the fusion process and yield gestures with greater realism and diversity. Our approach, rigorously evaluated on the multi-modal BEAT dataset, demonstrates significant improvements in Fréchet Gesture Distance (FGD), diversity scores, and beat alignment, achieving state-of-the-art performance in co-speech gesture generation.
    @inproceedings{fu2024mge,
    title = {MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion},
    author = {Chencan Fu and Yabiao Wang and Jiangning Zhang and Zhengkai Jiang and Xiaofeng Mao and Jiafu Wu and Weijian Cao and Chengjie Wang and Yanhao Ge and Yong Liu},
    year = 2024,
    booktitle = {32nd ACM International Conference on Multimedia (MM)},
    pages = {10794-10803},
    doi = {10.1145/3664647.3680625},
    abstract = {Co-speech gesture generation is crucial for producing synchronized and realistic human gestures that accompany speech, enhancing the animation of lifelike avatars in virtual environments. While diffusion models have shown impressive capabilities, current approaches often overlook a wide range of modalities and their interactions, resulting in less dynamic and contextually varied gestures. To address these challenges, we present MambaGesture, a novel framework integrating a Mamba-based attention block, MambaAttn, with a multi-modality feature fusion module, SEAD. The MambaAttn block combines the sequential data processing strengths of the Mamba model with the contextual richness of attention mechanisms, enhancing the temporal coherence of generated gestures. SEAD adeptly fuses audio, text, style, and emotion modalities, employing disentanglement to deepen the fusion process and yield gestures with greater realism and diversity. Our approach, rigorously evaluated on the multi-modal BEAT dataset, demonstrates significant improvements in Fréchet Gesture Distance (FGD), diversity scores, and beat alignment, achieving state-of-the-art performance in co-speech gesture generation.}
    }
  • Chencan Fu, Lin Li, Jianbiao Mei, Yukai Ma, Linpeng Peng, Xiangrui Zhao, and Yong Liu. A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 8493-8499, 2024.
    [BibTeX] [Abstract] [DOI] [PDF]
    Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird’s Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.
    @inproceedings{fu2024ctf,
    title = {A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation},
    author = {Chencan Fu and Lin Li and Jianbiao Mei and Yukai Ma and Linpeng Peng and Xiangrui Zhao and Yong Liu},
    year = 2024,
    booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)},
    pages = {8493-8499},
    doi = {10.1109/ICRA57147.2024.10611569},
    abstract = {Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird's Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.}
    }