Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: corenel@zju.edu.cn

Yusu Pan

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my master’s degree at Zhejiang University, advised by Professor Yong Liu. My recent research focus is on using GANs to synthesize realistic images and videos, with applications to customization, visual manipulations, and beyond.

Research and Interests

  • Computer Vision
  • Generative Adversarial Network

Publications

  • Xiangfang Zeng, Yusu Pan, Hao Zhang, Mengmeng Wang, Guanzhong Tian, and Yong Liu. Unpaired Salient Object Translation via Spatial Attention Prior. Neurocomputing, 2021.
    [BibTeX] [Abstract] [DOI] [PDF]
    With only set-level constraints, unpaired image translation is challenging in discovering the correct semantic-level correspondences between two domains. This limitation often results in false positives such as significantly changing color and appearance of the background during image translation. To address this limitation, we propose the Spatial Attention-Aware Generative Adversarial Network (SAAGAN), a novel approach to jointly learn salient object discovery and translation. Specifically, our generator consists of (1) spatial attention prediction branch and (2) image translation branch. For attention branch, we extract spatial attention prior from a pre-trained classification network to provide weak supervision for object discovery. The proposed attention loss can largely stabilize the training process of attention-guided generator. For translation branch, we revise classical adversarial loss for salient object translation. Such a discriminator only distinguish the distribution of the object between two domains. What is more, we propose a fake sample augmentation strategy to provide extra spatial information for discriminator. Our approach allows simultaneously locating the attention areas in each image and translating the related areas between two domains. Extensive experiments and evaluations show that our model can achieve more realistic mappings compared to state-of-the-art unpaired image translation methods.
    @article{zeng2021unpairedso,
    title = {Unpaired Salient Object Translation via Spatial Attention Prior},
    author = {Xiangfang Zeng and Yusu Pan and Hao Zhang and Mengmeng Wang and Guanzhong Tian and Yong Liu},
    year = 2021,
    journal = {Neurocomputing},
    doi = {10.1016/j.neucom.2020.05.105},
    abstract = {With only set-level constraints, unpaired image translation is challenging in discovering the correct semantic-level correspondences between two domains. This limitation often results in false positives such as significantly changing color and appearance of the background during image translation. To address this limitation, we propose the Spatial Attention-Aware Generative Adversarial Network (SAAGAN), a novel approach to jointly learn salient object discovery and translation. Specifically, our generator consists of (1) spatial attention prediction branch and (2) image translation branch. For attention branch, we extract spatial attention prior from a pre-trained classification network to provide weak supervision for object discovery. The proposed attention loss can largely stabilize the training process of attention-guided generator. For translation branch, we revise classical adversarial loss for salient object translation. Such a discriminator only distinguish the distribution of the object between two domains. What is more, we propose a fake sample augmentation strategy to provide extra spatial information for discriminator. Our approach allows simultaneously locating the attention areas in each image and translating the related areas between two domains. Extensive experiments and evaluations show that our model can achieve more realistic mappings compared to state-of-the-art unpaired image translation methods.}
    }
  • Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, and Yong Liu. Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.
    @inproceedings{zeng2020realisticfr,
    title = {Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose},
    author = {Xianfang Zeng and Yusu Pan and Mengmeng Wang and Jiangning Zhang and Yong Liu},
    year = 2020,
    booktitle = {Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI)},
    doi = {https://doi.org/10.1609/AAAI.V34I07.6970},
    abstract = {Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.},
    arxiv = {https://arxiv.org/pdf/2003.12957.pdf}
    }
  • Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu, Yong Liu, Yu Ding, and Changjie Fan. FReeNet: Multi-Identity Face Reenactment. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), page 5325–5334, 2020.
    [BibTeX] [Abstract] [DOI] [arXiv] [PDF]
    This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model. The proposed FReeNet consists of two parts: Unified Landmark Converter (ULC) and Geometry-aware Generator (GAG). The ULC adopts an encode-decoder architecture to efficiently convert expression in a latent landmark space, which significantly narrows the gap of the face contour between source and target identities. The GAG leverages the converted landmark to reenact the photorealistic image with a reference image of the target person. Moreover, a new triplet perceptual loss is proposed to force the GAG module to learn appearance and geometry information simultaneously, which also enriches facial details of the reenacted images. Further experiments demonstrate the superiority of our approach for generating photorealistic and expression-alike faces, as well as the flexibility for transferring facial expressions between identities.
    @inproceedings{zhang2020freenetmf,
    title = {FReeNet: Multi-Identity Face Reenactment},
    author = {Jiangning Zhang and Xianfang Zeng and Mengmeng Wang and Yusu Pan and Liang Liu and Yong Liu and Yu Ding and Changjie Fan},
    year = 2020,
    booktitle = {2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    pages = {5325--5334},
    doi = {https://doi.org/10.1109/cvpr42600.2020.00537},
    abstract = {This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model. The proposed FReeNet consists of two parts: Unified Landmark Converter (ULC) and Geometry-aware Generator (GAG). The ULC adopts an encode-decoder architecture to efficiently convert expression in a latent landmark space, which significantly narrows the gap of the face contour between source and target identities. The GAG leverages the converted landmark to reenact the photorealistic image with a reference image of the target person. Moreover, a new triplet perceptual loss is proposed to force the GAG module to learn appearance and geometry information simultaneously, which also enriches facial details of the reenacted images. Further experiments demonstrate the superiority of our approach for generating photorealistic and expression-alike faces, as well as the flexibility for transferring facial expressions between identities.},
    arxiv = {http://arxiv.org/pdf/1905.11805}
    }

Links

Education

  1. March 2020

    MSc

    Control Science and Engineering, Zhejiang University, China
  2. June 2017

    BSc

    Automation, Zhejiang University, China