Address

Room 101, Institute of Cyber-Systems and Control, Yuquan Campus, Zhejiang University, Hangzhou, Zhejiang, China

Contact Information

Email: jiatengwei@zju.edu.cn

Jiateng Wei

MS Student

Institute of Cyber-Systems and Control, Zhejiang University, China

Biography

I am pursuing my master’s degree in Control Engineering, Zhejiang University, Hangzhou, China. My major research interests lie in network pruning, quantization, and deployment.

Research and Interests

  • Network Pruning
  • Neural Network Deployment

Publications

  • Jiateng Wei, Quan Lu, Ning Jiang, Siqi Li, Jingyang Xiang, Jun Chen, and Yong Liu. Structured Optimal Brain Pruning for Large Language Models. In The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 13991-14007, 2024.
    [BibTeX] [Abstract] [DOI]
    language=”eng” data-ev-field=”abstract”>The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.
    @inproceedings{wei2024sob,
    title = {Structured Optimal Brain Pruning for Large Language Models},
    author = {Jiateng Wei and Quan Lu and Ning Jiang and Siqi Li and Jingyang Xiang and Jun Chen and Yong Liu},
    year = 2024,
    booktitle = {The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    pages = {13991-14007},
    doi = {10.18653/v1/2024.emnlp-main.775},
    abstract = {language="eng" data-ev-field="abstract">The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.}
    }