Dr Shiwei Liu

Ph.D.

Pronouns

He / Him

Status

Research Fellow

+44 1865 615321

Contact form

https://shiweiliuiiiiiii.github.io/

ORCID iD

https://orcid.org/https://orcid.org/0009-0001-1255-4436

Research groups

Address

Mathematical Institute
University of Oxford
Andrew Wiles Building
Radcliffe Observatory Quarter
Woodstock Road
Oxford
OX2 6GG

Major / recent publications

Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Lu Yin, Qiao Xiao, Stavros Petridis, Shiwei Liu, Maja Pantic. MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization. Interspeech, 2024.

Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu. Dynamic Data Pruning for Automatic Speech Recognition. Interspeech, 2024.

Zhang, Zhenyu, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, and Zhangyang Wang. "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding." arXiv preprint arXiv:2403.04797 (2024).

Yin, Lu, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, and Shiwei Liu. "Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity." The Forty-first International Conference on
Machine Learning (ICML), 2024.

Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, and Zhangyang Wang. "Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs Difficult Downstream Tasks in LLMs." The Forty-first International Conference on Machine Learning (ICML), 2024.

Yuxin Zhang, Yuxuan Du, Gen Luo, Yunshan Zhong, Zhenyu Zhang, Shiwei Liu, Rongrong Ji. "CaM: Cache Merging for Memory-efficient LLMs Inference." The Forty-first International Conference on Machine Learning (ICML), 2024.

Jie Ji, Gen Li, Lu Yin, Minghai Qin, Geng Yuan, Linke Guo, Shiwei Liu, Xiaolong Ma. Advancing Dynamic Sparse Training by Exploring Optimization Opportunities. The Forty-first International Conference on Machine Learning (ICML), 2024.

Zhangheng Li, Shiwei Liu, Tianlong Chen, Ajay Kumar Jaiswal, Zhenyu Zhang, Dilin Wang, Raghuraman Krishnamoorthi, Shiyu Chang, Zhangyang Wang. Sparse Cocktail: Co-Training Many Sparsity Patterns and Ratios at Once. The Forty-first International Conference on Machine
Learning (ICML), 2024.

Zhang, Yuxin, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, and Rongrong Ji. "Dynamic sparse no training: Training-free fine-tuning for sparse llms." In The Twelfth International Conference on Learning Representations. 2024.

Li, Gen, Lu Yin, Jie Ji, Wei Niu, Minghai Qin, Bin Ren, Linke Guo, Shiwei Liu, and Xiaolong Ma. "NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization." In The Twelfth International Conference on Learning Representations. 2024.

Yang, Enneng, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, and Dacheng Tao. "Adamerging: Adaptive model merging for multi-task learning." In The Twelfth International Conference on Learning Representations. 2024.

Pham, Hoang, Shiwei Liu, Lichuan Xiang, Dung Le, Hongkai Wen, and Long Tran-Thanh. "Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?." Advances in Neural Information Processing Systems 36 (2023).

Hoang, Duc, Souvik Kundu, Shiwei Liu, and Zhangyang Wang. "Don’t just prune by magnitude! Your mask topology is a secret weapon." Advances in Neural Information Processing Systems 36 (2023): 65056-65068.

Yin, Lu, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, and Shiwei Liu. "Dynamic sparsity is channel-level sparsity learner." Advances in Neural Information Processing Systems 36 (2023).

Jaiswal, Ajay, Shiwei Liu, Tianlong Chen, and Zhangyang Wang. "The emergence of essential sparsity in large pre-trained models: The weights that matter." Advances in Neural Information Processing Systems 36 (2023).

Huang, Tianjin, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, and Shiwei Liu. "Are large kernels better teachers than transformers for convnets?." In International Conference on Machine Learning, pp. 14023-14038. PMLR, 2023.

Jaiswal, Ajay Kumar, Shiwei Liu, Tianlong Chen, Ying Ding, and Zhangyang Wang. "Instant soup: Cheap pruning ensembles in a single pass can draw lottery tickets from large models." In International Conference on Machine Learning, pp. 14691-14701. PMLR, 2023.

Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, and Zhangyang Wang. "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!." ICLR 2023.

Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Mocanu, and Zhangyang Wang. "More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity." ICLR 2023.

Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin Mocanu, Zhangyang Wang, and Mykola Pechenizkiy. "The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training." ICLR 2022.

Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, and Decebal Constantin Mocanu. "Sparse training via boosting pruning plasticity with neuroregeneration." Advances in Neural Information Processing Systems 34 (2021): 9908-9922.

Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, and Mykola Pechenizkiy. "Do we actually need dense over-parameterization? in-time over-parameterization in sparse training." In International Conference on Machine Learning, pp. 6989-7000. PMLR, 2021.

Research interests

Deep Learning

Machine Learning

Learning with Sparsity

Large Language Models

Architecture of Deep Learning

Prizes, awards, and scholarships

Rising Star in AI, KAUST, 01/2024

Rising Star Award, Conference on Parsimony and Learning (CPAL), 10/2023

Best Ph.D. Dissertation Award Runner-Up, Informatics Europe, 10/2023

Newton International Fellowship Award, Royal Society & British Academy, 9/2023

Best Paper Award, Learning on Graphs Conference (LoG 2022), 11/2022

Cum Laude (Distinguished Ph.D. thesis), Eindhoven University of Technology, NL, 5% 4/2022

Recent publications

FS-GNN: Improving fairness in graph neural networks via joint sparsification

Zhao, J Huang, T Liu, S Yin, J Pei, Y Fang, M Pechenizkiy, M Neurocomputing 130641 (Jun 2025)

Data-Adaptive Weight-Ensembling for Multi-task Model Fusion

Tang, A Shen, L Luo, Y Liu, S Hu, H Du, B Tao, D International Journal of Computer Vision 1-17 (25 Apr 2025)

Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models

Fernandez-Lopez, A Liu, S Yin, L Petridis, S Pantic, M volume 00 1-5 (11 Apr 2025)

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

Jin, C Huang, T Zhang, Y Pechenizkiy, M Liu, S Chen, T Proceedings of the AAAI Conference on Artificial Intelligence volume 39 issue 4 4111-4119 (11 Apr 2025)

Revisiting Flatness-aware Optimization in Continual Learning with Orthogonal Gradient Projection

Yang, E Shen, L Wang, Z Liu, S Guo, G Wang, X Tao, D IEEE Transactions on Pattern Analysis and Machine Intelligence volume PP issue 99 1-12 (06 Feb 2025)

MIX-LN: UNLEASHING THE POWER OF DEEP LAYERS BY COMBINING PRE-LN AND POST-LN

Li, P Yin, L Liu, S 13th International Conference on Learning Representations Iclr 2025 21808-21822 (01 Jan 2025)

SPAM: SPIKE-AWARE ADAM WITH MOMENTUM RESET FOR STABLE LLM TRAINING

Huang, T Zhu, Z Jin, G Liu, L Wang, Z Liu, S 13th International Conference on Learning Representations Iclr 2025 54195-54215 (01 Jan 2025)

COMPOSABLE INTERVENTIONS FOR LANGUAGE MODELS

Kolbeinsson, A O'Brien, K Huang, T Gao, S Liu, S Schwarz, J Vaidya, A Mahmood, F Zitnik, M Chen, T Hartvigsen, T 13th International Conference on Learning Representations Iclr 2025 83521-83554 (01 Jan 2025)

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Zhang, Z Jaiswal, A Yin, L Liu, S Zhao, J Tian, Y Wang, Z Proceedings of Machine Learning Research volume 280 1035-1050 (01 Jan 2025)

Sparse Sounds: Exploring Low-Dimensionality in Music Generation Model

Wang, S Liu, S volume 00 3224-3234 (18 Dec 2024)

Dynamic Data Pruning for Automatic Speech Recognition

Xiao, Q Ma, P Fernandez-Lopez, A Wu, B Yin, L Petridis, S Pechenizkiy, M Pantic, M Mocanu, D Liu, S 4488-4492 (01 Sep 2024)

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

Fernandez-Lopez, A Chen, H Ma, P Yin, L Xiao, Q Petridis, S Liu, S Pantic, M 2820-2824 (01 Sep 2024)

Dynamic sparse no training: training-free fine-tuning for sparse llms

Tanner, J (15 Mar 2024)

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

Wu, B Xiao, Q Liu, S Yin, L Pechenizkiy, M Mocanu, D van Keulen, M Mocanu, E Advances in Neural Information Processing Systems volume 37 (01 Jan 2024)

HRBP: Hardware-friendly Regrouping towards Block-based Pruning for Sparse CNN Training

Ma, H Zhang, C Xiang, L Ma, X Yuan, G Zhang, W Liu, S Chen, T Tao, D Wang, Y Wang, Z Xie, X Proceedings of Machine Learning Research volume 234 282-301 (01 Jan 2024)

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Ji, J Li, G Yin, L Qin, M Yuan, G Guo, L Liu, S Ma, X Proceedings of Machine Learning Research volume 235 21606-21619 (01 Jan 2024)

NEURREV: TRAIN BETTER SPARSE NEURAL NETWORK PRACTICALLY VIA NEURON REVITALIZATION

Li, G Yin, L Ji, J Niu, W Qin, M Ren, B Guo, L Liu, S Ma, X 12th International Conference on Learning Representations, ICLR 2024 (01 Jan 2024)

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Yin, L Wu, Y Zhang, Z Hsieh, C Wang, Y Jia, Y Li, G Jaiswal, A Pechenizkiy, M Liang, Y Bendersky, M Wang, Z Liu, S Proceedings of Machine Learning Research volume 235 57101-57115 (01 Jan 2024)

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

Jaiswal, A Hu, B Yin, L Ro, Y Chen, T Liu, S Akella, A Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 16943-16956 (2024)

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

Li, Z Liu, S Chen, T Jaiswal, A Zhang, Z Wang, D Krishnamoorthi, R Chang, S Wang, Z Proceedings of Machine Learning Research volume 235 28368-28386 (01 Jan 2024)

Further details

Shiwei Liu is a Royal Society Newton International Fellow at the University of Oxford. Previously, he was a postdoctoral fellow at UT Austin and IFML. He obtained his Ph.D. at the Eindhoven University of Technology (TU/e), the Netherlands.

I am open to collaborating with remote students and researchers who are interested in deep learning, efficient training/inference of LLMs, learning with sparsity, architecture in Deep Learning, etc. Feel free to drop me an email if you are interested.