Unleashing the Power of Deeper Layers in LLMs

Seminar series

Computational Mathematics and Applications Seminar

Date

Thu, 28 Nov 2024

Time

14:00 - 15:00

Location

Lecture Room 3

Speaker

Shiwei Liu

Organisation

Oxford University

Large Language Models (LLMs) have demonstrated impressive achievements. However, recent research has shown that their deeper layers often contribute minimally, with effectiveness diminishing as layer depth increases. This pattern presents significant opportunities for model compression.

In the first part of this seminar, we will explore how this phenomenon can be harnessed to improve the efficiency of LLM compression and parameter-efficient fine-tuning. Despite these opportunities, the underutilization of deeper layers leads to inefficiencies, wasting resources that could be better used to enhance model performance.

The second part of the talk will address the root cause of this ineffectiveness in deeper layers and propose a solution. We identify the issue as stemming from the prevalent use of Pre-Layer Normalization (Pre-LN) and introduce Mix-Layer Normalization (Mix-LN) with combined Pre-LN and Post-LN as a new approach to mitigate this training deficiency.