Large Language Models (LLMs) have demonstrated impressive achievements. However, recent research has shown that their deeper layers often contribute minimally, with effectiveness diminishing as layer depth increases. This pattern presents significant opportunities for model compression.
In the first part of this seminar, we will explore how this phenomenon can be harnessed to improve the efficiency of LLM compression and parameter-efficient fine-tuning. Despite these opportunities, the underutilization of deeper layers leads to inefficiencies, wasting resources that could be better used to enhance model performance.
The second part of the talk will address the root cause of this ineffectiveness in deeper layers and propose a solution. We identify the issue as stemming from the prevalent use of Pre-Layer Normalization (Pre-LN) and introduce Mix-Layer Normalization (Mix-LN) with combined Pre-LN and Post-LN as a new approach to mitigate this training deficiency.
Seminar series
Date
Thu, 28 Nov 2024
Time
14:00 -
15:00
Location
Lecture Room 3
Speaker
Shiwei Liu
Organisation
Oxford University