Randomized Algorithms for Tensor CUR Approximations in Attention Mechanisms
Abstract
Katherine Pearce is going to talk about: 'Randomized Algorithms for Tensor CUR Approximations in Attention Mechanisms'
Attention mechanisms are a central component of transformer models that capture contextual relationships between tokens in large language models. Although many of the underlying computations (e.g., query, key, and value embeddings in multi-head attention) are inherently multi-way, classical transformer models are built on matrix-based formulations. In this talk, we discuss several ways that tensorial structure can be imposed on and exploited in attention mechanisms of transformer models. We describe how tensor-based attention can capture higher-order contextual relationships among tokens. We then explore how randomized algorithms to compute tensor CUR decompositions may be used to accelerate computations in tensor-based attention and reduce storage requirements.