Superpositional LLMs
LLMs ability to “think in superposition” is underutilized currently
Paper: Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition (27 Pages)
Researchers from University of Wisconsin-Madison, University of Michigan, Microsoft Research are interested in "task superposition" in Large Language Models (LLMs).
Hmm..What’s the background?
Task superposition is the ability of LLMs to perform multiple distinct in-context learning (ICL) tasks concurrently during a single inference call. This suggests that, beyond simply adapting to single tasks on-the-fly, LLMs can manage and execute multiple tasks in parallel using their pretraining knowledge and the context they are given.
Ok, So what is proposed in the research paper?
Notably, even when trained to learn one task at a time, LLMs can exhibit task superposition. This was demonstrated by training a GPT-2 model on retrieval tasks and observing its ability to generalize and perform multiple retrieval tasks simultaneously.
Theoretical analysis shows that the architecture of Transformer models has the inherent capacity for task superposition. A constructed seven-layer Transformer with specific properties can implement multiple tasks in parallel and weight their outputs based on the prevalence of each task within the input context.
Larger LLMs exhibit superior task superposition capabilities. As model size increases, they can solve more tasks concurrently and more accurately align their output distribution with the distribution of tasks in the input.
What’s next?
A major limitation is "generation collapse," where the model focuses on a single task after generating the first token. Future research should focus on decoding strategies that prevent this collapse and allow LLMs to maintain their multi-task state during the entire generation process.
Further investigation is needed to understand how LLMs internally represent and manage task superposition. While task vectors offer some insights, more research is required to uncover the full mechanisms at play.
LLMs ability to “think in superposition” is underutilized currently
Learned something new? Consider sharing it!