WiM helps LLMS read between the lines π¨
So essentially,
WiM consistently outperforms baseline methods (LLM and RAG) in multi-hop reasoning and aggregation tasks.
Paper: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (16 Pages)
Github: https://github.com/writer/writing-in-the-margins
Researchers from Writer Inc propose an inference pattern called "Writing in the Margins" (WiM), which generates query-based extractive summaries ("margins") at each step of the chunked prefill process.
Hmm..Whatβs the background?
Large Language Models (LLMs) face challenges when processing lengthy input sequences due to their fixed context windows and attention mechanisms. This research endeavors to bridge the gap between efficient transformer architecture research and the development of new prompting strategies.
It specifically focuses on creating a novel key-value (KV) cache-aware reasoning pattern for existing long-context window LLMs. This is particularly relevant for retrieval-oriented tasks, where the context is substantial, and the instructional prompt is comparatively short.
Ok, So what is proposed in the research paper?
Here's a breakdown of how WiM works:
The core idea of WiM is to divide the long context into smaller segments and process them sequentially, leveraging the "chunked prefill" mechanism
An extractive summary prompt instructs the model to extract relevant information from each segment related to the user's query
To avoid accumulating irrelevant information, WiM employs a classifier to determine the relevance of each margin to the user's query
The final prompt aggregates the relevant margins and presents them to the LLM along with the user's original instruction
WiM consistently outperforms baseline methods (LLM and RAG) in multi-hop reasoning and aggregation tasks. It achieves an average of 7.5% improvement in accuracy for reasoning skills (HotpotQA, MultiHop-RAG) and more than a 30.0% increase in the F1-score for aggregation tasks (CWE).
Whatβs next?
The researchers remark that these could directions for future work:
KV Cache Management ie refining the management of the key-value (KV) cache could further enhance WiM's efficiency
Fine-Tuning for Extraction and Classification
WiM could be adapted for different modalities like visual or audio
So essentially,
WiM consistently outperforms baseline methods (LLM and RAG) in multi-hop reasoning and aggregation tasks.
Learned something new? Consider sharing with your friends!