WiM helps LLMS read between the lines 📨

Aug 29, 2024

So essentially,

WiM consistently outperforms baseline methods (LLM and RAG) in multi-hop reasoning and aggregation tasks.

Paper: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (16 Pages)

Github: https://github.com/writer/writing-in-the-margins

Researchers from Writer Inc propose an inference pattern called "Writing in the Margins" (WiM), which generates query-based extractive summaries ("margins") at each step of the chunked prefill process.

Hmm..What’s the background?

Large Language Models (LLMs) face challenges when processing lengthy input sequences due to their fixed context windows and attention mechanisms. This research endeavors to bridge the gap between efficient transformer architecture research and the development of new prompting strategies.

It specifically focuses on creating a novel key-value (KV) cache-aware reasoning pattern for existing long-context window LLMs. This is particularly relevant for retrieval-oriented tasks, where the context is substantial, and the instructional prompt is comparatively short.

Source: https://lexica.art/prompt/08b85328-b924-4cc3-8c4a-4f69a5445c7c

Ok, So what is proposed in the research paper?

Here's a breakdown of how WiM works:

The core idea of WiM is to divide the long context into smaller segments and process them sequentially, leveraging the "chunked prefill" mechanism
An extractive summary prompt instructs the model to extract relevant information from each segment related to the user's query
To avoid accumulating irrelevant information, WiM employs a classifier to determine the relevance of each margin to the user's query
The final prompt aggregates the relevant margins and presents them to the LLM along with the user's original instruction

WiM consistently outperforms baseline methods (LLM and RAG) in multi-hop reasoning and aggregation tasks. It achieves an average of 7.5% improvement in accuracy for reasoning skills (HotpotQA, MultiHop-RAG) and more than a 30.0% increase in the F1-score for aggregation tasks (CWE).

What’s next?

The researchers remark that these could directions for future work:

KV Cache Management ie refining the management of the key-value (KV) cache could further enhance WiM's efficiency
Fine-Tuning for Extraction and Classification
WiM could be adapted for different modalities like visual or audio

So essentially,

WiM consistently outperforms baseline methods (LLM and RAG) in multi-hop reasoning and aggregation tasks.

Learned something new? Consider sharing with your friends!

Share So Essentially

So Essentially

WiM helps LLMS read between the lines 📨

Discussion about this post