Be a Scientific Scholar

Nov 23, 2024

OpenScholar generates scientific literature with citations (which could be better)

Paper: OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs (53 Pages)
Demo: https://openscholar.allen.ai/

Researchers from Asai et al. are interested in an open, retrieval-augmented language model (LM) designed to synthesize information from scientific literature.

Hmm..What’s the background?

Scientists face a challenge in staying abreast of the latest research due to the ever-growing volume of scientific literature. Existing large language models are prone to fabricating citations and relying on outdated data. OpenScholar was created to address these limitations by providing a reliable, transparent, and up-to-date method for synthesizing information from scientific papers.

Source: https://lexica.art/prompt/e606d62c-a1c7-4c7d-bed5-236e7a420262

So what is proposed in the research paper?

In the research paper OpenScholar incorporates several key insights:

OpenScholar utilizes a massive datastore containing 45 million open-access scientific papers from Semantic Scholar, pre-processed into 237 million passages and embedded using a specifically trained bi-encoder model (θbi). This domain-specific datastore, being the largest open-sourced one to date, ensures comprehensive coverage and better retrieval compared to using general-purpose datastores
Employs a multi-stage retrieval process to identify and prioritize relevant passages
It utilizes an iterative self-feedback mechanism to refine the generated response
They also introduce ScholarQABench, a large-scale benchmark for evaluating literature search and synthesis systems

OpenScholar, despite being an open and smaller model, outperforms both GPT-4o and PaperQA2 in correctness and citation accuracy when evaluated on ScholarQABench. It achieves citation accuracy on par with human experts, drastically reducing the issue of hallucinated citations (78-90% in GPT-4).

In human evaluations, experts favored responses generated by OpenScholar (with both an 8B model and GPT-4o) over expert-written responses more than 50% of the time. OpenScholar utilizes efficient retrieval pipelines and lightweight models, making it significantly more cost-effective than proprietary alternatives like PaperQA2.

What’s next?

The developers acknowledge that OpenScholar doesn't always retrieve the most relevant papers and suggest incorporating citation networks and publication metadata to enhance retrieval.

The current evaluation dataset with human-written answers is relatively small, potentially introducing bias and limiting statistical power. Future work aims to expand the dataset's size and scope.

OpenScholar generates scientific literature with citations (which could be better)

Learned something new? Consider sharing it!

So Essentially

Discussion about this post