So essentially,
HuatuoGPT shows promise for medical applications with visual data!
Paper: HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale (23 Pages)
Github: https://github.com/FreedomIntelligence/HuatuoGPT-Vision
Researchers from Shenzhen want to make better multi modal language models(MLLM) for medicine. The paper introduces PubMedVision, a new large-scale medical multimodal dataset with 1.3 million medical samples, constructed using PubMed data and HuatuoGPT, a vision MLLM trained on it.
Hmm..What’s the background?
MLLMs, while demonstrating progress, exhibit limited performance in medical applications, especially with visual data. This limitation primarily stems from the low quality and low availability of datasets containing medical images paired with text.
PubMed, a free search engine primarily accessing the MEDLINE database, is a valuable resource for medical image-text pairs due to its vast collection of de-identified data reflecting cutting-edge medical knowledge but contains a lot of noise.
Ok, So what is proposed in the research paper?
This paper introduces PubMedVision, a large-scale, high-quality medical multimodal dataset comprising 1.3 million medical VQA (Visual Question Answering) samples from refined PubMed data.
This dataset leverages an "unblinded" approach, employing GPT-4V's multimodal capabilities to denoise and reformat the data, resulting in a more accurate and larger-scale medical VQA dataset.
PubMedVision trained MLLM model HuaTuoGPT showcases significant improvements on benchmarks like the MMMU Health & Medicine track. Manual evaluations by medical experts and empirical results further confirm the superior data quality of PubMedVision in comparison to other medical VQA datasets and data construction methods.
What’s next?
The authors acknowledge the limitations of relying solely on GPT-4V for generating the PubMedVision dataset and encourage future research to explore using an ensemble of MLLMs with diverse architectures and training data. Future work could also involve developing more robust validation techniques or incorporating mechanisms to identify and correct inaccuracies introduced during the data generation process.
So essentially,
HuatuoGPT shows promise for medical applications with visual data!