So essentially,
"Platypus LLMs, trained on newly released OpenPlatypus dataset, are leading the HuggingFace OpenLLM LeaderBoard"
Paper: Platypus: Quick, Cheap, and Powerful Refinement of LLMs [17 Pages]
Researchers from Boston University have established themselves at the top of the leaderboard of HuggingFace’s Open LLM Leaderboard. (August 2023)
The main question they tackled was:
How can we fully optimize the LLM dataset and training processes to produce superior LLM models?
In the paper, they describe the following as their proposed advantages:
Their curated dataset Open-Platypus (now released) for STEM tasks
Their process of fine-tuning and merging LoRA modules in order to conserve the strong prior of pre-trained LLMs, while bringing specific domain knowledge to the surface
Their methods of checking for test data leaks and contamination in the training data
The Platypus family of LLMs is able to achieve strong performance in quantitative LLM metrics across model sizes using just a fraction of the fine-tuning data and overall compute that are required for other state-of-the-art fine-tuned LLMs.
A 13B Platypus model can be trained on a single A100 GPU using 25k questions in 5 hours. More datasets and code are available at https://platypus-llm.github.io
They curated the dataset with specific goals. By focusing on depth in specific areas, diversity of input prompts, and keeping the size of the training set small, they aimed to maximize the precision and relevance of our models’ outputs.
To achieve this, a content-filtered STEM instruction-tuned dataset which draws from a variety of open-source datasets, was generated called Open-Platypus. The dataset is focused on improving LLMs’ STEM and logic knowledge and is made up of 11 open-source datasets. It is comprised mainly of human-designed questions, with only 10% of questions generated by an LLM. They reduced data redundancy and checked for contamination of open LLM training sets in important LLM test sets, and with the descriptions of the training data filtering process in order to avoid pitfalls.
One large limitation of this approach, especially for domain-specific models derived from large,pre-trained ones, is that the fine-tuning process can be time-consuming and costly. There are limitations in retraining and additional dataset fine-tuning.
Here are the results from the paper
Their future work might delve deeper into understanding the nuances of model merging, especially in the context of models with similar baseline scores including exploring integration with alpaca and orca-style datasets, as well as examining the potential of QLoRA in the pipeline, inquiring into LIMA strategy within the LoRA and PEFT landscapes and potentially leveraging models like Lazarus, a successful LoRA merge of 6 models.