Llama is a Pro? ๐ค๐
So essentially,
Llama Pro model will not forget old info when training on new info
Paper: LLaMA Pro: Progressive LLaMA with Block Expansion (21 pages)
Researchers from The University of Hong Kong, Tencent PCG, Shanghai Jiao Tong University, and Beijing Language and Culture University are interested in making sure Llama models donโt forget domain knowledge when trained on a new domain.
Hmm..Whatโs the background?
LLMs have demonstrated remarkable proficiency in a wide range of natural language processing tasks, but they often lack domain-specific knowledge. This has motivated research on adapting LLMs to specific domains through post-pretraining techniques. Llama should acquire new skills without compromising the old unlike what we see currently from LLaMA to CodeLLaMA.
Ok, So what is proposed in the research paper?
The paper proposes a novel method for specializing LLMs in specific domains by expanding the model's capacity and fine-tuning the expanded blocks using domain-specific data.
The method is called โblock expansionโ which allows increasing the LLM's capacity by adding new Transformer blocks.
Fine-tune the expanded blocks using domain-specific data to efficiently adapt the LLM to the new domain.
The resulting model, called LLAMA PRO, achieves state-of-the-art performance on a variety of benchmarks. The paper also empirically demonstrates the effectiveness of the proposed method on the task of code generation, mathematics problem solving, and general-purpose reasoning.
And whatโs next?
The researchers acknowledge that one limitation is that it may be computationally expensive to train large language models with a large number of added blocks.
The researchers plan to address continual learning after the pretraining phase as well as use parameter-efficient fine-tuning methods to reduce the computational cost of adapting LLMs
So essentially,
Llama Pro model will not forget old info when training on new info