Paper #6
LLaMA: Open and Efficient Foundation Language Models (2023)
AI-generated
Meta's LLaMA showed that smaller, more efficiently trained open-source models can match or beat much larger proprietary models. It democratized access to powerful language models and spawned an entire ecosystem of open-source AI.
LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. The key insight is that model performance depends not just on size but on the amount of training data and compute. By training smaller models on significantly more data than typical (1.4 trillion tokens), LLaMA achieved performance comparable to much larger models.
The LLaMA-13B model, with only 13 billion parameters, matched the performance of GPT-3 (175B parameters) on most benchmarks. LLaMA-65B was competitive with the best proprietary models available at the time.
LLaMA catalyzed the open-source AI movement. After its release, researchers and developers fine-tuned it into hundreds of variants: Alpaca, Vicuna, Koala, WizardLM, and many more. This created a thriving ecosystem where anyone could run, modify, and study powerful language models.
It proved that you do not need hundreds of billions of parameters to get excellent performance. Smart training decisions (more data, longer training) can compensate for fewer parameters. This has practical implications: smaller models are cheaper to run, faster to respond, and can run on consumer hardware.
LLaMA 2 and LLaMA 3 followed, each expanding the open-source frontier further.
Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, and 20 others (Meta AI).
Model sizes: 7B, 13B, 33B, and 65B parameters.
Training data: 1.4 trillion tokens from publicly available sources.
Link to paper: https://arxiv.org/abs/2302.13971
Full paper: https://arxiv.org/abs/2302.13971
Meta AI blog: "Introducing LLaMA" - https://ai.meta.com/blog/large-language-model-llama-meta-ai/
Hugging Face: LLaMA models - https://huggingface.co/meta-llama