Paper #8
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020)
AI-generated
RAG (Retrieval-Augmented Generation) combines a language model with an external knowledge retrieval system. Instead of relying solely on knowledge stored in its parameters, the model retrieves relevant documents at inference time and uses them to generate better, more factual responses.
RAG consists of two components: a retriever and a generator. The retriever searches a document corpus for passages relevant to the input query. The generator (a sequence-to-sequence model) takes both the original input and the retrieved passages as context and generates the output.
The retriever is a dense passage retriever that encodes documents and queries into vectors and finds the closest matches. The generator is a pre-trained language model fine-tuned to use retrieved context.
RAG addresses one of the biggest problems with language models: hallucination and outdated knowledge. A standard LLM can only use knowledge from its training data, which has a cutoff date and is compressed into model weights in a lossy way. RAG gives the model access to external, up-to-date, verifiable information at generation time.
This pattern is now ubiquitous in production AI applications. When you use ChatGPT with web browsing, Perplexity AI, or any enterprise AI that answers questions about your company's documents, you are using RAG.
The practical implications are enormous: you can build AI systems that answer questions about private databases, recent events, or specialized domains without retraining the model.
Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, and 7 others (Facebook AI Research, University College London, New York University).
Link to paper: https://arxiv.org/abs/2005.11401
Full paper: https://arxiv.org/abs/2005.11401
LangChain: RAG tutorial - https://python.langchain.com/docs/tutorials/rag/
Pinecone: "What is RAG?" - https://www.pinecone.io/learn/retrieval-augmented-generation/