Nvidia

LangChain with Local Llama 2 Model

9 snippets

Step 8 - Run a query and get LLM response with source

This sends a natural language question to the RetrievalQA chain and prints the LLM’s response and its supporting source document.

query = "When is the film Titanic being made ?" #query = "Who is the director for the film?" llm_response = qa_chain(query) print('llm response after retrieve from KB, the answer is...

Step 7 - Supply hf_llm and retriever into LangChain RetrievalQA

This wraps your hf_llm and FAISS retriever into a LangChain RetrievalQA chain, using the "stuff" chain type and enabling source document return.

# create the using RetrievalQA from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type( llm=hf_llm, # supply meta llama2 model chain_type="stuff", retriever=retriever, # using our own...

Step 6 - Load Llama-2-13b-chat-hf to GPU

Loads the HuggingFace Llama-2-13b-chat-hf model locally into GPU memory using HuggingFacePipeline with multi-GPU or single GPU support. Requires your HuggingFace auth token.

import torch import transformers from langchain import HuggingFacePipeline from transformers import ( AutoConfig, AutoModel, AutoModelForCausalLM, AutoTokenizer, GenerationConfig, LlamaForCausalLM, LlamaTokenizer, ...

Step 5 - Create Retriever from Vectorstore

Converts the FAISS vectorstore into a retriever for similarity-based semantic search, using top-3 nearest neighbors by default.

retriever = store.as_retriever( search_type='similarity', search_kwargs={"k": 3} # k is a hyperparameter, usually by default set to 3 )

Step 4 - Reload Vectorstore from Disk

Reloads the saved FAISS vector index and the serialized vectorstore object from disk, and reattaches the index.

# Load the LangChain. from pathlib import Path from langchain.text_splitter import CharacterTextSplitter import faiss from langchain.vectorstores import FAISS import pickle index = faiss.read_index("./toy_data/hf_embedding_docs.index") with...

Step 3 - Parse Documents into FAISS Vectorstore

Reads .txt files, splits into chunks, embeds using HuggingFace, stores and saves the FAISS vector index and backup .pkl file.

import os from tqdm import tqdm from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.vectorstores import FAISS from langchain.text_splitter import CharacterTextSplitter from pathlib import Path import pickle # Here we...

Step 2 - Preview Toy Dataset Files 2

Displays initial lines of the Sweden and Titanic sample datasets before embedding.

!head -3 ./toy_data/Titanic_film.txt

Step 2 - Preview Toy Dataset Files 1

Displays initial lines of the Sweden and Titanic sample datasets before embedding.

!head -1 ./toy_data/Sweden.txt

Step 1 - Load HuggingFace Embedding

Initializes sentence-transformer embedding model using HuggingFace and CUDA. Used for building FAISS vectorstores.

### load custom embedding and use it in Faiss from langchain.vectorstores import FAISS from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.chains import RetrievalQA from langchain.document_loaders import...