Nvidia

HF Checkpoints with LlamaIndex and LangChain

9 snippets

Step 6b - Query the Index with LLM

Initializes a query engine from the index and performs a test query using natural language.

# Setup index query engine using LLM query_engine = index.as_query_engine() # Test out a query in natural response = query_engine.query("what is transformer engine?") response.metadata response.response

Step 6a - Load Text Data into Vector Index

Uses LlamaIndex’s SimpleDirectoryReader and loads the data into a VectorStoreIndex with a custom service context.

# create query engine with cross encoder reranker from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext import torch documents = SimpleDirectoryReader("./toy_data").load_data() index =...

Step 5 - Set Service Context with Custom Models

Creates a new ServiceContext using your HuggingFace LLM and embeddings, and sets it globally in the app.

# Create new service context instance service_context = ServiceContext.from_defaults( chunk_size=1024, llm=llm, embed_model=embeddings ) # And set the service context set_global_service_context(service_context)

Step 5 - Import and Initialize ServiceContext

Imports the necessary components from llama_index to modify the global service context.

# Bring in stuff to change service context from llama_index import set_global_service_context from llama_index import ServiceContext

Step 4 - Wrap Local HuggingFace Model with LlamaIndex

Wraps a locally loaded HuggingFace LLM with LlamaIndex using HuggingFaceLLM, applying the system and query wrapper prompts.

# Import the llama index HF Wrapper from llama_index.llms import HuggingFaceLLM # Create a HF LLM using the llama index wrapper llm = HuggingFaceLLM(context_window=4096, max_new_tokens=256, system_prompt=system_prompt, ...

Step 3 - Load HuggingFace Embeddings into Langchain

Loads HuggingFace's all-MiniLM-L6-v2 embeddings and wraps them for use in Langchain with LangchainEmbedding.

# Create and dl embeddings instance wrapping huggingface embedding into langchain embedding # Bring in embeddings wrapper from llama_index.embeddings import LangchainEmbedding # Bring in HF embeddings - need these to represent document chunks from...

Step 2 - Construct Prompt Template for LlamaIndex

Creates a system-level prompt template and wraps a user query using SimpleInputPrompt from llama_index.

# Import the prompt wrapper...but for llama index from llama_index.prompts.prompts import SimpleInputPrompt # Create a system prompt system_prompt = """<<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as...

Step 1 - Generate Output and Decode to Text

Runs the LLM using generate() with a streamer and token limit, then decodes the generated token output back into human-readable text.

output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=100) # Covert the output tokens back to text output_text = tokenizer.decode(output[0], skip_special_tokens=True) output_text

Step 1 - Load Llama-2 Locally from HuggingFace with GPU Support

Load Llama-2-13b-chat-hf from HuggingFace locally with GPU/CPU/Apple MPS fallback. Includes HuggingFace auth token logic and dynamic GPU allocation.

# uncomment the below if you have not yet install the python dependencies #pip install accelerate transformers==4.33.1 --upgrade import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO) logger =...