A shared folder with AI prompts and code snippets
From workspace: Nvidia
Team: Main
Total snippets: 9
9 snippets
Initializes a query engine from the index and performs a test query using natural language.
# Setup index query engine using LLM query_engine = index.as_query_engine() # Test out a query in natural response = query_engine.query("what is transformer engine?") response.metadata response.response
Uses LlamaIndex’s SimpleDirectoryReader and loads the data into a VectorStoreIndex with a custom service context.
# create query engine with cross encoder reranker from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext import torch documents = SimpleDirectoryReader("./toy_data").load_data() index =...
Creates a new ServiceContext using your HuggingFace LLM and embeddings, and sets it globally in the app.
# Create new service context instance service_context = ServiceContext.from_defaults( chunk_size=1024, llm=llm, embed_model=embeddings ) # And set the service context set_global_service_context(service_context)
Imports the necessary components from llama_index to modify the global service context.
# Bring in stuff to change service context from llama_index import set_global_service_context from llama_index import ServiceContext
Wraps a locally loaded HuggingFace LLM with LlamaIndex using HuggingFaceLLM, applying the system and query wrapper prompts.
# Import the llama index HF Wrapper from llama_index.llms import HuggingFaceLLM # Create a HF LLM using the llama index wrapper llm = HuggingFaceLLM(context_window=4096, max_new_tokens=256, system_prompt=system_prompt, ...
Loads HuggingFace's all-MiniLM-L6-v2 embeddings and wraps them for use in Langchain with LangchainEmbedding.
# Create and dl embeddings instance wrapping huggingface embedding into langchain embedding # Bring in embeddings wrapper from llama_index.embeddings import LangchainEmbedding # Bring in HF embeddings - need these to represent document chunks from...
Creates a system-level prompt template and wraps a user query using SimpleInputPrompt from llama_index.
# Import the prompt wrapper...but for llama index from llama_index.prompts.prompts import SimpleInputPrompt # Create a system prompt system_prompt = """<<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as...
Runs the LLM using generate() with a streamer and token limit, then decodes the generated token output back into human-readable text.
output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=100) # Covert the output tokens back to text output_text = tokenizer.decode(output[0], skip_special_tokens=True) output_text
Load Llama-2-13b-chat-hf from HuggingFace locally with GPU/CPU/Apple MPS fallback. Includes HuggingFace auth token logic and dynamic GPU allocation.
# uncomment the below if you have not yet install the python dependencies #pip install accelerate transformers==4.33.1 --upgrade import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO) logger =...