A shared folder with AI prompts and code snippets
From workspace: Nvidia
Team: Main
Total snippets: 15
15 snippets
Send a user query to the query engine and stream the response with time measurement.
import time start_time = time.time() response = query_engine.query("what is the context length of llama2?") response.print_response_stream() print(f"\n--- {time.time() - start_time} seconds ---")
Build a query engine from the vector index, assigning a custom prompt template and enabling streaming.
query_engine = index.as_query_engine(text_qa_template=qa_template, streaming=True)
Generate nodes from parsed documents and insert them into the vector index.
import time start_time = time.time() nodes = node_parser.get_nodes_from_documents(documents) index.insert_nodes(nodes) print(f"--- {time.time() - start_time} seconds ---")
Connect to a Milvus vector store, store the context, and insert parsed nodes.
from llama_index import VectorStoreIndex from llama_index.storage.storage_context import StorageContext from llama_index.vector_stores import MilvusVectorStore vector_store = MilvusVectorStore(uri="http://milvus:19538", dim=1024,...
Set the global service context in LlamaIndex to avoid passing it manually in each call.
from llama_index import set_global_service_context set_global_service_context(service_context)
Bundle LLM, embed model, node parser, and prompt helper into a LlamaIndex ServiceContext.
from llama_index import ServiceContext service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model, node_parser=node_parser, prompt_helper=prompt_helper )
Define a LangChain-compatible HuggingFace embedding model and wrap it for use in LlamaIndex.
from langchain.embeddings import HuggingFaceEmbeddings from llama_index.embeddings import LangchainEmbedding #Running the model on CPU as we want to conserve gpu memory. #In the production deployment (API server shown as part of the 5th notebook...
Initialize PromptHelper from LlamaIndex to manage context window, output tokens, and chunking ratio.
from llama_index import PromptHelper prompt_helper = PromptHelper( context_window=4096, num_output=256, chunk_overlap_ratio=0.1, chunk_size_limit=None )
Set up a token-based text splitter using SentenceTransformers and initialize a NodeParser for LlamaIndex.
from langchain.text_splitter import SentenceTransformersTokenTextSplitter from llama_index.node_parser import LangchainNodeParser TEXT_SPLITTER_MODEL = "intfloat/e5-large-v2" TEXT_SPLITTER_TOKENS_PER_CHUNK = 510 TEXT_SPLITTER_CHUNK_OVERLAP =...
Sample console log output showing NLTK dependencies getting downloaded and used.
[nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package averaged_perceptron_tagger to /root/nltk_data... [nltk_data] Package averaged_perceptron_tagger is...
Instantiate the loader, read the document, and measure the processing time.
import time loader = UnstructuredReader() start_time = time.time() documents = loader.load_data(file="llama2_paper.pdf") print(f"--- {time.time() - start_time} seconds ---")
Import the PDF loader module from Llama Hub.
from llama_hub.file.unstructured.base import UnstructuredReader
Load the Llama2 paper using LlamaIndex’s UnstructuredReader from the Llama Hub and convert the content into a format ready for embedding.
! wget -O "llama2_paper.pdf" -nc --user-agent="Mozilla" https://arxiv.org/pdf/2307.09288.pdf
Create a structured prompt template for Llama2 using context and user question, formatted for use in LlamaIndex.
from llama_index import Prompt LLAMA_PROMPT_TEMPLATE = ( "<s>[INST] <<SYS>>" "Use the following context to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer." ...
Custom integration of TensorRT-LLM with LangChain and LlamaIndex using LangChainLLM wrapper.
from triton_trt_llm import TensorRTLLM from llama_index.llms import LangChainLLM trtllm = TensorRTLLM(server_url="llm:8001", model_name="ensemble", tokens=500) llm = LangChainLLM(llm=trtllm)