A shared folder with AI prompts and code snippets
From workspace: Nvidia
Team: Main
Total snippets: 8
8 snippets
Demonstrates how to query a simplified Triton LLM chain with multiple sequential questions, showcasing contextual follow-up.
# Question 1: What is Triton? query = "What is Triton?" result = qa({"question": query}) print(result.get("answer")) # Question 2: Ask about ONNX support query = "Does Triton support ONNX?" result = qa({"question":...
Demonstrates a simpler retrieval-augmented generation (RAG) chain using a single LLM (LLaMA2 70B) as both the document retriever and answer generator.
llm = ChatNVIDIA(model="llama2_70b", temperature=0.1, max_tokens=1000, top_p=1.0) qa_prompt = QA_PROMPT doc_chain = load_qa_chain(llm, chain_type="stuff", prompt=QA_PROMPT) qa = ConversationalRetrievalChain.from_llm( llm=llm, ...
Follow up with a conversational question referencing prior context.
query = "But why?" result = qa({"question": query}) print(result.get("answer"))
Ask a more specific technical question about what interfaces Triton supports.
query = "What interfaces does Triton support?" result = qa({"question": query}) print(result.get("answer"))
Ask a general question about what Triton is.
query = "What is Triton?" result = qa({"question": query}) print(result.get("answer"))
Build a ConversationalRetrievalChain using both Llama2 and Mixtral to enhance multi-LLM RAG performance.
llm = ChatNVIDIA(model="llama2_70b") memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT) chat = ChatNVIDIA(model="mixtral_8x7b",...
Load vector data from a local FAISS store using the NVIDIA embedding model.
# Embed documents embedding_path = "embed/" docsearch = FAISS.load_local(folder_path=embedding_path, embeddings=embedding_model)
Create the NVIDIAEmbeddings model for use in loading or generating vector data for RAG.
create_embeddings() embedding_model = NVIDIAEmbeddings(model="nvolveqa_40k")