Nvidia

Q&A with LangChain

9 snippets

Step 6 - streamed chain response

Compose and execute a LangChain expression that streams a response using a prompt and context from vectorstore.

from langchain_core.runnables import RunnablePassthrough import time chain = ( {"context": vectorstore.as_retriever(), "question": RunnablePassthrough()} | LLAMA_PROMPT | llm ) start_time = time.time() for token in chain.stream(question): ...

Step 5 - vector search 2

Run a similarity search on the vector store using a natural language query to find semantically similar documents.

# Simple Example: Retrieve Documents from the Vector Database # note: this is just for demonstration purposes of a similarity search question = "Can you talk about safety evaluation of llama2 chat?" docs =...

Step 5 - embeddings setup 1

Generate text embeddings using HuggingFace embeddings (intfloat/e5-large-v2) and prepare them for use in a vector store.

from langchain.embeddings import HuggingFaceEmbeddings from langchain.vectorstores import Milvus import torch import time #Running the model on CPU as we want to conserve gpu memory. #In the production deployment (API server shown as part of the...

Step 4 - transform documents 1

Chunk documents into smaller, semantically coherent pieces using a sentence-transformer-based text splitter.

documents[40].page_content

Step 4 - transform documents 1

Chunk documents into smaller, semantically coherent pieces using a sentence-transformer-based text splitter.

import time from langchain.text_splitter import SentenceTransformersTokenTextSplitter TEXT_SPLITTER_MODEL = "intfloat/e5-large-v2" TEXT_SPLITTER_TOKENS_PER_CHUNK = 510 TEXT_SPLITTER_CHUNCK_OVERLAP = 200 text_splitter =...

Step 3- load documents 2

Load PDF documents into LangChain using the UnstructuredFileLoader to prepare for retrieval tasks.

from langchain.document_loaders import UnstructuredFileLoader loader = UnstructuredFileLoader("llama2_paper.pdf") data = loader.load()

Step 3- load documents 1

Load PDF documents into LangChain using the UnstructuredFileLoader to prepare for retrieval tasks.

! wget -O "llama2_paper.pdf" -nc --user-agent="Mozilla" https://arxiv.org/pdf/2307.09288.pdf

Step 2- create prompt template

Define a custom prompt template compatible with Llama2 and LangChain’s PromptTemplate system.

from langchain.prompts import PromptTemplate LLAMA_PROMPT_TEMPLATE = ( "<s>[INST] <<SYS>>" "Use the following context to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an...

Step 1 - integrate triton connector

Connect TritonTensorRTLLM client to the TRT-LLM server for Llama-2 integration using LangChain.

from langchain_nvidia_trt.llms import TritonTensorRTLLM # Connect to the TRT-LLM Llama-2 model running on the Triton server at the url below # Replace "llm" with the url of the system where llama2 is hosted triton_url = "llm:8001" pload = { ...