A shared folder with AI prompts and code snippets
From workspace: Nvidia
Team: Main
Total snippets: 14
14 snippets
Clears out any cached vector store data (for a fresh run).
!rm -rf data_*
Streams the LLM response for the given query and logs the time taken.
import time start_time = time.time() response = query_engine.query(query) response.print_response_stream() print(f"\n--- {time.time() - start_time} seconds ---")
Define a custom user query for the LLM to respond to.
query = "How do I setup a weaviate vector db? Give me a code sample please."
Creates a retriever using AutoMergingRetriever and builds a query engine with streaming and node post-processing using the previously defined token limiter.
from llama_index.retrievers import AutoMergingRetriever from llama_index.query_engine import RetrieverQueryEngine retriever = AutoMergingRetriever( index.as_retriever(similarity_top_k=12), storage_context=storage_context ) query_engine =...
Custom post-processor to limit the total length of retrieved nodes to fit within a maximum token limit (2500).
from typing import Callable, Optional from llama_index.utils import globals_helper, get_tokenizer from llama_index.schema import MetadataMode class LimitRetrievedNodesLength: def __init__(self, limit: int = 2500, tokenizer:...
For each documentation directory, parse nodes using load_markdown_docs() with hierarchical=True, store parent nodes in SimpleDocumentStore, and persist leaf nodes in VectorStoreIndex.
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage from llama_index.query_engine import RetrieverQueryEngine from llama_index.tools import QueryEngineTool, ToolMetadata from llama_index.storage.docstore import...
Define documentation folders and load documents using load_markdown_docs() with hierarchical parsing enabled.
docs_directories = { "./llama_docs_bot/docs/community": "Useful for information on community integrations with other libraries, vector dbs, and frameworks.", "./llama_docs_bot/docs/core_modules/agent_modules": "Useful for information on...
Load markdown docs from a directory and parse them either hierarchically or simply into nodes using LlamaIndex. Supports hierarchical chunking into parent and child node structure.
from llama_index import SimpleDirectoryReader, Document from llama_index.node_parser import HierarchicalNodeParser, SimpleNodeParser, get_leaf_nodes from llama_index.schema import MetadataMode from...
Clone the official llama_docs_bot GitHub repo, which contains the sample documentation used for chat.
!git clone https://github.com/run-llama/llama_docs_bot.git
Split text into multiple levels of nodes using a token splitter map with different chunk sizes.
from llama_index.text_splitter import TokenTextSplitter text_splitter_ids = ["1024", "510"] text_splitter_map = {} for ids in text_splitter_ids: text_splitter_map[ids] = TokenTextSplitter( chunk_size=int(ids), ...
Create a ServiceContext instance with the LLM and embedding model, and set it globally for LlamaIndex.
from llama_index import ServiceContext from llama_index import set_global_service_context service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model ) set_global_service_context(service_context)
Load a HuggingFace embedding model for inference on CPU and wrap it for use with LlamaIndex.
from langchain.embeddings import HuggingFaceEmbeddings from llama_index.embeddings import LangchainEmbedding model_kwargs = { "device": "cpu" } encode_kwargs = { "normalize_embeddings": False} hf_embeddings = HuggingFaceEmbeddings( ...
Define a structured prompt format using Llama2’s syntax and wrap it as a LlamaIndex prompt template
from llama_index import Prompt LLAMA_PROMPT_TEMPLATE = ( "<s>[INST] <<SYS>>" "Use the following context to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer." ...
Connect your TensorRT LLM to LangChain using a custom wrapper for LlamaIndex compatibility.
from triton_trt_llm import TensorRTLLM from llama_index.llms import LangChainLLM trtllm = TensorRTLLM(server_url = "llm:8001", model_name="ensemble", tokens=500) llm = LangChainLLM(llm=trtllm)