Nvidia

Basics: Prompt, Client, and Responses

4 snippets

Model load and infer

Loads the model and sends the prompt payload for generation

model_name = "ensemble" client.load_model(model_name) val = client.request(model_name, **pload) print(val)

Payload definition

Defines prompt and generation parameters to send to the model

pload = { 'prompt':[[prompt]], 'tokens':64, 'temperature':1.0, 'top_k':1, 'top_p':0, 'beam_width':1, 'repetition_penalty':1.0, 'length_penalty':1.0 }

Triton client init

Initializes the client to connect with the triton inference server

triton_url = "llm:8000" client = HttpTritonClient(triton_url)

NEMOTRON PROMPT TEMPLATE

Template for the full LLM input structure using <extra_id> tags.

NEMOTRON_PROMPT_TEMPLATE = ( """<extra_id_0>System {system} <extra_id_1>User {prompt} <extra_id_1>Assistant """ ) system = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Please...