A shared folder with AI prompts and code snippets
From workspace: Nvidia
Team: NVIDIA NIM
Total snippets: 3
3 snippets
Example Docker Command The following command requires that you have stored a temporary AWS token in /home/usr/.aws and stored your CA certificate in /etc
docker run --rm --runtime=nvidia --gpus=all -p 8000:8000 \ -e NGC_API_KEY=$NGC_API_KEY \ -v /home/usr/.aws:/tmp/.aws \ -e AWS_SHARED_CREDENTIALS_FILE=/tmp/.aws/credentials \ -e AWS_PROFILE=default \ -e AWS_REGION=us-east-1 \ ...
import time import requests import json # Define your model endpoint URL API_URL = "http://0.0.0.0:8000/v1/chat/completions" # Function to send a request to the API and return the response time def send_request(model, messages, max_tokens=15): ...
In scenarios where more than 90% of the initial prompt is identical across multiple requests—differing only in the final tokens—implementing a key-value cache could substantially improve inference speed. This approach leverages a high degree of...