A shared folder with AI prompts and code snippets
From workspace: Nvidia
Team: NVIDIA NIM
Total snippets: 4
4 snippets
The Llama Stack API supports tool calling, allowing the model to interact with external functions. Unlike the OpenAI API, the Llama Stack API only supports the tool choices "auto", “required", or None.
from inference import InferenceClient, process_chat_completion from llama_toolchain.inference.api import ChatCompletionRequest, UserMessage, ToolDefinition, ToolParamDefinition from llama_models.llama3.api.datatypes import SamplingParams,...
For streaming responses, use the same structure:
from inference import InferenceClient, process_chat_completion from llama_toolchain.inference.api import ChatCompletionRequest, UserMessage from llama_models.llama3.api.datatypes import SamplingParams def stream_chat(): client =...
Use these common components in the following basic usage example:
from inference import InferenceClient, process_chat_completion from llama_toolchain.inference.api import ChatCompletionRequest, UserMessage from llama_models.llama3.api.datatypes import SamplingParams def chat(): client =...
The following example stores common components in the file inference.py. This file contains the InferenceClient class and utility functions that are used across different examples. Here’s the content of inference.py:
import json from typing import Union, Generator import requests from llama_toolchain.apis.inference import ( ChatCompletionRequest, ChatCompletionResponse, ChatCompletionResponseStreamChunk ) class InferenceClient: def...