Getting Started with NVIDIA NIM: A Comprehensive Tutorial
NVIDIA NIM (NVIDIA Inference Microservices) is a powerful platform that provides data scientists and software engineers with easy access to a wide range of generative AI models. This tutorial will guide you through using NVIDIA NIM for various AI tasks, including chat completion, vector embeddings, text-to-image generation, and more.
Introduction to NVIDIA NIM
NVIDIA NIM offers a collection of AI models that can be deployed in the NVIDIA Cloud, on your local machine, or in your private cloud. These models are optimized for the NVIDIA NIM architecture and are designed to be easy to use and deploy.
To get started with NVIDIA NIM:
- Visit build.nvidia.com
- Create an account
- Apply for credits
- Generate an API key
Setting Up Your Environment
Before we begin, let's set up our Python environment with the necessary libraries:
!pip install openai
!pip install Pillow
Next, we'll import the required libraries and set up our API key:
from openai import OpenAI
import requests
import base64
from pathlib import Path
from PIL import Image
import io
from dataclasses import dataclass, field
from typing import List, Optional, Dict, Any
api_key_nim = 'your_api_key_here'
Chat Completion with Llama3
Let's start by implementing a chat interface using NVIDIA's NIM API with the Llama3 model:
@dataclass
class ChatCompletionConfig:
model: str
temperature: float = 0.5
top_p: float = 1
max_tokens: int = 1024
stream: bool = True
class AIClient:
def __init__(self, api_config: ApiConfig, mode_config: Union[ChatCompletionConfig, EmbeddingConfig]):
self.api_config = api_config
self.mode_config = mode_config
self.client = OpenAI(base_url=api_config.base_url, api_key=api_config.api_key)
def generate_chat_response(self, prompt: str, conversation_history: List[dict]):
messages = conversation_history + [{"role": "user", "content": prompt}]
config = self.mode_config
return self.client.chat.completions.create(
model=config.model,
messages=messages,
temperature=config.temperature,
top_p=config.top_p,
max_tokens=config.max_tokens,
stream=config.stream
)
# Usage
llama3_chat_client = create_chat_client(ChatModel.LLAMA3_8B_INSTRUCT, api_key_nim)
llama3_chat_interface = AIInterface(llama3_chat_client)
llama3_chat_interface.display()
This code sets up a chat interface using the Llama3 model. You can now interact with the model by asking questions or giving prompts.
Vector Embeddings
NVIDIA NIM provides multiple models for generating vector embeddings. Let's explore how to use them:
@dataclass
class EmbeddingConfig:
model: str
encoding_format: str = "float"
input_type: Optional[str] = None
truncate: Optional[str] = None
extra_params: Dict[str, Any] = field(default_factory=dict)
class AIClient:
def generate_embedding(self, input_text: str):
config = self.mode_config
payload = {
"input": [input_text],
"model": config.model,
"encoding_format": config.encoding_format,
}
extra_body = {}
if config.input_type:
extra_body["input_type"] = config.input_type
if config.truncate:
extra_body["truncate"] = config.truncate
if extra_body:
payload["extra_body"] = extra_body
payload.update(config.extra_params)
response = self.client.embeddings.create(**payload)
return response.data[0].embedding
# Usage for different embedding models
nv_embed_qa_client = create_embedding_client(EmbeddingModel.NV_EMBED_QA, api_key_nim)
nv_embed_v1_client = create_embedding_client(EmbeddingModel.NV_EMBED_V1, api_key_nim)
arctic_embed_client = create_embedding_client(EmbeddingModel.ARCTIC_EMBED_L, api_key_nim)
bge_m3_client = create_embedding_client(EmbeddingModel.BGE_M3, api_key_nim)
These embedding models can be used to convert text into vector representations, which are crucial for many NLP tasks and information retrieval systems.
Reranking for Improved Search Results
NVIDIA NIM also offers a reranking model to improve search results:
@dataclass
class Query:
text: str
@dataclass
class Passage:
text: str
@dataclass
class ReRankingConfig:
model: str = "nv-rerank-qa-mistral-4b:1"
query: Query = field(default_factory=Query)
passages: List[Passage] = field(default_factory=list)
class NvidiaReRankingAPI:
BASE_URL = "https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking"
def __init__(self, api_key: str):
self.api_key = api_key
def rerank(self, config: ReRankingConfig) -> dict:
payload = {
"model": config.model,
"query": {"text": config.query.text},
"passages": [{"text": passage.text} for passage in config.passages]
}
headers = {
"Authorization": f"Bearer {self.api_key}",
"Accept": "application/json",
}
response = requests.post(self.BASE_URL, headers=headers, json=payload)
response.raise_for_status()
return response.json()
# Usage
reranking_api = NvidiaReRankingAPI(api_key_nim)
query = Query(text="What is the GPU memory bandwidth of H100 SXM?")
passages = [
Passage(text="The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth, 7X faster than PCIe Gen5."),
Passage(text="A100 provides up to 20X higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands."),
Passage(text="Accelerated servers with H100 deliver the compute powerโalong with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitchโข.")
]
config = ReRankingConfig(query=query, passages=passages)
result = reranking_api.rerank(config)
print(result)
This reranking model helps determine which passages are most relevant to a given query, improving the quality of search results.
Text-to-Image Generation with Stable Diffusion
NVIDIA NIM provides access to Stable Diffusion models for text-to-image generation:
@dataclass
class TextPrompt:
text: str
weight: float = 1.0
@dataclass
class StableDiffusionConfig:
text_prompts: List[TextPrompt]
negative_prompt: Optional[TextPrompt] = None
cfg_scale: float = 5.0
sampler: str = "K_DPM_2_ANCESTRAL"
seed: int = 0
steps: int = 25
class AIImageGenerator:
def __init__(self, api_key: str):
self.api_key = api_key
self.base_url = "https://ai.api.nvidia.com/v1/genai/stabilityai"
def generate_stable_diffusion(self, config: StableDiffusionConfig) -> dict:
payload = {
"text_prompts": [{
"text": prompt.text,
"weight": prompt.weight
} for prompt in config.text_prompts],
"cfg_scale": config.cfg_scale,
"sampler": config.sampler,
"seed": config.seed,
"steps": config.steps
}
if config.negative_prompt:
payload["text_prompts"].append({
"text": config.negative_prompt.text,
"weight": config.negative_prompt.weight
})
return self._make_request("stable-diffusion-xl", payload)
def _make_request(self, endpoint: str, payload: dict) -> dict:
headers = {
"Authorization": f"Bearer {self.api_key}",
"Accept": "application/json",
}
response = requests.post(f"{self.base_url}/{endpoint}", headers=headers, json=payload)
response.raise_for_status()
return response.json()
# Usage
generator = AIImageGenerator(api_key_nim)
sd_config = StableDiffusionConfig(
text_prompts=[TextPrompt("An anthropomorphic duck wearing a cape and looking like a mage in the style of 90s video games, flat 2D cardboard cutouts, and children's cartoons. The duck will be used as a logo, easy to draw, with minimal details, evoking a retro and nostalgic feeling.")],
negative_prompt=TextPrompt("new, modern, realistic", -1),
seed=42,
steps=30
)
sd_result = generator.generate_stable_diffusion(sd_config)
# Save the generated image
def save_base64_image(response_data: Dict[str, Any], output_path: Path, image_format: str = 'JPEG'):
base64_string = response_data['artifacts'][0]['base64']
image_data = base64.b64decode(base64_string)
with Image.open(BytesIO(image_data)) as image:
image.save(output_path, format=image_format)
print(f"Image saved successfully to {output_path}")
output_file = Path('output/sd_result.jpg')
save_base64_image(sd_result, output_file)
This code generates an image based on the provided text prompt using Stable Diffusion XL.
Image-to-Text Generation with LLaVA and NEVA
NVIDIA NIM also provides models for generating textual descriptions of images:
@dataclass
class ImageToTextConfig:
model: str
image_path: Path
prompt: str
max_tokens: int
temperature: float
top_p: float
stream: bool = True
seed: Optional[int] = None
class ImageToTextAPI:
BASE_URL = "https://ai.api.nvidia.com/v1/vlm"
def __init__(self, api_key: str):
self.api_key = api_key
def generate_text(self, config: ImageToTextConfig) -> str:
image_b64 = self._encode_image(config.image_path)
payload = self._create_payload(config, image_b64)
headers = {
"Authorization": f"Bearer {self.api_key}",
"Accept": "text/event-stream" if config.stream else "application/json"
}
invoke_url = f"{self.BASE_URL}/{config.model}"
response = requests.post(invoke_url, headers=headers, json=payload, stream=config.stream)
if config.stream:
return self._process_stream(response)
else:
return response.json()
def _encode_image(self, image_path: Path) -> str:
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode()
def _create_payload(self, config: ImageToTextConfig, image_b64: str) -> Dict[str, Any]:
return {
"messages": [{"role": "user", "content": f'{config.prompt} <img src="data:image/jpeg;base64,{image_b64}" />'}],
"max_tokens": config.max_tokens,
"temperature": config.temperature,
"top_p": config.top_p,
"stream": config.stream,
"seed": config.seed
}
def _process_stream(self, response: requests.Response) -> str:
content = []
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith("data: "):
try:
data = json.loads(line[6:])
if 'choices' in data and len(data['choices']) > 0:
delta = data['choices'][0].get('delta', {})
if 'content' in delta:
content.append(delta['content'])
except json.JSONDecodeError:
if line[6:].strip() == "[DONE]":
break
return ''.join(content).strip()
# Usage
api = ImageToTextAPI(api_key_nim)
llava_config = ImageToTextConfig(
model="community/llava16-34b",
image_path=Path("output/sd_result.jpg"),
prompt="Describe the image.",
max_tokens=512,
temperature=1.00,
top_p=0.70
)
llava_result = api.generate_text(llava_config)
print("LLaVA result:", llava_result)
neva_config = ImageToTextConfig(
model="nvidia/neva-22b",
image_path=Path("output/sd_result.jpg"),
prompt="Describe what you see in this image.",
max_tokens=1024,
temperature=0.20,
top_p=0.70,
seed=0
)
neva_result = api.generate_text(neva_config)
print("NEVA result:", neva_result)
This code demonstrates how to use both LLaVA and NEVA models to generate textual descriptions of images.
Image-to-Video Generation with Stable Video Diffusion
Finally, let's explore how to generate short video clips from still images using NVIDIA's Stable Video Diffusion model:
@dataclass
class StableVideoDiffusionParams:
image: str
cfg_scale: float = field(default=2.5, metadata={"min": 0, "max": 100})
seed: int = field(default=0, metadata={"min": 0, "max": 2**32 - 1})
motion_bucket_id: int = field(default=127, metadata={"min": 0, "max": 127})
class ImageProcessor:
@staticmethod
def resize_and_encode(image_path: Path, max_size: tuple[int, int] = (300, 300)) -> str:
with Image.open(image_path) as img:
img.thumbnail(max_size, Image.LANCZOS)
img = img.convert('RGB')
buffered = io.BytesIO()
img.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue()).decode()
return f"data:image/png;base64,{img_str}"
class StableVideoDiffusionAPI:
BASE_URL = "https://ai.api.nvidia.com/v1/genai/stabilityai/stable-video-diffusion"
def __init__(self, api_key: str):
self.api_key = api_key
def send_request(self, params: StableVideoDiffusionParams) -> dict:
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
"Accept": "application/json",
}
response = requests.post(self.BASE_URL, headers=headers, json=asdict(params))
response.raise_for_status()
return response.json()
def generate_video(image_path: Path, api_key: str,
cfg_scale: Optional[float] = None,
seed: Optional[int] = None,
motion_bucket_id: Optional[int] = None) -> dict:
image_data = ImageProcessor.resize_and_encode(image_path)
params = StableVideoDiffusionParams(image=image_data)
if cfg_scale is not None:
params.cfg_scale = cfg_scale
if seed is not None:
params.seed = seed
if motion_bucket_id is not None:
params.motion_bucket_id = motion_bucket_id
api = StableVideoDiffusionAPI(api_key)
return api.send_request(params)
# Usage
result = generate_video(
image_path=Path("output/sd_result.jpg"),
api_key=api_key_nim,
cfg_scale=2.0,
seed=0,
motion_bucket_id=127
)
# Save the generated video
def save_video_from_json(data, output_filename='output.mp4'):
video_base64 = data['video']
video_data = base64.b64decode(video_base64)
with open(output_filename, 'wb') as f:
f.write(video_data)
print(f"Video saved as {output_filename}")
save_video_from_json(result, 'output_video.mp4')
This code generates a short video clip from the still image we created earlier using Stable Diffusion.
Conclusion
In this tutorial, we've explored a wide range of GenAI capabilities provided by NVIDIA NIM:
- Chat Completion with Llama3
- Vector Embeddings for text using multiple models
- Reranking for improved search results
- Text-to-Image generation with Stable Diffusion
- Image-to-Text generation with LLaVA and NEVA
- Image-to-Video generation with Stable Video Diffusion
These tools and models form a powerful toolkit for various AI applications, including:
- Natural Language Processing
- Information Retrieval
- Computer Vision
- Multimodal AI
As you continue your journey in AI Product Engineering, you'll find these capabilities essential for building sophisticated AI-powered products and services. Each of these models can be integrated into larger systems to create complex, intelligent applications.
Remember to always refer to the official NVIDIA documentation for the most up-to-date information on these services and any new features or models that may be introduced. As the field of AI is rapidly evolving, staying current with the latest developments and best practices is crucial.
Experiment with different parameters, combine these models in innovative ways, and always consider the ethical implications and potential biases in AI-generated content. Happy coding and exploring the world of GenAI with NVIDIA NIM!
๐ฌ๐ง Chapter