Tutorial Image: Getting Started with NVIDIA NIM: A Comprehensive Tutorial

Getting Started with NVIDIA NIM: A Comprehensive Tutorial

Learn how to leverage NVIDIA NIM for various AI tasks including chat completion, vector embeddings, text-to-image generation, and more.

Getting Started with NVIDIA NIM: A Comprehensive Tutorial

NVIDIA NIM (NVIDIA Inference Microservices) is a powerful platform that provides data scientists and software engineers with easy access to a wide range of generative AI models. This tutorial will guide you through using NVIDIA NIM for various AI tasks, including chat completion, vector embeddings, text-to-image generation, and more.

Introduction to NVIDIA NIM

NVIDIA NIM offers a collection of AI models that can be deployed in the NVIDIA Cloud, on your local machine, or in your private cloud. These models are optimized for the NVIDIA NIM architecture and are designed to be easy to use and deploy.

To get started with NVIDIA NIM:

  1. Visit build.nvidia.com
  2. Create an account
  3. Apply for credits
  4. Generate an API key

Setting Up Your Environment

Before we begin, let's set up our Python environment with the necessary libraries:

!pip install openai
!pip install Pillow

Next, we'll import the required libraries and set up our API key:

from openai import OpenAI
import requests
import base64
from pathlib import Path
from PIL import Image
import io
from dataclasses import dataclass, field
from typing import List, Optional, Dict, Any
api_key_nim = 'your_api_key_here'

Chat Completion with Llama3

Let's start by implementing a chat interface using NVIDIA's NIM API with the Llama3 model:

@dataclass
class ChatCompletionConfig:
    model: str
    temperature: float = 0.5
    top_p: float = 1
    max_tokens: int = 1024
    stream: bool = True

class AIClient:
    def __init__(self, api_config: ApiConfig, mode_config: Union[ChatCompletionConfig, EmbeddingConfig]):
        self.api_config = api_config
        self.mode_config = mode_config
        self.client = OpenAI(base_url=api_config.base_url, api_key=api_config.api_key)

    def generate_chat_response(self, prompt: str, conversation_history: List[dict]):
        messages = conversation_history + [{"role": "user", "content": prompt}]
        config = self.mode_config
        return self.client.chat.completions.create(
            model=config.model,
            messages=messages,
            temperature=config.temperature,
            top_p=config.top_p,
            max_tokens=config.max_tokens,
            stream=config.stream
        )

# Usage
llama3_chat_client = create_chat_client(ChatModel.LLAMA3_8B_INSTRUCT, api_key_nim)
llama3_chat_interface = AIInterface(llama3_chat_client)
llama3_chat_interface.display()

This code sets up a chat interface using the Llama3 model. You can now interact with the model by asking questions or giving prompts.

Vector Embeddings

NVIDIA NIM provides multiple models for generating vector embeddings. Let's explore how to use them:

@dataclass
class EmbeddingConfig:
    model: str
    encoding_format: str = "float"
    input_type: Optional[str] = None
    truncate: Optional[str] = None
    extra_params: Dict[str, Any] = field(default_factory=dict)

class AIClient:
    def generate_embedding(self, input_text: str):
        config = self.mode_config
        payload = {
            "input": [input_text],
            "model": config.model,
            "encoding_format": config.encoding_format,
        }
        extra_body = {}
        if config.input_type:
            extra_body["input_type"] = config.input_type
        if config.truncate:
            extra_body["truncate"] = config.truncate
        if extra_body:
            payload["extra_body"] = extra_body
        payload.update(config.extra_params)

        response = self.client.embeddings.create(**payload)
        return response.data[0].embedding

# Usage for different embedding models
nv_embed_qa_client = create_embedding_client(EmbeddingModel.NV_EMBED_QA, api_key_nim)
nv_embed_v1_client = create_embedding_client(EmbeddingModel.NV_EMBED_V1, api_key_nim)
arctic_embed_client = create_embedding_client(EmbeddingModel.ARCTIC_EMBED_L, api_key_nim)
bge_m3_client = create_embedding_client(EmbeddingModel.BGE_M3, api_key_nim)

These embedding models can be used to convert text into vector representations, which are crucial for many NLP tasks and information retrieval systems.

Reranking for Improved Search Results

NVIDIA NIM also offers a reranking model to improve search results:

@dataclass
class Query:
    text: str

@dataclass
class Passage:
    text: str

@dataclass
class ReRankingConfig:
    model: str = "nv-rerank-qa-mistral-4b:1"
    query: Query = field(default_factory=Query)
    passages: List[Passage] = field(default_factory=list)

class NvidiaReRankingAPI:
    BASE_URL = "https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking"

    def __init__(self, api_key: str):
        self.api_key = api_key

    def rerank(self, config: ReRankingConfig) -> dict:
        payload = {
            "model": config.model,
            "query": {"text": config.query.text},
            "passages": [{"text": passage.text} for passage in config.passages]
        }

        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Accept": "application/json",
        }

        response = requests.post(self.BASE_URL, headers=headers, json=payload)
        response.raise_for_status()
        return response.json()

# Usage
reranking_api = NvidiaReRankingAPI(api_key_nim)
query = Query(text="What is the GPU memory bandwidth of H100 SXM?")
passages = [
    Passage(text="The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth, 7X faster than PCIe Gen5."),
    Passage(text="A100 provides up to 20X higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands."),
    Passage(text="Accelerated servers with H100 deliver the compute powerโ€”along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitchโ„ข.")
]

config = ReRankingConfig(query=query, passages=passages)
result = reranking_api.rerank(config)
print(result)

This reranking model helps determine which passages are most relevant to a given query, improving the quality of search results.

Text-to-Image Generation with Stable Diffusion

NVIDIA NIM provides access to Stable Diffusion models for text-to-image generation:

@dataclass
class TextPrompt:
    text: str
    weight: float = 1.0

@dataclass
class StableDiffusionConfig:
    text_prompts: List[TextPrompt]
    negative_prompt: Optional[TextPrompt] = None
    cfg_scale: float = 5.0
    sampler: str = "K_DPM_2_ANCESTRAL"
    seed: int = 0
    steps: int = 25

class AIImageGenerator:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://ai.api.nvidia.com/v1/genai/stabilityai"

    def generate_stable_diffusion(self, config: StableDiffusionConfig) -> dict:
        payload = {
            "text_prompts": [{
                "text": prompt.text,
                "weight": prompt.weight
            } for prompt in config.text_prompts],
            "cfg_scale": config.cfg_scale,
            "sampler": config.sampler,
            "seed": config.seed,
            "steps": config.steps
        }
        if config.negative_prompt:
            payload["text_prompts"].append({
                "text": config.negative_prompt.text,
                "weight": config.negative_prompt.weight
            })
        return self._make_request("stable-diffusion-xl", payload)

    def _make_request(self, endpoint: str, payload: dict) -> dict:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Accept": "application/json",
        }
        response = requests.post(f"{self.base_url}/{endpoint}", headers=headers, json=payload)
        response.raise_for_status()
        return response.json()

# Usage
generator = AIImageGenerator(api_key_nim)
sd_config = StableDiffusionConfig(
    text_prompts=[TextPrompt("An anthropomorphic duck wearing a cape and looking like a mage in the style of 90s video games, flat 2D cardboard cutouts, and children's cartoons. The duck will be used as a logo, easy to draw, with minimal details, evoking a retro and nostalgic feeling.")],
    negative_prompt=TextPrompt("new, modern, realistic", -1),
    seed=42,
    steps=30
)
sd_result = generator.generate_stable_diffusion(sd_config)

# Save the generated image
def save_base64_image(response_data: Dict[str, Any], output_path: Path, image_format: str = 'JPEG'):
    base64_string = response_data['artifacts'][0]['base64']
    image_data = base64.b64decode(base64_string)
    with Image.open(BytesIO(image_data)) as image:
        image.save(output_path, format=image_format)
    print(f"Image saved successfully to {output_path}")

output_file = Path('output/sd_result.jpg')
save_base64_image(sd_result, output_file)

This code generates an image based on the provided text prompt using Stable Diffusion XL.

Image-to-Text Generation with LLaVA and NEVA

NVIDIA NIM also provides models for generating textual descriptions of images:

@dataclass
class ImageToTextConfig:
    model: str
    image_path: Path
    prompt: str
    max_tokens: int
    temperature: float
    top_p: float
    stream: bool = True
    seed: Optional[int] = None

class ImageToTextAPI:
    BASE_URL = "https://ai.api.nvidia.com/v1/vlm"

    def __init__(self, api_key: str):
        self.api_key = api_key

    def generate_text(self, config: ImageToTextConfig) -> str:
        image_b64 = self._encode_image(config.image_path)
        payload = self._create_payload(config, image_b64)
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Accept": "text/event-stream" if config.stream else "application/json"
        }
        invoke_url = f"{self.BASE_URL}/{config.model}"
        response = requests.post(invoke_url, headers=headers, json=payload, stream=config.stream)
        
        if config.stream:
            return self._process_stream(response)
        else:
            return response.json()

    def _encode_image(self, image_path: Path) -> str:
        with open(image_path, "rb") as f:
            return base64.b64encode(f.read()).decode()

    def _create_payload(self, config: ImageToTextConfig, image_b64: str) -> Dict[str, Any]:
        return {
            "messages": [{"role": "user", "content": f'{config.prompt} <img src="data:image/jpeg;base64,{image_b64}" />'}],
            "max_tokens": config.max_tokens,
            "temperature": config.temperature,
            "top_p": config.top_p,
            "stream": config.stream,
            "seed": config.seed
        }

    def _process_stream(self, response: requests.Response) -> str:
        content = []
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith("data: "):
                    try:
                        data = json.loads(line[6:])
                        if 'choices' in data and len(data['choices']) > 0:
                            delta = data['choices'][0].get('delta', {})
                            if 'content' in delta:
                                content.append(delta['content'])
                    except json.JSONDecodeError:
                        if line[6:].strip() == "[DONE]":
                            break
        return ''.join(content).strip()

# Usage
api = ImageToTextAPI(api_key_nim)

llava_config = ImageToTextConfig(
    model="community/llava16-34b",
    image_path=Path("output/sd_result.jpg"),
    prompt="Describe the image.",
    max_tokens=512,
    temperature=1.00,
    top_p=0.70
)

llava_result = api.generate_text(llava_config)
print("LLaVA result:", llava_result)

neva_config = ImageToTextConfig(
    model="nvidia/neva-22b",
    image_path=Path("output/sd_result.jpg"),
    prompt="Describe what you see in this image.",
    max_tokens=1024,
    temperature=0.20,
    top_p=0.70,
    seed=0
)

neva_result = api.generate_text(neva_config)
print("NEVA result:", neva_result)

This code demonstrates how to use both LLaVA and NEVA models to generate textual descriptions of images.

Image-to-Video Generation with Stable Video Diffusion

Finally, let's explore how to generate short video clips from still images using NVIDIA's Stable Video Diffusion model:

@dataclass
class StableVideoDiffusionParams:
    image: str
    cfg_scale: float = field(default=2.5, metadata={"min": 0, "max": 100})
    seed: int = field(default=0, metadata={"min": 0, "max": 2**32 - 1})
    motion_bucket_id: int = field(default=127, metadata={"min": 0, "max": 127})

class ImageProcessor:
    @staticmethod
    def resize_and_encode(image_path: Path, max_size: tuple[int, int] = (300, 300)) -> str:
        with Image.open(image_path) as img:
            img.thumbnail(max_size, Image.LANCZOS)
            img = img.convert('RGB')
            buffered = io.BytesIO()
            img.save(buffered, format="PNG")
            img_str = base64.b64encode(buffered.getvalue()).decode()
        return f"data:image/png;base64,{img_str}"

class StableVideoDiffusionAPI:
    BASE_URL = "https://ai.api.nvidia.com/v1/genai/stabilityai/stable-video-diffusion"

    def __init__(self, api_key: str):
        self.api_key = api_key

    def send_request(self, params: StableVideoDiffusionParams) -> dict:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
            "Accept": "application/json",
        }

        response = requests.post(self.BASE_URL, headers=headers, json=asdict(params))
        response.raise_for_status()
        return response.json()

def generate_video(image_path: Path, api_key: str,
                   cfg_scale: Optional[float] = None,
                   seed: Optional[int] = None,
                   motion_bucket_id: Optional[int] = None) -> dict:
    image_data = ImageProcessor.resize_and_encode(image_path)

    params = StableVideoDiffusionParams(image=image_data)
    if cfg_scale is not None:
        params.cfg_scale = cfg_scale
    if seed is not None:
        params.seed = seed
    if motion_bucket_id is not None:
        params.motion_bucket_id = motion_bucket_id

    api = StableVideoDiffusionAPI(api_key)
    return api.send_request(params)

# Usage
result = generate_video(
    image_path=Path("output/sd_result.jpg"),
    api_key=api_key_nim,
    cfg_scale=2.0,
    seed=0,
    motion_bucket_id=127
)

# Save the generated video
def save_video_from_json(data, output_filename='output.mp4'):
    video_base64 = data['video']
    video_data = base64.b64decode(video_base64)
    with open(output_filename, 'wb') as f:
        f.write(video_data)
    print(f"Video saved as {output_filename}")

save_video_from_json(result, 'output_video.mp4')

This code generates a short video clip from the still image we created earlier using Stable Diffusion.

Conclusion

In this tutorial, we've explored a wide range of GenAI capabilities provided by NVIDIA NIM:

  1. Chat Completion with Llama3
  2. Vector Embeddings for text using multiple models
  3. Reranking for improved search results
  4. Text-to-Image generation with Stable Diffusion
  5. Image-to-Text generation with LLaVA and NEVA
  6. Image-to-Video generation with Stable Video Diffusion

These tools and models form a powerful toolkit for various AI applications, including:

  • Natural Language Processing
  • Information Retrieval
  • Computer Vision
  • Multimodal AI

As you continue your journey in AI Product Engineering, you'll find these capabilities essential for building sophisticated AI-powered products and services. Each of these models can be integrated into larger systems to create complex, intelligent applications.

Remember to always refer to the official NVIDIA documentation for the most up-to-date information on these services and any new features or models that may be introduced. As the field of AI is rapidly evolving, staying current with the latest developments and best practices is crucial.

Experiment with different parameters, combine these models in innovative ways, and always consider the ethical implications and potential biases in AI-generated content. Happy coding and exploring the world of GenAI with NVIDIA NIM!

Rod Rivera

๐Ÿ‡ฌ๐Ÿ‡ง Chapter

More Tutorials

Post Image: Efficient Document Retrieval with ColPali and BLIP-2 for AI Queries

Efficient Document Retrieval with ColPali and BLIP-2 for AI Queries

Discover a cutting-edge AI-powered document retrieval system using ColPali and BLIP-2 models. This open-source project efficiently extracts relevant pages from large document sets before querying them with BLIP-2, significantly reducing computational costs. Ideal for handling PDFs and images, the system enables quick, accurate insights with minimal resources. Explore the full repository and learn how to enhance your document query process at AI Product Engineer.

Jens Weber

๐Ÿ‡ฉ๐Ÿ‡ช Chapter