The following is a step-by-step Python tutorial on using Ubicloud’s managed AI Inference Endpoints. This guide focuses on Python notebook environments like Google Colab or Jupyter Notebook but also works for a standard pip environment for running Python scripts locally. All scripts and examples are available in this Google Colab Notebook.

Overview of Concepts

  • Ubicloud AI Inference Endpoints: These endpoints host AI models that are compatible with the OpenAI API. This means you can use the familiar OpenAI Python SDK to interact with these models.

  • OpenAI Python SDK: The OpenAI SDK simplifies sending requests to an AI model and processing the responses. You’ll use it to send chat prompts, receive streaming or full responses, and even request structured (JSON) outputs.

  • API Key and Base URL: To authenticate and send requests to Ubicloud’s endpoints, you need your API key and the endpoint’s base URL (both available from your Ubicloud Dashboard).

  • Chat Completion modes: Ubicloud supports several ways to interact with the AI models:

    • Non-Streaming: Returns the full response in one go.

    • Streaming: Returns the response incrementally, which is useful for long answers or real-time processing.

    • JSON Output: Formats the response as a JSON object for structured data extraction.

Step 1: Install the OpenAI Python Package

In your Jupyter Notebook or Colab, start by installing (or upgrading) the openai package. You can do this using a cell with the following command:

%pip install openai --upgrade --quiet

This command installs the latest version of the OpenAI SDK, which is fully compatible with Ubicloud’s endpoints.

If you are using a terminal or a vanilla Python environment, you can run:

pip install openai --upgrade

Step 2: Import Libraries and Setup Your Environment

Import the necessary libraries and set up your environment. Here we use Colab’s built-in methods for retrieving stored user data. If you’re on another notebook platform, adjust the API key retrieval accordingly.

from google.colab import userdata  # This is specific to Colab; adjust if using another environment.
import json
import openai

# Retrieve your Ubicloud API key (replace with your method of storing or retrieving credentials)
INFERENCE_API_KEY = userdata.get("UBICLOUD_API_KEY")  # or simply: INFERENCE_API_KEY = "your_api_key_here"

# Define the model name and base URL from your Ubicloud Dashboard
MODEL = "llama-3-1-8b-it"  # Example model; update as needed.
BASE_URL = f"https://{MODEL}.ai.ubicloud.com/v1"

# Create the OpenAI client instance configured for Ubicloud endpoints.
client = openai.OpenAI(
    api_key=INFERENCE_API_KEY,
    base_url=BASE_URL
)

The INFERENCE_API_KEY is used to authenticate your requests. The MODEL and BASE_URL are specific to your chosen model and Ubicloud deployment.

Step 3: Non-Streaming Chat Completion Example

In this example, you will send a simple chat message to the model and print the full response at once.

# Non-streaming chat completion: Send a simple message and get the full response.
completion = client.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        },
    ],
)

# Print the response message from the model.
print(completion.choices[0].message.content)

The messages parameter is a list of dictionaries representing the conversation. The response is accessed through completion.choices[0].message.content.

Step 4: Streaming Chat Completion Example

This example demonstrates how to receive the model’s output in a streaming manner (chunk by chunk). This is especially useful when dealing with long responses or when you want to start processing output before the entire response is ready.

# Streaming chat completion: The response is returned in chunks.
stream = client.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "user",
            "content": "How do I print hello world in Python?",
        },
    ],
    stream=True,  # Enable streaming mode.
)

# Iterate over the response chunks and print them as they are received.
for chunk in stream:
    if not chunk.choices:
        continue  # Skip if there are no choices in this chunk.
    
    # Print each piece of content without a newline until the full message is complete.
    print(chunk.choices[0].delta.content, end="")

# Finally, print a newline after the streaming output.
print()

Setting stream=True tells the API to return partial results as they become available. The loop iterates over each chunk and prints the delta (the latest addition) of the message.

Step 5: Chat Completion with JSON Output

In some cases, you might want the model to produce a structured output, such as a JSON object. This example shows how to request JSON output directly from the model.

json_completion = client.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "user",
            "content": """
I have three apples and two bananas.
Generate a JSON object with two fields: apple and banana.
Each field should represent the respective count of the mentioned fruits.
""",
        },
    ],
    response_format={"type": "json_object"},  # Request JSON output.
)

# Load the JSON string into a Python dictionary.
result = json.loads(json_completion.choices[0].message.content)
print(result)

The prompt instructs the model to output a JSON object. The response_format parameter specifies that the expected output is a JSON object. The json.loads() function converts the JSON-formatted string into a Python dictionary for further manipulation.

Summary

By following these steps, you have learned how to:

  • Set up your environment: Install the OpenAI Python SDK and import required libraries.

  • Configure the client: Use your Ubicloud API key, model name, and base URL.

  • Perform chat completions: Both in non-streaming and streaming modes.

  • Handle structured outputs: Request and process JSON output from the model.

This tutorial provides a foundation for using Ubicloud’s managed AI inference endpoints in your Python projects, leveraging the familiar OpenAI API interface. Feel free to customize the prompts, model names, and other parameters based on your specific use case and the models available in your Ubicloud Dashboard. Enjoy building with Ubicloud’s AI inference endpoints!