Skip to main content
This doc will help you get started with Oracle Cloud Infrastructure (OCI) Generative AI chat models. OCI Generative AI is a fully managed service providing state-of-the-art, customizable large language models covering a wide range of use cases through a single API. Access ready-to-use pretrained models or create and host fine-tuned custom models on dedicated AI clusters. For detailed documentation, see the OCI Generative AI documentation and API reference.

Overview

Integration details

ClassPackageSerializableJS supportDownloadsVersion
ChatOCIGenAIlangchain-ocibetaPyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs
✅ (Gemini)✅ (Gemini)

Setup

Installation

pip install -qU langchain-oci oci

Credentials

Set up authentication with the OCI CLI (creates ~/.oci/config):
oci setup config
For other auth methods (session tokens, instance principals), see OCI SDK authentication.

Instantiation

from langchain_oci import ChatOCIGenAI

llm = ChatOCIGenAI(
    model_id="meta.llama-3.3-70b-instruct",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="ocid1.compartment.oc1..your-compartment-id",
    model_kwargs={"temperature": 0.7, "max_tokens": 500},  # Optional
)
Key parameters:
  • model_id - The model to use (see available models)
  • service_endpoint - Regional endpoint (us-chicago-1, eu-frankfurt-1, etc.)
  • compartment_id - Your OCI compartment OCID
  • model_kwargs - Model settings like temperature, max_tokens

Invocation

messages = [
    ("system", "You are a code review assistant."),
    ("human", """Review this Python function for security issues:

def login(username, password):
    query = f"SELECT * FROM users WHERE name='{username}' AND pass='{password}'"
    return db.execute(query)
"""),
]
response = llm.invoke(messages)
print(response.content)
This function has a critical SQL injection vulnerability. The username and password
are directly interpolated into the SQL query string, allowing attackers to bypass
authentication or extract data. Use parameterized queries instead:

    cursor.execute("SELECT * FROM users WHERE name=? AND pass=?", (username, password))
Multi-turn conversations maintain context across messages:
messages = [
    ("user", "Analyze error rate spike at 14:30 UTC"),
    ("assistant", "The spike correlates with deploy-v2.1.3. Checking logs..."),
    ("user", "What was the root cause?"),
]
response = llm.invoke(messages)
# Model references previous context about deploy-v2.1.3

Streaming

Get responses as they’re generated:
for chunk in llm.stream("Explain Python generators in 3 sentences"):
    print(chunk.content, end="", flush=True)

Async

Process multiple requests concurrently for better throughput:
import asyncio

# Analyze multiple code files concurrently
async def analyze_codebase(files: list[str]) -> list:
    tasks = [llm.ainvoke(f"Find vulnerabilities in:\n{code}") for code in files]
    return await asyncio.gather(*tasks)

# Stream responses for real-time UI updates
async def stream_response():
    async for chunk in llm.astream("Explain async/await in Python"):
        print(chunk.content, end="", flush=True)

asyncio.run(stream_response())

Tool Calling

Give models access to APIs, databases, and custom functions:
from langchain.tools import tool

@tool
def get_order_status(order_id: str) -> dict:
    """Check the status of a customer order.

    Args:
        order_id: The order ID to look up
    """
    # In production, query your database
    return {"order_id": order_id, "status": "shipped", "eta": "2024-03-15"}

@tool
def get_account_balance(account_id: str) -> dict:
    """Get current account balance.

    Args:
        account_id: The account ID
    """
    return {"account_id": account_id, "balance": 1250.00, "currency": "USD"}

# Bind tools to the model
tools = [get_order_status, get_account_balance]
llm_with_tools = llm.bind_tools(tools)

# Model decides which tool to call
response = llm_with_tools.invoke("What's the status of order ORD-12345?")

# Check if model wants to call a tool
if response.tool_calls:
    tool_call = response.tool_calls[0]
    print(f"Tool: {tool_call['name']}, Args: {tool_call['args']}")
    # Output: Tool: get_order_status, Args: {'order_id': 'ORD-12345'}
Complete tool execution loop - execute the tool and return results:
from langchain.messages import HumanMessage, AIMessage, ToolMessage

messages = [HumanMessage(content="What's the status of order ORD-12345?")]
response = llm_with_tools.invoke(messages)

# Execute each tool call and collect results
if response.tool_calls:
    messages.append(response)  # Add AI response with tool calls

    for tool_call in response.tool_calls:
        # Find and execute the tool
        tool_fn = {"get_order_status": get_order_status,
                   "get_account_balance": get_account_balance}[tool_call["name"]]
        result = tool_fn.invoke(tool_call["args"])

        # Add tool result to messages
        messages.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))

    # Get final response with tool results
    final_response = llm_with_tools.invoke(messages)
    print(final_response.content)
    # Output: Order ORD-12345 has been shipped and is expected to arrive on March 15, 2024.
Parallel tool execution (Llama 4+) for concurrent API calls:
llm = ChatOCIGenAI(
    model_id="meta.llama-4-scout-17b-16e-instruct",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="ocid1.compartment.oc1..your-compartment-id",
)
llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=True)
# Model can call multiple tools at once, reducing latency

Structured Output

Parse unstructured text into typed data structures for processing:
from pydantic import BaseModel, Field
from typing import List, Literal

class SupportTicket(BaseModel):
    """Structured representation of a customer support ticket."""
    ticket_id: str
    severity: Literal["low", "medium", "high", "critical"]
    category: str = Field(description="e.g., billing, technical, account")
    description: str
    affected_services: List[str]

structured_llm = llm.with_structured_output(SupportTicket)

# Parse unstructured support email
email_text = """From: customer@example.com
Subject: URGENT - Cannot access production database

Our production API has been returning 500 errors for the past hour.
The database connection pool appears exhausted. This is affecting
our payment processing and user authentication services."""

ticket = structured_llm.invoke(email_text)
print(ticket.severity)           # "critical"
print(ticket.category)           # "technical"
print(ticket.affected_services)  # ["payment processing", "user authentication"]
Use for log parsing, invoice extraction, or data classification pipelines.

Vision & Multimodal

Process images for data extraction, analysis, and automation:
from langchain.messages import HumanMessage
from langchain_oci import ChatOCIGenAI, load_image

llm = ChatOCIGenAI(
    model_id="meta.llama-3.2-90b-vision-instruct",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="ocid1.compartment.oc1..your-compartment-id",
)

# Analyze an architecture diagram
message = HumanMessage(content=[
    {"type": "text", "text": "List all services and their connections in this diagram."},
    load_image("./architecture_diagram.png"),  # Local file or URL
])
response = llm.invoke([message])
print(response.content)
The diagram shows 4 services:
1. API Gateway - receives external traffic, routes to internal services
2. Auth Service - handles authentication, connects to User DB
3. Order Service - processes orders, connects to Orders DB and Payment API
4. Notification Service - sends emails/SMS, triggered by Order Service
Use cases: Diagram analysis, receipt/invoice parsing, chart data extraction, document processing Vision models: Llama 3.2 Vision (11B, 90B), Gemini 2.0/2.5, Grok 4, Cohere Command A

Gemini Multimodal (PDF, Video, Audio)

Process documents, videos, and audio with Gemini models:
import base64
from langchain.messages import HumanMessage
from langchain_oci import ChatOCIGenAI

llm = ChatOCIGenAI(
    model_id="google.gemini-2.5-flash",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="ocid1.compartment.oc1..your-compartment-id",
)

# Extract data from a PDF
with open("contract.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

message = HumanMessage(content=[
    {"type": "text", "text": "Extract the contract parties, effective date, and payment terms as JSON."},
    {"type": "document_url", "document_url": {"url": f"data:application/pdf;base64,{pdf_data}"}}
])
response = llm.invoke([message])
print(response.content)
{
  "parties": ["Acme Corp", "TechStart Inc"],
  "effective_date": "2024-01-15",
  "payment_terms": "Net 30, monthly invoicing"
}
Video/Audio analysis:
with open("meeting.mp4", "rb") as f:
    video_data = base64.b64encode(f.read()).decode()

message = HumanMessage(content=[
    {"type": "text", "text": "List the action items and who is responsible for each."},
    {"type": "video_url", "video_url": {"url": f"data:video/mp4;base64,{video_data}"}}
])
response = llm.invoke([message])
Supported formats: PDF, MP4/MOV video, MP3/WAV audio (Gemini 2.0/2.5 only)

Configuration

Control model behavior with model_kwargs:
llm = ChatOCIGenAI(
    model_id="meta.llama-3.3-70b-instruct",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="ocid1.compartment.oc1..your-compartment-id",
    model_kwargs={
        "temperature": 0.7,    # Creativity: 0 = deterministic, 1 = creative
        "max_tokens": 500,     # Maximum response length
        "top_p": 0.9,          # Nucleus sampling threshold
    },
)

Available Models

ProviderExample ModelsKey Features
MetaLlama 3.2/3.3/4 (Scout, Maverick)Vision, parallel tools
GoogleGemini 2.0/2.5 Flash, ProPDF, video, audio
xAIGrok 3, Grok 4Vision, reasoning
CohereCommand R+, Command ARAG, vision
See the OCI model catalog for the complete list and regional availability.

API Reference

For detailed documentation of all ChatOCIGenAI features and configurations, head to the API reference.