Documentation

Everything you need to get started with GPT-OSS models. Apache 2.0 licensed open-weight models with configurable reasoning and agentic capabilities.

Installation

Get up and running with GPT-OSS in minutes using your preferred framework.

API Reference

Complete API documentation with examples and best practices.

Tutorials

Step-by-step guides for common use cases and advanced configurations.

Getting Started

1. Choose Your Model

Select the right model for your use case from our Apache 2.0 licensed models:

GPT-OSS-120B (117B parameters)

• 5.1B active parameters with sparse architecture
• Best for production and high reasoning use cases
• Runs on single H100 GPU with native MXFP4 quantization
• Full chain-of-thought reasoning access
• Advanced agentic capabilities and tool use

GPT-OSS-20B (21B parameters)

• 3.6B active parameters for efficiency
• Optimized for lower latency and local deployment
• Runs within 16GB memory on consumer hardware
• Fine-tunable on consumer-grade GPUs
• Ideal for experimentation and specialized use cases

2. Installation Methods

Transformers

pip install transformers torch

vLLM (Recommended for Production)

pip install vllm

Ollama (Easy Setup)

ollama pull gpt-oss:20b

LM Studio & PyTorch/Triton

Also supports LM Studio GUI and PyTorch/Triton for custom deployments

3. Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "openai/gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate response
inputs = tokenizer("Hello, how can I help you today?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

Key Features

Harmony Response Format

Trained on OpenAI's "harmony response format" for enhanced reasoning capabilities and coherent outputs.

• Improved reasoning consistency
• Better structured responses
• Enhanced agentic behavior

Apache 2.0 License

Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.

• No usage restrictions
• Commercial deployment allowed
• Full customization rights

Native Quantization

Built-in MXFP4 quantization allows efficient deployment with minimal performance loss.

• Reduced memory footprint
• Faster inference speeds
• Maintained quality

Agentic Capabilities

Advanced tool use including web browsing, function calling, and complex task execution.

• Function calling support
• Web browsing capabilities
• Multi-step task execution

Advanced Configuration

Reasoning Level Configuration

Configure the reasoning level based on your specific needs. GPT-OSS provides full access to chain-of-thought reasoning:

Low: Fast responses with minimal reasoning overhead

Medium: Balanced performance with moderate reasoning depth

High: Deep reasoning with full chain-of-thought visibility

# Configure reasoning level in generation config
generation_config = {
    "reasoning_level": "medium",  # "low", "medium", or "high"
    "max_length": 2048,
    "temperature": 0.7,
    "do_sample": True
}

outputs = model.generate(**inputs, **generation_config)

Function Calling

Enable agentic capabilities with function calling:

# Define available functions
functions = [
    {
        "name": "get_weather",
        "description": "Get current weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
]

# Use with function calling enabled
response = model.chat_completion(
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    functions=functions,
    function_call="auto"
)

Additional Resources

GPT-OSS-120B

117B parameter model on HuggingFace

GPT-OSS-20B

21B parameter model on HuggingFace

GitHub Repository

Source code and examples

Need Help?

Join our community for support, examples, and discussions about GPT-OSS.