Documentation

Everything you need to get started with GPT-OSS models. Apache 2.0 licensed open-weight models with configurable reasoning and agentic capabilities.

Installation

Get up and running with GPT-OSS in minutes using your preferred framework.

API Reference

Complete API documentation with examples and best practices.

Tutorials

Step-by-step guides for common use cases and advanced configurations.

Getting Started

1. Choose Your Model

Select the right model for your use case from our Apache 2.0 licensed models:

GPT-OSS-120B (117B parameters)

  • • 5.1B active parameters with sparse architecture
  • • Best for production and high reasoning use cases
  • • Runs on single H100 GPU with native MXFP4 quantization
  • • Full chain-of-thought reasoning access
  • • Advanced agentic capabilities and tool use

GPT-OSS-20B (21B parameters)

  • • 3.6B active parameters for efficiency
  • • Optimized for lower latency and local deployment
  • • Runs within 16GB memory on consumer hardware
  • • Fine-tunable on consumer-grade GPUs
  • • Ideal for experimentation and specialized use cases

2. Installation Methods

Transformers
pip install transformers torch
vLLM (Recommended for Production)
pip install vllm
Ollama (Easy Setup)
ollama pull gpt-oss:20b
LM Studio & PyTorch/Triton

Also supports LM Studio GUI and PyTorch/Triton for custom deployments

3. Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "openai/gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate response
inputs = tokenizer("Hello, how can I help you today?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

Key Features

Harmony Response Format

Trained on OpenAI's "harmony response format" for enhanced reasoning capabilities and coherent outputs.

  • • Improved reasoning consistency
  • • Better structured responses
  • • Enhanced agentic behavior

Apache 2.0 License

Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.

  • • No usage restrictions
  • • Commercial deployment allowed
  • • Full customization rights

Native Quantization

Built-in MXFP4 quantization allows efficient deployment with minimal performance loss.

  • • Reduced memory footprint
  • • Faster inference speeds
  • • Maintained quality

Agentic Capabilities

Advanced tool use including web browsing, function calling, and complex task execution.

  • • Function calling support
  • • Web browsing capabilities
  • • Multi-step task execution

Advanced Configuration

Reasoning Level Configuration

Configure the reasoning level based on your specific needs. GPT-OSS provides full access to chain-of-thought reasoning:

Low: Fast responses with minimal reasoning overhead
Medium: Balanced performance with moderate reasoning depth
High: Deep reasoning with full chain-of-thought visibility
# Configure reasoning level in generation config
generation_config = {
    "reasoning_level": "medium",  # "low", "medium", or "high"
    "max_length": 2048,
    "temperature": 0.7,
    "do_sample": True
}

outputs = model.generate(**inputs, **generation_config)

Function Calling

Enable agentic capabilities with function calling:

# Define available functions
functions = [
    {
        "name": "get_weather",
        "description": "Get current weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
]

# Use with function calling enabled
response = model.chat_completion(
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    functions=functions,
    function_call="auto"
)

Need Help?

Join our community for support, examples, and discussions about GPT-OSS.