Documentation
Everything you need to get started with GPT-OSS models. Apache 2.0 licensed open-weight models with configurable reasoning and agentic capabilities.
Installation
Get up and running with GPT-OSS in minutes using your preferred framework.
API Reference
Complete API documentation with examples and best practices.
Tutorials
Step-by-step guides for common use cases and advanced configurations.
Getting Started
1. Choose Your Model
Select the right model for your use case from our Apache 2.0 licensed models:
GPT-OSS-120B (117B parameters)
- • 5.1B active parameters with sparse architecture
- • Best for production and high reasoning use cases
- • Runs on single H100 GPU with native MXFP4 quantization
- • Full chain-of-thought reasoning access
- • Advanced agentic capabilities and tool use
GPT-OSS-20B (21B parameters)
- • 3.6B active parameters for efficiency
- • Optimized for lower latency and local deployment
- • Runs within 16GB memory on consumer hardware
- • Fine-tunable on consumer-grade GPUs
- • Ideal for experimentation and specialized use cases
2. Installation Methods
pip install transformers torch
pip install vllm
ollama pull gpt-oss:20b
Also supports LM Studio GUI and PyTorch/Triton for custom deployments
3. Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "openai/gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate response
inputs = tokenizer("Hello, how can I help you today?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Key Features
Harmony Response Format
Trained on OpenAI's "harmony response format" for enhanced reasoning capabilities and coherent outputs.
- • Improved reasoning consistency
- • Better structured responses
- • Enhanced agentic behavior
Apache 2.0 License
Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- • No usage restrictions
- • Commercial deployment allowed
- • Full customization rights
Native Quantization
Built-in MXFP4 quantization allows efficient deployment with minimal performance loss.
- • Reduced memory footprint
- • Faster inference speeds
- • Maintained quality
Agentic Capabilities
Advanced tool use including web browsing, function calling, and complex task execution.
- • Function calling support
- • Web browsing capabilities
- • Multi-step task execution
Advanced Configuration
Reasoning Level Configuration
Configure the reasoning level based on your specific needs. GPT-OSS provides full access to chain-of-thought reasoning:
# Configure reasoning level in generation config
generation_config = {
"reasoning_level": "medium", # "low", "medium", or "high"
"max_length": 2048,
"temperature": 0.7,
"do_sample": True
}
outputs = model.generate(**inputs, **generation_config)
Function Calling
Enable agentic capabilities with function calling:
# Define available functions
functions = [
{
"name": "get_weather",
"description": "Get current weather information",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
]
# Use with function calling enabled
response = model.chat_completion(
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
functions=functions,
function_call="auto"
)
Additional Resources
Need Help?
Join our community for support, examples, and discussions about GPT-OSS.