# babbage-002 **Current Snapshot:** babbage-002 GPT base models can understand and generate natural language or code but are not trained with instruction following. These models are made to be replacements for our original GPT-3 base models and use the legacy Completions API. Most customers should use GPT-3.5 or GPT-4. ## Snapshots ## Supported Tools ## Rate Limits ### babbage-002 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 10000 | 100000 | | tier_2 | 5000 | 40000 | 200000 | | tier_3 | 5000 | 80000 | 5000000 | | tier_4 | 10000 | 300000 | 30000000 | | tier_5 | 10000 | 1000000 | 150000000 | # ChatGPT-4o **Current Snapshot:** chatgpt-4o-latest ChatGPT-4o points to the GPT-4o snapshot currently used in ChatGPT. We recommend using an API model like [GPT-5](/docs/models/gpt-5) or [GPT-4o](/docs/models/gpt-4o) for most API integrations, but feel free to use this ChatGPT-4o model to test our latest improvements for chat use cases. ## Snapshots ## Supported Tools ## Rate Limits ### chatgpt-4o-latest | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # codex-mini-latest **Current Snapshot:** codex-mini-latest codex-mini-latest is a fine-tuned version of o4-mini specifically for use in Codex CLI. For direct use in the API, we recommend starting with gpt-4.1. ## Snapshots ## Supported Tools ## Rate Limits ### codex-mini-latest | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 1000 | 100000 | 1000000 | | tier_2 | 2000 | 200000 | 2000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # computer-use-preview **Current Snapshot:** computer-use-preview-2025-03-11 The computer-use-preview model is a specialized model for the computer use tool. It is trained to understand and execute computer tasks. See the [computer use guide](/docs/guides/tools-computer-use) for more information. This model is only usable in the [Responses API](/docs/api-reference/responses). ## Snapshots ### computer-use-preview-2025-03-11 - Context window size: 8192 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 1024 - Supported features: function_calling ## Supported Tools ## Rate Limits ### computer-use-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ---- | -------- | ----------------- | | tier_3 | 3000 | 20000000 | 450000000 | | tier_4 | 3000 | 20000000 | 450000000 | | tier_5 | 3000 | 20000000 | 450000000 | # DALL·E 2 **Current Snapshot:** dall-e-2 DALL·E is an AI system that creates realistic images and art from a natural language description. Older than DALL·E 3, DALL·E 2 offers more control in prompting and more requests at once. ## Snapshots ## Supported Tools ## Rate Limits ### dall-e-2 | Tier | RPM | TPM | Batch Queue Limit | | --------- | ------------- | --- | ----------------- | | tier_free | 5 img/min | | | | tier_1 | 500 img/min | | | | tier_2 | 2500 img/min | | | | tier_3 | 5000 img/min | | | | tier_4 | 7500 img/min | | | | tier_5 | 10000 img/min | | | # DALL·E 3 **Current Snapshot:** dall-e-3 DALL·E is an AI system that creates realistic images and art from a natural language description. DALL·E 3 currently supports the ability, given a prompt, to create a new image with a specific size. ## Snapshots ## Supported Tools ## Rate Limits ### dall-e-3 | Tier | RPM | TPM | Batch Queue Limit | | --------- | ------------- | --- | ----------------- | | tier_free | 1 img/min | | | | tier_1 | 500 img/min | | | | tier_2 | 2500 img/min | | | | tier_3 | 5000 img/min | | | | tier_4 | 7500 img/min | | | | tier_5 | 10000 img/min | | | # davinci-002 **Current Snapshot:** davinci-002 GPT base models can understand and generate natural language or code but are not trained with instruction following. These models are made to be replacements for our original GPT-3 base models and use the legacy Completions API. Most customers should use GPT-3.5 or GPT-4. ## Snapshots ## Supported Tools ## Rate Limits ### davinci-002 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 10000 | 100000 | | tier_2 | 5000 | 40000 | 200000 | | tier_3 | 5000 | 80000 | 5000000 | | tier_4 | 10000 | 300000 | 30000000 | | tier_5 | 10000 | 1000000 | 150000000 | # gpt-3.5-turbo-16k-0613 **Current Snapshot:** gpt-3.5-turbo-16k-0613 GPT-3.5 Turbo models can understand and generate natural language or code and have been optimized for chat using the Chat Completions API but work well for non-chat tasks as well. As of July 2024, use gpt-4o-mini in place of GPT-3.5 Turbo, as it is cheaper, more capable, multimodal, and just as fast. GPT-3.5 Turbo is still available for use in the API. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-3.5-turbo-16k-0613 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 3500 | 200000 | 2000000 | | tier_2 | 3500 | 2000000 | 5000000 | | tier_3 | 3500 | 800000 | 50000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 10000 | 50000000 | 10000000000 | # gpt-3.5-turbo-instruct **Current Snapshot:** gpt-3.5-turbo-instruct Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-3.5-turbo-instruct | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 3500 | 200000 | 2000000 | | tier_2 | 3500 | 2000000 | 5000000 | | tier_3 | 3500 | 800000 | 50000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 10000 | 50000000 | 10000000000 | # GPT-3.5 Turbo **Current Snapshot:** gpt-3.5-turbo-0125 GPT-3.5 Turbo models can understand and generate natural language or code and have been optimized for chat using the Chat Completions API but work well for non-chat tasks as well. As of July 2024, use gpt-4o-mini in place of GPT-3.5 Turbo, as it is cheaper, more capable, multimodal, and just as fast. GPT-3.5 Turbo is still available for use in the API. ## Snapshots ### gpt-3.5-turbo-0125 - Context window size: 16385 - Knowledge cutoff date: 2021-09-01 - Maximum output tokens: 4096 - Supported features: fine_tuning ### gpt-3.5-turbo-0613 - Context window size: 16385 - Knowledge cutoff date: 2021-09-01 - Maximum output tokens: 4096 - Supported features: fine_tuning ### gpt-3.5-turbo-1106 - Context window size: 16385 - Knowledge cutoff date: 2021-09-01 - Maximum output tokens: 4096 - Supported features: fine_tuning ### gpt-3.5-turbo-16k-0613 - Context window size: 16385 - Knowledge cutoff date: 2021-09-01 - Maximum output tokens: 4096 - Supported features: fine_tuning ### gpt-3.5-turbo-instruct - Context window size: 4096 - Knowledge cutoff date: 2021-09-01 - Maximum output tokens: 4096 - Supported features: fine_tuning ## Supported Tools ## Rate Limits ### gpt-3.5-turbo | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 3500 | 200000 | 2000000 | | tier_2 | 3500 | 2000000 | 5000000 | | tier_3 | 3500 | 800000 | 50000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 10000 | 50000000 | 10000000000 | # GPT-4.5 Preview (Deprecated) **Current Snapshot:** gpt-4.5-preview-2025-02-27 Deprecated - a research preview of GPT-4.5. We recommend using gpt-4.1 or o3 models instead for most use cases. ## Snapshots ### gpt-4.5-preview-2025-02-27 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: function_calling, structured_outputs, streaming, system_messages, evals, prompt_caching, image_input ## Supported Tools ## Rate Limits ### gpt-4.5-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 1000 | 125000 | 50000 | | tier_2 | 5000 | 250000 | 500000 | | tier_3 | 5000 | 500000 | 50000000 | | tier_4 | 10000 | 1000000 | 100000000 | | tier_5 | 10000 | 2000000 | 5000000000 | # GPT-4 Turbo Preview **Current Snapshot:** gpt-4-0125-preview This is a research preview of the GPT-4 Turbo model, an older high-intelligence GPT model. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-4-turbo-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 600000 | 40000000 | | tier_4 | 10000 | 800000 | 80000000 | | tier_5 | 10000 | 2000000 | 300000000 | # GPT-4 Turbo **Current Snapshot:** gpt-4-turbo-2024-04-09 GPT-4 Turbo is the next generation of GPT-4, an older high-intelligence GPT model. It was designed to be a cheaper, better version of GPT-4. Today, we recommend using a newer model like GPT-4o. ## Snapshots ### gpt-4-turbo-2024-04-09 - Context window size: 128000 - Knowledge cutoff date: 2023-12-01 - Maximum output tokens: 4096 - Supported features: streaming, function_calling, image_input ## Supported Tools ## Rate Limits ### gpt-4-turbo | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 600000 | 40000000 | | tier_4 | 10000 | 800000 | 80000000 | | tier_5 | 10000 | 2000000 | 300000000 | # GPT-4.1 mini **Current Snapshot:** gpt-4.1-mini-2025-04-14 GPT-4.1 mini excels at instruction following and tool calling. It features a 1M token context window, and low latency without a reasoning step. Note that we recommend starting with [GPT-5 mini](/docs/models/gpt-5-mini) for more complex tasks. ## Snapshots ### gpt-4.1-mini-2025-04-14 - Context window size: 1047576 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 32768 - Supported features: predicted_outputs, streaming, function_calling, fine_tuning, file_search, file_uploads, web_search, structured_outputs, image_input ## Supported Tools - function_calling - web_search - file_search - code_interpreter - mcp ## Rate Limits ### Standard | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | free | 3 | 40000 | | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | ### Long Context (> 128k input tokens) | Tier | RPM | TPM | Batch Queue Limit | | ------ | ---- | -------- | ----------------- | | tier_1 | 200 | 400000 | 5000000 | | tier_2 | 500 | 1000000 | 40000000 | | tier_3 | 1000 | 2000000 | 80000000 | | tier_4 | 2000 | 10000000 | 200000000 | | tier_5 | 8000 | 20000000 | 2000000000 | # GPT-4.1 nano **Current Snapshot:** gpt-4.1-nano-2025-04-14 GPT-4.1 nano excels at instruction following and tool calling. It features a 1M token context window, and low latency without a reasoning step. Note that we recommend starting with [GPT-5 nano](/docs/models/gpt-5-nano) for more complex tasks. ## Snapshots ### gpt-4.1-nano-2025-04-14 - Context window size: 1047576 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 32768 - Supported features: predicted_outputs, streaming, function_calling, file_search, file_uploads, structured_outputs, image_input, prompt_caching, fine_tuning ## Supported Tools - function_calling - file_search - image_generation - code_interpreter - mcp ## Rate Limits ### Standard | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | free | 3 | 40000 | | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | ### Long Context (> 128k input tokens) | Tier | RPM | TPM | Batch Queue Limit | | ------ | ---- | -------- | ----------------- | | tier_1 | 200 | 400000 | 5000000 | | tier_2 | 500 | 1000000 | 40000000 | | tier_3 | 1000 | 2000000 | 80000000 | | tier_4 | 2000 | 10000000 | 200000000 | | tier_5 | 8000 | 20000000 | 2000000000 | # GPT-4.1 **Current Snapshot:** gpt-4.1-2025-04-14 GPT-4.1 excels at instruction following and tool calling, with broad knowledge across domains. It features a 1M token context window, and low latency without a reasoning step. Note that we recommend starting with [GPT-5](/docs/models/gpt-5) for complex tasks. ## Snapshots ### gpt-4.1-2025-04-14 - Context window size: 1047576 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 32768 - Supported features: streaming, structured_outputs, predicted_outputs, distillation, function_calling, file_search, file_uploads, image_input, web_search, fine_tuning, prompt_caching ### gpt-4.1-mini-2025-04-14 - Context window size: 1047576 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 32768 - Supported features: predicted_outputs, streaming, function_calling, fine_tuning, file_search, file_uploads, web_search, structured_outputs, image_input ### gpt-4.1-nano-2025-04-14 - Context window size: 1047576 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 32768 - Supported features: predicted_outputs, streaming, function_calling, file_search, file_uploads, structured_outputs, image_input, prompt_caching, fine_tuning ## Supported Tools - function_calling - web_search - file_search - image_generation - code_interpreter - mcp ## Rate Limits ### default | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | ### Long Context (> 128k input tokens) | Tier | RPM | TPM | Batch Queue Limit | | ------ | ---- | -------- | ----------------- | | tier_1 | 100 | 200000 | 2000000 | | tier_2 | 250 | 500000 | 20000000 | | tier_3 | 500 | 1000000 | 40000000 | | tier_4 | 1000 | 5000000 | 100000000 | | tier_5 | 4000 | 10000000 | 1000000000 | # GPT-4 **Current Snapshot:** gpt-4-0613 GPT-4 is an older version of a high-intelligence GPT model, usable in Chat Completions. ## Snapshots ### gpt-4-0125-preview - Context window size: 128000 - Knowledge cutoff date: 2023-12-01 - Maximum output tokens: 4096 - Supported features: fine_tuning ### gpt-4-0314 - Context window size: 8192 - Knowledge cutoff date: 2023-12-01 - Maximum output tokens: 8192 - Supported features: fine_tuning, streaming ### gpt-4-0613 - Context window size: 8192 - Knowledge cutoff date: 2023-12-01 - Maximum output tokens: 8192 - Supported features: fine_tuning, streaming ### gpt-4-1106-vision-preview - Context window size: 128000 - Knowledge cutoff date: 2023-12-01 - Maximum output tokens: 4096 - Supported features: fine_tuning, streaming ### gpt-4-turbo-2024-04-09 - Context window size: 128000 - Knowledge cutoff date: 2023-12-01 - Maximum output tokens: 4096 - Supported features: streaming, function_calling, image_input ## Supported Tools ## Rate Limits ### gpt-4 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 10000 | 100000 | | tier_2 | 5000 | 40000 | 200000 | | tier_3 | 5000 | 80000 | 5000000 | | tier_4 | 10000 | 300000 | 30000000 | | tier_5 | 10000 | 1000000 | 150000000 | # GPT-4o Audio **Current Snapshot:** gpt-4o-audio-preview-2025-06-03 This is a preview release of the GPT-4o Audio models. These models accept audio inputs and outputs, and can be used in the Chat Completions REST API. ## Snapshots ### gpt-4o-audio-preview-2024-10-01 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-audio-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-audio-preview-2025-06-03 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ## Supported Tools ## Rate Limits ### gpt-4o-audio-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 2000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # GPT-4o mini Audio **Current Snapshot:** gpt-4o-mini-audio-preview-2024-12-17 This is a preview release of the smaller GPT-4o Audio mini model. It's designed to input audio or create audio outputs via the REST API. ## Snapshots ### gpt-4o-mini-audio-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ## Supported Tools - web_search - file_search - code_interpreter - mcp ## Rate Limits ### gpt-4o-mini-audio-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | free | 3 | 40000 | | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # GPT-4o mini Realtime **Current Snapshot:** gpt-4o-mini-realtime-preview-2024-12-17 This is a preview release of the GPT-4o-mini Realtime model, capable of responding to audio and text inputs in realtime over WebRTC or a WebSocket interface. ## Snapshots ### gpt-4o-mini-realtime-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ## Supported Tools ## Rate Limits ### gpt-4o-mini-realtime-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 200 | 40000 | | | tier_2 | 400 | 200000 | | | tier_3 | 5000 | 800000 | | | tier_4 | 10000 | 4000000 | | | tier_5 | 20000 | 15000000 | | # GPT-4o mini Search Preview **Current Snapshot:** gpt-4o-mini-search-preview-2025-03-11 GPT-4o mini Search Preview is a specialized model trained to understand and execute [web search](/docs/guides/tools-web-search?api-mode=chat) queries with the Chat Completions API. In addition to token fees, web search queries have a fee per tool call. Learn more in the [pricing](/docs/pricing) page. ## Snapshots ### gpt-4o-mini-search-preview-2025-03-11 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, image_input ## Supported Tools ## Rate Limits ### gpt-4o-mini-search-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | free | 3 | 40000 | | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # GPT-4o mini Transcribe **Current Snapshot:** gpt-4o-mini-transcribe GPT-4o mini Transcribe is a speech-to-text model that uses GPT-4o mini to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-4o-mini-transcribe | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 50000 | | | tier_2 | 2000 | 150000 | | | tier_3 | 5000 | 600000 | | | tier_4 | 10000 | 2000000 | | | tier_5 | 10000 | 8000000 | | # GPT-4o mini TTS **Current Snapshot:** gpt-4o-mini-tts GPT-4o mini TTS is a text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text to natural sounding spoken text. The maximum number of input tokens is 2000. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-4o-mini-tts | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 50000 | | | tier_2 | 2000 | 150000 | | | tier_3 | 5000 | 600000 | | | tier_4 | 10000 | 2000000 | | | tier_5 | 10000 | 8000000 | | # GPT-4o mini **Current Snapshot:** gpt-4o-mini-2024-07-18 GPT-4o mini (“o” for “omni”) is a fast, affordable small model for focused tasks. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is ideal for fine-tuning, and model outputs from a larger model like GPT-4o can be distilled to GPT-4o-mini to produce similar results at lower cost and latency. ## Snapshots ### gpt-4o-mini-2024-07-18 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: predicted_outputs, streaming, function_calling, fine_tuning, file_search, file_uploads, web_search, structured_outputs, image_input ### gpt-4o-mini-audio-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-mini-realtime-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-mini-search-preview-2025-03-11 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, image_input ### gpt-4o-mini-transcribe - Context window size: 16000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 2000 ### gpt-4o-mini-tts ## Supported Tools - function_calling - web_search - file_search - image_generation - code_interpreter - mcp ## Rate Limits ### gpt-4o-mini | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | free | 3 | 40000 | | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # GPT-4o Realtime **Current Snapshot:** gpt-4o-realtime-preview-2025-06-03 This is a preview release of the GPT-4o Realtime model, capable of responding to audio and text inputs in realtime over WebRTC or a WebSocket interface. ## Snapshots ### gpt-4o-realtime-preview-2024-10-01 - Context window size: 16000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-realtime-preview-2024-12-17 - Context window size: 16000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-realtime-preview-2025-06-03 - Context window size: 32000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ## Supported Tools ## Rate Limits ### gpt-4o-realtime-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 200 | 40000 | | | tier_2 | 400 | 200000 | | | tier_3 | 5000 | 800000 | | | tier_4 | 10000 | 4000000 | | | tier_5 | 20000 | 15000000 | | # GPT-4o Search Preview **Current Snapshot:** gpt-4o-search-preview-2025-03-11 GPT-4o Search Preview is a specialized model trained to understand and execute [web search](/docs/guides/tools-web-search?api-mode=chat) queries with the Chat Completions API. In addition to token fees, web search queries have a fee per tool call. Learn more in the [pricing](/docs/pricing) page. ## Snapshots ### gpt-4o-search-preview-2025-03-11 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, image_input ## Supported Tools ## Rate Limits ### gpt-4o-search-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ---- | ------- | ----------------- | | tier_1 | 100 | 30000 | | | tier_2 | 500 | 45000 | | | tier_3 | 500 | 80000 | | | tier_4 | 1000 | 200000 | | | tier_5 | 1000 | 3000000 | | # GPT-4o Transcribe **Current Snapshot:** gpt-4o-transcribe GPT-4o Transcribe is a speech-to-text model that uses GPT-4o to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-4o-transcribe | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | ------- | ----------------- | | tier_1 | 500 | 10000 | | | tier_2 | 2000 | 100000 | | | tier_3 | 5000 | 400000 | | | tier_4 | 10000 | 2000000 | | | tier_5 | 10000 | 6000000 | | # GPT-4o **Current Snapshot:** gpt-4o-2024-08-06 GPT-4o (“o” for “omni”) is our versatile, high-intelligence flagship model. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is the best model for most tasks, and is our most capable model outside of our o-series models. ## Snapshots ### gpt-4o-2024-05-13 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: streaming, function_calling, fine_tuning, file_search, file_uploads, image_input, web_search, predicted_outputs ### gpt-4o-2024-08-06 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, predicted_outputs, distillation, file_search, file_uploads, fine_tuning, function_calling, image_input, web_search ### gpt-4o-2024-11-20 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, predicted_outputs, distillation, function_calling, file_search, file_uploads, image_input, web_search ### gpt-4o-audio-preview-2024-10-01 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-audio-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-audio-preview-2025-06-03 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-mini-2024-07-18 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: predicted_outputs, streaming, function_calling, fine_tuning, file_search, file_uploads, web_search, structured_outputs, image_input ### gpt-4o-mini-audio-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, function_calling ### gpt-4o-mini-realtime-preview-2024-12-17 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-mini-search-preview-2025-03-11 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, image_input ### gpt-4o-mini-transcribe - Context window size: 16000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 2000 ### gpt-4o-mini-tts ### gpt-4o-realtime-preview-2024-10-01 - Context window size: 16000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-realtime-preview-2024-12-17 - Context window size: 16000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-realtime-preview-2025-06-03 - Context window size: 32000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 4096 - Supported features: function_calling, prompt_caching ### gpt-4o-search-preview-2025-03-11 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 16384 - Supported features: streaming, structured_outputs, image_input ### gpt-4o-transcribe - Context window size: 16000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 2000 ## Supported Tools - function_calling - web_search - file_search - image_generation - code_interpreter - mcp ## Rate Limits ### gpt-4o | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # GPT-5 Chat **Current Snapshot:** gpt-5-chat-latest GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT. We recommend [GPT-5](/docs/models/gpt-5) for most API usage, but feel free to use this GPT-5 Chat model to test our latest improvements for chat use cases. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-5-chat-latest | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 50000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 100000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 15000 | 40000000 | 15000000000 | # GPT-5 mini **Current Snapshot:** gpt-5-mini-2025-08-07 GPT-5 mini is a faster, more cost-efficient version of GPT-5. It's great for well-defined tasks and precise prompts. Learn more in our [GPT-5 usage guide](/docs/guides/gpt-5). ## Snapshots ### gpt-5-mini-2025-08-07 - Context window size: 400000 - Knowledge cutoff date: 2024-05-31 - Maximum output tokens: 128000 - Supported features: streaming, function_calling, file_search, file_uploads, web_search, structured_outputs, image_input ## Supported Tools - function_calling - web_search - file_search - code_interpreter - mcp ## Rate Limits ### gpt-5-mini | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 180000000 | 15000000000 | # GPT-5 nano **Current Snapshot:** gpt-5-nano-2025-08-07 GPT-5 Nano is our fastest, cheapest version of GPT-5. It's great for summarization and classification tasks. Learn more in our [GPT-5 usage guide](/docs/guides/gpt-5). ## Snapshots ### gpt-5-nano-2025-08-07 - Context window size: 400000 - Knowledge cutoff date: 2024-05-31 - Maximum output tokens: 128000 - Supported features: streaming, function_calling, file_search, file_uploads, structured_outputs, image_input, prompt_caching, fine_tuning ## Supported Tools - function_calling - file_search - image_generation - code_interpreter - mcp ## Rate Limits ### gpt-5-nano | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 500 | 200000 | 2000000 | | tier_2 | 5000 | 2000000 | 20000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 180000000 | 15000000000 | # GPT-5 **Current Snapshot:** gpt-5-2025-08-07 GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. Learn more in our [GPT-5 usage guide](/docs/guides/gpt-5). ## Snapshots ### gpt-5-2025-08-07 - Context window size: 400000 - Knowledge cutoff date: 2024-09-30 - Maximum output tokens: 128000 - Supported features: streaming, structured_outputs, distillation, function_calling, file_search, file_uploads, image_input, web_search, prompt_caching ### gpt-5-chat-latest - Context window size: 128000 - Knowledge cutoff date: 2024-09-30 - Maximum output tokens: 16384 - Supported features: streaming, image_input ### gpt-5-mini-2025-08-07 - Context window size: 400000 - Knowledge cutoff date: 2024-05-31 - Maximum output tokens: 128000 - Supported features: streaming, function_calling, file_search, file_uploads, web_search, structured_outputs, image_input ### gpt-5-nano-2025-08-07 - Context window size: 400000 - Knowledge cutoff date: 2024-05-31 - Maximum output tokens: 128000 - Supported features: streaming, function_calling, file_search, file_uploads, structured_outputs, image_input, prompt_caching, fine_tuning ## Supported Tools - function_calling - web_search - file_search - image_generation - code_interpreter - mcp ## Rate Limits ### gpt-5 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 100000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 15000 | 40000000 | 15000000000 | # GPT Image 1 **Current Snapshot:** gpt-image-1 GPT Image 1 is our new state-of-the-art image generation model. It is a natively multimodal language model that accepts both text and image inputs, and produces image outputs. ## Snapshots ## Supported Tools ## Rate Limits ### gpt-image-1 | Tier | RPM | TPM | Batch Queue Limit | | ------ | --- | ------- | ----------------- | | tier_1 | | 100000 | | | tier_2 | | 250000 | | | tier_3 | | 800000 | | | tier_4 | | 3000000 | | | tier_5 | | 8000000 | | # gpt-oss-120b **Current Snapshot:** gpt-oss-120b `gpt-oss-120b`is our most powerful open-weight model, which fits into a single H100 GPU (117B parameters with 5.1B active parameters). [Download gpt-oss-120b on HuggingFace](https://huggingface.co/openai/gpt-oss-120b). **Key features** - **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. - **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. - **Full chain-of-thought:** Gain complete access to the model's reasoning process, facilitating easier debugging and increased trust in outputs. - **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. - **Agentic capabilities:** Use the models' native capabilities for function calling, web browsing, Python code execution, and structured outputs. ## Snapshots ## Supported Tools - function_calling - code_interpreter - mcp - web_search ## Rate Limits ### gpt-oss-120b | Tier | RPM | TPM | Batch Queue Limit | | ------ | --- | --- | ----------------- | | tier_1 | | | | | tier_2 | | | | | tier_3 | | | | | tier_4 | | | | | tier_5 | | | | # gpt-oss-20b **Current Snapshot:** gpt-oss-20b `gpt-oss-20b` is our medium-sized open-weight model for low latency, local, or specialized use-cases (21B parameters with 3.6B active parameters). [Download gpt-oss-20b on HuggingFace](https://huggingface.co/openai/gpt-oss-20b). **Key features** - **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. - **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. - **Full chain-of-thought:** Gain complete access to the model's reasoning process, facilitating easier debugging and increased trust in outputs. - **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. - **Agentic capabilities:** Use the models' native capabilities for function calling, web browsing, Python code execution, and structured outputs. ## Snapshots ## Supported Tools - function_calling - code_interpreter - mcp - web_search ## Rate Limits ### gpt-oss-20b | Tier | RPM | TPM | Batch Queue Limit | | ------ | --- | --- | ----------------- | | tier_1 | | | | | tier_2 | | | | | tier_3 | | | | | tier_4 | | | | | tier_5 | | | | # o1-mini **Current Snapshot:** o1-mini-2024-09-12 The o1 reasoning model is designed to solve hard problems across domains. o1-mini is a faster and more affordable reasoning model, but we recommend using the newer o3-mini model that features higher intelligence at the same latency and price as o1-mini. ## Snapshots ### o1-mini-2024-09-12 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 65536 - Supported features: streaming, file_search, file_uploads ## Supported Tools - file_search - code_interpreter - mcp ## Rate Limits ### o1-mini | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 500 | 200000 | | | tier_2 | 5000 | 2000000 | | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # o1 Preview **Current Snapshot:** o1-preview-2024-09-12 Research preview of the o1 series of models, trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. ## Snapshots ### o1-preview-2024-09-12 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 32768 - Supported features: streaming, structured_outputs, file_search, function_calling, file_uploads ## Supported Tools ## Rate Limits ### o1-preview | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | | | tier_2 | 5000 | 450000 | | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # o1-pro **Current Snapshot:** o1-pro-2025-03-19 The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. o1-pro is available in the [Responses API only](/docs/api-reference/responses) to enable support for multi-turn model interactions before responding to API requests, and other advanced API features in the future. ## Snapshots ### o1-pro-2025-03-19 - Context window size: 200000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 100000 - Supported features: structured_outputs, function_calling, image_input ## Supported Tools - function_calling - file_search - mcp ## Rate Limits ### o1-pro | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # o1 **Current Snapshot:** o1-2024-12-17 The o1 series of models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. ## Snapshots ### o1-2024-12-17 - Context window size: 200000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 100000 - Supported features: streaming, structured_outputs, file_search, function_calling, file_uploads, image_input ### o1-mini-2024-09-12 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 65536 - Supported features: streaming, file_search, file_uploads ### o1-preview-2024-09-12 - Context window size: 128000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 32768 - Supported features: streaming, structured_outputs, file_search, function_calling, file_uploads ### o1-pro-2025-03-19 - Context window size: 200000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 100000 - Supported features: structured_outputs, function_calling, image_input ## Supported Tools - function_calling - file_search - mcp ## Rate Limits ### o1 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # o3-deep-research **Current Snapshot:** o3-deep-research-2025-06-26 o3-deep-research is our most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data—brought in through MCP connectors. Learn more about getting started with this model in our [deep research](/docs/guides/deep-research) guide. ## Snapshots ### o3-deep-research-2025-06-26 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: streaming, file_uploads, image_input, prompt_caching, evals, stored_completions ## Supported Tools - web_search - code_interpreter - mcp ## Rate Limits ### o3-deep-research | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 200000 | 200000 | | tier_2 | 5000 | 450000 | 300000 | | tier_3 | 5000 | 800000 | 500000 | | tier_4 | 10000 | 2000000 | 2000000 | | tier_5 | 10000 | 30000000 | 10000000 | # o3-mini **Current Snapshot:** o3-mini-2025-01-31 o3-mini is our newest small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini supports key developer features, like Structured Outputs, function calling, and Batch API. ## Snapshots ### o3-mini-2025-01-31 - Context window size: 200000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 100000 - Supported features: streaming, structured_outputs, function_calling, file_search, file_uploads ## Supported Tools - function_calling - file_search - code_interpreter - mcp - image_generation ## Rate Limits ### o3-mini | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 1000 | 100000 | 1000000 | | tier_2 | 2000 | 200000 | 2000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # o3-pro **Current Snapshot:** o3-pro-2025-06-10 The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers. o3-pro is available in the [Responses API only](/docs/api-reference/responses) to enable support for multi-turn model interactions before responding to API requests, and other advanced API features in the future. Since o3-pro is designed to tackle tough problems, some requests may take several minutes to finish. To avoid timeouts, try using [background mode](/docs/guides/background). ## Snapshots ### o3-pro-2025-06-10 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: structured_outputs, function_calling, image_input ## Supported Tools - function_calling - file_search - image_generation - mcp - web_search ## Rate Limits ### o3-pro | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # o3 **Current Snapshot:** o3-2025-04-16 o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. o3 is succeeded by [GPT-5](/docs/models/gpt-5). Learn more about how to use our reasoning models in our [reasoning](/docs/guides/reasoning?api-mode=responses) guide. ## Snapshots ### o3-2025-04-16 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: streaming, structured_outputs, file_search, function_calling, file_uploads, image_input, prompt_caching, evals, stored_completions ### o3-deep-research-2025-06-26 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: streaming, file_uploads, image_input, prompt_caching, evals, stored_completions ### o3-mini-2025-01-31 - Context window size: 200000 - Knowledge cutoff date: 2023-10-01 - Maximum output tokens: 100000 - Supported features: streaming, structured_outputs, function_calling, file_search, file_uploads ### o3-pro-2025-06-10 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: structured_outputs, function_calling, image_input ## Supported Tools - function_calling - file_search - image_generation - code_interpreter - mcp - web_search ## Rate Limits ### o3 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | tier_1 | 500 | 30000 | 90000 | | tier_2 | 5000 | 450000 | 1350000 | | tier_3 | 5000 | 800000 | 50000000 | | tier_4 | 10000 | 2000000 | 200000000 | | tier_5 | 10000 | 30000000 | 5000000000 | # o4-mini-deep-research **Current Snapshot:** o4-mini-deep-research-2025-06-26 o4-mini-deep-research is our faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data, brought in through MCP connectors. Learn more about how to use this model in our [deep research](/docs/guides/deep-research) guide. ## Snapshots ### o4-mini-deep-research-2025-06-26 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: streaming, file_uploads, image_input, prompt_caching, evals, stored_completions ## Supported Tools - web_search - code_interpreter - mcp ## Rate Limits ### o4-mini-deep-research | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 1000 | 200000 | 200000 | | tier_2 | 2000 | 2000000 | 300000 | | tier_3 | 5000 | 4000000 | 500000 | | tier_4 | 10000 | 10000000 | 2000000 | | tier_5 | 30000 | 150000000 | 10000000 | # o4-mini **Current Snapshot:** o4-mini-2025-04-16 o4-mini is our latest small o-series model. It's optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. It's succeeded by [GPT-5 mini](/docs/models/gpt-5-mini). Learn more about how to use our reasoning models in our [reasoning](/docs/guides/reasoning?api-mode=responses) guide. ## Snapshots ### o4-mini-2025-04-16 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: streaming, structured_outputs, function_calling, file_search, file_uploads, image_input, prompt_caching, evals, stored_completions, fine_tuning ### o4-mini-deep-research-2025-06-26 - Context window size: 200000 - Knowledge cutoff date: 2024-06-01 - Maximum output tokens: 100000 - Supported features: streaming, file_uploads, image_input, prompt_caching, evals, stored_completions ## Supported Tools - function_calling - file_search - code_interpreter - mcp - web_search ## Rate Limits ### o4-mini | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --------- | ----------------- | | tier_1 | 1000 | 100000 | 1000000 | | tier_2 | 2000 | 2000000 | 2000000 | | tier_3 | 5000 | 4000000 | 40000000 | | tier_4 | 10000 | 10000000 | 1000000000 | | tier_5 | 30000 | 150000000 | 15000000000 | # omni-moderation **Current Snapshot:** omni-moderation-2024-09-26 Moderation models are free models designed to detect harmful content. This model is our most capable moderation model, accepting images as input as well. ## Snapshots ## Supported Tools ## Rate Limits ### omni-moderation-latest | Tier | RPM | TPM | Batch Queue Limit | | ------ | ---- | ------ | ----------------- | | free | 250 | 10000 | | | tier_1 | 500 | 10000 | | | tier_2 | 500 | 20000 | | | tier_3 | 1000 | 50000 | | | tier_4 | 2000 | 250000 | | | tier_5 | 5000 | 500000 | | # text-embedding-3-large **Current Snapshot:** text-embedding-3-large text-embedding-3-large is our most capable embedding model for both english and non-english tasks. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. ## Snapshots ## Supported Tools ## Rate Limits ### text-embedding-3-large | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | free | 100 | 40000 | | | tier_1 | 3000 | 1000000 | 3000000 | | tier_2 | 5000 | 1000000 | 20000000 | | tier_3 | 5000 | 5000000 | 100000000 | | tier_4 | 10000 | 5000000 | 500000000 | | tier_5 | 10000 | 10000000 | 4000000000 | # text-embedding-3-small **Current Snapshot:** text-embedding-3-small text-embedding-3-small is our improved, more performant version of our ada embedding model. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. ## Snapshots ## Supported Tools ## Rate Limits ### text-embedding-3-small | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | free | 100 | 40000 | | | tier_1 | 3000 | 1000000 | 3000000 | | tier_2 | 5000 | 1000000 | 20000000 | | tier_3 | 5000 | 5000000 | 100000000 | | tier_4 | 10000 | 5000000 | 500000000 | | tier_5 | 10000 | 10000000 | 4000000000 | # text-embedding-ada-002 **Current Snapshot:** text-embedding-ada-002 text-embedding-ada-002 is our improved, more performant version of our ada embedding model. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. ## Snapshots ## Supported Tools ## Rate Limits ### text-embedding-ada-002 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | -------- | ----------------- | | free | 100 | 40000 | | | tier_1 | 3000 | 1000000 | 3000000 | | tier_2 | 5000 | 1000000 | 20000000 | | tier_3 | 5000 | 5000000 | 100000000 | | tier_4 | 10000 | 5000000 | 500000000 | | tier_5 | 10000 | 10000000 | 4000000000 | # text-moderation **Current Snapshot:** text-moderation-007 Moderation models are free models designed to detect harmful content. This is our text only moderation model; we expect omni-moderation-\* models to be the best default moving forward. ## Snapshots ## Supported Tools ## Rate Limits # text-moderation-stable **Current Snapshot:** text-moderation-007 Moderation models are free models designed to detect harmful content. This is our text only moderation model; we expect omni-moderation-\* models to be the best default moving forward. ## Snapshots ## Supported Tools ## Rate Limits # TTS-1 HD **Current Snapshot:** tts-1-hd TTS is a model that converts text to natural sounding spoken text. The tts-1-hd model is optimized for high quality text-to-speech use cases. Use it with the Speech endpoint in the Audio API. ## Snapshots ## Supported Tools ## Rate Limits ### tts-1-hd | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --- | ----------------- | | tier_1 | 500 | | | | tier_2 | 2500 | | | | tier_3 | 5000 | | | | tier_4 | 7500 | | | | tier_5 | 10000 | | | # TTS-1 **Current Snapshot:** tts-1 TTS is a model that converts text to natural sounding spoken text. The tts-1 model is optimized for realtime text-to-speech use cases. Use it with the Speech endpoint in the Audio API. ## Snapshots ### tts-1-hd ## Supported Tools ## Rate Limits ### tts-1 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --- | ----------------- | | free | 3 | | | | tier_1 | 500 | | | | tier_2 | 2500 | | | | tier_3 | 5000 | | | | tier_4 | 7500 | | | | tier_5 | 10000 | | | # Whisper **Current Snapshot:** whisper-1 Whisper is a general-purpose speech recognition model, trained on a large dataset of diverse audio. You can also use it as a multitask model to perform multilingual speech recognition as well as speech translation and language identification. ## Snapshots ## Supported Tools ## Rate Limits ### whisper-1 | Tier | RPM | TPM | Batch Queue Limit | | ------ | ----- | --- | ----------------- | | free | 3 | | | | tier_1 | 500 | | | | tier_2 | 2500 | | | | tier_3 | 5000 | | | | tier_4 | 7500 | | | | tier_5 | 10000 | | | # Latest models **New:** Save on synchronous requests with [flex processing](/docs/guides/flex-processing). ## Text tokens | Name | Input | Cached input | Output | Unit | | ---------------------------------------- | ----- | ------------ | ------ | --------- | | gpt-4.1 | 2 | 0.5 | 8 | 1M tokens | | gpt-4.1 (batch) | 1 | | 4 | 1M tokens | | gpt-4.1-2025-04-14 | 2 | 0.5 | 8 | 1M tokens | | gpt-4.1-2025-04-14 (batch) | 1 | | 4 | 1M tokens | | gpt-4.1-mini | 0.4 | 0.1 | 1.6 | 1M tokens | | gpt-4.1-mini (batch) | 0.2 | | 0.8 | 1M tokens | | gpt-4.1-mini-2025-04-14 | 0.4 | 0.1 | 1.6 | 1M tokens | | gpt-4.1-mini-2025-04-14 (batch) | 0.2 | | 0.8 | 1M tokens | | gpt-4.1-nano | 0.1 | 0.025 | 0.4 | 1M tokens | | gpt-4.1-nano (batch) | 0.05 | | 0.2 | 1M tokens | | gpt-4.1-nano-2025-04-14 | 0.1 | 0.025 | 0.4 | 1M tokens | | gpt-4.1-nano-2025-04-14 (batch) | 0.05 | | 0.2 | 1M tokens | | gpt-4.5-preview | 75 | 37.5 | 150 | 1M tokens | | gpt-4.5-preview (batch) | 37.5 | | 75 | 1M tokens | | gpt-4.5-preview-2025-02-27 | 75 | 37.5 | 150 | 1M tokens | | gpt-4.5-preview-2025-02-27 (batch) | 37.5 | | 75 | 1M tokens | | gpt-4o | 2.5 | 1.25 | 10 | 1M tokens | | gpt-4o (batch) | 1.25 | | 5 | 1M tokens | | gpt-4o-2024-11-20 | 2.5 | 1.25 | 10 | 1M tokens | | gpt-4o-2024-11-20 (batch) | 1.25 | | 5 | 1M tokens | | gpt-4o-2024-08-06 | 2.5 | 1.25 | 10 | 1M tokens | | gpt-4o-2024-08-06 (batch) | 1.25 | | 5 | 1M tokens | | gpt-4o-2024-05-13 | 5 | | 15 | 1M tokens | | gpt-4o-2024-05-13 (batch) | 2.5 | | 7.5 | 1M tokens | | gpt-4o-audio-preview | 2.5 | | 10 | 1M tokens | | gpt-4o-audio-preview-2025-06-03 | 2.5 | | 10 | 1M tokens | | gpt-4o-audio-preview-2024-12-17 | 2.5 | | 10 | 1M tokens | | gpt-4o-audio-preview-2024-10-01 | 2.5 | | 10 | 1M tokens | | gpt-4o-realtime-preview | 5 | 2.5 | 20 | 1M tokens | | gpt-4o-realtime-preview-2025-06-03 | 5 | 2.5 | 20 | 1M tokens | | gpt-4o-realtime-preview-2024-12-17 | 5 | 2.5 | 20 | 1M tokens | | gpt-4o-realtime-preview-2024-10-01 | 5 | 2.5 | 20 | 1M tokens | | gpt-4o-mini | 0.15 | 0.075 | 0.6 | 1M tokens | | gpt-4o-mini (batch) | 0.075 | | 0.3 | 1M tokens | | gpt-4o-mini-2024-07-18 | 0.15 | 0.075 | 0.6 | 1M tokens | | gpt-4o-mini-2024-07-18 (batch) | 0.075 | | 0.3 | 1M tokens | | gpt-4o-mini-audio-preview | 0.15 | | 0.6 | 1M tokens | | gpt-4o-mini-audio-preview-2024-12-17 | 0.15 | | 0.6 | 1M tokens | | gpt-4o-mini-realtime-preview | 0.6 | 0.3 | 2.4 | 1M tokens | | gpt-4o-mini-realtime-preview-2024-12-17 | 0.6 | 0.3 | 2.4 | 1M tokens | | o1 | 15 | 7.5 | 60 | 1M tokens | | o1 (batch) | 7.5 | | 30 | 1M tokens | | o1-2024-12-17 | 15 | 7.5 | 60 | 1M tokens | | o1-2024-12-17 (batch) | 7.5 | | 30 | 1M tokens | | o1-preview-2024-09-12 | 15 | 7.5 | 60 | 1M tokens | | o1-preview-2024-09-12 (batch) | 7.5 | | 30 | 1M tokens | | o1-pro | 150 | | 600 | 1M tokens | | o1-pro (batch) | 75 | | 300 | 1M tokens | | o1-pro-2025-03-19 | 150 | | 600 | 1M tokens | | o1-pro-2025-03-19 (batch) | 75 | | 300 | 1M tokens | | o3-pro | 20 | | 80 | 1M tokens | | o3-pro (batch) | 10 | | 40 | 1M tokens | | o3-pro-2025-06-10 | 20 | | 80 | 1M tokens | | o3-pro-2025-06-10 (batch) | 10 | | 40 | 1M tokens | | o3 | 2 | 0.5 | 8 | 1M tokens | | o3 (batch) | 1 | | 4 | 1M tokens | | o3-2025-04-16 | 2 | 0.5 | 8 | 1M tokens | | o3-2025-04-16 (batch) | 1 | | 4 | 1M tokens | | o3-deep-research | 10 | 2.5 | 40 | 1M tokens | | o3-deep-research (batch) | 5 | | 20 | 1M tokens | | o3-deep-research-2025-06-26 | 10 | 2.5 | 40 | 1M tokens | | o3-deep-research-2025-06-26 (batch) | 5 | | 20 | 1M tokens | | o4-mini | 1.1 | 0.275 | 4.4 | 1M tokens | | o4-mini (batch) | 0.55 | | 2.2 | 1M tokens | | o4-mini-2025-04-16 | 1.1 | 0.275 | 4.4 | 1M tokens | | o4-mini-2025-04-16 (batch) | 0.55 | | 2.2 | 1M tokens | | o4-mini-deep-research | 2 | 0.5 | 8 | 1M tokens | | o4-mini-deep-research (batch) | 1 | | 4 | 1M tokens | | o4-mini-deep-research-2025-06-26 | 2 | 0.5 | 8 | 1M tokens | | o4-mini-deep-research-2025-06-26 (batch) | 1 | | 4 | 1M tokens | | o3-mini | 1.1 | 0.55 | 4.4 | 1M tokens | | o3-mini (batch) | 0.55 | | 2.2 | 1M tokens | | o3-mini-2025-01-31 | 1.1 | 0.55 | 4.4 | 1M tokens | | o3-mini-2025-01-31 (batch) | 0.55 | | 2.2 | 1M tokens | | o1-mini | 1.1 | 0.55 | 4.4 | 1M tokens | | o1-mini (batch) | 0.55 | | 2.2 | 1M tokens | | o1-mini-2024-09-12 | 1.1 | 0.55 | 4.4 | 1M tokens | | o1-mini-2024-09-12 (batch) | 0.55 | | 2.2 | 1M tokens | | codex-mini-latest | 1.5 | 0.375 | 6 | 1M tokens | | codex-mini-latest | 1.5 | 0.375 | 6 | 1M tokens | | gpt-4o-mini-search-preview | 0.15 | | 0.6 | 1M tokens | | gpt-4o-mini-search-preview-2025-03-11 | 0.15 | | 0.6 | 1M tokens | | gpt-4o-search-preview | 2.5 | | 10 | 1M tokens | | gpt-4o-search-preview-2025-03-11 | 2.5 | | 10 | 1M tokens | | computer-use-preview | 3 | | 12 | 1M tokens | | computer-use-preview (batch) | 1.5 | | 6 | 1M tokens | | computer-use-preview-2025-03-11 | 3 | | 12 | 1M tokens | | computer-use-preview-2025-03-11 (batch) | 1.5 | | 6 | 1M tokens | | gpt-image-1 | 5 | 1.25 | | 1M tokens | | gpt-5 | 1.25 | 0.125 | 10 | 1M tokens | | gpt-5 (batch) | 0.625 | 0.0625 | 5 | 1M tokens | | gpt-5-2025-08-07 | 1.25 | 0.125 | 10 | 1M tokens | | gpt-5-2025-08-07 (batch) | 0.625 | 0.0625 | 5 | 1M tokens | | gpt-5-latest | 1.25 | 0.125 | 10 | 1M tokens | | gpt-5-mini | 0.25 | 0.025 | 2 | 1M tokens | | gpt-5-mini (batch) | 0.125 | 0.0125 | 1 | 1M tokens | | gpt-5-mini-2025-08-07 | 0.25 | 0.025 | 2 | 1M tokens | | gpt-5-mini-2025-08-07 (batch) | 0.125 | 0.0125 | 1 | 1M tokens | | gpt-5-nano | 0.05 | 0.005 | 0.4 | 1M tokens | | gpt-5-nano (batch) | 0.025 | 0.0025 | 0.2 | 1M tokens | | gpt-5-nano-2025-08-07 | 0.05 | 0.005 | 0.4 | 1M tokens | | gpt-5-nano-2025-08-07 (batch) | 0.025 | 0.0025 | 0.2 | 1M tokens | ## Text tokens (Flex Processing) | Name | Input | Cached input | Output | Unit | | ------------------ | ----- | ------------ | ------ | --------- | | o3 | 1 | 0.25 | 4 | 1M tokens | | o3-2025-04-16 | 1 | 0.25 | 4 | 1M tokens | | o4-mini | 0.55 | 0.1375 | 2.2 | 1M tokens | | o4-mini-2025-04-16 | 0.55 | 0.1375 | 2.2 | 1M tokens | ## Audio tokens | Name | Input | Cached input | Output | Unit | | --------------------------------------- | ----- | ------------ | ------ | --------- | | gpt-4o-audio-preview | 40 | | 80 | 1M tokens | | gpt-4o-audio-preview-2025-06-03 | 40 | | 80 | 1M tokens | | gpt-4o-audio-preview-2024-12-17 | 40 | | 80 | 1M tokens | | gpt-4o-audio-preview-2024-10-01 | 100 | | 200 | 1M tokens | | gpt-4o-mini-audio-preview | 10 | | 20 | 1M tokens | | gpt-4o-mini-audio-preview-2024-12-17 | 10 | | 20 | 1M tokens | | gpt-4o-realtime-preview | 40 | 2.5 | 80 | 1M tokens | | gpt-4o-realtime-preview-2025-06-03 | 40 | 2.5 | 80 | 1M tokens | | gpt-4o-realtime-preview-2024-12-17 | 40 | 2.5 | 80 | 1M tokens | | gpt-4o-realtime-preview-2024-10-01 | 100 | 20 | 200 | 1M tokens | | gpt-4o-mini-realtime-preview | 10 | 0.3 | 20 | 1M tokens | | gpt-4o-mini-realtime-preview-2024-12-17 | 10 | 0.3 | 20 | 1M tokens | ## Image tokens | Name | Input | Cached input | Output | Unit | | ----------- | ----- | ------------ | ------ | --------- | | gpt-image-1 | 10 | 2.5 | 40 | 1M tokens | # Fine-tuning Tokens used for model grading in reinforcement fine-tuning are billed at that model's per-token rate. Inference discounts are available if you enable data sharing when creating the fine-tune job. [Learn more](https://help.openai.com/en/articles/10306912-sharing-feedback-evaluation-and-fine-tuning-data-and-api-inputs-and-outputs-with-openai#h_c93188c569). | Name | Training | Input | Cached input | Output | Unit | | -------------------------------------------- | -------------- | ----- | ------------ | ------ | --------- | | o4-mini-2025-04-16 | $100.00 / hour | 4 | 1 | 16 | 1M tokens | | o4-mini-2025-04-16 (batch) | | 2 | | 8 | 1M tokens | | o4-mini-2025-04-16 with data sharing | $100.00 / hour | 2 | 0.5 | 8 | 1M tokens | | o4-mini-2025-04-16 with data sharing (batch) | | 1 | | 4 | 1M tokens | | gpt-4.1-2025-04-14 | 25 | 3 | 0.75 | 12 | 1M tokens | | gpt-4.1-2025-04-14 (batch) | | 1.5 | | 6 | 1M tokens | | gpt-4.1-mini-2025-04-14 | 5 | 0.8 | 0.2 | 3.2 | 1M tokens | | gpt-4.1-mini-2025-04-14 (batch) | | 0.4 | | 1.6 | 1M tokens | | gpt-4.1-nano-2025-04-14 | 1.5 | 0.2 | 0.05 | 0.8 | 1M tokens | | gpt-4.1-nano-2025-04-14 (batch) | | 0.1 | | 0.4 | 1M tokens | | gpt-4o-2024-08-06 | 25 | 3.75 | 1.875 | 15 | 1M tokens | | gpt-4o-2024-08-06 (batch) | | 1.875 | | 7.5 | 1M tokens | | gpt-4o-mini-2024-07-18 | 3 | 0.3 | 0.15 | 1.2 | 1M tokens | | gpt-4o-mini-2024-07-18 (batch) | | 0.15 | | 0.6 | 1M tokens | | gpt-3.5-turbo | 8 | 3 | | 6 | 1M tokens | | gpt-3.5-turbo (batch) | | 1.5 | | 3 | 1M tokens | | davinci-002 | 6 | 12 | | 12 | 1M tokens | | davinci-002 (batch) | | 6 | | 6 | 1M tokens | | babbage-002 | 0.4 | 1.6 | | 1.6 | 1M tokens | | babbage-002 (batch) | | 0.8 | | 0.8 | 1M tokens | # Built-in tools The tokens used for built-in tools are billed at the chosen model's per-token rates. GB refers to binary gigabytes of storage (also known as gibibyte), where 1GB is 2^30 bytes. **Web search content tokens:** Search content tokens are tokens retrieved from the search index and fed to the model alongside your prompt to generate an answer. For gpt-4o and gpt-4.1 models, these tokens are included in the $25/1K calls cost. For o3 and o4-mini models, you are billed for these tokens at input token rates on top of the $10/1K calls cost. | Name | Cost | Unit | | ------------------------------------------------------------------------------------------------------- | ---- | --------------------------------------------- | | Code Interpreter | 0.03 | container | | File Search Storage | 0.1 | GB/day (1GB free) | | File Search Tool Call - Responses API only | 2.5 | 1k calls (\*Does not apply on Assistants API) | | Web Search - gpt-4o and gpt-4.1 models (including mini models) - Search content tokens free | 25 | 1k calls | | Web Search - o3, o4-mini, o3-pro, and deep research models - Search content tokens billed at model rate | 10 | 1k calls | # Transcription and speech generation ## Text tokens | Name | Input | Output | Estimated cost | Unit | | ---------------------- | ----- | ------ | -------------- | --------- | | gpt-4o-mini-tts | 0.6 | | 0.015 | 1M tokens | | gpt-4o-transcribe | 2.5 | 10 | 0.006 | 1M tokens | | gpt-4o-mini-transcribe | 1.25 | 5 | 0.003 | 1M tokens | ## Audio tokens | Name | Input | Output | Estimated cost | Unit | | ---------------------- | ----- | ------ | -------------- | --------- | | gpt-4o-mini-tts | | 12 | 0.015 | 1M tokens | | gpt-4o-transcribe | 6 | | 0.006 | 1M tokens | | gpt-4o-mini-transcribe | 3 | | 0.003 | 1M tokens | ## Other models | Name | Use case | Cost | Unit | | ------- | ----------------- | ----- | ------------- | | Whisper | Transcription | 0.006 | minute | | TTS | Speech generation | 15 | 1M characters | | TTS HD | Speech generation | 30 | 1M characters | # Image generation Please note that this pricing for GPT Image 1 does not include text and image tokens used in the image generation process, and only reflects the output image tokens cost. For input text and image tokens, refer to the corresponding sections above. There are no additional costs for DALL·E 2 or DALL·E 3. ## Image generation | Name | Quality | 1024x1024 | 1024x1536 | 1536x1024 | Unit | | ----------- | ------- | --------- | --------- | --------- | ----- | | GPT Image 1 | Low | 0.011 | 0.016 | 0.016 | image | | GPT Image 1 | Medium | 0.042 | 0.063 | 0.063 | image | | GPT Image 1 | High | 0.167 | 0.25 | 0.25 | image | ## Image generation | Name | Quality | 1024x1024 | 1024x1792 | 1792x1024 | Unit | | -------- | -------- | --------- | --------- | --------- | ----- | | DALL·E 3 | Standard | 0.04 | 0.08 | 0.08 | image | | DALL·E 3 | HD | 0.08 | 0.12 | 0.12 | image | ## Image generation | Name | Quality | 256x256 | 512x512 | 1024x1024 | Unit | | -------- | -------- | ------- | ------- | --------- | --------- | | DALL·E 2 | Standard | 0.016 | 0.018 | 0.02 | 1M tokens | # Embeddings ## Embeddings | Name | Cost | Unit | | ------------------------------ | ----- | --------- | | text-embedding-3-small | 0.02 | 1M tokens | | text-embedding-3-small (batch) | 0.01 | 1M tokens | | text-embedding-3-large | 0.13 | 1M tokens | | text-embedding-3-large (batch) | 0.065 | 1M tokens | | text-embedding-ada-002 | 0.1 | 1M tokens | | text-embedding-ada-002 (batch) | 0.05 | 1M tokens | # Moderation | Name | Cost | Unit | | -------------------------- | ---- | --------- | | omni-moderation-latest | Free | 1M tokens | | omni-moderation-2024-09-26 | Free | 1M tokens | | text-moderation-latest | Free | 1M tokens | | text-moderation-007 | Free | 1M tokens | # Other models ## Text tokens | Name | Input | Output | Unit | | --------------------------------- | ----- | ------ | --------- | | chatgpt-4o-latest | 5 | 15 | 1M tokens | | gpt-4-turbo | 10 | 30 | 1M tokens | | gpt-4-turbo (batch) | 5 | 15 | 1M tokens | | gpt-4-turbo-2024-04-09 | 10 | 30 | 1M tokens | | gpt-4-turbo-2024-04-09 (batch) | 5 | 15 | 1M tokens | | gpt-4-0125-preview | 10 | 30 | 1M tokens | | gpt-4-0125-preview (batch) | 5 | 15 | 1M tokens | | gpt-4-1106-preview | 10 | 30 | 1M tokens | | gpt-4-1106-preview (batch) | 5 | 15 | 1M tokens | | gpt-4-1106-vision-preview | 10 | 30 | 1M tokens | | gpt-4-1106-vision-preview (batch) | 5 | 15 | 1M tokens | | gpt-4 | 30 | 60 | 1M tokens | | gpt-4 (batch) | 15 | 30 | 1M tokens | | gpt-4-0613 | 30 | 60 | 1M tokens | | gpt-4-0613 (batch) | 15 | 30 | 1M tokens | | gpt-4-0314 | 30 | 60 | 1M tokens | | gpt-4-0314 (batch) | 15 | 30 | 1M tokens | | gpt-4-32k | 60 | 120 | 1M tokens | | gpt-4-32k (batch) | 30 | 60 | 1M tokens | | gpt-3.5-turbo | 0.5 | 1.5 | 1M tokens | | gpt-3.5-turbo (batch) | 0.25 | 0.75 | 1M tokens | | gpt-3.5-turbo-0125 | 0.5 | 1.5 | 1M tokens | | gpt-3.5-turbo-0125 (batch) | 0.25 | 0.75 | 1M tokens | | gpt-3.5-turbo-1106 | 1 | 2 | 1M tokens | | gpt-3.5-turbo-1106 (batch) | 0.5 | 1 | 1M tokens | | gpt-3.5-turbo-0613 | 1.5 | 2 | 1M tokens | | gpt-3.5-turbo-0613 (batch) | 0.75 | 1 | 1M tokens | | gpt-3.5-0301 | 1.5 | 2 | 1M tokens | | gpt-3.5-0301 (batch) | 0.75 | 1 | 1M tokens | | gpt-3.5-turbo-instruct | 1.5 | 2 | 1M tokens | | gpt-3.5-turbo-16k-0613 | 3 | 4 | 1M tokens | | gpt-3.5-turbo-16k-0613 (batch) | 1.5 | 2 | 1M tokens | | davinci-002 | 2 | 2 | 1M tokens | | davinci-002 (batch) | 1 | 1 | 1M tokens | | babbage-002 | 0.4 | 0.4 | 1M tokens | | babbage-002 (batch) | 0.2 | 0.2 | 1M tokens |