ChatGPT introduces new voice and image capabilities

OpenAI is excited to announce the introduction of new voice and image capabilities in ChatGPT. This latest update allows users to engage in voice conversations with their AI assistant and even show it images. With the ability to have a back-and-forth conversation and receive human-like audio responses, ChatGPT becomes more interactive and intuitive. Users can now snap pictures of landmarks while traveling, analyze images for work-related data, or even troubleshoot various issues by discussing images with their AI assistant. These new features bring a whole new level of versatility and convenience to ChatGPT, enhancing its usefulness in users’ daily lives.

Voice and Image Capabilities in ChatGPT

Introduction to new voice and image capabilities

We are excited to announce the rollout of new voice and image capabilities in ChatGPT! These new features offer a more intuitive way to interact with the model, allowing you to engage in voice conversations and share images. By incorporating voice and image capabilities, ChatGPT becomes an even more versatile tool in your everyday life.

Benefits of voice and image in ChatGPT

Voice and image capabilities provide you with more ways to use ChatGPT. Whether you’re traveling and want to discuss a landmark, planning a meal and need to analyze the contents of your fridge, or seeking assistance with a complex graph for work-related data, ChatGPT is now equipped to understand and discuss these topics. The addition of voice and image features opens up a whole new world of possibilities for interaction and engagement.

Rollout of Voice and Image Capabilities

Availability for Plus and Enterprise users

Over the next two weeks, we will be rolling out voice and image capabilities to Plus and Enterprise users. This means that if you are a Plus or Enterprise user, you will soon be able to take advantage of these exciting new features.

Platforms for voice and image features

Voice capabilities will be available on iOS and Android devices, allowing you to have conversations with ChatGPT on the go. On the other hand, image capabilities will be available on all platforms, ensuring that you can make the most of the image-related features regardless of your device.

Voice Capabilities in ChatGPT

Engaging in conversation with ChatGPT

With the new voice capabilities, you can now engage in back-and-forth conversations with ChatGPT. Whether you need assistance while you’re on the move or want to settle a dinner table debate, ChatGPT is there to have a conversation with you.

Setting up voice features

To get started with voice, simply head to the Settings menu in the ChatGPT mobile app and opt into voice conversations through the New Features section. Once you have done that, you can tap the headphone button on the home screen to initiate a voice conversation.

Selecting preferred voice

ChatGPT offers five different voices for you to choose from. Whether you prefer a soft and gentle voice or a more enthusiastic tone, you can select the voice that best suits your preferences.

Text-to-speech model and voice actors

The voice capabilities in ChatGPT are powered by a new text-to-speech model that can generate human-like audio from text and a few seconds of sample speech. To ensure the highest quality voice experience, we collaborated with professional voice actors to create each of the voices available in ChatGPT.

Listening to voice samples

Curious about how the voices in ChatGPT sound? You can listen to voice samples to get a better idea of the different voices and choose the one that resonates with you.

Image Capabilities in ChatGPT

ChatGPT’s ability to understand and discuss images

In addition to voice capabilities, ChatGPT can now understand and discuss images. Whether you need help troubleshooting a technical issue, want to plan a meal based on the contents of your fridge, or need to analyze a complex graph, ChatGPT is here to assist you.

Using the drawing tool for image focus

To guide ChatGPT’s attention to a specific part of an image, you can use the drawing tool available in the ChatGPT mobile app. This feature allows you to ensure that ChatGPT focuses on the details that are most relevant to your query or discussion.

OpenAI’s Approach for Gradual Deployment

Safety concerns with voice technology

While voice technology opens up new possibilities, it also introduces new risks. To address these concerns, we have taken a measured approach to deployment. By focusing voice capabilities on a specific use case like voice chat and working closely with voice actors, we aim to mitigate potential risks such as impersonation or fraud.

Specific use case for voice chat

Voice chat is the primary use case for ChatGPT’s voice capabilities. By narrowing the scope of usage and collaborating with trusted partners like Spotify, who are utilizing the technology for their Voice Translation feature, we can ensure responsible and beneficial usage of this powerful technology.

Collaboration with other companies

OpenAI is actively collaborating with other companies to leverage voice capabilities in various domains. By working together, we can explore new applications and ensure that the technology is used responsibly and ethically.

Challenges with vision-based models

Similar to voice capabilities, vision-based models present their own set of challenges. From potential hallucinations to the interpretation of images in critical domains, it is crucial to thoroughly test and refine these models before broader deployment.

Testing for risk and responsible usage

Prior to wider deployment, we have conducted rigorous testing with red teamers and alpha testers to analyze risks associated with image input. This approach helps us identify potential issues and develop safeguards to ensure responsible usage of ChatGPT’s image capabilities.

Balancing Usefulness and Privacy in Vision Features

Assisting users with daily tasks

ChatGPT’s vision features are designed to assist users in their daily tasks. By analyzing images and providing insights or guidance related to the content, ChatGPT aims to be a helpful companion in various scenarios.

Less focus on analyzing and discussing people

To respect individuals’ privacy, we have implemented technical measures to limit ChatGPT’s direct statements and analysis specifically regarding people. While ChatGPT may still engage in conversations about images containing people, the system respects privacy and focuses more on the general context rather than specific individuals.

Technical measures for privacy

Privacy is a top priority, and we have taken technical measures to ensure that ChatGPT does not infringe upon users’ privacy. These measures help create a balance between usefulness and respecting personal boundaries.

Improving safeguards with real-world usage and feedback

As ChatGPT’s voice and image features are adopted by users, their real-world usage and feedback will further inform the development of safeguards and privacy measures. The insights gained from user experiences will help us enhance the system and ensure that it remains a valuable tool while maintaining user privacy.

Transparency about Model Limitations

Specialized use cases and limitations

While ChatGPT is a powerful tool, it does have limitations. It is important to recognize its capabilities and use it responsibly. For specialized topics, such as research in specific fields, users should be aware of the model’s limitations and exercise caution in relying solely on ChatGPT’s responses.

Advice for non-English users

ChatGPT performs well with English text but may struggle with languages using non-roman scripts. Therefore, non-English users are advised to avoid relying on ChatGPT for translation or transcription purposes in languages that the model might not accurately handle.

Safety measures and verification process

To ensure the safety and responsible use of ChatGPT, proper verification processes are in place for certain high-risk use cases. OpenAI is committed to transparency and aims to educate users about the model’s limitations and potential risks associated with its usage.

Expansion of Access to Voice and Image Features

Upcoming availability for Plus and Enterprise users

The voice and image capabilities of ChatGPT will soon be available to Plus and Enterprise users. Over the next two weeks, these users will be able to experience the full potential of the new features.

Plans to roll out to other user groups

Following the rollout to Plus and Enterprise users, OpenAI has plans to make voice and image capabilities available to other user groups, including developers. This expansion will ensure that more users can benefit from the enhanced functionalities of ChatGPT.


The introduction of voice and image capabilities in ChatGPT marks an exciting milestone in the evolution of this AI tool. By enabling voice conversations and image understanding, ChatGPT becomes an even more powerful and versatile assistant. OpenAI is committed to deploying these features responsibly, and user feedback and real-world usage will be instrumental in refining and improving these capabilities. With the expansion of access to voice and image features, ChatGPT continues to empower users in their daily lives.


Voice Mode Core Research

Alec Radford, Tao Xu, Jong Wook Kim

Vision Deployment Core Research

Raul Puri, Jamie Kiros, Hyeonwoo Noh, Long Ouyang, Sandhini Agarwal