Apple has introduced Visual Intelligence, a new feature for the iPhone 16 lineup, designed to enhance how users interact with their phone’s camera. This tool lets the camera recognize and extract text from images, such as signs or event posters. It integrates seamlessly with other iOS apps like Google Search and ChatGPT for more specific answers.
For instance, Visual Intelligence lets you point your camera at a restaurant to instantly retrieve details like operating hours, reservation options, and the menu. Similarly, by scanning an event poster, the AI can automatically extract the time and location, adding the details directly to your calendar.
A key feature is the integration with third-party apps. With ChatGPT, you can use AI to solve math problems or generate text based on what’s in the image. At the same time, integration with Google allows for a search experience similar to Google Lens, identifying objects or locations in the image and providing relevant results.
Tap the new capture button on the iPhone 16’s camera to use Visual Intelligence. Apple also emphasized privacy, assuring users that photos captured with this feature will not be stored on their servers.
Visual search is the new AI trend.
Apple’s introduction of Visual Intelligence for the iPhone 16 lineup closely mirrors Google’s strategy with AI-powered visual search, particularly with Project Astra, which Google announced in May 2024. Both companies are enhancing how users interact with their surroundings via the camera, offering real-time information and AI-driven insights.
Like Apple’s Visual Intelligence, Project Astra lets users scan their environment and interact with what the camera sees, delivering details and actionable items on the spot.
Similarly, OpenAI has moved in the same direction with its mobile version of ChatGPT. The GPT-4o model is multimodal, meaning it can handle text and images. This allows users to send images during a conversation, where the AI can analyze and respond to visual inputs—similar to how Apple’s Visual Intelligence works, though focused more on conversational and creative tasks.
These advancements reflect a broader trend in integrating AI and multimodal capabilities into smartphones, making visual interaction and AI responses a standard part of mobile experiences.