Meta launches Llama 3.2, an AI model that “understands” image and text

Meta introduced Llama 3.2, the latest version of its AI models, at the Meta Connect 2024 event on September 25. This release brings significant advancements, including vision models and compact text models, with enhanced capabilities for visual data interpretation. Developers can now access these models via Meta’s official website and Hugging Face.

The vision models in Llama 3.2 are designed to interpret data from images, graphs, and maps, enabling tasks like image recognition and automatic captioning. These models are available in parameter sizes ranging from 11 billion to 90 billion. For example, they can analyze complex data such as a company’s sales graph and determine which month had the best sales performance. Additionally, for captioning, these models are capable of generating concise descriptions—one or two sentences—by analyzing the details within an image.

Llama 3.2 for cell phones

Llama 3.2 will also be available in smaller versions specifically designed for mobile devices and edge computing environments. These compact models, with 1 billion and 3 billion parameters, are optimized for use on mobile phones and other compact devices, providing efficient AI processing locally.

Circle to Copilot: The new Microsoft Edge feature that challenges Google

These mobile-optimized models are compatible with chips from leading manufacturers such as Qualcomm and MediaTek and are designed to work seamlessly with ARM architecture processors, which are widely used in smartphones due to their lower power consumption. This enables AI tasks to be performed efficiently on smaller devices while maintaining energy efficiency, making Llama 3.2 suitable for a range of mobile and edge computing applications.

Security and integration

To reinforce its commitment to safety and ethics in AI, Meta introduced Llama Guard 3, a security system designed to monitor the input and output of text and images from its models. This tool helps ensure that AI applications are developed and deployed responsibly, adding an extra layer of oversight to prevent misuse or unethical practices.

Additionally, Meta launched Llama Slack, a toolkit designed to facilitate the integration and customization of Llama models across different environments. Whether deployed in the cloud or locally on smartphones, these tools aim to streamline the use of Llama models, allowing developers to adapt the models to specific needs in diverse operating environments.