has officially rolled out Gemma 3n, its latest
on-device AI model first teased back in May 2025. What makes this launch exciting is that
Gemma 3n brings full-scale
multimodal processing think audio, video, image, and text straight to smartphones and edge devices, all without needing constant internet or heavy cloud support. It’s a big step forward for developers looking to bring powerful AI features to low-power devices running on limited memory.At the core of Gemma 3n is a new architecture called MatFormer short for Matryoshka Transformer. Think Russian nesting dolls: smaller, fully-functional models tucked inside bigger ones. This clever setup lets developers scale AI performance based on the device’s capability. You get two versions E2B runs on just 2GB RAM, and E4B works with around 3GB.
Despite packing 5 to 8 billion raw parameters, both versions behave like much smaller models when it comes to resource use. That’s thanks to smart design choices like Per-Layer Embeddings (PLE), which shift some of the load from the GPU to the CPU, helping save memory. It also features KV Cache Sharing, which speeds up processing of long audio and video inputs by nearly 2x perfect for real-time use cases like voice assistants and mobile video analysis.
— GoogleDeepMind (@GoogleDeepMind)
Gemma 3n isn’t just light on memory it’s stacked with serious capabilities. For speech-based features, it uses an audio encoder adapted from Google’s Universal Speech Model, which means it can handle
speech-to-text and even
language translation directly on your phone. It’s already showing solid results, especially when translating between English and European languages like Spanish, French, Italian, and Portuguese.
On the visual front, it’s powered by Google’s new MobileNet-V5—a lightweight but powerful vision encoder that can process video at up to 60fps on phones like the Pixel. That means smooth, real-time video analysis without breaking a sweat. And it’s not just fast—it’s also more accurate than older models.
Developers can plug into Gemma 3n using popular tools like Hugging Face Transformers, Ollama, MLX, llama.cpp, and more. Google’s also kicked off the
Gemma 3n Impact Challenge, offering a $150,000 prize pool for apps that showcase the model’s offline magic.The best part? Gemma 3n runs entirely offline. No cloud, no connection just pure on-device AI. With support for over 140 languages and the ability to understand content in 35, it’s a game-changer for building AI apps where connectivity is patchy or privacy is a priority. Want to try Gemma 3n for yourself? Here’s how you can get started: