Google launches Gemma 4 open models for reasoning and agentic AI workflows

Details

Google introduced Gemma 4, a family of open-source models released under Apache 2.0 license, designed for advanced reasoning and autonomous agent workflows across multiple hardware platforms.
Four model sizes released: 2B and 4B effective parameters (E2B, E4B) for mobile and edge devices; 26B Mixture-of-Experts (MoE) optimized for latency; 31B Dense for maximum quality. The 31B ranks #3 and 26B ranks #6 on Arena AI's open model leaderboard, outcompeting models 20x their size.
Native multimodal support includes text, image, video, and audio processing across all models, with 128K context windows for edge models and 256K for larger variants. All models trained on 140+ languages.
Enhanced capabilities include multi-step reasoning, function-calling and structured JSON output for agent development, offline code generation, and OCR/chart understanding. Gemini Nano 4 (based on Gemma 4 E2B) delivers 4x faster performance and 60% lower battery consumption than prior version.
Community adoption demonstrates traction: prior Gemma generation downloaded over 400 million times with 100,000+ variants created. Day-one support spans Hugging Face, vLLM, llama.cpp, NVIDIA NIM, and deployment options from edge devices to Google Cloud TPUs.

Impact

Gemma 4 accelerates the shift toward local-first AI inference, enabling developers to build agentic applications on consumer hardware without cloud dependency—critical for latency-sensitive, privacy-focused, and cost-constrained deployments. The MoE and dense architectures competing with proprietary 100B+ models suggest open-source efficiency gains will pressure commercial model pricing. Enterprise adoption may accelerate, particularly in regulated sectors prioritizing data sovereignty, as the Apache 2.0 license removes deployment restrictions versus competing open frameworks.