Google launches the open-source Gemma 4 model: “On-device inference” boosts the efficiency of AI agent workflows

ChainNewsAbmedia

Google officially released Gemma 4 on April 2, 2026. As one of the most powerful open-source models to date, Gemma 4 achieves significant breakthroughs in native function calls, agent workflow operations, and multimodal perception, and it adopts a business-friendly Apache 2.0 license, offering unprecedented freedom and flexibility to developers and enterprises worldwide.

What is Gemma 4? Core features at a glance

Gemma 4 is an open-source large language model series developed by Google DeepMind, sharing technology with the Gemini series models. Key highlights include:

Advanced reasoning capabilities: Supports multi-step planning and deep logical reasoning, significantly outperforming other comparable open-source models on math and instruction-following benchmark tests.

Native agent workflows: Includes function calling, structured JSON output, and system instruction support, enabling you to directly drive autonomous AI agents and execute multi-step tasks.

On-device deployment: E2B and E4B versions are optimized for devices like mobile phones, and can run fully offline.

Comprehensive multimodal support: All versions natively support image and video inputs; E2B and E4B additionally support native audio input.

Long-context window: Edge models support a 128K token context window, while larger models reach up to 256K tokens, allowing you to pass an entire codebase or long-form documents in a single prompt.

High-quality code generation: Supports offline code writing, letting you turn your personal workstation into a local-first AI programming assistant.

Native training for 140+ languages: Supports more than 140 languages worldwide, helping developers build multilingual applications for international users.

Four model types, maximizing support for all application scenarios

Gemma 4 comes in four versions, optimized for different hardware environments and use cases:

Effective 2B (E2B): Designed specifically for mobile devices and IoT, supports a 128K context window, native audio input, and can run fully offline on edge devices such as Android phones and Raspberry Pi.

Effective 4B (E4B): Also optimized for the edge, it features multimodal capabilities, striking an excellent balance between inference performance and memory usage.

26B Mixture of Experts (MoE): During inference, only 3.8 billion parameters are activated, enabling high-speed reasoning with extremely low latency—ideal for local workstation deployments that prioritize throughput.

31B Dense: The flagship version, ranking third on the Arena AI text leaderboard. It provides the highest-quality outputs and can run end-to-end on a single 80GB NVIDIA H100 GPU.

Quantized versions of 26B MoE and 31B Dense can natively run on consumer graphics cards as well, bringing powerful AI inference capabilities to desktop environments for individual developers.

Major breakthrough in on-device inference: goodbye API dependency

One of Gemma 4’s most attention-grabbing features is its emphasis on “on-device” inference capabilities. The E2B and E4B models are designed to maximize compute efficiency and memory efficiency, running on edge devices such as mobile phones, Raspberry Pi, and NVIDIA Jetson Orin Nano with near-zero latency.

This has a huge impact on developers. In the past, calling cloud AI APIs meant paying costs for every request, along with network latency and data privacy risks. With Gemma 4’s on-device inference capabilities, developers can run the model on their own hardware, greatly reducing API call costs, while also enjoying full data sovereignty and offline availability.

Google also closely partnered with the Pixel team and mobile hardware partners such as Qualcomm and MediaTek to ensure that E2B/E4B achieve best-in-class performance on mainstream Android devices, and it opened AICore Developer Preview to Android developers so they can build integrations for Gemini Nano 4.

Strengthen AI agent workflows, improve efficiency with native function calling

Gemma 4 also delivers native support for agentic workflows—one of the most notable feature upgrades compared with the previous generation. The model supports:

Native function calling: The model can directly call external tools and APIs to perform real actions, such as querying databases or calling third-party services.

Structured JSON output: Ensures the model’s output conforms to a specific format, making it easy to integrate seamlessly with backend systems.

Native system instructions: Developers can set model behavior at the system level, making the AI agent’s role configuration more stable and consistent.

These capabilities allow Gemma 4 to become an all-around autonomous AI agent. It can not only answer questions, but also actively interact with tools and automatically execute multi-step workflows.

All-around multimodal upgrade: vision, audio, and long-form text—covered

All Gemma 4 models in the full lineup include native multimodal capabilities, greatly expanding the types of tasks they can handle.

Images and video

For visual understanding, all models support native processing of images and video, with support for variable resolutions. They perform especially well on visual tasks such as OCR (optical character recognition) and chart understanding.

Audio input

For audio, the E2B and E4B edge models additionally support native audio input, enabling direct speech recognition and understanding without needing an extra speech-to-text step.

Long context

For documents, edge models support a 128K token context window, while larger models provide up to 256K tokens—allowing developers to pass an entire codebase or long-form documents in a single prompt.

Offline code generation

Supports high-quality offline code writing, enabling your personal workstation to become a local-first AI programming assistant.

140+ language support

Native training for more than 140 languages helps developers build applications that serve users globally.

Apache 2.0 license: a milestone in the open-source ecosystem

Gemma 4 is released under the Apache 2.0 license, one of the most business-friendly licenses in the open-source community. Developers and enterprises can freely use, modify, and distribute the model—whether deploying it in private infrastructure, hybrid cloud environments, or embedding it in commercial products—without additional restrictions.

Robust ecosystem support

Gemma 4 also receives comprehensive support from major industry tools, including Hugging Face (Transformers, TRL, Transformers.js), Ollama, vLLM, llama.cpp, MLX, LM Studio, NVIDIA NIM and NeMo, Keras, Vertex AI, and more.

Developers can download model weights directly via Hugging Face, Kaggle, or Ollama, and try the 31B and 26B MoE versions online in Google AI Studio, or test the E2B and E4B versions through the Google AI Edge Gallery.

For enterprises that need large-scale deployment, Google Cloud provides complete cloud solutions covering Vertex AI, Cloud Run, GKE, Sovereign Cloud, and TPU-accelerated inference services, eliminating limits on local compute capacity.

Lower costs without sacrificing capability: Gemma 4 is a new developer choice

The release of Gemma 4 is a milestone for open-source AI models. As an enterprise-grade tool with production deployment capabilities, it can run offline on mobile phones, call external tools to complete tasks autonomously, handle long documents and multimodal inputs, and also allow everyone to use it freely.

For developers and enterprises that want to reduce API calling costs while retaining AI capabilities, Gemma 4 offers a highly compelling path.

This article about Google launching the open-source Gemma 4 model: “enhancing AI agent workflow efficiency with ‘on-device inference’” first appeared on ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments