IBM Unveils Granite 4.0: Open-Source LLM Revolutionizing AI with Efficiency and Accessibility
October 2, 2025
Future applications include summarization, data extraction, question-answering, code generation, and multilingual interactions, with models optimized for edge deployment, even in-browser via WebGPU.
IBM has launched Granite 4.0, a fully open-source, enterprise-grade large language model (LLM) that emphasizes high performance and cost efficiency, licensed under Apache 2.0 and ISO 42001-certified.
The launch marks a shift toward democratizing AI, making powerful, certified models more accessible and responsibly governed, which could accelerate adoption and inspire new hybrid architectures.
The core innovation of Granite 4.0 is its hybrid architecture, which interleaves a small number of self-attention blocks with mostly Mamba-2 state-space layers in a 9:1 ratio, leading to over 70% reduction in RAM usage for long-context and multi-session inference compared to traditional Transformers.
Available through platforms such as watsonx.ai, Hugging Face, and Docker Hub, these models are accessible via multiple partners including Dell, NVIDIA, and plan to expand to Amazon SageMaker and Microsoft Azure.
Granite 4.0 models are trained on samples up to 512K tokens and evaluated on sequences up to 128K tokens, with options for quantization, conversion to GGUF, and support for FP8 computations on compatible hardware.
Designed to be more affordable and accessible, these models can be deployed on cheaper GPUs, lowering infrastructure barriers and encouraging broader adoption among smaller organizations.
The models omit positional encoding (NoPE), simplifying their design without sacrificing performance on long-context tasks, and are highly optimized for efficiency, making them suitable for edge computing and resource-constrained environments.
These models are tailored for enterprise workflows such as multi-tool agents and customer support, focusing on efficiency, low latency, and scalability.
By activating only necessary parameters through a Mixture-of-Experts (MoE) routing strategy, Granite 4.0 models significantly reduce computational load and RAM usage, with examples showing over 70% RAM reduction for long-input tasks.
Granite 4.0 models come in various sizes and configurations, including instruction-tuned and reasoning variants, with plans to release additional models like Granite Nano for edge deployment.
Benchmark results demonstrate that Granite 4.0 models outperform most open-weight models in instruction-following, function calling, and retrieval-augmented generation, trained on 22 trillion tokens for robust performance.
IBM emphasizes safety and responsible AI, with certifications, bug bounty programs, cryptographic signing, and adherence to best practices for security and provenance.
Summary based on 6 sources
Get a daily email with more Tech stories
Sources

Analytics India Magazine • Oct 3, 2025
IBM Launches Granite 4.0 Hybrid AI Models With Lower Memory and Hardware Costs

