IBM Unveils Granite 4.0: Open-Source LLM Revolutionizing AI with Efficiency and Accessibility

October 2, 2025

Tech

Future applications include summarization, data extraction, question-answering, code generation, and multilingual interactions, with models optimized for edge deployment, even in-browser via WebGPU.
IBM has launched Granite 4.0, a fully open-source, enterprise-grade large language model (LLM) that emphasizes high performance and cost efficiency, licensed under Apache 2.0 and ISO 42001-certified.
The launch marks a shift toward democratizing AI, making powerful, certified models more accessible and responsibly governed, which could accelerate adoption and inspire new hybrid architectures.
The core innovation of Granite 4.0 is its hybrid architecture, which interleaves a small number of self-attention blocks with mostly Mamba-2 state-space layers in a 9:1 ratio, leading to over 70% reduction in RAM usage for long-context and multi-session inference compared to traditional Transformers.
Available through platforms such as watsonx.ai, Hugging Face, and Docker Hub, these models are accessible via multiple partners including Dell, NVIDIA, and plan to expand to Amazon SageMaker and Microsoft Azure.
Granite 4.0 models are trained on samples up to 512K tokens and evaluated on sequences up to 128K tokens, with options for quantization, conversion to GGUF, and support for FP8 computations on compatible hardware.
Designed to be more affordable and accessible, these models can be deployed on cheaper GPUs, lowering infrastructure barriers and encouraging broader adoption among smaller organizations.
The models omit positional encoding (NoPE), simplifying their design without sacrificing performance on long-context tasks, and are highly optimized for efficiency, making them suitable for edge computing and resource-constrained environments.
These models are tailored for enterprise workflows such as multi-tool agents and customer support, focusing on efficiency, low latency, and scalability.
By activating only necessary parameters through a Mixture-of-Experts (MoE) routing strategy, Granite 4.0 models significantly reduce computational load and RAM usage, with examples showing over 70% RAM reduction for long-input tasks.
Granite 4.0 models come in various sizes and configurations, including instruction-tuned and reasoning variants, with plans to release additional models like Granite Nano for edge deployment.
Benchmark results demonstrate that Granite 4.0 models outperform most open-weight models in instruction-following, function calling, and retrieval-augmented generation, trained on 22 trillion tokens for robust performance.
IBM emphasizes safety and responsible AI, with certifications, bug bounty programs, cryptographic signing, and adherence to best practices for security and provenance.

Summary based on 6 sources

Get a daily email with more Tech stories

Sources

• Oct 2, 2025

IBM Granite 4.0: hyper-efficient, high performance hybrid models for enterprise

Analytics India Magazine • Oct 3, 2025

IBM Launches Granite 4.0 Hybrid AI Models With Lower Memory and Hardware Costs

MarkTechPost • Oct 2, 2025

IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture: Drastically Reducing Memory Use without Sacrificing Performance

Site Logo • Oct 3, 2025

IBM Unleashes Granite 4.0: A Hybrid AI Architecture Poised to Redefine Enterprise and Open-Source LLMs

IBM Unveils Granite 4.0: Open-Source LLM Revolutionizing AI with Efficiency and Accessibility

Get a daily email with more Tech stories

Sources

More Stories