Details
- IBM launched Granite 4.1, a family of decoder-only dense LLMs in 3B, 8B, and 30B sizes, trained from scratch on ~15T tokens via five-phase pre-training extending context to 512K tokens, followed by SFT on 4.1M curated samples and RL with on-policy GRPO/DAPO.
- Key players include IBM Granite team; models released under Apache 2.0 on Hugging Face, with instruct versions showing enhanced tool-calling, instruction-following, math, coding, and chat performance.
- Architecture features GQA, RoPE, SwiGLU, RMSNorm, shared embeddings; 8B instruct model matches or beats prior 32B-A9B MoE Granite 4.0-H-Small despite fewer parameters, emphasizing data quality over scale.
- Comes after Granite 4.0; multi-stage data curation via LLM-as-Judge and filtering improves post-training, avoiding long CoT for predictable latency and lower costs versus chain-of-thought approaches.
- Models available on Hugging Face with strong benchmarks like 68.29% HumanEval pass@1 for 8B; integrates with Azure AI and supports FIM code completion, positioning as efficient open alternatives to proprietary SLMs.
Impact
Granite 4.1 advances open-source SLMs by proving dense architectures with rigorous data curation can rival larger MoE models, accelerating enterprise adoption for on-device inference and tool-use agents. Over 12-24 months, it could shift R&D toward quality-focused training pipelines, boosting developer ecosystems on Hugging Face while pressuring competitors like Mistral and Stability AI to match efficiency gains.