Cohere Unveils Command A+: A Milestone in Sovereign AI with Hugging Face Integration and Enhanced Efficiency
May 22, 2026
Efficiency gains include up to 63% faster output tokens per second and up to 17% shorter time to first token versus Command A Reasoning under the same quantization and parallel conditions, with W4A4 accelerating speed by about 47% and reducing latency by ~13%.
The model prioritizes extreme hardware and memory efficiency through sparse routing and 16‑bit, 8‑bit, and 4‑bit (W4A4) quantization, with 4‑bit quantization applied only to MoE experts and aided by Quantization‑Aware Distillation, enabling deployment on a single NVIDIA Blackwell B200 or two NVIDIA H100 GPUs and lowering inference costs.
The model is open‑sourced as an enterprise‑grade multimodal AI designed for agent‑like tasks, inference, and multilingual processing, intended to run in private environments on two NVIDIA H100s or one Blackwell-generation B200.
Cohere positions Command A+ as a milestone for sovereign AI and an open-source accelerant for enterprise AI adoption, with immediate ecosystem traction due to integration with Hugging Face and vLLM and emphasis on running securely in private data environments.
Cohere unveils Command A+ as a 218‑billion‑parameter decoder‑only Sparse MoE transformer optimized for complex reasoning, multimodal document processing, and agentic workflows, released with weights on Hugging Face under the Apache 2.0 license to promote sovereign AI.
Command A+ includes a redesigned tokenizer with native multi‑language support (48 languages) and improved tokenization efficiency for non‑European languages, reducing token counts in Arabic, Japanese, and Korean and lowering multilingual deployment costs.
The model provides native citation capabilities and grounding spans to improve trust and reduce hallucinations, a key feature for enterprise use in finance, healthcare, and legal sectors.
Command A+ integrates inference, multimodal processing, multilingual support, and tool usage into a single model with 48 languages and image inputs, using 218 billion parameters in Sparse MoE and an input context length of 128K.
A+ is engineered for agentic tasks with strong external tool integration, supports conversational tool use to connect to internal APIs, search engines, and SQL databases, and introduces native citation generation through grounding spans linked to source documents or database rows for traceability in regulated industries.
Command A+ achieves multimodal benchmarks of 63% on MMMU Pro and 75.1% on MMMU, with additional gains on MathVista and CharXiv reasoning, signaling improved document comprehension and multimodal processing over Command A Vision.
The model utilizes a new tokenizer to boost performance for Arabic, Korean, and Japanese and is released under the Apache 2.0 license.
Public availability includes quantized variants (BF16, FP8, W4A4) with minimum GPU requirements varying by quantization—four B200 or eight H100 for BF16; two B200 or four H100 for FP8; one B200 or two H100 for W4A4.
Summary based on 2 sources

