Alibaba's Qwen3.5 AI Surpasses GPT-5.2, Offers 8x Throughput and 60% Cost Reduction

February 16, 2026
Alibaba's Qwen3.5 AI Surpasses GPT-5.2, Offers 8x Throughput and 60% Cost Reduction
  • Alibaba points to official technical details, model weights, and GitHub/social channels for updates.

  • The article credits Ayesha Aayat and notes a focus on cybercrime and digital safety, with external link references included.

  • Alibaba unveils Qwen3.5, a multimodal mixture-of-experts AI that Alibaba says outperforms GPT-5.2 and Claude 4.5 Opus on several benchmarks, while also delivering 8x better throughput on large workloads and 60% lower operating costs.

  • Qwen3.5 is built with native multimodality, including text, images, and video, and introduces agentic capabilities that can autonomously take actions across mobile and desktop apps.

  • The model scales via a sparse Mixture-of-Experts routing across 60 layers, featuring a hybrid Efficient Hybrid Architecture with Gated Delta Networks and both linear and quadratic attention components to reduce memory and compute.

  • Key architectural highlights include a large-context capability with a base context window of up to 256k tokens and Qwen3.5-Plus extending to 1,000,000 tokens for long inputs, enabled by an asynchronous RL framework.

  • Qwen3.5 supports over 210 languages and can process images and data visualizations, enabling rich multimodal input.

  • The available excerpt does not disclose the exact release date, deployment scope, or additional technical specs.

  • The architecture employs 512 MoE experts with 11 active per token (10 routing experts plus 1 shared) and a 4,096 hidden size, using a 60-layer configuration with a 3:1 ratio of gating attention to gating delta networks.

  • Analysts warn that rising costs, data governance, regulatory changes, and heavy computing demands could affect long-term profitability and sustainability of large models.

  • MoE details include 512 experts with 11 active per token and a vocabulary of around 248k tokens.

  • Qwen3.5’s base length is 256k tokens, with hosted Qwen3.5-Plus supporting 1,000,000 tokens to handle long documents and large codebases without full RAG.

Summary based on 14 sources


Get a daily email with more Tech stories

More Stories