Apple's New Multi-Token Framework Boosts LLM Speed by Up to 5x Without Sacrificing Quality
August 9, 2025
The integration of 'mask' tokens is crucial for ensuring the accuracy and relevance of the generated text, which is essential for both commercial and research applications of LLMs.
This innovative framework allows LLMs to predict multiple tokens simultaneously, utilizing special 'mask' tokens to verify predictions against standard autoregressive decoding methods.
By addressing inefficiencies in traditional autoregressive decoding, which generates text one token at a time, this advancement significantly reduces the time required for longer sequences.
The implications of faster text generation are vast, potentially enhancing AI-driven trading systems, improving customer service within financial institutions, and facilitating more efficient data analysis, which could lead to increased market efficiency.
While the exact financial impact of this framework remains to be seen, its innovations are anticipated to significantly influence the future of AI applications in the financial sector.
The new approach is detailed in a paper titled 'Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential', which outlines the capabilities of the multi-token prediction (MTP) framework.
Researchers have incorporated special 'mask' tokens into prompts, enabling the model to speculate on several upcoming words simultaneously, thereby accelerating the inference process.
In tests with the open-source Tulu3-8B model, the new method demonstrated average speed improvements of 2-3 times for general tasks and up to 5 times for more predictable tasks such as coding and mathematics.
Apple has unveiled a groundbreaking 'multi-token prediction' framework designed to enhance the performance of large language models (LLMs), enabling them to generate text up to five times faster while maintaining high output quality.
Known as gated LoRA adaptation, this technique guarantees that the speed enhancements do not compromise the quality of the generated content.
This development comes at a time when the AI industry is heavily focused on optimizing LLM performance, with other companies like OpenAI also advancing their own models and decoding frameworks.
Summary based on 2 sources
Get a daily email with more Tech stories
Sources

9to5Mac • Aug 8, 2025
Apple researchers taught an LLM to predict tokens up to 5x faster - 9to5Mac
Ainvest • Aug 8, 2025
Apple's LLM Technology Boosts Prediction Speed