DeepSeek Unveils AI Model Revolutionizing Image-to-Text Conversion with Unprecedented Efficiency and Accuracy
October 21, 2025
DeepSeek has launched 'DeepSeek-OCR,' a multimodal AI model that leverages visual perception to efficiently process large and complex documents by converting images into compressed text representations.
This innovative approach significantly reduces computational load, achieving up to 96.5% accuracy with only a tenth of the original token count, and maintaining around 60% accuracy even with twenty times compression.
DeepEncoder plays a crucial role by transforming high-resolution images into fewer tokens—such as reducing a 1024x1024 image from 4096 to 256 tokens—by passing data through a CLIP-based system linking images and text.
The technology is especially valuable for handling visual data like tables, graphs, and diagrams across fields like finance, science, and medicine, enabling better management of long-context calculations.
DeepSeek-OCR's high throughput can facilitate the creation of large training datasets for AI, with tools and models made publicly available for developers.
While promising, critics note that performance metrics are based on internal tests, and independent evaluations are needed to verify accuracy, stability, and reliability across different languages and complex layouts.
Industry expert Andrej Karpathy praised the model's 'vision token' system, suggesting it could eliminate traditional tokenizers and enable more flexible reasoning over complex data.
DeepSeek-OCR is open-source, accessible via a live demo, and can be deployed on personal or enterprise infrastructure, with code and weights available on GitHub, although it currently lacks an official API.
The model was announced on October 21, 2025, by the South China Morning Post, highlighting its potential to impact AI and large language model development.
Applications of DeepSeek-OCR span academic research, digital archiving, financial automation, and AI training data generation, helping digitize archives and create large, high-quality datasets.
The model demonstrates advanced understanding in extracting structured information from financial charts, recognizing chemical formulas, and parsing geometric figures, indicating broad STEM applications.
Its ability to drastically reduce token counts could influence future AI training datasets, lowering costs but also posing risks of errors during data compression.
Summary based on 22 sources
Get a daily email with more Tech stories
Sources

The Times Of India • Oct 21, 2025
Deepseek's new tool can extract text from photos of pages: What it means for users
The Indian Express • Oct 21, 2025
DeepSeek’s new AI model can generate 200K pages of training data daily on a single GPU
South China Morning Post • Oct 21, 2025
DeepSeek unveils AI model that uses visual perception to compress text input