The competition for document AI leadership is intensifying, and recent results from OmniDocBench v1.5 place PaddleOCR-VL at the forefront. As businesses increasingly require accurate, fast, and multilingual document processing, this model represents a significant advancement in handling complex documents across various sectors.
Leading the Benchmark Rankings
PaddleOCR-VL achieved an impressive overall score of 92.6 on OmniDocBench v1.5, surpassing competing systems: MinerU 2.5 (90.7), MonkeyOCR-pro-3B (88.9), Gemini-2.5 Pro (88.0), and GPT-4o (75.0). The model outperformed both established OCR frameworks and leading vision-enabled language models, demonstrating the effectiveness of its specialized approach.
In a recent announcement, PaddlePaddle trader highlighted that PaddleOCR-VL is a compact vision-language model delivering state-of-the-art accuracy across diverse tasks while maintaining industrial-grade efficiency.

It supports 109 languages, handles complex layouts, and processes even small-scale text effectively.
Performance Across Document Tasks
The model excels across critical document intelligence categories with a text score of 96.5, leading all competitors including GPT-4o and Gemini. Its formula recognition capability reaches 91.4, substantially higher than alternatives like Gemini (88.3) and MinerU (88.5). For table structure understanding, it scores 89.9, among the best for processing complex tabular data. The reading order accuracy of 95.7 ensures precise layout interpretation. These results demonstrate the system's capability to process not only plain text but also mathematical notation, tables, and structured multi-modal document elements.
Technical Foundation
PaddleOCR-VL combines the NaViT dynamic vision encoder with the ERNIE lightweight language model, achieving high performance while maintaining a compact 0.9B parameter size. This architecture delivers both speed and accuracy, making it practical for large-scale enterprise applications.
Industry Impact
Document parsing remains one of the most commercially valuable AI applications, powering use cases from financial services and legal automation to healthcare records and e-commerce data extraction. With its multilingual capabilities and flexible layout handling, PaddleOCR-VL offers a more targeted, efficient, and cost-effective alternative to general-purpose language models.