Unlimited OCR: One-Shot Long-Horizon Parsing System
Original: Unlimited OCR: One-Shot Long-Horizon Parsing
Why This Matters
Advances open-source OCR capabilities for processing lengthy documents, enabling practical applications in document digitization and information extraction.
Baidu releases Unlimited-OCR, an advanced optical character recognition system enabling one-shot long-horizon parsing. The open-source model is available on GitHub and ModelScope, with paper published on arXiv and inference support via Hugging Face Transformers.
Baidu has released Unlimited-OCR, an open-source optical character recognition system designed to advance long-horizon document parsing capabilities. The project represents an evolution of Deepseek-OCR technology, introducing what the developers describe as "one-shot" parsing for extended documents. The model is publicly available on GitHub under the baidu/Unlimited-OCR repository and has been made accessible through the ModelScope community platform. The research paper describing the technical approach was published on arXiv. The system supports inference using Hugging Face Transformers framework on NVIDIA GPUs. Technical requirements include Python 3.12.3 with CUDA 12.9, torch 2.10.0, transformers 4.57.1, and additional dependencies for image processing (Pillow, matplotlib) and document handling (pymupdf). The repository includes inference code and model weights, enabling developers to integrate the OCR capabilities into their applications. The project has garnered 2.9k stars on GitHub with 181 forks, indicating community interest in the technology.