Mistral Releases OCR 4 with Advanced Document Intelligence

Original: Mistral OCR 4

Why This Matters

Advances OCR capabilities for enterprise document processing and RAG pipelines with multilingual support and structured output.

Mistral AI released OCR 4 on June 23, 2026, an optical character recognition model supporting 170 languages with bounding boxes, block classification, and confidence scores. Independent annotators prefer OCR 4 over leading competitors with 72% average win rates.

Mistral AI announced Mistral OCR 4, a document intelligence model designed for enterprise applications. The model processes documents to extract text alongside bounding boxes, block classification (identifying titles, tables, equations, signatures, and more), and inline confidence scores. It supports 170 languages across 10 language groups and can run in a single container for fully self-hosted deployments. According to Mistral, independent annotators prefer OCR 4 over every leading OCR and document-AI system tested, with win rates averaging 72 percent. The model achieved the top overall score on OlmOCRBench at 85.20. OCR 4 serves as an ingestion component for enterprise search, retrieval-augmented generation (RAG), and domain-specific retrieval pipelines. The model is integrated with Mistral Search Toolkit, an open-source composable search framework announced at the AI Now Summit. The toolkit's ingestion, retrieval, and evaluation workflow uses OCR 4's structured output for RAG and enterprise search applications. Mistral positioned OCR 4 as a small, focused model compared to broader document AI systems, providing guidance on when to use the model API versus Document AI solutions.

Source

mistral.ai — Read original →