IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

IBM released a new release. Granite-Docling-258M, a vision-language (Apache 2.0) model that is open source and designed for document conversion from end to end. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. Hugging face has both a live Demo and MLX Build for Apple Silicon.

What makes SmolDocling different?

Granite-Docling replaces SmolDocling256M as a ready-to-use product. IBM has replaced its earlier backbone by a Granite 165M Upgraded language encoder and the visual model SigLIP2 (base, patch16-512) While retaining Idefics3’s style connector (pixel-shuffle-projector). This model, which has 258M variables, shows a consistent improvement in accuracy across all areas of layout analysis (including full-page OCR), code, equations and tables. IBM addressed the instability failure modes that were observed in preview model.

The architecture and training pipeline

Backbone: Idefics3-derived stack with SigLIP2 vision encoder → pixel-shuffle connector → Granite 165M LLM.
Train the Trainer: The nanoVLM This is a lightweight, all-PyTorch VLM Training Toolkit.
Representation: Outputs DocTagsIBM markup for unambiguous structure of documents (elements plus coordinates plus relationships) that downstream tools can convert into Markdown/HTML/JSON.
Compute: IBM Trained Personnel Blue Vela H100 cluster.

Quantified improvements (Granite-Docling-258M vs. SmolDocling-256M preview)

Evaluation with docling-evalLMMS, LMMS Eval and tasks-specific datasets

Layout: F1 0.86 is a significant improvement over 0.85.
Full-page OCR: F1 0.84 vs. 0.80; lower edit distance.
Recognition of codes: F1 0.988 vs. 0.915; edit distance 0.013 vs. 0.114.
Recognizing Equations F1 0.968 vs. 0.947.
Table recognition using FinTabNet (150dpi). TEDS-structure 0.97 With content, TEDS is 0.82 vs. 0.96 vs. 0.76.
The following benchmarks are also available: MMStar 0.30 vs. 0.17; OCRBench 500 vs. 338.
Stability: “Avoids infinite loops more effectively” (production-oriented fix).

Multilingual support

Additions of Granite-Docling experimental Support for Japanese, Arabic and Chinese. IBM classifies this project as an early-stage development; English will remain the main language target.

DocTags: How it changes AI Documentation

The traditional OCR to Markdown pipelines can lose structure information, which complicates retrieval and augmented generation (RAG). Granite-Docling produces a lot of emissions DocTags—a compact, LLM-friendly structural grammar—which Docling converts into Markdown/HTML/JSON. It preserves the table topology and inline/floating mathematics, as well as code blocks, captions and reading order.

The integration of inference

Documentation Integration is recommended: You can also find out more about the following: Dosing CLI/SDK pulls Granite Docling automatically and converts Office Docs/PDFs/Images into multiple formats. IBM sees the model as part of Docling pipelines, rather than just a generic VLM.
Runtimes: Work with Transformers, The vLLM, ONNXThen, MLXIt is a dedicated MLX Build is optimized for Apple Silicon. Interactive demo of A Hugging Face Space is available (ZeroGPU).
License: Apache-2.0.

Why granite-Docling?

Small VLMs are ideal for enterprise AI document management. Preservation of Structure Inference complexity and cost are reduced. Granite-Docling replaces several single-purpose models, such as layout, OCR and table (code, equations, etc.) with one component which emits an intermediate representation that is richer, improving retrieval of downstream data. The measured gains—in TEDS for tables, F1 for code/equations, and reduced instability—make it a practical upgrade from SmolDocling for production workflows.

Demo

The following is a summary of the information that you will find on this page.

Granite-Docling-258M represents a breakthrough in AI document structure preservation. By combining IBM’s Granite backbone, SigLIP2 vision encoder, and the nanoVLM training framework, it delivers enterprise-ready performance across tables, equations, code, and multilingual text—all while remaining lightweight and open-source under Apache 2.0. Granite-Docling, with its measurable improvements over SmolDocling and seamless integration within Docling pipelines provides an ideal foundation for RAG workflows and document conversion where accuracy and reliability is critical.

Click here to find out more Models on Hugging Face You can also find out more about the following: Demo here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter.

Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model

This ultra-realistic platform is behind the rise of romance scams.

AI Humanoids are Here: Move aside, chatbots!

Microsoft, Meta, and Google Triple Up on AI Spending

OpenAI Anthropic Block are teaming up to create AI agents that play nicely

The US should build data centers in the following locations

Top Insights

Tracing OpenAI Agent Responses using MLFlow

The MiniMax MiniMax M2.7 is a Self-Evolving Model with a Score of 56.22% SWE Pro and 57.0% Terminal Bench 2.

Latest News

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Your Favorite AI Gay Thirst Traps: The Men Behind them

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

What makes SmolDocling different?

The architecture and training pipeline

Quantified improvements (Granite-Docling-258M vs. SmolDocling-256M preview)

Multilingual support

DocTags: How it changes AI Documentation

The integration of inference

Why granite-Docling?

Demo

The following is a summary of the information that you will find on this page.

Related Posts