IBM released a new release. Granite-Docling-258M, a vision-language (Apache 2.0) model that is open source and designed for document conversion from end to end. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. Hugging face has both a live Demo and MLX Build for Apple Silicon.
What makes SmolDocling different?
Granite-Docling replaces SmolDocling256M as a ready-to-use product. IBM has replaced its earlier backbone by a Granite 165M Upgraded language encoder and the visual model SigLIP2 (base, patch16-512) While retaining Idefics3’s style connector (pixel-shuffle-projector). This model, which has 258M variables, shows a consistent improvement in accuracy across all areas of layout analysis (including full-page OCR), code, equations and tables. IBM addressed the instability failure modes that were observed in preview model.
The architecture and training pipeline
- Backbone: Idefics3-derived stack with SigLIP2 vision encoder → pixel-shuffle connector → Granite 165M LLM.
- Train the Trainer: The nanoVLM This is a lightweight, all-PyTorch VLM Training Toolkit.
- Representation: Outputs DocTagsIBM markup for unambiguous structure of documents (elements plus coordinates plus relationships) that downstream tools can convert into Markdown/HTML/JSON.
- Compute: IBM Trained Personnel Blue Vela H100 cluster.
Quantified improvements (Granite-Docling-258M vs. SmolDocling-256M preview)
Evaluation with docling-evalLMMS, LMMS Eval and tasks-specific datasets
- Layout: F1 0.86 is a significant improvement over 0.85.
- Full-page OCR: F1 0.84 vs. 0.80; lower edit distance.
- Recognition of codes: F1 0.988 vs. 0.915; edit distance 0.013 vs. 0.114.
- Recognizing Equations F1 0.968 vs. 0.947.
- Table recognition using FinTabNet (150dpi). TEDS-structure 0.97 With content, TEDS is 0.82 vs. 0.96 vs. 0.76.
- The following benchmarks are also available: MMStar 0.30 vs. 0.17; OCRBench 500 vs. 338.
- Stability: “Avoids infinite loops more effectively” (production-oriented fix).
Multilingual support
Additions of Granite-Docling experimental Support for Japanese, Arabic and Chinese. IBM classifies this project as an early-stage development; English will remain the main language target.
DocTags: How it changes AI Documentation
The traditional OCR to Markdown pipelines can lose structure information, which complicates retrieval and augmented generation (RAG). Granite-Docling produces a lot of emissions DocTags—a compact, LLM-friendly structural grammar—which Docling converts into Markdown/HTML/JSON. It preserves the table topology and inline/floating mathematics, as well as code blocks, captions and reading order.
The integration of inference
- Documentation Integration is recommended: You can also find out more about the following:
DosingCLI/SDK pulls Granite Docling automatically and converts Office Docs/PDFs/Images into multiple formats. IBM sees the model as part of Docling pipelines, rather than just a generic VLM. - Runtimes: Work with Transformers, The vLLM, ONNXThen, MLXIt is a dedicated MLX Build is optimized for Apple Silicon. Interactive demo of A Hugging Face Space is available (ZeroGPU).
- License: Apache-2.0.
Why granite-Docling?
Small VLMs are ideal for enterprise AI document management. Preservation of Structure Inference complexity and cost are reduced. Granite-Docling replaces several single-purpose models, such as layout, OCR and table (code, equations, etc.) with one component which emits an intermediate representation that is richer, improving retrieval of downstream data. The measured gains—in TEDS for tables, F1 for code/equations, and reduced instability—make it a practical upgrade from SmolDocling for production workflows.
Demo
The following is a summary of the information that you will find on this page.
Granite-Docling-258M represents a breakthrough in AI document structure preservation. By combining IBM’s Granite backbone, SigLIP2 vision encoder, and the nanoVLM training framework, it delivers enterprise-ready performance across tables, equations, code, and multilingual text—all while remaining lightweight and open-source under Apache 2.0. Granite-Docling, with its measurable improvements over SmolDocling and seamless integration within Docling pipelines provides an ideal foundation for RAG workflows and document conversion where accuracy and reliability is critical.
Click here to find out more Models on Hugging Face You can also find out more about the following: Demo here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter.
Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

