Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval
  • Your Favorite AI Gay Thirst Traps: The Men Behind them
  • Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin
  • Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges
  • Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model
  • Stanford Students Wait in Line to Hear From Silicon Valley Royalty at ‘AI Coachella’
  • Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.
  • Equinox Detailed implementation with JAX Native Moduls, Filtered Transformations, Stateful Ladders and Workflows from End to end.
AI-trends.todayAI-trends.today
Home»Tech»IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

Tech By Gavin Wallace18/09/20254 Mins Read
Facebook Twitter LinkedIn Email
Can LLMs Really Judge with Reasoning? Microsoft and Tsinghua Researchers
Can LLMs Really Judge with Reasoning? Microsoft and Tsinghua Researchers
Share
Facebook Twitter LinkedIn Email

IBM released a new release. Granite-Docling-258M, a vision-language (Apache 2.0) model that is open source and designed for document conversion from end to end. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. Hugging face has both a live Demo and MLX Build for Apple Silicon.

What makes SmolDocling different?

Granite-Docling replaces SmolDocling256M as a ready-to-use product. IBM has replaced its earlier backbone by a Granite 165M Upgraded language encoder and the visual model SigLIP2 (base, patch16-512) While retaining Idefics3’s style connector (pixel-shuffle-projector). This model, which has 258M variables, shows a consistent improvement in accuracy across all areas of layout analysis (including full-page OCR), code, equations and tables. IBM addressed the instability failure modes that were observed in preview model.

The architecture and training pipeline

  • Backbone: Idefics3-derived stack with SigLIP2 vision encoder → pixel-shuffle connector → Granite 165M LLM.
  • Train the Trainer: The nanoVLM This is a lightweight, all-PyTorch VLM Training Toolkit.
  • Representation: Outputs DocTagsIBM markup for unambiguous structure of documents (elements plus coordinates plus relationships) that downstream tools can convert into Markdown/HTML/JSON.
  • Compute: IBM Trained Personnel Blue Vela H100 cluster.

Quantified improvements (Granite-Docling-258M vs. SmolDocling-256M preview)

Evaluation with docling-evalLMMS, LMMS Eval and tasks-specific datasets

  • Layout: F1 0.86 is a significant improvement over 0.85.
  • Full-page OCR: F1 0.84 vs. 0.80; lower edit distance.
  • Recognition of codes: F1 0.988 vs. 0.915; edit distance 0.013 vs. 0.114.
  • Recognizing Equations F1 0.968 vs. 0.947.
  • Table recognition using FinTabNet (150dpi). TEDS-structure 0.97 With content, TEDS is 0.82 vs. 0.96 vs. 0.76.
  • The following benchmarks are also available: MMStar 0.30 vs. 0.17; OCRBench 500 vs. 338.
  • Stability: “Avoids infinite loops more effectively” (production-oriented fix).

Multilingual support

Additions of Granite-Docling experimental Support for Japanese, Arabic and Chinese. IBM classifies this project as an early-stage development; English will remain the main language target.

DocTags: How it changes AI Documentation

The traditional OCR to Markdown pipelines can lose structure information, which complicates retrieval and augmented generation (RAG). Granite-Docling produces a lot of emissions DocTags—a compact, LLM-friendly structural grammar—which Docling converts into Markdown/HTML/JSON. It preserves the table topology and inline/floating mathematics, as well as code blocks, captions and reading order.

The integration of inference

  • Documentation Integration is recommended: You can also find out more about the following: Dosing CLI/SDK pulls Granite Docling automatically and converts Office Docs/PDFs/Images into multiple formats. IBM sees the model as part of Docling pipelines, rather than just a generic VLM.
  • Runtimes: Work with Transformers, The vLLM, ONNXThen, MLXIt is a dedicated MLX Build is optimized for Apple Silicon. Interactive demo of A Hugging Face Space is available (ZeroGPU).
  • License: Apache-2.0.

Why granite-Docling?

Small VLMs are ideal for enterprise AI document management. Preservation of Structure Inference complexity and cost are reduced. Granite-Docling replaces several single-purpose models, such as layout, OCR and table (code, equations, etc.) with one component which emits an intermediate representation that is richer, improving retrieval of downstream data. The measured gains—in TEDS for tables, F1 for code/equations, and reduced instability—make it a practical upgrade from SmolDocling for production workflows.

Demo

The following is a summary of the information that you will find on this page.

Granite-Docling-258M represents a breakthrough in AI document structure preservation. By combining IBM’s Granite backbone, SigLIP2 vision encoder, and the nanoVLM training framework, it delivers enterprise-ready performance across tables, equations, code, and multilingual text—all while remaining lightweight and open-source under Apache 2.0. Granite-Docling, with its measurable improvements over SmolDocling and seamless integration within Docling pipelines provides an ideal foundation for RAG workflows and document conversion where accuracy and reliability is critical.


Click here to find out more Models on Hugging Face You can also find out more about the following: Demo here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter.


Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI

AI ibm
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

24/04/2026

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

24/04/2026

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

24/04/2026

Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model

23/04/2026
Top News

This ultra-realistic platform is behind the rise of romance scams.

AI Humanoids are Here: Move aside, chatbots!

Microsoft, Meta, and Google Triple Up on AI Spending

OpenAI Anthropic Block are teaming up to create AI agents that play nicely

The US should build data centers in the following locations

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Tracing OpenAI Agent Responses using MLFlow

15/07/2025

The MiniMax MiniMax M2.7 is a Self-Evolving Model with a Score of 56.22% SWE Pro and 57.0% Terminal Bench 2.

12/04/2026
Latest News

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

24/04/2026

Your Favorite AI Gay Thirst Traps: The Men Behind them

24/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.