[tutorial] Docling + ibm-granite/granite-docling-258M
Note: This page is an AI-generated (gpt-5-mini-2025-08-07) translation from Traditional Chinese and may contain minor inaccuracies.
๐ Introduction
This demonstrates how to use the Docling document conversion tool with IBMโs newly released VLM ibm-granite/granite-docling-258M
to convert PDFs, images, and other files into structured Markdown or HTML formats for easier downstream use with LLMs.
๐ Introducing the document conversion tool + Vision Language Model
Docling
- Input: PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, โฆ)
- Output: a unified DoclingDocument
- Purpose: provide a unified and simple tool to convert various documents into a structured (LLM-Ready) format
Vision Language Model (VLM)
This test uses IBMโs newly released VLM model ibm-granite/granite-docling-258M
. Compared to the older ds4sd/SmolDocling-256M-preview
, with nearly the same number of parameters, the model size was reduced from 3.55 GB to 530 MB.
VLM used by the older Docling
VLM used by the new Docling
Practical steps
- Docling +
ibm-granite/granite-docling-258M
1 | uv add docling |
Supported VLM models
- Old:
vlm_model_specs.SMOLDOCLING_MLX
- New:
vlm_model_specs.GRANITEDOCLING_MLX
Module overview
VlmPipelineOptions
: configuration for using a VLM for document conversion (e.g., model name, saving images, etc.)DocumentConverter
: configures processing methods for different input formats
Image handling modes
1 | class ImageRefMode(str, Enum): |
PLACEHOLDER
: use<!-- IMAGE -->
to represent images (does not save the image)EMBEDDED
: convert the image to base64 and store it directly in the converted documentREFERENCED
: reference the image via a URI
Full code
1 | from pathlib import Path |
๐ Key takeaways
- Install docling and use the latest model
ibm-granite/granite-docling-258M
for recognition - On Macs with M-series chips you can additionally install
mlx-vlm
and choose models that support acceleration - Two output modes: export directly to Markdown or HTML
๐ References
[tutorial] Docling + ibm-granite/granite-docling-258M
https://hsiangjenli.github.io/blog/tutorial-docling-ibm-granite-granite-docling-258m.en/