[tutorial] Docling + ibm-granite/granite-docling-258M

Note: This page is an AI-generated (gpt-5-mini-2025-08-07) translation from Traditional Chinese and may contain minor inaccuracies.

๐Ÿ“Œ Introduction

This demonstrates how to use the Docling document conversion tool with IBMโ€™s newly released VLM ibm-granite/granite-docling-258M to convert PDFs, images, and other files into structured Markdown or HTML formats for easier downstream use with LLMs.

๐Ÿš€ Introducing the document conversion tool + Vision Language Model

Docling

  • Input: PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, โ€ฆ)
  • Output: a unified DoclingDocument
  • Purpose: provide a unified and simple tool to convert various documents into a structured (LLM-Ready) format

Vision Language Model (VLM)

This test uses IBMโ€™s newly released VLM model ibm-granite/granite-docling-258M. Compared to the older ds4sd/SmolDocling-256M-preview, with nearly the same number of parameters, the model size was reduced from 3.55 GB to 530 MB.

VLM used by the older Docling

image

VLM used by the new Docling

image

Practical steps

  • Docling + ibm-granite/granite-docling-258M
1
2
uv add docling 
uv add mlx-vlm # I use a Mac; install only if you need acceleration with M-series chips

Supported VLM models

  • Old: vlm_model_specs.SMOLDOCLING_MLX
  • New: vlm_model_specs.GRANITEDOCLING_MLX

Module overview

  • VlmPipelineOptions: configuration for using a VLM for document conversion (e.g., model name, saving images, etc.)
  • DocumentConverter: configures processing methods for different input formats

Image handling modes

1
2
3
4
5
6
class ImageRefMode(str, Enum):
"""ImageRefMode."""

PLACEHOLDER = "placeholder" # just a place-holder
EMBEDDED = "embedded" # embed the image as a base64
REFERENCED = "referenced" # reference the image via uri
  • PLACEHOLDER: use <!-- IMAGE --> to represent images (does not save the image)
  • EMBEDDED: convert the image to base64 and store it directly in the converted document
  • REFERENCED: reference the image via a URI

Full code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from pathlib import Path
from docling.datamodel.base_models import InputFormat
from docling.document_converter import (
DocumentConverter,
ImageFormatOption,
PdfFormatOption,
)
from docling.pipeline.vlm_pipeline import VlmPipeline
from docling.datamodel.pipeline_options import VlmPipelineOptions
from docling.datamodel import vlm_model_specs
from docling_core.types.doc import ImageRefMode

import os

model = vlm_model_specs.GRANITEDOCLING_MLX

vlm_pipeline_options = VlmPipelineOptions(
vlm_options=model,
generate_page_images=True,
generate_picture_images=False,
images_scale=2.0,
do_picture_description=True,
)

converter = DocumentConverter(
format_options={
InputFormat.IMAGE: ImageFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=vlm_pipeline_options,
),
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=vlm_pipeline_options,
),
}
)

FILE_PATHS = os.listdir("data/raw/")

output_dir = Path(f"data/output/{model.repo_id.replace('/', '_')}")
output_dir.mkdir(parents=True, exist_ok=True)

for file_path in FILE_PATHS:
input_file_path = Path(f"data/raw/{file_path}")
output_file_path = output_dir / f"{input_file_path.stem}.md"

doc = converter.convert(source=input_file_path, raises_on_error=True).document
output_file_path.parent.mkdir(parents=True, exist_ok=True)

with open(output_file_path, "w", encoding="utf-8") as f:
f.write(doc.export_to_markdown(image_mode=ImageRefMode.EMBEDDED))

with open(output_file_path.with_suffix(".html"), "w", encoding="utf-8") as f:
f.write(
doc.export_to_html(image_mode=ImageRefMode.EMBEDDED, split_page_view=True)
)

๐Ÿ” Key takeaways

  • Install docling and use the latest model ibm-granite/granite-docling-258M for recognition
  • On Macs with M-series chips you can additionally install mlx-vlm and choose models that support acceleration
  • Two output modes: export directly to Markdown or HTML

๐Ÿ”— References

Author

Hsiang-Jen Li

Posted on

2025-09-23

Updated on

2025-09-23

Licensed under