{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q docling[vlm] ipython" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from docling.datamodel.base_models import InputFormat\n", "from docling.datamodel.pipeline_options import (\n", " PdfPipelineOptions,\n", " granite_picture_description,\n", " smolvlm_picture_description,\n", ")\n", "from docling.document_converter import DocumentConverter, PdfFormatOption" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9d3bb7b3b4fd4640af40289dd7bf50d7", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/2 [00:00Picture #/pictures/0

Caption

Figure 1: Sketch of Docling's pipelines and usage model. Both PDF pipeline and simple pipeline build up a DoclingDocument representation, which can be further enriched. Downstream applications can utilize Docling's API to inspect, export, or chunk the document for various purposes.

Annotations

In this image we can see a poster with some text and images.
\n", "

Picture #/pictures/1


Caption

Figure 2: Dataset categories and sample counts for documents and pages.

Annotations

In this image we can see a pie chart. In the pie chart we can see the categories and the number of documents in each category.
\n", "

Picture #/pictures/2


Caption

Figure 3: Distribution of conversion times for all documents, ordered by number of pages in a document, on all system configurations. Every dot represents one document. Log/log scale is used to even the spacing, since both number of pages and conversion times have long-tail distributions.

Annotations

In this image we can see a graph. On the x-axis we can see the number of pages. On the y-axis we can see the seconds.
\n", "

Picture #/pictures/3


Caption

Figure 4: Contributions of PDF backend and AI models to the conversion time of a page (in seconds per page). Lower is better. Left: Ranges of time contributions for each model to pages it was applied on (i.e., OCR was applied only on pages with bitmaps, table structure was applied only on pages with tables). Right: Average time contribution to a page in the benchmark dataset (factoring in zero-time contribution for OCR and table structure models on pages without bitmaps or tables) .

Annotations

In this image we can see a bar chart and a line chart. In the bar chart we can see the values of Pdf Parse, OCR, Layout, Table Structure, Page Total and Page. In the line chart we can see the values of Pdf Parse, OCR, Layout, Table Structure, Page Total and Page.
\n", "

Picture #/pictures/4


Caption

Figure 5: Conversion time in seconds per page on our dataset in three scenarios, across all assets and system configurations. Lower bars are better. The configuration includes OCR and table structure recognition ( fast table option on Docling and MinerU, hi res in unstructured, as shown in table 1).

Annotations

In this image we can see a bar chart. In the chart we can see the CPU, Max, GPU, and sec/page.
\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from docling_core.types.doc.document import PictureDescriptionData\n", "from IPython import display\n", "\n", "html_buffer = []\n", "# display the first 5 pictures and their captions and annotations:\n", "for pic in doc.pictures[:5]:\n", " html_item = (\n", " f\"

Picture {pic.self_ref}

\"\n", " f'
'\n", " f\"

Caption

{pic.caption_text(doc=doc)}
\"\n", " )\n", " for annotation in pic.annotations:\n", " if not isinstance(annotation, PictureDescriptionData):\n", " continue\n", " html_item += f\"

Annotations

{annotation.text}
\\n\"\n", " html_buffer.append(html_item)\n", "display.HTML(\"
\".join(html_buffer))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "docling-aMWN2FRM-py3.12", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 2 }