feat: Introduce support for GPU Accelerators (#593)

* Upgraded Layout Postprocessing, sending old code back to ERZ

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Implement hierachical cluster layout processing

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pass nested cluster processing through full pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pass nested clusters through GLM as payload

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move to_docling_document from ds-glm to this repo

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Clean up imports again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI.
- Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run.
- Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting.
- Refactor the way how the docling-ibm-models are called to match the new init signature of models.
- Translate the accelerator options to the specific inputs for third-party models.
- Extend the docling CLI with parameters to set the num_threads and device.
- Add new unit tests.
- Write new example how to use the accelerator options.

* fix: Improve the pydantic objects in the pipeline_options and imports.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Updated test ground-truth

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated test ground-truth (again), bugfix for empty layout

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Do proper check to set the device in EasyOCR, RapidOCR.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Rollback changes from main

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update test gt

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unused debug settings

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Review fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Nail the accelerator defaults for MPS

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
This commit is contained in:
Nikos Livathinos
2024-12-13 17:45:22 +01:00
committed by GitHub
parent 365a1e7b98
commit 19fad9261c
38 changed files with 384 additions and 93 deletions

View File

@@ -1 +1 @@
{"_name": "", "type": "pdf-document", "description": {"title": null, "abstract": null, "authors": null, "affiliations": null, "subjects": null, "keywords": null, "publication_date": null, "languages": null, "license": null, "publishers": null, "url_refs": null, "references": null, "publication": null, "reference_count": null, "citation_count": null, "citation_date": null, "advanced": null, "analytics": null, "logs": [], "collection": null, "acquisition": null}, "file-info": {"filename": "ocr_test.pdf", "filename-prov": null, "document-hash": "73f23122e9edbdb0a115b448e03c8064a0ea8bdc21d02917ce220cf032454f31", "#-pages": 1, "collection-name": null, "description": null, "page-hashes": [{"hash": "8c5c5b766c1bdb92242142ca37260089b02380f9c57729703350f646cdf4771e", "model": "default", "page": 1}]}, "main-text": [{"prov": [{"bbox": [70.90211486816406, 689.2166748046875, 504.87200927734375, 765.0995483398438], "page": 1, "span": [0, 94], "__ref_s3_data": null}], "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package", "type": "paragraph", "name": "Text", "font": null}], "figures": [], "tables": [], "bitmaps": null, "equations": [], "footnotes": [], "page-dimensions": [{"height": 841.9216918945312, "page": 1, "width": 595.201171875}], "page-footers": [], "page-headers": [], "_s3_data": null, "identifiers": null}
{"_name": "", "type": "pdf-document", "description": {"title": null, "abstract": null, "authors": null, "affiliations": null, "subjects": null, "keywords": null, "publication_date": null, "languages": null, "license": null, "publishers": null, "url_refs": null, "references": null, "publication": null, "reference_count": null, "citation_count": null, "citation_date": null, "advanced": null, "analytics": null, "logs": [], "collection": null, "acquisition": null}, "file-info": {"filename": "ocr_test.pdf", "filename-prov": null, "document-hash": "80f38f5b87a84870681556176a9622186fd200dd32c5557be9e0c0af05b8bc61", "#-pages": 1, "collection-name": null, "description": null, "page-hashes": [{"hash": "14d896dc8bcb7ee7c08c0347eb6be8dcb92a3782501992f1ea14d2e58077d4e3", "model": "default", "page": 1}]}, "main-text": [{"prov": [{"bbox": [69.6796646118164, 689.012451171875, 504.87200927734375, 765.0995483398438], "page": 1, "span": [0, 94], "__ref_s3_data": null}], "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package", "type": "paragraph", "payload": null, "name": "Text", "font": null}], "figures": [], "tables": [], "bitmaps": null, "equations": [], "footnotes": [], "page-dimensions": [{"height": 841.9216918945312, "page": 1, "width": 595.201171875}], "page-footers": [], "page-headers": [], "_s3_data": null, "identifiers": null}

View File

@@ -1 +1 @@
[{"page_no": 0, "size": {"width": 595.201171875, "height": 841.9216918945312}, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 70.90211866351085, "t": 76.82212829589844, "r": 504.8720079864275, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}, "confidence": 0.9715733528137207, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 70.90211866351085, "t": 76.82212829589844, "r": 504.8720079864275, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}, "confidence": 0.9715733528137207, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "body": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 70.90211866351085, "t": 76.82212829589844, "r": 504.8720079864275, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}, "confidence": 0.9715733528137207, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "headers": []}}]
[{"page_no": 0, "size": {"width": 595.201171875, "height": 841.9216918945312}, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 69.6796630536824, "t": 76.82213592529297, "r": 504.8720051760782, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}, "confidence": 0.9715732336044312, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 69.6796630536824, "t": 76.82213592529297, "r": 504.8720051760782, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}, "confidence": 0.9715732336044312, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "body": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 69.6796630536824, "t": 76.82213592529297, "r": 504.8720051760782, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}, "confidence": 0.9715732336044312, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "headers": []}}]

View File

@@ -1 +1 @@
{"schema_name": "DoclingDocument", "version": "1.0.0", "name": "ocr_test", "origin": {"mimetype": "application/pdf", "binary_hash": 14853448746796404529, "filename": "ocr_test.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}], "name": "_root_", "label": "unspecified"}, "groups": [], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 70.90211486816406, "t": 765.0995483398438, "r": 504.87200927734375, "b": 689.2166748046875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 94]}], "orig": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package", "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "pictures": [], "tables": [], "key_value_items": [], "pages": {"1": {"size": {"width": 595.201171875, "height": 841.9216918945312}, "image": null, "page_no": 1}}}
{"schema_name": "DoclingDocument", "version": "1.0.0", "name": "ocr_test", "origin": {"mimetype": "application/pdf", "binary_hash": 14853448746796404529, "filename": "ocr_test.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}], "name": "_root_", "label": "unspecified"}, "groups": [], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 69.6796646118164, "t": 765.0995483398438, "r": 504.87200927734375, "b": 689.012451171875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 94]}], "orig": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package", "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "pictures": [], "tables": [], "key_value_items": [], "pages": {"1": {"size": {"width": 595.201171875, "height": 841.9216918945312}, "image": null, "page_no": 1}}}

View File

@@ -1 +1 @@
[{"page_no": 0, "size": {"width": 595.201171875, "height": 841.9216918945312}, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 70.90211866351085, "t": 76.82212829589844, "r": 504.8720079864275, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}, "confidence": 0.9715733528137207, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 70.90211866351085, "t": 76.82212829589844, "r": 504.8720079864275, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}, "confidence": 0.9715733528137207, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "body": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 70.90211866351085, "t": 76.82212829589844, "r": 504.8720079864275, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}, "confidence": 0.9715733528137207, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 70.90211866351085, "t": 102.66666671251767, "r": 504.8720079864275, "b": 124.83139551297336, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 73.10852522817731, "t": 130.0013615789096, "r": 153.04479435252625, "b": 152.70503335218427, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "headers": []}}]
[{"page_no": 0, "size": {"width": 595.201171875, "height": 841.9216918945312}, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 69.6796630536824, "t": 76.82213592529297, "r": 504.8720051760782, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}, "confidence": 0.9715732336044312, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 69.6796630536824, "t": 76.82213592529297, "r": 504.8720051760782, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}, "confidence": 0.9715732336044312, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "body": [{"label": "text", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "text", "bbox": {"l": 69.6796630536824, "t": 76.82213592529297, "r": 504.8720051760782, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}, "confidence": 0.9715732336044312, "cells": [{"id": 0, "text": "Docling bundles PDF document conversion to", "bbox": {"l": 73.34702132031646, "t": 76.99999977896755, "r": 503.64955224479564, "b": 97.99999977896755, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "JSON and Markdown in an easy self contained", "bbox": {"l": 69.6796630536824, "t": 104.00000011573798, "r": 504.8720051760782, "b": 124.83139494707746, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "package", "bbox": {"l": 71.84193505100733, "t": 129.79712523204603, "r": 153.088934155825, "b": 152.90926970226087, "coord_origin": "TOPLEFT"}}]}, "text": "Docling bundles PDF document conversion to JSON and Markdown in an easy self contained package"}], "headers": []}}]