docling/docs/supported_formats.md
Panos Vagenas 68272b987a docs: document Docling JSON parsing
Also:
- factored out and expanded supported formats
- reorged feature list

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-27 17:23:39 +01:00

1.1 KiB

Docling can parse various documents formats into a unified representation (Docling document), which it can export to different formats too — check out Architecture for more details.

Below you can find a listing of all supported input and output formats.

Supported input formats

Format Description
PDF
DOCX, XLSX, PPTX Default formats in MS Office 2007+, based on Office Open XML
Markdown
AsciiDoc
HTML, XHTML
PNG, JPEG, TIFF, BMP Image formats

Schema-specific support:

Format Description
USPTO XML XML format followed by USPTO patents
PMC XML XML format followed by PubMed Central® articles
Docling JSON JSON-serialized Docling Document

Supported output formats

Format Description
HTML Docling supports both image embedding and referencing
Markdown
JSON Lossless serialization of Docling Document
Doctags