[](https://arxiv.org/abs/2408.09869)
[](https://pypi.org/project/docling/)
[](https://pypi.org/project/docling/)
[](https://python-poetry.org/)
[](https://github.com/psf/black)
[](https://pycqa.github.io/isort/)
[](https://pydantic.dev)
[](https://github.com/pre-commit/pre-commit)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/projects/docling)
Docling parses documents and exports them to the desired format with ease and speed.
## Features
* ποΈ Parsing of [multiple documents formats][supported_formats] incl. PDF, DOCX, XLSX, HTML, images, & more
* π Advanced PDF understanding including page layout, reading order & table structure
* 𧬠Unified, expressive [DoclingDocument][docling_document] representation format
* βͺοΈ Various [export formats][supported_formats] and options, including Markdown, HTML, and lossless JSON
* π Local execution capabilities for sensitive data and air-gapped environments
* π€ Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
* π OCR support for scanned PDFs and images
* π» Simple and convenient CLI
### Coming soon
* βΎοΈ Equation & code extraction
* π Metadata extraction, including title, authors, references & language
## Get started
## IBM β€οΈ Open Source AI
Docling has been brought to you by IBM.
[supported_formats]: ./supported_formats.md
[docling_document]: ./concepts/docling_document.md
[integrations]: ./integrations/index.md