mirror of
https://github.com/DS4SD/docling.git
synced 2025-08-02 07:22:14 +00:00
Also: - factored out and expanded supported formats - reorged feature list Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
3.1 KiB
3.1 KiB
Docling parses documents and exports them to the desired format with ease and speed.
Features
- 🗂️ Parsing of multiple documents formats incl. PDF, DOCX, XLSX, HTML, images, & more
- 📑 Advanced PDF understanding including page layout, reading order & table structure
- 🧬 Unified, expressive DoclingDocument representation format
- ↪️ Various export formats and options, including Markdown, HTML, and lossless JSON
- 🔒 Local execution capabilities for sensitive data and air-gapped environments
- 🤖 Plug-and-play integrations incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
- 🔍 OCR support for scanned PDFs and images
- 💻 Simple and convenient CLI
Coming soon
- ♾️ Equation & code extraction
- 📝 Metadata extraction, including title, authors, references & language
Get started
Concepts
Learn Docling fundamendals Examples
Try out recipes for various use cases, including conversion, RAG, and more Integrations
Check out integrations with popular frameworks and tools Reference
See more API details
Learn Docling fundamendals Examples
Try out recipes for various use cases, including conversion, RAG, and more Integrations
Check out integrations with popular frameworks and tools Reference
See more API details
IBM ❤️ Open Source AI
Docling has been brought to you by IBM.