mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-27 20:44:16 +00:00
1.8 KiB
1.8 KiB
Docling
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
Features
- ⚡ Converts PDF, Word, Powerpoint or HTML documents to JSON or Markdown format, stable and lightning fast
- 📑 Understands detailed page layout, reading order and recovers table structures
- 📝 Extracts metadata from the document, such as title, authors, references and language
- 🔍 Includes OCR support for scanned PDFs or image formats
- 🤖 Integrates easily with LLM app / RAG frameworks like LlamaIndex 🦙 & LangChain 🦜🔗
- 💻 Provides a simple and convenient CLI