docling/docs/index.md
Christoph Auer c123e5a812 Update docling-core pinnings
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 11:43:23 +02:00

1.8 KiB

Docling

Docling

arXiv PyPI version Python Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT

Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.

Features

  • Converts PDF, Word, Powerpoint or HTML documents to JSON or Markdown format, stable and lightning fast
  • 📑 Understands detailed page layout, reading order and recovers table structures
  • 📝 Extracts metadata from the document, such as title, authors, references and language
  • 🔍 Includes OCR support for scanned PDFs or image formats
  • 🤖 Integrates easily with LLM app / RAG frameworks like LlamaIndex 🦙 & LangChain 🦜🔗
  • 💻 Provides a simple and convenient CLI