docling/docs/index.md
Christoph Auer fa5d972291 Merge remaining changes from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:52:16 +02:00

1.8 KiB

Docling

Docling

arXiv PyPI version Python Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT

Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.

Features

  • Converts any PDF document to JSON or Markdown format, stable and lightning fast
  • 📑 Understands detailed page layout, reading order and recovers table structures
  • 📝 Extracts metadata from the document, such as title, authors, references and language
  • 🔍 Includes OCR support for scanned PDFs
  • 🤖 Integrates easily with LLM app / RAG frameworks like LlamaIndex 🦙 & LangChain 🦜🔗
  • 💻 Provides a simple and convenient CLI