docling/docs/index.md
Panos Vagenas 68272b987a docs: document Docling JSON parsing
Also:
- factored out and expanded supported formats
- reorged feature list

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-27 17:23:39 +01:00

3.1 KiB

Docling DS4SD%2Fdocling | Trendshift

arXiv PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT PyPI Downloads

Docling parses documents and exports them to the desired format with ease and speed.

Features

  • 🗂️ Parsing of multiple documents formats incl. PDF, DOCX, XLSX, HTML, images, & more
  • 📑 Advanced PDF understanding including page layout, reading order & table structure
  • 🧬 Unified, expressive DoclingDocument representation format
  • ↪️ Various export formats and options, including Markdown, HTML, and lossless JSON
  • 🔒 Local execution capabilities for sensitive data and air-gapped environments
  • 🤖 Plug-and-play integrations incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
  • 🔍 OCR support for scanned PDFs and images
  • 💻 Simple and convenient CLI

Coming soon

  • ♾️ Equation & code extraction
  • 📝 Metadata extraction, including title, authors, references & language

Get started

IBM ❤️ Open Source AI

Docling has been brought to you by IBM.