docs: document Docling JSON parsing (#819)

* docs: document Docling JSON parsing Also: - factored out and expanded supported formats - reorged feature list Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * update feature list, minor fixes Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-12-08 20:58:11 +00:00 · 2025-01-28 13:23:30 +01:00
parent 5139b48e4e
commit 6875913e34
5 changed files with 70 additions and 34 deletions
--- a/docs/index.md
+++ b/docs/index.md
@@ -14,20 +14,21 @@
 [![License MIT](https://img.shields.io/github/license/DS4SD/docling)](https://opensource.org/licenses/MIT)
 [![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling)

-Docling parses documents and exports them to the desired format with ease and speed.
+Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

 ## Features

-* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
-* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
-* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
-* 🤖 Plug-and-play [integrations](https://ds4sd.github.io/docling/integrations/) incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
-* 🔍 OCR support for scanned PDFs
+* 🗂️ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, XLSX, HTML, images, and more
+* 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
+* 🧬 Unified, expressive [DoclingDocument][docling_document] representation format
+* ↪️ Various [export formats][supported_formats] and options, including Markdown, HTML, and lossless JSON
+* 🔒 Local execution capabilities for sensitive data and air-gapped environments
+* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
+* 🔍 Extensive OCR support for scanned PDFs and images
 * 💻 Simple and convenient CLI

 ### Coming soon

-* ♾️ Equation & code extraction
 * 📝 Metadata extraction, including title, authors, references & language

 ## Get started
@@ -42,3 +43,7 @@ Docling parses documents and exports them to the desired format with ease and sp
 ## IBM ❤️ Open Source AI

 Docling has been brought to you by IBM.
+
+[supported_formats]: ./supported_formats.md
+[docling_document]: ./concepts/docling_document.md
+[integrations]: ./integrations/index.md