mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat!: Docling v2 (#117)
--------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Maxim Lysak <mly@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Maxim Lysak <mly@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
@@ -17,13 +17,13 @@
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
|
||||
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
|
||||
Docling parses documents and exports them to the desired format with ease and speed.
|
||||
|
||||
## Features
|
||||
|
||||
* ⚡ Converts any PDF document to JSON or Markdown format, stable and lightning fast
|
||||
* 📑 Understands detailed page layout, reading order and recovers table structures
|
||||
* 📝 Extracts metadata from the document, such as title, authors, references and language
|
||||
* 🔍 Includes OCR support for scanned PDFs
|
||||
* 🤖 Integrates easily with LLM app / RAG frameworks like LlamaIndex 🦙 & LangChain 🦜🔗
|
||||
* 💻 Provides a simple and convenient CLI
|
||||
* 🗂️ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
|
||||
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
|
||||
* 📝 Metadata extraction, including title, authors, references & language
|
||||
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
||||
* 🔍 OCR support for scanned PDFs
|
||||
* 💻 Simple and convenient CLI
|
||||
|
||||
Reference in New Issue
Block a user