diff --git a/README.md b/README.md index 8ed7971a..256fc0c8 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ Docling parses documents and exports them to the desired format with ease and sp * 🗂️ Multi-format support for input (PDF, DOCX, PPTX, HTML, AsciiDoc, MarkDown) and output (Markdown, JSON, YAML) * 📑 Advanced PDF document understanding incl. page layout, reading order & table structures -* 🧩 Strongly typed Pydantic v2 data structure named [DoclingDocument](https://github.com/DS4SD/docling-core/blob/main/docling_core/types/doc/document.py#L945) which supports hierarchies and provides native iterators and chunkers. +* 🧩 Strongly typed Pydantic v2 data structure named [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) which supports hierarchies and provides native iterators and chunkers. * 📝 Metadata extraction, including title, authors, references & language * 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications * 🔍 OCR support for scanned PDFs diff --git a/docs/concepts/docling_document.md b/docs/concepts/docling_document.md index 00b5452f..1ac46f55 100644 --- a/docs/concepts/docling_document.md +++ b/docs/concepts/docling_document.md @@ -7,6 +7,8 @@ pydantic datatype, which can express several features common to documents, such * Layout information (i.e. bounding boxes) for all items, if available * Provenance information +The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc). + It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch. ## Example document structures