mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
docs: extend chunking docs, add FAQ on token limit (#1053)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
This commit is contained in:
@@ -1,5 +1,18 @@
|
||||
## Introduction
|
||||
|
||||
!!! note "Chunking approaches"
|
||||
|
||||
Starting from a `DoclingDocument`, there are in principle two possible chunking
|
||||
approaches:
|
||||
|
||||
1. exporting the `DoclingDocument` to Markdown (or similar format) and then
|
||||
performing user-defined chunking as a post-processing step, or
|
||||
2. using native Docling chunkers, i.e. operating directly on the `DoclingDocument`
|
||||
|
||||
This page is about the latter, i.e. using native Docling chunkers.
|
||||
For an example of using approach (1) check out e.g.
|
||||
[this recipe](../examples/rag_langchain.ipynb) looking at the Markdown export mode.
|
||||
|
||||
A *chunker* is a Docling abstraction that, given a
|
||||
[`DoclingDocument`](./docling_document.md), returns a stream of chunks, each of which
|
||||
captures some part of the document as a string accompanied by respective metadata.
|
||||
|
||||
Reference in New Issue
Block a user