docs: extend chunking docs, add FAQ on token limit (#1053)

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
This commit is contained in:
Panos Vagenas
2025-02-25 13:07:38 +01:00
committed by GitHub
parent 1b0ead6907
commit c84b973959
2 changed files with 50 additions and 1 deletions

View File

@@ -1,5 +1,18 @@
## Introduction
!!! note "Chunking approaches"
Starting from a `DoclingDocument`, there are in principle two possible chunking
approaches:
1. exporting the `DoclingDocument` to Markdown (or similar format) and then
performing user-defined chunking as a post-processing step, or
2. using native Docling chunkers, i.e. operating directly on the `DoclingDocument`
This page is about the latter, i.e. using native Docling chunkers.
For an example of using approach (1) check out e.g.
[this recipe](../examples/rag_langchain.ipynb) looking at the Markdown export mode.
A *chunker* is a Docling abstraction that, given a
[`DoclingDocument`](./docling_document.md), returns a stream of chunks, each of which
captures some part of the document as a string accompanied by respective metadata.