mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
docs: add serialization docs, update chunking docs (#1556)
* docs: add serializers docs, update chunking docs Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update notebook to improve MD table rendering Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
This commit is contained in:
@@ -31,7 +31,7 @@ The `BaseChunker` base class API defines that any chunker should provide the fol
|
||||
|
||||
- `def chunk(self, dl_doc: DoclingDocument, **kwargs) -> Iterator[BaseChunk]`:
|
||||
Returning the chunks for the provided document.
|
||||
- `def serialize(self, chunk: BaseChunk) -> str`:
|
||||
- `def contextualize(self, chunk: BaseChunk) -> str`:
|
||||
Returning the potentially metadata-enriched serialization of the chunk, typically
|
||||
used to feed an embedding model (or generation model).
|
||||
|
||||
@@ -44,10 +44,14 @@ The `BaseChunker` base class API defines that any chunker should provide the fol
|
||||
from docling.chunking import HybridChunker
|
||||
```
|
||||
- If you are only using the `docling-core` package, you must ensure to install
|
||||
the `chunking` extra, e.g.
|
||||
the `chunking` extra if you want to use HuggingFace tokenizers, e.g.
|
||||
```shell
|
||||
pip install 'docling-core[chunking]'
|
||||
```
|
||||
or the `chunking-openai` extra if you prefer Open AI tokenizers (tiktoken), e.g.
|
||||
```shell
|
||||
pip install 'docling-core[chunking-openai]'
|
||||
```
|
||||
and then you
|
||||
can import as follows:
|
||||
```python
|
||||
|
||||
Reference in New Issue
Block a user