mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
docs: add architecture outline (#341)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
19
docs/concepts/architecture.md
Normal file
19
docs/concepts/architecture.md
Normal file
@@ -0,0 +1,19 @@
|
||||

|
||||
|
||||
In a nutshell, Docling's architecture is outlined in the diagram above.
|
||||
|
||||
For each document format, the *document converter* knows which format-specific *backend* to employ for parsing the document and which *pipeline* to use for orchestrating the execution, along with any relevant *options*.
|
||||
|
||||
!!! tip
|
||||
|
||||
While the document converter holds a default mapping, this configuration is parametrizable, so e.g. for the PDF format, different backends and different pipeline options can be used — see [Usage](../usage.md#adjust-pipeline-features).
|
||||
|
||||
The *conversion result* contains the [*Docling document*](./docling_document.md), Docling's fundamental document representation.
|
||||
|
||||
Some typical scenarios for using a Docling document include directly calling its *export methods*, such as for markdown, dictionary etc., or having it chunked by a *chunker*.
|
||||
|
||||
For more details on Docling's architecture, check out the [Docling Technical Report](https://arxiv.org/abs/2408.09869).
|
||||
|
||||
!!! note
|
||||
|
||||
The components illustrated with dashed outline indicate base classes that can be subclassed for specialized implementations.
|
||||
Reference in New Issue
Block a user