docs: add architecture outline (#341)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Panos Vagenas
2024-11-15 12:52:41 +01:00
committed by GitHub
parent 835e077b02
commit 25fd149c38
7 changed files with 23 additions and 9 deletions

View File

@@ -0,0 +1,19 @@
![docling_architecture](../assets/docling_arch.png)
In a nutshell, Docling's architecture is outlined in the diagram above.
For each document format, the *document converter* knows which format-specific *backend* to employ for parsing the document and which *pipeline* to use for orchestrating the execution, along with any relevant *options*.
!!! tip
While the document converter holds a default mapping, this configuration is parametrizable, so e.g. for the PDF format, different backends and different pipeline options can be used — see [Usage](../usage.md#adjust-pipeline-features).
The *conversion result* contains the [*Docling document*](./docling_document.md), Docling's fundamental document representation.
Some typical scenarios for using a Docling document include directly calling its *export methods*, such as for markdown, dictionary etc., or having it chunked by a *chunker*.
For more details on Docling's architecture, check out the [Docling Technical Report](https://arxiv.org/abs/2408.09869).
!!! note
The components illustrated with dashed outline indicate base classes that can be subclassed for specialized implementations.

View File

@@ -1,3 +1 @@
In this area you can find guides on the main Docling concepts.
Use the navigation on the left to browse through them.
Use the navigation on the left to browse some core Docling concepts.