docs: Enrichment models (#1097)

* warning for develop examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for enrichment models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* minor reorg of top-level docs (#1098)

* minor reorg of top-level docs

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* fix typo [no ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* trigger ci

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Michele Dolfi
2025-03-04 14:24:38 +01:00
committed by GitHub
parent b1e79cadc7
commit 357d41cc47
10 changed files with 250 additions and 20 deletions

View File

@@ -0,0 +1,35 @@
Docling can parse various documents formats into a unified representation (Docling
Document), which it can export to different formats too — check out
[Architecture](../concepts/architecture.md) for more details.
Below you can find a listing of all supported input and output formats.
## Supported input formats
| Format | Description |
|--------|-------------|
| PDF | |
| DOCX, XLSX, PPTX | Default formats in MS Office 2007+, based on Office Open XML |
| Markdown | |
| AsciiDoc | |
| HTML, XHTML | |
| CSV | |
| PNG, JPEG, TIFF, BMP | Image formats |
Schema-specific support:
| Format | Description |
|--------|-------------|
| USPTO XML | XML format followed by [USPTO](https://www.uspto.gov/patents) patents |
| JATS XML | XML format followed by [JATS](https://jats.nlm.nih.gov/) articles |
| Docling JSON | JSON-serialized [Docling Document](../concepts/docling_document.md) |
## Supported output formats
| Format | Description |
|--------|-------------|
| HTML | Both image embedding and referencing are supported |
| Markdown | |
| JSON | Lossless serialization of Docling Document |
| Text | Plain text, i.e. without Markdown markers |
| Doctags | |