mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
docs: Enrichment models (#1097)
* warning for develop examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs for enrichment models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * minor reorg of top-level docs (#1098) * minor reorg of top-level docs Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * fix typo [no ci] Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * trigger ci Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
35
docs/usage/supported_formats.md
Normal file
35
docs/usage/supported_formats.md
Normal file
@@ -0,0 +1,35 @@
|
||||
Docling can parse various documents formats into a unified representation (Docling
|
||||
Document), which it can export to different formats too — check out
|
||||
[Architecture](../concepts/architecture.md) for more details.
|
||||
|
||||
Below you can find a listing of all supported input and output formats.
|
||||
|
||||
## Supported input formats
|
||||
|
||||
| Format | Description |
|
||||
|--------|-------------|
|
||||
| PDF | |
|
||||
| DOCX, XLSX, PPTX | Default formats in MS Office 2007+, based on Office Open XML |
|
||||
| Markdown | |
|
||||
| AsciiDoc | |
|
||||
| HTML, XHTML | |
|
||||
| CSV | |
|
||||
| PNG, JPEG, TIFF, BMP | Image formats |
|
||||
|
||||
Schema-specific support:
|
||||
|
||||
| Format | Description |
|
||||
|--------|-------------|
|
||||
| USPTO XML | XML format followed by [USPTO](https://www.uspto.gov/patents) patents |
|
||||
| JATS XML | XML format followed by [JATS](https://jats.nlm.nih.gov/) articles |
|
||||
| Docling JSON | JSON-serialized [Docling Document](../concepts/docling_document.md) |
|
||||
|
||||
## Supported output formats
|
||||
|
||||
| Format | Description |
|
||||
|--------|-------------|
|
||||
| HTML | Both image embedding and referencing are supported |
|
||||
| Markdown | |
|
||||
| JSON | Lossless serialization of Docling Document |
|
||||
| Text | Plain text, i.e. without Markdown markers |
|
||||
| Doctags | |
|
||||
Reference in New Issue
Block a user