mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat: add a backend parser for WebVTT files (#2288)
* feat: add a backend parser for WebVTT files Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: update README with VTT support Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: add description to supported formats Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * chore: upgrade docling-core to unescape WebVTT in markdown Pin the new release of docling-core 2.48.2. Do not escape HTML reserved characters when exporting WebVTT documents to markdown. Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * test: add missing copyright notice Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
b5628f1227
commit
46efaaefee
5
docs/usage/supported_formats.md
vendored
5
docs/usage/supported_formats.md
vendored
@@ -11,10 +11,11 @@ Below you can find a listing of all supported input and output formats.
|
||||
| PDF | |
|
||||
| DOCX, XLSX, PPTX | Default formats in MS Office 2007+, based on Office Open XML |
|
||||
| Markdown | |
|
||||
| AsciiDoc | |
|
||||
| AsciiDoc | Human-readable, plain-text markup language for structured technical content |
|
||||
| HTML, XHTML | |
|
||||
| CSV | |
|
||||
| PNG, JPEG, TIFF, BMP, WEBP | Image formats |
|
||||
| WebVTT | Web Video Text Tracks format for displaying timed text |
|
||||
|
||||
Schema-specific support:
|
||||
|
||||
@@ -32,4 +33,4 @@ Schema-specific support:
|
||||
| Markdown | |
|
||||
| JSON | Lossless serialization of Docling Document |
|
||||
| Text | Plain text, i.e. without Markdown markers |
|
||||
| Doctags | |
|
||||
| [Doctags](https://arxiv.org/pdf/2503.11576) | Markup format for efficiently representing the full content and layout characteristics of a document |
|
||||
|
||||
Reference in New Issue
Block a user