feat: add a backend parser for WebVTT files (#2288)

* feat: add a backend parser for WebVTT files

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: update README with VTT support

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: add description to supported formats

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: upgrade docling-core to unescape WebVTT in markdown

Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* test: add missing copyright notice

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
This commit is contained in:
Cesar Berrospi Ramis
2025-09-22 15:24:34 +02:00
committed by GitHub
parent b5628f1227
commit 46efaaefee
23 changed files with 3969 additions and 34 deletions

15
tests/data/webvtt/webvtt_example_02.vtt vendored Normal file
View File

@@ -0,0 +1,15 @@
WEBVTT
NOTE Copyright © 2019 World Wide Web Consortium. https://www.w3.org/TR/webvtt1/
00:00.000 --> 00:02.000
<v.first.loud Esme>Its a blue apple tree!
00:02.000 --> 00:04.000
<v Mary>No way!
00:04.000 --> 00:06.000
<v Esme>Hee!</v> <i>laughter</i>
00:06.000 --> 00:08.000
<v.loud Mary>Thats awesome!