feat: add a backend parser for WebVTT files (#2288)

* feat: add a backend parser for WebVTT files

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: update README with VTT support

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: add description to supported formats

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: upgrade docling-core to unescape WebVTT in markdown

Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* test: add missing copyright notice

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
This commit is contained in:
Cesar Berrospi Ramis
2025-09-22 15:24:34 +02:00
committed by GitHub
parent b5628f1227
commit 46efaaefee
23 changed files with 3969 additions and 34 deletions

View File

@@ -0,0 +1,51 @@
00:11.000 --> 00:13.000
Roger Bingham: We are in New York City
00:13.000 --> 00:16.000
Roger Bingham: Were actually at the Lucern Hotel, just down the street
00:16.000 --> 00:18.000
Roger Bingham: from the American Museum of Natural History
00:18.000 --> 00:20.000
Roger Bingham: And with me is Neil deGrasse Tyson
00:20.000 --> 00:22.000
Roger Bingham: Astrophysicist, Director of the Hayden Planetarium
00:22.000 --> 00:24.000
Roger Bingham: at the AMNH.
00:24.000 --> 00:26.000
Roger Bingham: Thank you for walking down here.
00:27.000 --> 00:30.000
Roger Bingham: And I want to do a follow-up on the last conversation we did.
00:30.000 --> 00:31.500
Roger Bingham: When we e-mailed—
00:30.500 --> 00:32.500
Neil deGrasse Tyson: Didnt we talk about enough in that conversation?
00:32.000 --> 00:35.500
Roger Bingham: No! No no no no; 'cos 'cos obviously 'cos
00:32.500 --> 00:33.500
Neil deGrasse Tyson: *Laughs*
00:35.500 --> 00:38.000
Roger Bingham: You know Im so excited my glasses are falling off here.