Files
docling/tests/data/webvtt/webvtt_example_03.vtt
Cesar Berrospi Ramis 46efaaefee feat: add a backend parser for WebVTT files (#2288)
* feat: add a backend parser for WebVTT files

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: update README with VTT support

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: add description to supported formats

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: upgrade docling-core to unescape WebVTT in markdown

Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* test: add missing copyright notice

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-09-22 15:24:34 +02:00

57 lines
1.5 KiB
WebVTT
Vendored

WEBVTT
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/15-0
00:00:04.963 --> 00:00:08.571
<v Speaker A>OK,
I think now we should be recording</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/15-1
00:00:08.571 --> 00:00:09.403
<v Speaker A>properly.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/16-0
00:00:10.683 --> 00:00:11.563
Good.
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/17-0
00:00:13.363 --> 00:00:13.803
<v Speaker A>Yeah.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/78-0
00:00:49.603 --> 00:00:53.363
<v Speaker B>I was also thinking.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/113-0
00:00:54.963 --> 00:01:02.072
<v Speaker B>Would be maybe good to create items,</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/113-1
00:01:02.072 --> 00:01:06.811
<v Speaker B>some metadata,
some options that can be specific.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/150-0
00:01:10.243 --> 00:01:13.014
<v Speaker A>Yeah,
I mean I think you went even more than</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/119-0
00:01:10.563 --> 00:01:12.643
<v Speaker B>But we preserved the atoms.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/150-1
00:01:13.014 --> 00:01:15.907
<v Speaker A>than me.
I just opened the format.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/197-1
00:01:50.222 --> 00:01:51.643
<v Speaker A>give it a try, yeah.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/200-0
00:01:52.043 --> 00:01:55.043
<v Speaker B>Okay, talk to you later.</v>
62357a1d-d250-41d5-a1cf-6cc0eeceffcc/202-0
00:01:54.603 --> 00:01:55.283
<v Speaker A>See you.</v>