mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-09 13:18:24 +00:00
feat: add a backend parser for WebVTT files (#2288)
* feat: add a backend parser for WebVTT files Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: update README with VTT support Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: add description to supported formats Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * chore: upgrade docling-core to unescape WebVTT in markdown Pin the new release of docling-core 2.48.2. Do not escape HTML reserved characters when exporting WebVTT documents to markdown. Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * test: add missing copyright notice Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
b5628f1227
commit
46efaaefee
42
tests/data/webvtt/webvtt_example_01.vtt
vendored
Normal file
42
tests/data/webvtt/webvtt_example_01.vtt
vendored
Normal file
@@ -0,0 +1,42 @@
|
||||
WEBVTT
|
||||
|
||||
NOTE Copyright © 2019 World Wide Web Consortium. https://www.w3.org/TR/webvtt1/
|
||||
|
||||
00:11.000 --> 00:13.000
|
||||
<v Roger Bingham>We are in New York City
|
||||
|
||||
00:13.000 --> 00:16.000
|
||||
<v Roger Bingham>We’re actually at the Lucern Hotel, just down the street
|
||||
|
||||
00:16.000 --> 00:18.000
|
||||
<v Roger Bingham>from the American Museum of Natural History
|
||||
|
||||
00:18.000 --> 00:20.000
|
||||
<v Roger Bingham>And with me is Neil deGrasse Tyson
|
||||
|
||||
00:20.000 --> 00:22.000
|
||||
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium
|
||||
|
||||
00:22.000 --> 00:24.000
|
||||
<v Roger Bingham>at the AMNH.
|
||||
|
||||
00:24.000 --> 00:26.000
|
||||
<v Roger Bingham>Thank you for walking down here.
|
||||
|
||||
00:27.000 --> 00:30.000
|
||||
<v Roger Bingham>And I want to do a follow-up on the last conversation we did.
|
||||
|
||||
00:30.000 --> 00:31.500 align:right size:50%
|
||||
<v Roger Bingham>When we e-mailed—
|
||||
|
||||
00:30.500 --> 00:32.500 align:left size:50%
|
||||
<v Neil deGrasse Tyson>Didn’t we talk about enough in that conversation?
|
||||
|
||||
00:32.000 --> 00:35.500 align:right size:50%
|
||||
<v Roger Bingham>No! No no no no; 'cos 'cos obviously 'cos
|
||||
|
||||
00:32.500 --> 00:33.500 align:left size:50%
|
||||
<v Neil deGrasse Tyson><i>Laughs</i>
|
||||
|
||||
00:35.500 --> 00:38.000
|
||||
<v Roger Bingham>You know I’m so excited my glasses are falling off here.
|
||||
Reference in New Issue
Block a user