mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat: add a backend parser for WebVTT files (#2288)
* feat: add a backend parser for WebVTT files Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: update README with VTT support Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: add description to supported formats Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * chore: upgrade docling-core to unescape WebVTT in markdown Pin the new release of docling-core 2.48.2. Do not escape HTML reserved characters when exporting WebVTT documents to markdown. Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * test: add missing copyright notice Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
b5628f1227
commit
46efaaefee
22
tests/data/groundtruth/docling_v2/webvtt_example_02.vtt.itxt
vendored
Normal file
22
tests/data/groundtruth/docling_v2/webvtt_example_02.vtt.itxt
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
item-0 at level 0: unspecified: group _root_
|
||||
item-1 at level 1: section: group WebVTT cue block
|
||||
item-2 at level 2: text: 00:00.000 --> 00:02.000
|
||||
item-3 at level 2: inline: group WebVTT cue voice span
|
||||
item-4 at level 3: text: Esme (first, loud):
|
||||
item-5 at level 3: text: It’s a blue apple tree!
|
||||
item-6 at level 1: section: group WebVTT cue block
|
||||
item-7 at level 2: text: 00:02.000 --> 00:04.000
|
||||
item-8 at level 2: inline: group WebVTT cue voice span
|
||||
item-9 at level 3: text: Mary:
|
||||
item-10 at level 3: text: No way!
|
||||
item-11 at level 1: section: group WebVTT cue block
|
||||
item-12 at level 2: text: 00:04.000 --> 00:06.000
|
||||
item-13 at level 2: inline: group WebVTT cue voice span
|
||||
item-14 at level 3: text: Esme:
|
||||
item-15 at level 3: text: Hee!
|
||||
item-16 at level 2: text: laughter
|
||||
item-17 at level 1: section: group WebVTT cue block
|
||||
item-18 at level 2: text: 00:06.000 --> 00:08.000
|
||||
item-19 at level 2: inline: group WebVTT cue voice span
|
||||
item-20 at level 3: text: Mary (loud):
|
||||
item-21 at level 3: text: That’s awesome!
|
||||
Reference in New Issue
Block a user