handle <code> tags as code blocks
Signed-off-by: FernandoSSI <fernandosi2005@gmail.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Check if picture shape has an image attached in pptx backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* chore: bump version to 2.28.4 [skip ci]
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Improve text parsing
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* fix: Tesseract OCR CLI can't process images composed with numbers only (#1201)
fix wrong type text extracted by tesseract_ocr_cli_model
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Flexibilize heading detection
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Fix trailing space
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Remove trailing space
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Guilhem VERMOREL <83694424+guilhemvermorel@users.noreply.github.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
fix wrong type text extracted by tesseract_ocr_cli_model
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Check if picture shape has an image attached in pptx backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
* chore: bump version to 2.28.4 [skip ci]
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Improve text parsing
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* fix: Tesseract OCR CLI can't process images composed with numbers only (#1201)
fix wrong type text extracted by tesseract_ocr_cli_model
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Flexibilize heading detection
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Fix trailing space
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Remove trailing space
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Guilhem VERMOREL <83694424+guilhemvermorel@users.noreply.github.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
fix wrong type text extracted by tesseract_ocr_cli_model
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Markdown fixes:
- properly propagate section header levels
- improve handling of list subroots without text
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
* Update document_converter.py
Fixing caching same class with different options by using composite key (class, options)
# TODO this will ignore if different options have been defined for the same pipeline class.
at row 292
Signed-off-by: mislavmartinic <mislav.martinic@pontistechnology.com>
* formatted script
* removed unnecessary hasattr check
* pre-commit chain run
---------
Signed-off-by: mislavmartinic <mislav.martinic@pontistechnology.com>
* Modifications to identify numbered headers
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Add style check
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* feat: Add PPTX notes slides
Presenter notes may have useful information and should also be extracted.
Signed-off-by: Maciej Wieczorek <maciej@wieczorek.co>
* feat: Move presenter notes into furniture
Signed-off-by: Maciej Wieczorek <maciej@wieczorek.co>
---------
Signed-off-by: Maciej Wieczorek <maciej@wieczorek.co>
Update docling-core dependency to version 2.23.3.
Fix regression test of HTML backend after docling-core dependency update.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Fixing function return
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Add message
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* point the auxiliary files to the community repo and add lfai in README
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update docs index
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>