docling/tests/data
Matteo 3213b247ad
feat: Code and equation model for PDF and code blocks in markdown (#752)
* propagated changes for new CodeItem class

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* Rebased branch on latest main. changes for CodeItem

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* removed unused files

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* chore: update lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* pin latest docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docling-core pinning

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* pin docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use new add_code in backends and update typing in MD backend

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* added if statement for backend

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* removed unused import

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* removed print statements

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* gt for new pdf

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* Update docling/pipeline/standard_pdf_pipeline.py

Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com>

* fixed doc comment of __call__ function of code_formula_model

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>

* fix artifacts_path type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move expansion_factor to base class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2025-01-24 16:54:22 +01:00
..
docx fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
groundtruth feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
html fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
md feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
pptx feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
pubmed feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
uspto feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
xlsx feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
2203.01017v2.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
2206.01062.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
2305.03393v1-pg9-img.png feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1-pg9.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
2305.03393v1.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
code_and_formula.pdf feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
redp5110_sampled.pdf chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_01.asciidoc feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00