Commit Graph

  • 6cd81e251a Inlcude furniture, Update tests with furniture Christoph Auer 2025-02-20 14:22:47 +0100
  • 7f3a34194d test: avoid testing exact JSON Cesar Berrospi Ramis 2025-02-20 14:06:45 +0100
  • 26dda63555 sanitize text Michele Dolfi 2025-02-20 13:10:15 +0100
  • fa6b7eeec4 Push final lockfile Christoph Auer 2025-02-20 12:00:55 +0100
  • 9b4328c817 fix: Runtime error when Pandas Series is not always of string type fan 2025-02-20 18:40:31 +0800
  • a89c19105c Update tests with code Christoph Auer 2025-02-20 08:56:38 +0100
  • 53ee8ea1d8 Merge from main Christoph Auer 2025-02-19 17:27:40 +0100
  • 857d6c4292 Add normalization, update tests again Christoph Auer 2025-02-19 16:55:20 +0100
  • dfcc30dddb
    chore: Update tests and lockfile (#1021) Christoph Auer 2025-02-19 16:51:53 +0100
  • eb67337e51 Fixes, update tests Christoph Auer 2025-02-19 16:28:21 +0100
  • 7d55102605 Merge branch 'cau/fix-tests' of github.com:DS4SD/docling into cau/integrate-reading-order Christoph Auer 2025-02-19 15:55:53 +0100
  • c3ac8b392a Update tests and lockfile Christoph Auer 2025-02-19 15:54:07 +0100
  • 4e68da99b6 Add children to output after reading-order Christoph Auer 2025-02-19 15:28:43 +0100
  • 27c04007bc
    docs: revamp picture description example (#1015) Panos Vagenas 2025-02-19 11:28:54 +0100
  • d788bf2a6e Merge from main Christoph Auer 2025-02-19 10:29:39 +0100
  • 0df19cadc9 Improvements for visualization example (#1017) Michele Dolfi 2025-02-19 09:32:57 +0100
  • 287e621c7a show other vlm Michele Dolfi 2025-02-19 08:25:57 +0100
  • 753c12b29e show results with all models Michele Dolfi 2025-02-19 08:21:42 +0100
  • e69b02eeeb switch docs to notbook Michele Dolfi 2025-02-19 07:12:34 +0100
  • 413ffd18bd fix colab install, use granite and improve viz of description Michele Dolfi 2025-02-19 07:10:41 +0100
  • 122466f15b docs: revamp picture description example Panos Vagenas 2025-02-18 22:54:07 +0100
  • 27d59ad12e apply pre-commit after rebase Michele Dolfi 2025-02-18 13:20:22 +0100
  • 0fa9e14a7b add factory for ocr engines Michele Dolfi 2025-02-18 13:07:30 +0100
  • 7450050ace
    refactor: upgrade BeautifulSoup4 with type hints (#999) Cesar Berrospi Ramis 2025-02-18 11:30:47 +0100
  • 8606b598dc Merge from main Christoph Auer 2025-02-18 11:24:53 +0100
  • dadff50589 fix: Disable the TOKENIZERS_PARALLELISM in test_e2e_ocr_conversion.py to avoid warning messages from HF nli/fix_ocr_tests Nikos Livathinos 2025-02-18 10:58:11 +0100
  • cb4c69700c build: allow beautifulsoup4 version 4.12.3 Cesar Berrospi Ramis 2025-02-18 10:13:32 +0100
  • 4201049929 refactor: upgrade BeautifulSoup4 with type hints Cesar Berrospi Ramis 2025-02-17 16:09:00 +0100
  • 75db61127c chore: bump version to 2.23.0 [skip ci] v2.23.0 github-actions[bot] 2025-02-17 14:22:49 +0000
  • 6e75f0b5d3
    fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) Maxim Lysak 2025-02-17 14:11:55 +0100
  • 6118540adb Clean up imports Christoph Auer 2025-02-17 13:15:59 +0100
  • 0649e5b8c2 Update deps and lockfile Christoph Auer 2025-02-17 13:14:00 +0100
  • 92b461b9ab Merge branch 'main' of github.com:DS4SD/docling into dev/test_dt_cleanup Christoph Auer 2025-02-17 11:36:09 +0100
  • ed6631b60f Update test cases for office formats Christoph Auer 2025-02-17 11:36:02 +0100
  • 77eb77bdc2
    feat: Support cuda:n GPU device allocation (#694) Ahmed Nassar 2025-02-17 11:31:13 +0100
  • 428b656793
    feat(xml-jats): parse XML JATS documents (#967) Cesar Berrospi Ramis 2025-02-17 10:43:31 +0100
  • 69cb31799d Revert accelerator test options Signed-off-by: ahn <ahn@zurich.ibm.com> ahn 2025-02-17 10:20:25 +0100
  • 4b8396cde3 Fixed rebased issues Signed-off-by: ahn <ahn@zurich.ibm.com> ahn 2025-02-17 10:18:13 +0100
  • a8d1cdfaa5 Resetted some options to default, removed EasyOCR model wrap. Signed-off-by: ahn <ahn@zurich.ibm.com> ahn 2025-02-17 09:48:00 +0100
  • 0bca78a84b chore: Accept AcceleratorDevice enum type Christoph Auer 2025-02-03 14:32:44 +0100
  • dd0728e646 Pydantic field validator and comment restored. ahn 2025-01-31 16:21:36 +0100
  • b9668877be Fixes pydantic exception with cuda:n Signed-off-by: ahn <ahn@zurich.ibm.com> ahn 2025-01-07 14:41:38 +0100
  • fd51a7fa1f Adding multi-gpu support, and cuda device allocation ahn 2025-01-03 17:02:33 +0100
  • e1436a8b05
    test: validate actual docitems in tests (#966) Michele Dolfi 2025-02-14 17:47:53 +0100
  • a80d350b38 fix: Update fixes Christoph Auer 2025-02-14 15:46:26 +0100
  • f8600756ec disable test generation Michele Dolfi 2025-02-14 15:24:08 +0100
  • 38838a2998 remove verbose print Michele Dolfi 2025-02-14 15:18:25 +0100
  • 3486fbe9a9 fix: Fix code-formula model for new docling-core Christoph Auer 2025-02-14 15:17:59 +0100
  • 93eb9de871 chore(xml-jats): rename PubMed objects to JATS Cesar Berrospi Ramis 2025-02-14 14:59:57 +0100
  • f46c3990b0 fix: Fix code_formula test unit, update test-cases Christoph Auer 2025-02-14 14:54:21 +0100
  • 8ab99f9db7 validate actual docitems in tests Michele Dolfi 2025-02-14 14:33:22 +0100
  • 744bf25231 Testing fix for docling-core dt Maksym Lysak 2025-02-14 14:15:07 +0100
  • 011dd6ce96 feat(xml-jats): improve existing parser and extend features Cesar Berrospi Ramis 2025-02-14 13:34:09 +0100
  • c9ae5f6545 fix(xml-jats): replace new line character by a space Cesar Berrospi Ramis 2025-02-05 19:26:20 +0100
  • 21a99fc27d chore(xml-jats): separate authors and affiliations Cesar Berrospi Ramis 2025-02-05 17:46:46 +0100
  • b5b1ddca3b chore: Restore the orphan clusters Nikos Livathinos 2025-02-14 11:13:54 +0100
  • ffbde1d1b0 chore: bump version to 2.22.0 [skip ci] v2.22.0 github-actions[bot] 2025-02-14 08:53:20 +0000
  • 00d9405b0a
    feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) Tobias Strebitzer 2025-02-14 15:55:09 +0800
  • c04a58306c feat: Add validation for delimiters and tests for inconsistent csv files Tobias Strebitzer 2025-02-14 08:44:00 +0800
  • 7493d5b01f
    docs: update example Dockerfile with download CLI (#929) Michele Dolfi 2025-02-13 14:19:50 +0100
  • af19c03f6e
    fix: update Pillow constraints (#958) Michele Dolfi 2025-02-13 14:19:37 +0100
  • dd9bae29f3 update example Dockerfile with download CLI Michele Dolfi 2025-02-13 13:26:11 +0100
  • cf2d1b2ff3 update pillow and lock deps Michele Dolfi 2025-02-13 12:29:34 +0100
  • 2d66e99b69
    docs: Examples for picture descriptions (#951) Michele Dolfi 2025-02-13 08:33:12 +0100
  • 1ca87f5d8c feat: Add support for various CSV dialects and update documentation Tobias Strebitzer 2025-02-13 08:37:02 +0800
  • 7a7cc1856b fix merge typo Michele Dolfi 2025-02-12 17:19:57 +0100
  • fae4d10479 updated poetry to ref correct branch in docling-core. Updated tests gt mao/doctags Matteo-Omenetti 2025-02-12 17:04:26 +0100
  • 4d403a6b0f add more examples for picture descriptions Michele Dolfi 2025-02-12 13:38:55 +0100
  • 2716c7d4ff
    feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) Michele Dolfi 2025-02-12 15:18:01 +0100
  • 48777b17fa Updates for reading-order implementation Christoph Auer 2025-02-12 15:14:44 +0100
  • 654b806c69 rename to enable_remote_services Michele Dolfi 2025-02-12 14:35:52 +0100
  • 3db14e1c59 updated poetry to ref correct branch in docling-core Matteo-Omenetti 2025-02-12 13:48:15 +0100
  • 5101e2519e
    feat: allow artifacts_path to be defined as ENV (#940) Michele Dolfi 2025-02-12 13:08:37 +0100
  • e824996406 fix: Measure the layout mAP without the orphan clusters. Nikos Livathinos 2025-02-12 10:01:52 +0100
  • 79eed3ef08 docs: Add example and CSV format documentation Tobias Strebitzer 2025-02-12 12:11:32 +0800
  • d91ea7b186 test: Implement csv parsing and format tests Tobias Strebitzer 2025-02-12 11:37:47 +0800
  • d64f2bb0ab feat: Implement csv backend and format detection Tobias Strebitzer 2025-02-12 11:37:24 +0800
  • 9c3a70ac9c enhance docs Michele Dolfi 2025-02-11 13:24:01 +0100
  • 8acd9aae4a add check if artifacts_path exists and is dir Michele Dolfi 2025-02-11 13:01:20 +0100
  • 6b1d88b54a add option in the example Michele Dolfi 2025-02-11 12:55:33 +0100
  • 291c91fe96 feat: Introduce the allow_remote_services option to allow remote connections while processing Michele Dolfi 2025-02-11 12:51:36 +0100
  • c47ae700ec
    fix: Fix the initialization of the TesseractOcrModel (#935) Nikos Livathinos 2025-02-11 12:27:12 +0100
  • 686b3a0616 allow the artifacts_path to be defined as ENV Michele Dolfi 2025-02-11 10:32:30 +0100
  • cc0e64b0cb fix: Fix the initialization of the TesseractOcrModel Nikos Livathinos 2025-02-10 17:47:28 +0100
  • 27b896b938 Updates for reading-order implementation Christoph Auer 2025-02-10 16:59:52 +0100
  • a6ee5a4326 Add captions, footnotes and merges [skip ci] Christoph Auer 2025-02-10 14:08:00 +0100
  • 898a497e71 feat: add support for user-provided OCR model vdaleke 2025-02-07 14:36:28 +0300
  • 5aebaf58de Merge branch 'main' of github.com:DS4SD/docling into cau/integrate-reading-order Christoph Auer 2025-02-10 12:46:16 +0100
  • 46d7342671 Update lockfile [skip ci] Christoph Auer 2025-02-10 12:45:37 +0100
  • 2046ffbbb0 Merge from main Christoph Auer 2025-02-10 12:43:37 +0100
  • de462090e7 chore: bump version to 2.21.0 [skip ci] v2.21.0 github-actions[bot] 2025-02-10 11:43:05 +0000
  • cf78d5b7b9
    feat: Add content_layer property to items to address body, furniture and other roles (#735) Christoph Auer 2025-02-10 12:07:49 +0100
  • 3850a45356 Update lock to final docling-core Christoph Auer 2025-02-10 11:28:44 +0100
  • 4b6e8bc910 Update lock Christoph Auer 2025-02-10 11:08:27 +0100
  • e49fa7ec4f Update lock Christoph Auer 2025-02-10 10:46:46 +0100
  • f875fbc6cf Update reading-order model branch Christoph Auer 2025-02-10 09:51:52 +0100
  • 3e26597995 chore: bump version to 2.20.0 [skip ci] v2.20.0 github-actions[bot] 2025-02-07 17:46:36 +0000
  • c18f47c5c0
    fix: remove unused httpx (#919) Michele Dolfi 2025-02-07 17:51:31 +0100
  • 5156f8e197 remove more usage of httpx Michele Dolfi 2025-02-07 16:59:00 +0100
  • b686e5ab88 use requests instead of httpx Michele Dolfi 2025-02-07 16:58:11 +0100