mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat: Layout model specification and multiple choices (#1910)
* Establish layout_model spec and example instantations Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Updated naming Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Back to uppercase constants Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix deps issue with openai-whipser>numba>llvmlite Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Pull v1 changed test GT from main Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
@@ -322,7 +322,7 @@ Computer Vision and Pattern Recognition , pages 658-666, 2019. 6
|
||||
- [35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4
|
||||
- [36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3
|
||||
- [37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,
|
||||
- and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7
|
||||
13. and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7
|
||||
- [38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1
|
||||
|
||||
## TableFormer: Table Structure Understanding with Transformers Supplementary Material
|
||||
@@ -369,7 +369,7 @@ Here is a step-by-step description of the prediction postprocessing:
|
||||
1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure.
|
||||
2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches.
|
||||
3. Use a carefully selected IOU threshold to designate the matches as "good" ones and "bad" ones.
|
||||
- 3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column.
|
||||
4. 3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column.
|
||||
4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:
|
||||
|
||||
<!-- formula-not-decoded -->
|
||||
|
||||
Reference in New Issue
Block a user