feat: Integrate ListItemMarkerProcessor into document assembly (#1825)

* Integrate ListItemMarkerProcessor into document assembly Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update to final version Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update all test cases Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Upgrade deps Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-12-09 13:18:24 +00:00 · 2025-07-01 10:04:58 +02:00
parent bdfee4e2d0
commit 56a0e104f7
24 changed files with 739 additions and 1675 deletions
--- a/tests/data/groundtruth/docling_v2/2206.01062.json
+++ b/tests/data/groundtruth/docling_v2/2206.01062.json
@@ -10862,11 +10862,11 @@
        }
      ],
      "orig": "(1) Human Annotation : In contrast to PubLayNet and DocBank, we relied on human annotation instead of automation approaches to generate the data set.",
-      "text": "(1) Human Annotation : In contrast to PubLayNet and DocBank, we relied on human annotation instead of automation approaches to generate the data set.",
+      "text": "Human Annotation : In contrast to PubLayNet and DocBank, we relied on human annotation instead of automation approaches to generate the data set.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(1)"
    },
    {
      "self_ref": "#/texts/356",
@@ -10893,11 +10893,11 @@
        }
      ],
      "orig": "(2) Large Layout Variability : We include diverse and complex layouts from a large variety of public sources.",
-      "text": "(2) Large Layout Variability : We include diverse and complex layouts from a large variety of public sources.",
+      "text": "Large Layout Variability : We include diverse and complex layouts from a large variety of public sources.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(2)"
    },
    {
      "self_ref": "#/texts/357",
@@ -10924,11 +10924,11 @@
        }
      ],
      "orig": "(3) Detailed Label Set : We define 11 class labels to distinguish layout features in high detail. PubLayNet provides 5 labels; DocBank provides 13, although not a superset of ours.",
-      "text": "(3) Detailed Label Set : We define 11 class labels to distinguish layout features in high detail. PubLayNet provides 5 labels; DocBank provides 13, although not a superset of ours.",
+      "text": "Detailed Label Set : We define 11 class labels to distinguish layout features in high detail. PubLayNet provides 5 labels; DocBank provides 13, although not a superset of ours.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(3)"
    },
    {
      "self_ref": "#/texts/358",
@@ -10955,11 +10955,11 @@
        }
      ],
      "orig": "(4) Redundant Annotations : A fraction of the pages in the DocLayNet data set carry more than one human annotation.",
-      "text": "(4) Redundant Annotations : A fraction of the pages in the DocLayNet data set carry more than one human annotation.",
+      "text": "Redundant Annotations : A fraction of the pages in the DocLayNet data set carry more than one human annotation.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(4)"
    },
    {
      "self_ref": "#/texts/359",
@@ -11044,11 +11044,11 @@
        }
      ],
      "orig": "(5) Pre-defined Train-, Test- & Validation-set : Like DocBank, we provide fixed train-, test- & validation-sets to ensure proportional representation of the class-labels. Further, we prevent leakage of unique layouts across sets, which has a large effect on model accuracy scores.",
-      "text": "(5) Pre-defined Train-, Test- & Validation-set : Like DocBank, we provide fixed train-, test- & validation-sets to ensure proportional representation of the class-labels. Further, we prevent leakage of unique layouts across sets, which has a large effect on model accuracy scores.",
+      "text": "Pre-defined Train-, Test- & Validation-set : Like DocBank, we provide fixed train-, test- & validation-sets to ensure proportional representation of the class-labels. Further, we prevent leakage of unique layouts across sets, which has a large effect on model accuracy scores.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(5)"
    },
    {
      "self_ref": "#/texts/362",
@@ -12426,11 +12426,11 @@
        }
      ],
      "orig": "(1) Every list-item is an individual object instance with class label List-item . This definition is different from PubLayNet and DocBank, where all list-items are grouped together into one List object.",
-      "text": "(1) Every list-item is an individual object instance with class label List-item . This definition is different from PubLayNet and DocBank, where all list-items are grouped together into one List object.",
+      "text": "Every list-item is an individual object instance with class label List-item . This definition is different from PubLayNet and DocBank, where all list-items are grouped together into one List object.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(1)"
    },
    {
      "self_ref": "#/texts/409",
@@ -12457,11 +12457,11 @@
        }
      ],
      "orig": "(2) A List-item is a paragraph with hanging indentation. Singleline elements can qualify as List-item if the neighbour elements expose hanging indentation. Bullet or enumeration symbols are not a requirement.",
-      "text": "(2) A List-item is a paragraph with hanging indentation. Singleline elements can qualify as List-item if the neighbour elements expose hanging indentation. Bullet or enumeration symbols are not a requirement.",
+      "text": "A List-item is a paragraph with hanging indentation. Singleline elements can qualify as List-item if the neighbour elements expose hanging indentation. Bullet or enumeration symbols are not a requirement.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(2)"
    },
    {
      "self_ref": "#/texts/410",
@@ -12488,11 +12488,11 @@
        }
      ],
      "orig": "(3) For every Caption , there must be exactly one corresponding Picture or Table .",
-      "text": "(3) For every Caption , there must be exactly one corresponding Picture or Table .",
+      "text": "For every Caption , there must be exactly one corresponding Picture or Table .",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(3)"
    },
    {
      "self_ref": "#/texts/411",
@@ -12519,11 +12519,11 @@
        }
      ],
      "orig": "(4) Connected sub-pictures are grouped together in one Picture object.",
-      "text": "(4) Connected sub-pictures are grouped together in one Picture object.",
+      "text": "Connected sub-pictures are grouped together in one Picture object.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(4)"
    },
    {
      "self_ref": "#/texts/412",
@@ -12550,11 +12550,11 @@
        }
      ],
      "orig": "(5) Formula numbers are included in a Formula object.",
-      "text": "(5) Formula numbers are included in a Formula object.",
+      "text": "Formula numbers are included in a Formula object.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(5)"
    },
    {
      "self_ref": "#/texts/413",
@@ -12581,11 +12581,11 @@
        }
      ],
      "orig": "(6) Emphasised text (e.g. in italic or bold) at the beginning of a paragraph is not considered a Section-header , unless it appears exclusively on its own line.",
-      "text": "(6) Emphasised text (e.g. in italic or bold) at the beginning of a paragraph is not considered a Section-header , unless it appears exclusively on its own line.",
+      "text": "Emphasised text (e.g. in italic or bold) at the beginning of a paragraph is not considered a Section-header , unless it appears exclusively on its own line.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "(6)"
    },
    {
      "self_ref": "#/texts/414",
@@ -14709,11 +14709,11 @@
        }
      ],
      "orig": "[1] Max G\u00f6bel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013.",
-      "text": "[1] Max G\u00f6bel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013.",
+      "text": "Max G\u00f6bel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[1]"
    },
    {
      "self_ref": "#/texts/487",
@@ -14740,11 +14740,11 @@
        }
      ],
      "orig": "[2] Christian Clausner, Apostolos Antonacopoulos, and Stefan Pletschacher. Icdar2017 competition on recognition of documents with complex layouts rdcl2017. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 1404-1410, 2017.",
-      "text": "[2] Christian Clausner, Apostolos Antonacopoulos, and Stefan Pletschacher. Icdar2017 competition on recognition of documents with complex layouts rdcl2017. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 1404-1410, 2017.",
+      "text": "Christian Clausner, Apostolos Antonacopoulos, and Stefan Pletschacher. Icdar2017 competition on recognition of documents with complex layouts rdcl2017. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 1404-1410, 2017.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[2]"
    },
    {
      "self_ref": "#/texts/488",
@@ -14771,11 +14771,11 @@
        }
      ],
      "orig": "[3] Herv\u00e9 D\u00e9jean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), April 2019. http://sac.founderit.com/.",
-      "text": "[3] Herv\u00e9 D\u00e9jean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), April 2019. http://sac.founderit.com/.",
+      "text": "Herv\u00e9 D\u00e9jean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), April 2019. http://sac.founderit.com/.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[3]"
    },
    {
      "self_ref": "#/texts/489",
@@ -14802,11 +14802,11 @@
        }
      ],
      "orig": "[4] Antonio Jimeno Yepes, Peter Zhong, and Douglas Burdick. Competition on scientific literature parsing. In Proceedings of the International Conference on Document Analysis and Recognition , ICDAR, pages 605-617. LNCS 12824, SpringerVerlag, sep 2021.",
-      "text": "[4] Antonio Jimeno Yepes, Peter Zhong, and Douglas Burdick. Competition on scientific literature parsing. In Proceedings of the International Conference on Document Analysis and Recognition , ICDAR, pages 605-617. LNCS 12824, SpringerVerlag, sep 2021.",
+      "text": "Antonio Jimeno Yepes, Peter Zhong, and Douglas Burdick. Competition on scientific literature parsing. In Proceedings of the International Conference on Document Analysis and Recognition , ICDAR, pages 605-617. LNCS 12824, SpringerVerlag, sep 2021.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[4]"
    },
    {
      "self_ref": "#/texts/490",
@@ -14833,11 +14833,11 @@
        }
      ],
      "orig": "[5] Logan Markewich, Hao Zhang, Yubin Xing, Navid Lambert-Shirzad, Jiang Zhexin, Roy Lee, Zhi Li, and Seok-Bum Ko. Segmentation for document layout analysis: not dead yet. International Journal on Document Analysis and Recognition (IJDAR) , pages 1-11, 01 2022.",
-      "text": "[5] Logan Markewich, Hao Zhang, Yubin Xing, Navid Lambert-Shirzad, Jiang Zhexin, Roy Lee, Zhi Li, and Seok-Bum Ko. Segmentation for document layout analysis: not dead yet. International Journal on Document Analysis and Recognition (IJDAR) , pages 1-11, 01 2022.",
+      "text": "Logan Markewich, Hao Zhang, Yubin Xing, Navid Lambert-Shirzad, Jiang Zhexin, Roy Lee, Zhi Li, and Seok-Bum Ko. Segmentation for document layout analysis: not dead yet. International Journal on Document Analysis and Recognition (IJDAR) , pages 1-11, 01 2022.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[5]"
    },
    {
      "self_ref": "#/texts/491",
@@ -14864,11 +14864,11 @@
        }
      ],
      "orig": "[6] Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. Publaynet: Largest dataset ever for document layout analysis. In Proceedings of the International Conference on Document Analysis and Recognition , ICDAR, pages 1015-1022, sep 2019.",
-      "text": "[6] Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. Publaynet: Largest dataset ever for document layout analysis. In Proceedings of the International Conference on Document Analysis and Recognition , ICDAR, pages 1015-1022, sep 2019.",
+      "text": "Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. Publaynet: Largest dataset ever for document layout analysis. In Proceedings of the International Conference on Document Analysis and Recognition , ICDAR, pages 1015-1022, sep 2019.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[6]"
    },
    {
      "self_ref": "#/texts/492",
@@ -14895,11 +14895,11 @@
        }
      ],
      "orig": "[7] Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. Docbank: A benchmark dataset for document layout analysis. In Proceedings of the 28th International Conference on Computational Linguistics , COLING, pages 949-960. International Committee on Computational Linguistics, dec 2020.",
-      "text": "[7] Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. Docbank: A benchmark dataset for document layout analysis. In Proceedings of the 28th International Conference on Computational Linguistics , COLING, pages 949-960. International Committee on Computational Linguistics, dec 2020.",
+      "text": "Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. Docbank: A benchmark dataset for document layout analysis. In Proceedings of the 28th International Conference on Computational Linguistics , COLING, pages 949-960. International Committee on Computational Linguistics, dec 2020.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[7]"
    },
    {
      "self_ref": "#/texts/493",
@@ -14926,11 +14926,11 @@
        }
      ],
      "orig": "[8] Riaz Ahmad, Muhammad Tanvir Afzal, and M. Qadir. Information extraction from pdf sources based on rule-based system using integrated formats. In SemWebEval@ESWC , 2016.",
-      "text": "[8] Riaz Ahmad, Muhammad Tanvir Afzal, and M. Qadir. Information extraction from pdf sources based on rule-based system using integrated formats. In SemWebEval@ESWC , 2016.",
+      "text": "Riaz Ahmad, Muhammad Tanvir Afzal, and M. Qadir. Information extraction from pdf sources based on rule-based system using integrated formats. In SemWebEval@ESWC , 2016.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[8]"
    },
    {
      "self_ref": "#/texts/494",
@@ -14957,11 +14957,11 @@
        }
      ],
      "orig": "[9] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition , CVPR, pages 580-587. IEEE Computer Society, jun 2014.",
-      "text": "[9] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition , CVPR, pages 580-587. IEEE Computer Society, jun 2014.",
+      "text": "Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition , CVPR, pages 580-587. IEEE Computer Society, jun 2014.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[9]"
    },
    {
      "self_ref": "#/texts/495",
@@ -14988,11 +14988,11 @@
        }
      ],
      "orig": "[10] Ross B. Girshick. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision , ICCV, pages 1440-1448. IEEE Computer Society, dec 2015.",
-      "text": "[10] Ross B. Girshick. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision , ICCV, pages 1440-1448. IEEE Computer Society, dec 2015.",
+      "text": "Ross B. Girshick. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision , ICCV, pages 1440-1448. IEEE Computer Society, dec 2015.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[10]"
    },
    {
      "self_ref": "#/texts/496",
@@ -15019,11 +15019,11 @@
        }
      ],
      "orig": "[11] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence , 39(6):1137-1149, 2017.",
-      "text": "[11] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence , 39(6):1137-1149, 2017.",
+      "text": "Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence , 39(6):1137-1149, 2017.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[11]"
    },
    {
      "self_ref": "#/texts/497",
@@ -15050,11 +15050,11 @@
        }
      ],
      "orig": "[12] Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross B. Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision , ICCV, pages 2980-2988. IEEE Computer Society, Oct 2017.",
-      "text": "[12] Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross B. Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision , ICCV, pages 2980-2988. IEEE Computer Society, Oct 2017.",
+      "text": "Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross B. Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision , ICCV, pages 2980-2988. IEEE Computer Society, Oct 2017.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[12]"
    },
    {
      "self_ref": "#/texts/498",
@@ -15081,11 +15081,11 @@
        }
      ],
      "orig": "[13] Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, Alex Wang, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu",
-      "text": "[13] Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, Alex Wang, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu",
+      "text": "Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, Alex Wang, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[13]"
    },
    {
      "self_ref": "#/texts/499",
@@ -15576,11 +15576,11 @@
        }
      ],
      "orig": "[20] Shoubin Li, Xuyan Ma, Shuaiqun Pan, Jun Hu, Lin Shi, and Qing Wang. Vtlayout: Fusion of visual and text features for document layout analysis, 2021.",
-      "text": "[20] Shoubin Li, Xuyan Ma, Shuaiqun Pan, Jun Hu, Lin Shi, and Qing Wang. Vtlayout: Fusion of visual and text features for document layout analysis, 2021.",
+      "text": "Shoubin Li, Xuyan Ma, Shuaiqun Pan, Jun Hu, Lin Shi, and Qing Wang. Vtlayout: Fusion of visual and text features for document layout analysis, 2021.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[20]"
    },
    {
      "self_ref": "#/texts/516",
@@ -15607,11 +15607,11 @@
        }
      ],
      "orig": "[14] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. CoRR , abs/2005.12872, 2020.",
-      "text": "[14] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. CoRR , abs/2005.12872, 2020.",
+      "text": "Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. CoRR , abs/2005.12872, 2020.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[14]"
    },
    {
      "self_ref": "#/texts/517",
@@ -15638,11 +15638,11 @@
        }
      ],
      "orig": "[15] Mingxing Tan, Ruoming Pang, and Quoc V. Le. Efficientdet: Scalable and efficient object detection. CoRR , abs/1911.09070, 2019.",
-      "text": "[15] Mingxing Tan, Ruoming Pang, and Quoc V. Le. Efficientdet: Scalable and efficient object detection. CoRR , abs/1911.09070, 2019.",
+      "text": "Mingxing Tan, Ruoming Pang, and Quoc V. Le. Efficientdet: Scalable and efficient object detection. CoRR , abs/1911.09070, 2019.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[15]"
    },
    {
      "self_ref": "#/texts/518",
@@ -15669,11 +15669,11 @@
        }
      ],
      "orig": "[16] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. Microsoft COCO: common objects in context, 2014.",
-      "text": "[16] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. Microsoft COCO: common objects in context, 2014.",
+      "text": "Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. Microsoft COCO: common objects in context, 2014.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[16]"
    },
    {
      "self_ref": "#/texts/519",
@@ -15700,11 +15700,11 @@
        }
      ],
      "orig": "[17] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2, 2019.",
-      "text": "[17] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2, 2019.",
+      "text": "Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2, 2019.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[17]"
    },
    {
      "self_ref": "#/texts/520",
@@ -15731,11 +15731,11 @@
        }
      ],
      "orig": "[18] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter W. J. Staar. Robust pdf document conversion using recurrent neural networks. In Proceedings of the 35th Conference on Artificial Intelligence , AAAI, pages 1513715145, feb 2021.",
-      "text": "[18] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter W. J. Staar. Robust pdf document conversion using recurrent neural networks. In Proceedings of the 35th Conference on Artificial Intelligence , AAAI, pages 1513715145, feb 2021.",
+      "text": "Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter W. J. Staar. Robust pdf document conversion using recurrent neural networks. In Proceedings of the 35th Conference on Artificial Intelligence , AAAI, pages 1513715145, feb 2021.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[18]"
    },
    {
      "self_ref": "#/texts/521",
@@ -15762,11 +15762,11 @@
        }
      ],
      "orig": "[19] Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD, pages 1192-1200, New York, USA, 2020. Association for Computing Machinery.",
-      "text": "[19] Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD, pages 1192-1200, New York, USA, 2020. Association for Computing Machinery.",
+      "text": "Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD, pages 1192-1200, New York, USA, 2020. Association for Computing Machinery.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[19]"
    },
    {
      "self_ref": "#/texts/522",
@@ -15793,11 +15793,11 @@
        }
      ],
      "orig": "[21] Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Fei Wu. Vsr: A unified framework for document layout analysis combining vision, semantics and relations, 2021.",
-      "text": "[21] Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Fei Wu. Vsr: A unified framework for document layout analysis combining vision, semantics and relations, 2021.",
+      "text": "Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Fei Wu. Vsr: A unified framework for document layout analysis combining vision, semantics and relations, 2021.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[21]"
    },
    {
      "self_ref": "#/texts/523",
@@ -15824,11 +15824,11 @@
        }
      ],
      "orig": "[22] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD, pages 774-782. ACM, 2018.",
-      "text": "[22] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD, pages 774-782. ACM, 2018.",
+      "text": "Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD, pages 774-782. ACM, 2018.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[22]"
    },
    {
      "self_ref": "#/texts/524",
@@ -15855,11 +15855,11 @@
        }
      ],
      "orig": "[23] Connor Shorten and Taghi M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data , 6(1):60, 2019.",
-      "text": "[23] Connor Shorten and Taghi M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data , 6(1):60, 2019.",
+      "text": "Connor Shorten and Taghi M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data , 6(1):60, 2019.",
      "formatting": null,
      "hyperlink": null,
      "enumerated": false,
-      "marker": ""
+      "marker": "[23]"
    }
  ],
  "pictures": [