Updated test ground-truth (again), bugfix for empty layout

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
Christoph Auer 2024-12-09 13:50:04 +01:00
parent 731e48ea43
commit fbb28b851d
27 changed files with 70 additions and 275 deletions

View File

@ -236,6 +236,9 @@ class LayoutPostprocessor:
# Initial cell assignment # Initial cell assignment
clusters = self._assign_cells_to_clusters(clusters) clusters = self._assign_cells_to_clusters(clusters)
# Remove clusters with no cells
clusters = [cluster for cluster in clusters if cluster.cells]
# Handle orphaned cells # Handle orphaned cells
unassigned = self._find_unassigned_cells(clusters) unassigned = self._find_unassigned_cells(clusters)
if unassigned: if unassigned:

View File

@ -10,11 +10,11 @@
<figure> <figure>
<location><page_1><loc_52><loc_62><loc_88><loc_71></location> <location><page_1><loc_52><loc_62><loc_88><loc_71></location>
</figure> </figure>
<subtitle-level-1><location><page_1><loc_52><loc_58><loc_79><loc_60></location>b. Red-annotation of bounding boxes, Blue-predictions by TableFormer</subtitle-level-1> <paragraph><location><page_1><loc_52><loc_58><loc_79><loc_60></location>- b. Red-annotation of bounding boxes, Blue-predictions by TableFormer</paragraph>
<figure> <figure>
<location><page_1><loc_51><loc_48><loc_88><loc_57></location> <location><page_1><loc_51><loc_48><loc_88><loc_57></location>
</figure> </figure>
<subtitle-level-1><location><page_1><loc_52><loc_46><loc_80><loc_47></location>c. Structure predicted by TableFormer:</subtitle-level-1> <paragraph><location><page_1><loc_52><loc_46><loc_80><loc_47></location>- c. Structure predicted by TableFormer:</paragraph>
<caption><location><page_1><loc_50><loc_29><loc_89><loc_35></location>Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.</caption> <caption><location><page_1><loc_50><loc_29><loc_89><loc_35></location>Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.</caption>
<figure> <figure>
<location><page_1><loc_52><loc_37><loc_88><loc_45></location> <location><page_1><loc_52><loc_37><loc_88><loc_45></location>
@ -152,7 +152,7 @@
<row_6><col_0><row_header>TableFormer</col_0><col_1><body>95.4</col_1><col_2><body>90.1</col_2><col_3><body>93.6</col_3></row_6> <row_6><col_0><row_header>TableFormer</col_0><col_1><body>95.4</col_1><col_2><body>90.1</col_2><col_3><body>93.6</col_3></row_6>
</table> </table>
<paragraph><location><page_8><loc_9><loc_89><loc_10><loc_90></location>- a.</paragraph> <paragraph><location><page_8><loc_9><loc_89><loc_10><loc_90></location>- a.</paragraph>
<paragraph><location><page_8><loc_11><loc_89><loc_82><loc_90></location>Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells</paragraph> <paragraph><location><page_8><loc_11><loc_89><loc_82><loc_90></location>- Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells</paragraph>
<paragraph><location><page_8><loc_9><loc_87><loc_46><loc_88></location>Japanese language (previously unseen by TableFormer):</paragraph> <paragraph><location><page_8><loc_9><loc_87><loc_46><loc_88></location>Japanese language (previously unseen by TableFormer):</paragraph>
<paragraph><location><page_8><loc_50><loc_87><loc_70><loc_88></location>Example table from FinTabNet:</paragraph> <paragraph><location><page_8><loc_50><loc_87><loc_70><loc_88></location>Example table from FinTabNet:</paragraph>
<figure> <figure>
@ -283,7 +283,7 @@
<paragraph><location><page_12><loc_8><loc_13><loc_47><loc_16></location>where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point.</paragraph> <paragraph><location><page_12><loc_8><loc_13><loc_47><loc_16></location>where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point.</paragraph>
<paragraph><location><page_12><loc_50><loc_13><loc_89><loc_16></location>- 9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.</paragraph> <paragraph><location><page_12><loc_50><loc_13><loc_89><loc_16></location>- 9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.</paragraph>
<paragraph><location><page_12><loc_8><loc_10><loc_47><loc_13></location>- 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-</paragraph> <paragraph><location><page_12><loc_8><loc_10><loc_47><loc_13></location>- 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-</paragraph>
<paragraph><location><page_12><loc_50><loc_10><loc_89><loc_13></location>9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-</paragraph> <paragraph><location><page_12><loc_50><loc_10><loc_89><loc_13></location>- 9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-</paragraph>
<paragraph><location><page_12><loc_50><loc_21><loc_89><loc_23></location>- 9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.</paragraph> <paragraph><location><page_12><loc_50><loc_21><loc_89><loc_23></location>- 9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.</paragraph>
<paragraph><location><page_12><loc_50><loc_16><loc_89><loc_20></location>- 9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).</paragraph> <paragraph><location><page_12><loc_50><loc_16><loc_89><loc_20></location>- 9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).</paragraph>
<paragraph><location><page_12><loc_50><loc_42><loc_89><loc_51></location>- 8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.</paragraph> <paragraph><location><page_12><loc_50><loc_42><loc_89><loc_51></location>- 8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.</paragraph>
@ -293,79 +293,22 @@
<paragraph><location><page_13><loc_8><loc_89><loc_15><loc_91></location>phan cell.</paragraph> <paragraph><location><page_13><loc_8><loc_89><loc_15><loc_91></location>phan cell.</paragraph>
<paragraph><location><page_13><loc_8><loc_86><loc_47><loc_89></location>9f. Otherwise create a new structural cell and match it wit the orphan cell.</paragraph> <paragraph><location><page_13><loc_8><loc_86><loc_47><loc_89></location>9f. Otherwise create a new structural cell and match it wit the orphan cell.</paragraph>
<paragraph><location><page_13><loc_8><loc_83><loc_47><loc_86></location>Aditional images with examples of TableFormer predictions and post-processing can be found below.</paragraph> <paragraph><location><page_13><loc_8><loc_83><loc_47><loc_86></location>Aditional images with examples of TableFormer predictions and post-processing can be found below.</paragraph>
<subtitle-level-1><location><page_13><loc_14><loc_81><loc_18><loc_81></location></subtitle-level-1> <paragraph><location><page_13><loc_10><loc_35><loc_45><loc_37></location>Figure 8: Example of a table with multi-line header.</paragraph>
<table>
<location><page_13><loc_14><loc_73><loc_39><loc_80></location>
</table>
<subtitle-level-1><location><page_13><loc_14><loc_71><loc_30><loc_72></location></subtitle-level-1>
<table>
<location><page_13><loc_14><loc_63><loc_39><loc_70></location>
</table>
<subtitle-level-1><location><page_13><loc_14><loc_61><loc_27><loc_62></location></subtitle-level-1>
<table>
<location><page_13><loc_14><loc_54><loc_39><loc_61></location>
</table>
<subtitle-level-1><location><page_13><loc_14><loc_50><loc_27><loc_51></location></subtitle-level-1>
<caption><location><page_13><loc_10><loc_35><loc_45><loc_37></location>Figure 8: Example of a table with multi-line header.</caption>
<table>
<location><page_13><loc_14><loc_38><loc_41><loc_50></location>
<caption>Figure 8: Example of a table with multi-line header.</caption>
</table>
<subtitle-level-1><location><page_13><loc_51><loc_87><loc_54><loc_88></location></subtitle-level-1>
<table>
<location><page_13><loc_51><loc_83><loc_91><loc_87></location>
</table>
<subtitle-level-1><location><page_13><loc_51><loc_81><loc_62><loc_82></location></subtitle-level-1>
<table>
<location><page_13><loc_51><loc_77><loc_91><loc_80></location>
</table>
<subtitle-level-1><location><page_13><loc_51><loc_75><loc_60><loc_76></location></subtitle-level-1>
<table>
<location><page_13><loc_51><loc_71><loc_91><loc_75></location>
</table>
<subtitle-level-1><location><page_13><loc_51><loc_68><loc_60><loc_69></location></subtitle-level-1>
<caption><location><page_13><loc_50><loc_59><loc_89><loc_61></location>Figure 9: Example of a table with big empty distance between cells.</caption> <caption><location><page_13><loc_50><loc_59><loc_89><loc_61></location>Figure 9: Example of a table with big empty distance between cells.</caption>
<figure> <figure>
<location><page_13><loc_51><loc_63><loc_70><loc_68></location> <location><page_13><loc_51><loc_63><loc_70><loc_68></location>
<caption>Figure 9: Example of a table with big empty distance between cells.</caption> <caption>Figure 9: Example of a table with big empty distance between cells.</caption>
</figure> </figure>
<subtitle-level-1><location><page_13><loc_55><loc_51><loc_58><loc_52></location></subtitle-level-1>
<table>
<location><page_13><loc_55><loc_45><loc_80><loc_51></location>
</table>
<subtitle-level-1><location><page_13><loc_55><loc_43><loc_69><loc_44></location></subtitle-level-1>
<table>
<location><page_13><loc_55><loc_37><loc_80><loc_43></location>
</table>
<subtitle-level-1><location><page_13><loc_55><loc_35><loc_67><loc_36></location></subtitle-level-1>
<table>
<location><page_13><loc_55><loc_28><loc_80><loc_34></location>
</table>
<subtitle-level-1><location><page_13><loc_55><loc_25><loc_66><loc_26></location></subtitle-level-1>
<caption><location><page_13><loc_51><loc_13><loc_89><loc_14></location>Figure 10: Example of a complex table with empty cells.</caption> <caption><location><page_13><loc_51><loc_13><loc_89><loc_14></location>Figure 10: Example of a complex table with empty cells.</caption>
<figure> <figure>
<location><page_13><loc_55><loc_16><loc_85><loc_25></location> <location><page_13><loc_55><loc_16><loc_85><loc_25></location>
<caption>Figure 10: Example of a complex table with empty cells.</caption> <caption>Figure 10: Example of a complex table with empty cells.</caption>
</figure> </figure>
<subtitle-level-1><location><page_14><loc_8><loc_86><loc_12><loc_87></location></subtitle-level-1>
<caption><location><page_14><loc_8><loc_52><loc_47><loc_55></location>Figure 11: Simple table with different style and empty cells.</caption> <caption><location><page_14><loc_8><loc_52><loc_47><loc_55></location>Figure 11: Simple table with different style and empty cells.</caption>
<figure> <figure>
<location><page_14><loc_8><loc_56><loc_46><loc_87></location> <location><page_14><loc_8><loc_56><loc_46><loc_87></location>
<caption>Figure 11: Simple table with different style and empty cells.</caption> <caption>Figure 11: Simple table with different style and empty cells.</caption>
</figure> </figure>
<subtitle-level-1><location><page_14><loc_8><loc_43><loc_11><loc_44></location></subtitle-level-1>
<table>
<location><page_14><loc_8><loc_38><loc_51><loc_43></location>
</table>
<subtitle-level-1><location><page_14><loc_8><loc_37><loc_20><loc_37></location></subtitle-level-1>
<table>
<location><page_14><loc_8><loc_32><loc_51><loc_36></location>
</table>
<subtitle-level-1><location><page_14><loc_8><loc_30><loc_18><loc_31></location></subtitle-level-1>
<table>
<location><page_14><loc_8><loc_25><loc_51><loc_30></location>
</table>
<subtitle-level-1><location><page_14><loc_8><loc_23><loc_18><loc_24></location></subtitle-level-1>
<caption><location><page_14><loc_9><loc_14><loc_46><loc_15></location>Figure 12: Simple table predictions and post processing.</caption> <caption><location><page_14><loc_9><loc_14><loc_46><loc_15></location>Figure 12: Simple table predictions and post processing.</caption>
<figure> <figure>
<location><page_14><loc_8><loc_17><loc_29><loc_23></location> <location><page_14><loc_8><loc_17><loc_29><loc_23></location>
@ -376,32 +319,13 @@
<location><page_14><loc_52><loc_55><loc_87><loc_89></location> <location><page_14><loc_52><loc_55><loc_87><loc_89></location>
<caption>Figure 13: Table predictions example on colorful table.</caption> <caption>Figure 13: Table predictions example on colorful table.</caption>
</figure> </figure>
<subtitle-level-1><location><page_14><loc_52><loc_46><loc_55><loc_46></location></subtitle-level-1> <paragraph><location><page_14><loc_56><loc_13><loc_83><loc_14></location>Figure 14: Example with multi-line text.</paragraph>
<table>
<location><page_14><loc_52><loc_40><loc_85><loc_46></location>
</table>
<subtitle-level-1><location><page_14><loc_52><loc_38><loc_63><loc_39></location></subtitle-level-1>
<table>
<location><page_14><loc_52><loc_32><loc_85><loc_38></location>
</table>
<subtitle-level-1><location><page_14><loc_52><loc_31><loc_61><loc_32></location></subtitle-level-1>
<table>
<location><page_14><loc_52><loc_25><loc_85><loc_31></location>
</table>
<subtitle-level-1><location><page_14><loc_52><loc_23><loc_61><loc_23></location></subtitle-level-1>
<caption><location><page_14><loc_56><loc_13><loc_83><loc_14></location>Figure 14: Example with multi-line text.</caption>
<table>
<location><page_14><loc_52><loc_16><loc_87><loc_23></location>
<caption>Figure 14: Example with multi-line text.</caption>
</table>
<caption><location><page_15><loc_9><loc_67><loc_20><loc_68></location></caption>
<figure> <figure>
<location><page_15><loc_9><loc_69><loc_46><loc_83></location> <location><page_15><loc_9><loc_69><loc_46><loc_83></location>
</figure> </figure>
<figure> <figure>
<location><page_15><loc_9><loc_53><loc_46><loc_67></location> <location><page_15><loc_9><loc_53><loc_46><loc_67></location>
</figure> </figure>
<subtitle-level-1><location><page_15><loc_9><loc_51><loc_18><loc_52></location></subtitle-level-1>
<figure> <figure>
<location><page_15><loc_9><loc_37><loc_46><loc_51></location> <location><page_15><loc_9><loc_37><loc_46><loc_51></location>
</figure> </figure>
@ -410,19 +334,9 @@
<location><page_15><loc_8><loc_20><loc_52><loc_36></location> <location><page_15><loc_8><loc_20><loc_52><loc_36></location>
<caption>Figure 15: Example with triangular table.</caption> <caption>Figure 15: Example with triangular table.</caption>
</figure> </figure>
<caption><location><page_15><loc_53><loc_85><loc_57><loc_86></location></caption>
<table>
<location><page_15><loc_53><loc_72><loc_86><loc_85></location>
</table>
<subtitle-level-1><location><page_15><loc_53><loc_70><loc_70><loc_71></location></subtitle-level-1>
<table>
<location><page_15><loc_53><loc_57><loc_86><loc_69></location>
</table>
<subtitle-level-1><location><page_15><loc_53><loc_55><loc_67><loc_56></location></subtitle-level-1>
<figure> <figure>
<location><page_15><loc_53><loc_41><loc_86><loc_54></location> <location><page_15><loc_53><loc_41><loc_86><loc_54></location>
</figure> </figure>
<subtitle-level-1><location><page_15><loc_58><loc_39><loc_73><loc_39></location></subtitle-level-1>
<caption><location><page_15><loc_50><loc_15><loc_89><loc_18></location>Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.</caption> <caption><location><page_15><loc_50><loc_15><loc_89><loc_18></location>Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.</caption>
<figure> <figure>
<location><page_15><loc_58><loc_20><loc_81><loc_38></location> <location><page_15><loc_58><loc_20><loc_81><loc_38></location>

File diff suppressed because one or more lines are too long

View File

@ -17,12 +17,12 @@ The occurrence of tables in documents is ubiquitous. They often summarise quanti
<!-- image --> <!-- image -->
## b. Red-annotation of bounding boxes, Blue-predictions by TableFormer - b. Red-annotation of bounding boxes, Blue-predictions by TableFormer
<!-- image --> <!-- image -->
## c. Structure predicted by TableFormer: - c. Structure predicted by TableFormer:
Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'. Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.
<!-- image --> <!-- image -->
@ -217,7 +217,7 @@ Table 4: Results of structure with content retrieved using cell detection on Pub
- a. - a.
Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells - Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells
Japanese language (previously unseen by TableFormer): Japanese language (previously unseen by TableFormer):
@ -420,7 +420,7 @@ where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for t
- 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me- - 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-
9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or- - 9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-
- 9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row. - 9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.
@ -440,7 +440,7 @@ phan cell.
Aditional images with examples of TableFormer predictions and post-processing can be found below. Aditional images with examples of TableFormer predictions and post-processing can be found below.
## Figure 8: Example of a table with multi-line header.
Figure 9: Example of a table with big empty distance between cells. Figure 9: Example of a table with big empty distance between cells.
<!-- image --> <!-- image -->
@ -457,6 +457,8 @@ Figure 12: Simple table predictions and post processing.
Figure 13: Table predictions example on colorful table. Figure 13: Table predictions example on colorful table.
<!-- image --> <!-- image -->
Figure 14: Example with multi-line text.
<!-- image --> <!-- image -->

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -2,7 +2,8 @@
<subtitle-level-1><location><page_1><loc_22><loc_82><loc_79><loc_85></location>Optimized Table Tokenization for Table Structure Recognition</subtitle-level-1> <subtitle-level-1><location><page_1><loc_22><loc_82><loc_79><loc_85></location>Optimized Table Tokenization for Table Structure Recognition</subtitle-level-1>
<paragraph><location><page_1><loc_23><loc_75><loc_78><loc_79></location>Maksym Lysak [0000 - 0002 - 3723 - $^{6960]}$, Ahmed Nassar[0000 - 0002 - 9468 - $^{0822]}$, Nikolaos Livathinos [0000 - 0001 - 8513 - $^{3491]}$, Christoph Auer[0000 - 0001 - 5761 - $^{0422]}$, [0000 - 0002 - 8088 - 0823]</paragraph> <paragraph><location><page_1><loc_23><loc_75><loc_78><loc_79></location>Maksym Lysak [0000 - 0002 - 3723 - $^{6960]}$, Ahmed Nassar[0000 - 0002 - 9468 - $^{0822]}$, Nikolaos Livathinos [0000 - 0001 - 8513 - $^{3491]}$, Christoph Auer[0000 - 0001 - 5761 - $^{0422]}$, [0000 - 0002 - 8088 - 0823]</paragraph>
<paragraph><location><page_1><loc_38><loc_74><loc_49><loc_75></location>and Peter Staar</paragraph> <paragraph><location><page_1><loc_38><loc_74><loc_49><loc_75></location>and Peter Staar</paragraph>
<paragraph><location><page_1><loc_36><loc_70><loc_64><loc_73></location>{mly,ahn,nli,cau,taa}@zurich.ibm.com IBM Research</paragraph> <paragraph><location><page_1><loc_46><loc_72><loc_55><loc_73></location>IBM Research</paragraph>
<paragraph><location><page_1><loc_36><loc_70><loc_64><loc_71></location>{mly,ahn,nli,cau,taa}@zurich.ibm.com</paragraph>
<paragraph><location><page_1><loc_27><loc_41><loc_74><loc_66></location>Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.</paragraph> <paragraph><location><page_1><loc_27><loc_41><loc_74><loc_66></location>Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.</paragraph>
<paragraph><location><page_1><loc_27><loc_37><loc_74><loc_40></location>Keywords: Table Structure Recognition · Data Representation · Transformers · Optimization.</paragraph> <paragraph><location><page_1><loc_27><loc_37><loc_74><loc_40></location>Keywords: Table Structure Recognition · Data Representation · Transformers · Optimization.</paragraph>
<subtitle-level-1><location><page_1><loc_22><loc_33><loc_37><loc_34></location>1 Introduction</subtitle-level-1> <subtitle-level-1><location><page_1><loc_22><loc_33><loc_37><loc_34></location>1 Introduction</subtitle-level-1>
@ -56,7 +57,7 @@
<paragraph><location><page_7><loc_22><loc_58><loc_59><loc_59></location>The OTSL representation follows these syntax rules:</paragraph> <paragraph><location><page_7><loc_22><loc_58><loc_59><loc_59></location>The OTSL representation follows these syntax rules:</paragraph>
<paragraph><location><page_7><loc_23><loc_54><loc_79><loc_56></location>- 1. Left-looking cell rule : The left neighbour of an "L" cell must be either another "L" cell or a "C" cell.</paragraph> <paragraph><location><page_7><loc_23><loc_54><loc_79><loc_56></location>- 1. Left-looking cell rule : The left neighbour of an "L" cell must be either another "L" cell or a "C" cell.</paragraph>
<paragraph><location><page_7><loc_23><loc_51><loc_79><loc_53></location>- 2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell.</paragraph> <paragraph><location><page_7><loc_23><loc_51><loc_79><loc_53></location>- 2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell.</paragraph>
<paragraph><location><page_7><loc_23><loc_49><loc_37><loc_50></location>- 3. Cross cell rule :</paragraph> <subtitle-level-1><location><page_7><loc_23><loc_49><loc_37><loc_50></location>3. Cross cell rule :</subtitle-level-1>
<paragraph><location><page_7><loc_25><loc_44><loc_79><loc_49></location>- The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell.</paragraph> <paragraph><location><page_7><loc_25><loc_44><loc_79><loc_49></location>- The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell.</paragraph>
<paragraph><location><page_7><loc_23><loc_43><loc_78><loc_44></location>- 4. First row rule : Only "L" cells and "C" cells are allowed in the first row.</paragraph> <paragraph><location><page_7><loc_23><loc_43><loc_78><loc_44></location>- 4. First row rule : Only "L" cells and "C" cells are allowed in the first row.</paragraph>
<paragraph><location><page_7><loc_23><loc_40><loc_79><loc_43></location>- 5. First column rule : Only "U" cells and "C" cells are allowed in the first column.</paragraph> <paragraph><location><page_7><loc_23><loc_40><loc_79><loc_43></location>- 5. First column rule : Only "U" cells and "C" cells are allowed in the first column.</paragraph>

File diff suppressed because one or more lines are too long

View File

@ -4,7 +4,9 @@ Maksym Lysak [0000 - 0002 - 3723 - $^{6960]}$, Ahmed Nassar[0000 - 0002 - 9468 -
and Peter Staar and Peter Staar
{mly,ahn,nli,cau,taa}@zurich.ibm.com IBM Research IBM Research
{mly,ahn,nli,cau,taa}@zurich.ibm.com
Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community. Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.
@ -91,7 +93,7 @@ The OTSL representation follows these syntax rules:
- 2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell. - 2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell.
- 3. Cross cell rule : ## 3. Cross cell rule :
- The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell. - The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell.

File diff suppressed because one or more lines are too long

View File

@ -256,7 +256,7 @@
<paragraph><location><page_13><loc_25><loc_58><loc_66><loc_59></location>- -Employees can see only their own unmasked TAX_ID.</paragraph> <paragraph><location><page_13><loc_25><loc_58><loc_66><loc_59></location>- -Employees can see only their own unmasked TAX_ID.</paragraph>
<paragraph><location><page_13><loc_25><loc_55><loc_89><loc_57></location>- -Managers see a masked version of TAX_ID with the first five characters replaced with the X character (for example, XXX-XX-1234).</paragraph> <paragraph><location><page_13><loc_25><loc_55><loc_89><loc_57></location>- -Managers see a masked version of TAX_ID with the first five characters replaced with the X character (for example, XXX-XX-1234).</paragraph>
<paragraph><location><page_13><loc_25><loc_52><loc_87><loc_54></location>- -Any other person sees the entire TAX_ID as masked, for example, XXX-XX-XXXX.</paragraph> <paragraph><location><page_13><loc_25><loc_52><loc_87><loc_54></location>- -Any other person sees the entire TAX_ID as masked, for example, XXX-XX-XXXX.</paragraph>
<paragraph><location><page_13><loc_25><loc_50><loc_87><loc_51></location>To implement this column mask, run the SQL statement that is shown in Example 3-9.</paragraph> <paragraph><location><page_13><loc_25><loc_50><loc_87><loc_51></location>- To implement this column mask, run the SQL statement that is shown in Example 3-9.</paragraph>
<paragraph><location><page_13><loc_22><loc_48><loc_58><loc_49></location>Example 3-9 Creating a mask on the TAX_ID column</paragraph> <paragraph><location><page_13><loc_22><loc_48><loc_58><loc_49></location>Example 3-9 Creating a mask on the TAX_ID column</paragraph>
<paragraph><location><page_13><loc_22><loc_14><loc_86><loc_47></location>CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX_ID RETURN CASE WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX_ID , 8 , 4 ) ) WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ;</paragraph> <paragraph><location><page_13><loc_22><loc_14><loc_86><loc_47></location>CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX_ID RETURN CASE WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX_ID , 8 , 4 ) ) WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ;</paragraph>
<paragraph><location><page_14><loc_22><loc_90><loc_74><loc_91></location>- 3. Figure 3-10 shows the masks that are created in the HR_SCHEMA.</paragraph> <paragraph><location><page_14><loc_22><loc_90><loc_74><loc_91></location>- 3. Figure 3-10 shows the masks that are created in the HR_SCHEMA.</paragraph>
@ -268,7 +268,7 @@
<subtitle-level-1><location><page_14><loc_11><loc_73><loc_33><loc_74></location>3.6.6 Activating RCAC</subtitle-level-1> <subtitle-level-1><location><page_14><loc_11><loc_73><loc_33><loc_74></location>3.6.6 Activating RCAC</subtitle-level-1>
<paragraph><location><page_14><loc_22><loc_67><loc_89><loc_71></location>Now that you have created the row permission and the two column masks, RCAC must be activated. The row permission and the two column masks are enabled (last clause in the scripts), but now you must activate RCAC on the table. To do so, complete the following steps:</paragraph> <paragraph><location><page_14><loc_22><loc_67><loc_89><loc_71></location>Now that you have created the row permission and the two column masks, RCAC must be activated. The row permission and the two column masks are enabled (last clause in the scripts), but now you must activate RCAC on the table. To do so, complete the following steps:</paragraph>
<paragraph><location><page_14><loc_22><loc_65><loc_67><loc_66></location>- 1. Run the SQL statements that are shown in Example 3-10.</paragraph> <paragraph><location><page_14><loc_22><loc_65><loc_67><loc_66></location>- 1. Run the SQL statements that are shown in Example 3-10.</paragraph>
<paragraph><location><page_14><loc_22><loc_62><loc_61><loc_63></location>Example 3-10 Activating RCAC on the EMPLOYEES table</paragraph> <subtitle-level-1><location><page_14><loc_22><loc_62><loc_61><loc_63></location>Example 3-10 Activating RCAC on the EMPLOYEES table</subtitle-level-1>
<paragraph><location><page_14><loc_22><loc_60><loc_62><loc_61></location>- /* Active Row Access Control (permissions) */</paragraph> <paragraph><location><page_14><loc_22><loc_60><loc_62><loc_61></location>- /* Active Row Access Control (permissions) */</paragraph>
<paragraph><location><page_14><loc_22><loc_58><loc_58><loc_60></location>- /* Active Column Access Control (masks)</paragraph> <paragraph><location><page_14><loc_22><loc_58><loc_58><loc_60></location>- /* Active Column Access Control (masks)</paragraph>
<paragraph><location><page_14><loc_60><loc_58><loc_62><loc_60></location>*/</paragraph> <paragraph><location><page_14><loc_60><loc_58><loc_62><loc_60></location>*/</paragraph>

File diff suppressed because one or more lines are too long

View File

@ -368,7 +368,7 @@ WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR', 'EMP' ) = 1 THEN EMPLOYEES . D
- -Any other person sees the entire TAX_ID as masked, for example, XXX-XX-XXXX. - -Any other person sees the entire TAX_ID as masked, for example, XXX-XX-XXXX.
To implement this column mask, run the SQL statement that is shown in Example 3-9. - To implement this column mask, run the SQL statement that is shown in Example 3-9.
Example 3-9 Creating a mask on the TAX_ID column Example 3-9 Creating a mask on the TAX_ID column
@ -385,7 +385,7 @@ Now that you have created the row permission and the two column masks, RCAC must
- 1. Run the SQL statements that are shown in Example 3-10. - 1. Run the SQL statements that are shown in Example 3-10.
Example 3-10 Activating RCAC on the EMPLOYEES table ## Example 3-10 Activating RCAC on the EMPLOYEES table
- /* Active Row Access Control (permissions) */ - /* Active Row Access Control (permissions) */

File diff suppressed because one or more lines are too long

View File

@ -10,11 +10,15 @@
<figure> <figure>
<location><page_1><loc_52><loc_62><loc_88><loc_71></location> <location><page_1><loc_52><loc_62><loc_88><loc_71></location>
</figure> </figure>
<section_header_level_1><location><page_1><loc_52><loc_58><loc_79><loc_60></location>b. Red-annotation of bounding boxes, Blue-predictions by TableFormer</section_header_level_1> <unordered_list>
<list_item><location><page_1><loc_52><loc_58><loc_79><loc_60></location>b. Red-annotation of bounding boxes, Blue-predictions by TableFormer</list_item>
</unordered_list>
<figure> <figure>
<location><page_1><loc_51><loc_48><loc_88><loc_57></location> <location><page_1><loc_51><loc_48><loc_88><loc_57></location>
</figure> </figure>
<section_header_level_1><location><page_1><loc_52><loc_46><loc_80><loc_47></location>c. Structure predicted by TableFormer:</section_header_level_1> <unordered_list>
<list_item><location><page_1><loc_52><loc_46><loc_80><loc_47></location>c. Structure predicted by TableFormer:</list_item>
</unordered_list>
<figure> <figure>
<location><page_1><loc_52><loc_37><loc_88><loc_45></location> <location><page_1><loc_52><loc_37><loc_88><loc_45></location>
<caption>Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.</caption> <caption>Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.</caption>
@ -150,8 +154,8 @@
</table> </table>
<unordered_list> <unordered_list>
<list_item><location><page_8><loc_9><loc_89><loc_10><loc_90></location>a.</list_item> <list_item><location><page_8><loc_9><loc_89><loc_10><loc_90></location>a.</list_item>
<list_item><location><page_8><loc_11><loc_89><loc_82><loc_90></location>Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells</list_item>
</unordered_list> </unordered_list>
<text><location><page_8><loc_11><loc_89><loc_82><loc_90></location>Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells</text>
<text><location><page_8><loc_9><loc_87><loc_46><loc_88></location>Japanese language (previously unseen by TableFormer):</text> <text><location><page_8><loc_9><loc_87><loc_46><loc_88></location>Japanese language (previously unseen by TableFormer):</text>
<text><location><page_8><loc_50><loc_87><loc_70><loc_88></location>Example table from FinTabNet:</text> <text><location><page_8><loc_50><loc_87><loc_70><loc_88></location>Example table from FinTabNet:</text>
<figure> <figure>
@ -297,9 +301,7 @@
<unordered_list> <unordered_list>
<list_item><location><page_12><loc_50><loc_13><loc_89><loc_16></location>9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.</list_item> <list_item><location><page_12><loc_50><loc_13><loc_89><loc_16></location>9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.</list_item>
<list_item><location><page_12><loc_8><loc_10><loc_47><loc_13></location>5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-</list_item> <list_item><location><page_12><loc_8><loc_10><loc_47><loc_13></location>5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-</list_item>
</unordered_list> <list_item><location><page_12><loc_50><loc_10><loc_89><loc_13></location>9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-</list_item>
<text><location><page_12><loc_50><loc_10><loc_89><loc_13></location>9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-</text>
<unordered_list>
<list_item><location><page_12><loc_50><loc_21><loc_89><loc_23></location>9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.</list_item> <list_item><location><page_12><loc_50><loc_21><loc_89><loc_23></location>9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.</list_item>
<list_item><location><page_12><loc_50><loc_16><loc_89><loc_20></location>9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).</list_item> <list_item><location><page_12><loc_50><loc_16><loc_89><loc_20></location>9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).</list_item>
<list_item><location><page_12><loc_50><loc_42><loc_89><loc_51></location>8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.</list_item> <list_item><location><page_12><loc_50><loc_42><loc_89><loc_51></location>8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.</list_item>
@ -312,75 +314,19 @@
<text><location><page_13><loc_8><loc_89><loc_15><loc_91></location>phan cell.</text> <text><location><page_13><loc_8><loc_89><loc_15><loc_91></location>phan cell.</text>
<text><location><page_13><loc_8><loc_86><loc_47><loc_89></location>9f. Otherwise create a new structural cell and match it wit the orphan cell.</text> <text><location><page_13><loc_8><loc_86><loc_47><loc_89></location>9f. Otherwise create a new structural cell and match it wit the orphan cell.</text>
<text><location><page_13><loc_8><loc_83><loc_47><loc_86></location>Aditional images with examples of TableFormer predictions and post-processing can be found below.</text> <text><location><page_13><loc_8><loc_83><loc_47><loc_86></location>Aditional images with examples of TableFormer predictions and post-processing can be found below.</text>
<section_header_level_1><location><page_13><loc_14><loc_81><loc_18><loc_81></location></section_header_level_1> <paragraph><location><page_13><loc_10><loc_35><loc_45><loc_37></location>Figure 8: Example of a table with multi-line header.</paragraph>
<table>
<location><page_13><loc_14><loc_73><loc_39><loc_80></location>
</table>
<section_header_level_1><location><page_13><loc_14><loc_71><loc_30><loc_72></location></section_header_level_1>
<table>
<location><page_13><loc_14><loc_63><loc_39><loc_70></location>
</table>
<section_header_level_1><location><page_13><loc_14><loc_61><loc_27><loc_62></location></section_header_level_1>
<table>
<location><page_13><loc_14><loc_54><loc_39><loc_61></location>
</table>
<section_header_level_1><location><page_13><loc_14><loc_50><loc_27><loc_51></location></section_header_level_1>
<table>
<location><page_13><loc_14><loc_38><loc_41><loc_50></location>
<caption>Figure 8: Example of a table with multi-line header.</caption>
</table>
<section_header_level_1><location><page_13><loc_51><loc_87><loc_54><loc_88></location></section_header_level_1>
<table>
<location><page_13><loc_51><loc_83><loc_91><loc_87></location>
</table>
<section_header_level_1><location><page_13><loc_51><loc_81><loc_62><loc_82></location></section_header_level_1>
<table>
<location><page_13><loc_51><loc_77><loc_91><loc_80></location>
</table>
<section_header_level_1><location><page_13><loc_51><loc_75><loc_60><loc_76></location></section_header_level_1>
<table>
<location><page_13><loc_51><loc_71><loc_91><loc_75></location>
</table>
<section_header_level_1><location><page_13><loc_51><loc_68><loc_60><loc_69></location></section_header_level_1>
<figure> <figure>
<location><page_13><loc_51><loc_63><loc_70><loc_68></location> <location><page_13><loc_51><loc_63><loc_70><loc_68></location>
<caption>Figure 9: Example of a table with big empty distance between cells.</caption> <caption>Figure 9: Example of a table with big empty distance between cells.</caption>
</figure> </figure>
<section_header_level_1><location><page_13><loc_55><loc_51><loc_58><loc_52></location></section_header_level_1>
<table>
<location><page_13><loc_55><loc_45><loc_80><loc_51></location>
</table>
<section_header_level_1><location><page_13><loc_55><loc_43><loc_69><loc_44></location></section_header_level_1>
<table>
<location><page_13><loc_55><loc_37><loc_80><loc_43></location>
</table>
<section_header_level_1><location><page_13><loc_55><loc_35><loc_67><loc_36></location></section_header_level_1>
<table>
<location><page_13><loc_55><loc_28><loc_80><loc_34></location>
</table>
<section_header_level_1><location><page_13><loc_55><loc_25><loc_66><loc_26></location></section_header_level_1>
<figure> <figure>
<location><page_13><loc_55><loc_16><loc_85><loc_25></location> <location><page_13><loc_55><loc_16><loc_85><loc_25></location>
<caption>Figure 10: Example of a complex table with empty cells.</caption> <caption>Figure 10: Example of a complex table with empty cells.</caption>
</figure> </figure>
<section_header_level_1><location><page_14><loc_8><loc_86><loc_12><loc_87></location></section_header_level_1>
<figure> <figure>
<location><page_14><loc_8><loc_56><loc_46><loc_87></location> <location><page_14><loc_8><loc_56><loc_46><loc_87></location>
<caption>Figure 11: Simple table with different style and empty cells.</caption> <caption>Figure 11: Simple table with different style and empty cells.</caption>
</figure> </figure>
<section_header_level_1><location><page_14><loc_8><loc_43><loc_11><loc_44></location></section_header_level_1>
<table>
<location><page_14><loc_8><loc_38><loc_51><loc_43></location>
</table>
<section_header_level_1><location><page_14><loc_8><loc_37><loc_20><loc_37></location></section_header_level_1>
<table>
<location><page_14><loc_8><loc_32><loc_51><loc_36></location>
</table>
<section_header_level_1><location><page_14><loc_8><loc_30><loc_18><loc_31></location></section_header_level_1>
<table>
<location><page_14><loc_8><loc_25><loc_51><loc_30></location>
</table>
<section_header_level_1><location><page_14><loc_8><loc_23><loc_18><loc_24></location></section_header_level_1>
<figure> <figure>
<location><page_14><loc_8><loc_17><loc_29><loc_23></location> <location><page_14><loc_8><loc_17><loc_29><loc_23></location>
<caption>Figure 12: Simple table predictions and post processing.</caption> <caption>Figure 12: Simple table predictions and post processing.</caption>
@ -389,30 +335,13 @@
<location><page_14><loc_52><loc_55><loc_87><loc_89></location> <location><page_14><loc_52><loc_55><loc_87><loc_89></location>
<caption>Figure 13: Table predictions example on colorful table.</caption> <caption>Figure 13: Table predictions example on colorful table.</caption>
</figure> </figure>
<section_header_level_1><location><page_14><loc_52><loc_46><loc_55><loc_46></location></section_header_level_1> <paragraph><location><page_14><loc_56><loc_13><loc_83><loc_14></location>Figure 14: Example with multi-line text.</paragraph>
<table>
<location><page_14><loc_52><loc_40><loc_85><loc_46></location>
</table>
<section_header_level_1><location><page_14><loc_52><loc_38><loc_63><loc_39></location></section_header_level_1>
<table>
<location><page_14><loc_52><loc_32><loc_85><loc_38></location>
</table>
<section_header_level_1><location><page_14><loc_52><loc_31><loc_61><loc_32></location></section_header_level_1>
<table>
<location><page_14><loc_52><loc_25><loc_85><loc_31></location>
</table>
<section_header_level_1><location><page_14><loc_52><loc_23><loc_61><loc_23></location></section_header_level_1>
<table>
<location><page_14><loc_52><loc_16><loc_87><loc_23></location>
<caption>Figure 14: Example with multi-line text.</caption>
</table>
<figure> <figure>
<location><page_15><loc_9><loc_69><loc_46><loc_83></location> <location><page_15><loc_9><loc_69><loc_46><loc_83></location>
</figure> </figure>
<figure> <figure>
<location><page_15><loc_9><loc_53><loc_46><loc_67></location> <location><page_15><loc_9><loc_53><loc_46><loc_67></location>
</figure> </figure>
<section_header_level_1><location><page_15><loc_9><loc_51><loc_18><loc_52></location></section_header_level_1>
<figure> <figure>
<location><page_15><loc_9><loc_37><loc_46><loc_51></location> <location><page_15><loc_9><loc_37><loc_46><loc_51></location>
</figure> </figure>
@ -420,18 +349,9 @@
<location><page_15><loc_8><loc_20><loc_52><loc_36></location> <location><page_15><loc_8><loc_20><loc_52><loc_36></location>
<caption>Figure 15: Example with triangular table.</caption> <caption>Figure 15: Example with triangular table.</caption>
</figure> </figure>
<table>
<location><page_15><loc_53><loc_72><loc_86><loc_85></location>
</table>
<section_header_level_1><location><page_15><loc_53><loc_70><loc_70><loc_71></location></section_header_level_1>
<table>
<location><page_15><loc_53><loc_57><loc_86><loc_69></location>
</table>
<section_header_level_1><location><page_15><loc_53><loc_55><loc_67><loc_56></location></section_header_level_1>
<figure> <figure>
<location><page_15><loc_53><loc_41><loc_86><loc_54></location> <location><page_15><loc_53><loc_41><loc_86><loc_54></location>
</figure> </figure>
<section_header_level_1><location><page_15><loc_58><loc_39><loc_73><loc_39></location></section_header_level_1>
<figure> <figure>
<location><page_15><loc_58><loc_20><loc_81><loc_38></location> <location><page_15><loc_58><loc_20><loc_81><loc_38></location>
<caption>Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.</caption> <caption>Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.</caption>

File diff suppressed because one or more lines are too long

View File

@ -16,11 +16,11 @@ The occurrence of tables in documents is ubiquitous. They often summarise quanti
<!-- image --> <!-- image -->
## b. Red-annotation of bounding boxes, Blue-predictions by TableFormer - b. Red-annotation of bounding boxes, Blue-predictions by TableFormer
<!-- image --> <!-- image -->
## c. Structure predicted by TableFormer: - c. Structure predicted by TableFormer:
Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'. Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.
@ -221,8 +221,7 @@ Table 4: Results of structure with content retrieved using cell detection on Pub
| TableFormer | 95.4 | 90.1 | 93.6 | | TableFormer | 95.4 | 90.1 | 93.6 |
- a. - a.
- Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells
Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells
Japanese language (previously unseen by TableFormer): Japanese language (previously unseen by TableFormer):
@ -381,9 +380,7 @@ where c is one of { left, centroid, right } and x$\_{c}$ is the xcoordinate for
- 9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column. - 9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.
- 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me- - 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-
- 9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-
9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-
- 9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row. - 9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.
- 9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column). - 9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).
- 8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score. - 8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.
@ -399,54 +396,20 @@ phan cell.
Aditional images with examples of TableFormer predictions and post-processing can be found below. Aditional images with examples of TableFormer predictions and post-processing can be found below.
##
##
##
##
Figure 8: Example of a table with multi-line header. Figure 8: Example of a table with multi-line header.
##
##
##
##
Figure 9: Example of a table with big empty distance between cells. Figure 9: Example of a table with big empty distance between cells.
<!-- image --> <!-- image -->
##
##
##
##
Figure 10: Example of a complex table with empty cells. Figure 10: Example of a complex table with empty cells.
<!-- image --> <!-- image -->
##
Figure 11: Simple table with different style and empty cells. Figure 11: Simple table with different style and empty cells.
<!-- image --> <!-- image -->
##
##
##
##
Figure 12: Simple table predictions and post processing. Figure 12: Simple table predictions and post processing.
<!-- image --> <!-- image -->
@ -455,36 +418,20 @@ Figure 13: Table predictions example on colorful table.
<!-- image --> <!-- image -->
##
##
##
##
Figure 14: Example with multi-line text. Figure 14: Example with multi-line text.
<!-- image --> <!-- image -->
<!-- image --> <!-- image -->
##
<!-- image --> <!-- image -->
Figure 15: Example with triangular table. Figure 15: Example with triangular table.
<!-- image --> <!-- image -->
##
##
<!-- image --> <!-- image -->
##
Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact. Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.
<!-- image --> <!-- image -->

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -2,7 +2,8 @@
<section_header_level_1><location><page_1><loc_22><loc_82><loc_79><loc_85></location>Optimized Table Tokenization for Table Structure Recognition</section_header_level_1> <section_header_level_1><location><page_1><loc_22><loc_82><loc_79><loc_85></location>Optimized Table Tokenization for Table Structure Recognition</section_header_level_1>
<text><location><page_1><loc_23><loc_75><loc_78><loc_79></location>Maksym Lysak [0000 - 0002 - 3723 - $^{6960]}$, Ahmed Nassar[0000 - 0002 - 9468 - $^{0822]}$, Nikolaos Livathinos [0000 - 0001 - 8513 - $^{3491]}$, Christoph Auer[0000 - 0001 - 5761 - $^{0422]}$, [0000 - 0002 - 8088 - 0823]</text> <text><location><page_1><loc_23><loc_75><loc_78><loc_79></location>Maksym Lysak [0000 - 0002 - 3723 - $^{6960]}$, Ahmed Nassar[0000 - 0002 - 9468 - $^{0822]}$, Nikolaos Livathinos [0000 - 0001 - 8513 - $^{3491]}$, Christoph Auer[0000 - 0001 - 5761 - $^{0422]}$, [0000 - 0002 - 8088 - 0823]</text>
<text><location><page_1><loc_38><loc_74><loc_49><loc_75></location>and Peter Staar</text> <text><location><page_1><loc_38><loc_74><loc_49><loc_75></location>and Peter Staar</text>
<text><location><page_1><loc_36><loc_70><loc_64><loc_73></location>{mly,ahn,nli,cau,taa}@zurich.ibm.com IBM Research</text> <text><location><page_1><loc_46><loc_72><loc_55><loc_73></location>IBM Research</text>
<text><location><page_1><loc_36><loc_70><loc_64><loc_71></location>{mly,ahn,nli,cau,taa}@zurich.ibm.com</text>
<text><location><page_1><loc_27><loc_41><loc_74><loc_66></location>Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.</text> <text><location><page_1><loc_27><loc_41><loc_74><loc_66></location>Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.</text>
<text><location><page_1><loc_27><loc_37><loc_74><loc_40></location>Keywords: Table Structure Recognition · Data Representation · Transformers · Optimization.</text> <text><location><page_1><loc_27><loc_37><loc_74><loc_40></location>Keywords: Table Structure Recognition · Data Representation · Transformers · Optimization.</text>
<section_header_level_1><location><page_1><loc_22><loc_33><loc_37><loc_34></location>1 Introduction</section_header_level_1> <section_header_level_1><location><page_1><loc_22><loc_33><loc_37><loc_34></location>1 Introduction</section_header_level_1>
@ -56,7 +57,9 @@
<unordered_list> <unordered_list>
<list_item><location><page_7><loc_23><loc_54><loc_79><loc_56></location>1. Left-looking cell rule : The left neighbour of an "L" cell must be either another "L" cell or a "C" cell.</list_item> <list_item><location><page_7><loc_23><loc_54><loc_79><loc_56></location>1. Left-looking cell rule : The left neighbour of an "L" cell must be either another "L" cell or a "C" cell.</list_item>
<list_item><location><page_7><loc_23><loc_51><loc_79><loc_53></location>2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell.</list_item> <list_item><location><page_7><loc_23><loc_51><loc_79><loc_53></location>2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell.</list_item>
<list_item><location><page_7><loc_23><loc_49><loc_37><loc_50></location>3. Cross cell rule :</list_item> </unordered_list>
<section_header_level_1><location><page_7><loc_23><loc_49><loc_37><loc_50></location>3. Cross cell rule :</section_header_level_1>
<unordered_list>
<list_item><location><page_7><loc_25><loc_44><loc_79><loc_49></location>The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell.</list_item> <list_item><location><page_7><loc_25><loc_44><loc_79><loc_49></location>The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell.</list_item>
<list_item><location><page_7><loc_23><loc_43><loc_78><loc_44></location>4. First row rule : Only "L" cells and "C" cells are allowed in the first row.</list_item> <list_item><location><page_7><loc_23><loc_43><loc_78><loc_44></location>4. First row rule : Only "L" cells and "C" cells are allowed in the first row.</list_item>
<list_item><location><page_7><loc_23><loc_40><loc_79><loc_43></location>5. First column rule : Only "U" cells and "C" cells are allowed in the first column.</list_item> <list_item><location><page_7><loc_23><loc_40><loc_79><loc_43></location>5. First column rule : Only "U" cells and "C" cells are allowed in the first column.</list_item>

File diff suppressed because one or more lines are too long

View File

@ -4,7 +4,9 @@ Maksym Lysak [0000 - 0002 - 3723 - $^{6960]}$, Ahmed Nassar[0000 - 0002 - 9468 -
and Peter Staar and Peter Staar
{mly,ahn,nli,cau,taa}@zurich.ibm.com IBM Research IBM Research
{mly,ahn,nli,cau,taa}@zurich.ibm.com
Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community. Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.
@ -88,7 +90,9 @@ The OTSL representation follows these syntax rules:
- 1. Left-looking cell rule : The left neighbour of an "L" cell must be either another "L" cell or a "C" cell. - 1. Left-looking cell rule : The left neighbour of an "L" cell must be either another "L" cell or a "C" cell.
- 2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell. - 2. Up-looking cell rule : The upper neighbour of a "U" cell must be either another "U" cell or a "C" cell.
- 3. Cross cell rule :
## 3. Cross cell rule :
- The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell. - The left neighbour of an "X" cell must be either another "X" cell or a "U" cell, and the upper neighbour of an "X" cell must be either another "X" cell or an "L" cell.
- 4. First row rule : Only "L" cells and "C" cells are allowed in the first row. - 4. First row rule : Only "L" cells and "C" cells are allowed in the first row.
- 5. First column rule : Only "U" cells and "C" cells are allowed in the first column. - 5. First column rule : Only "U" cells and "C" cells are allowed in the first column.

File diff suppressed because one or more lines are too long

View File

@ -265,8 +265,8 @@
<list_item><location><page_13><loc_25><loc_58><loc_66><loc_59></location>-Employees can see only their own unmasked TAX_ID.</list_item> <list_item><location><page_13><loc_25><loc_58><loc_66><loc_59></location>-Employees can see only their own unmasked TAX_ID.</list_item>
<list_item><location><page_13><loc_25><loc_55><loc_89><loc_57></location>-Managers see a masked version of TAX_ID with the first five characters replaced with the X character (for example, XXX-XX-1234).</list_item> <list_item><location><page_13><loc_25><loc_55><loc_89><loc_57></location>-Managers see a masked version of TAX_ID with the first five characters replaced with the X character (for example, XXX-XX-1234).</list_item>
<list_item><location><page_13><loc_25><loc_52><loc_87><loc_54></location>-Any other person sees the entire TAX_ID as masked, for example, XXX-XX-XXXX.</list_item> <list_item><location><page_13><loc_25><loc_52><loc_87><loc_54></location>-Any other person sees the entire TAX_ID as masked, for example, XXX-XX-XXXX.</list_item>
<list_item><location><page_13><loc_25><loc_50><loc_87><loc_51></location>To implement this column mask, run the SQL statement that is shown in Example 3-9.</list_item>
</unordered_list> </unordered_list>
<text><location><page_13><loc_25><loc_50><loc_87><loc_51></location>To implement this column mask, run the SQL statement that is shown in Example 3-9.</text>
<paragraph><location><page_13><loc_22><loc_48><loc_58><loc_49></location>Example 3-9 Creating a mask on the TAX_ID column</paragraph> <paragraph><location><page_13><loc_22><loc_48><loc_58><loc_49></location>Example 3-9 Creating a mask on the TAX_ID column</paragraph>
<code><location><page_13><loc_22><loc_14><loc_86><loc_47></location>CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX_ID RETURN CASE WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX_ID , 8 , 4 ) ) WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ;</code> <code><location><page_13><loc_22><loc_14><loc_86><loc_47></location>CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX_ID RETURN CASE WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX_ID , 8 , 4 ) ) WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ;</code>
<unordered_list> <unordered_list>
@ -281,7 +281,7 @@
<unordered_list> <unordered_list>
<list_item><location><page_14><loc_22><loc_65><loc_67><loc_66></location>1. Run the SQL statements that are shown in Example 3-10.</list_item> <list_item><location><page_14><loc_22><loc_65><loc_67><loc_66></location>1. Run the SQL statements that are shown in Example 3-10.</list_item>
</unordered_list> </unordered_list>
<paragraph><location><page_14><loc_22><loc_62><loc_61><loc_63></location>Example 3-10 Activating RCAC on the EMPLOYEES table</paragraph> <section_header_level_1><location><page_14><loc_22><loc_62><loc_61><loc_63></location>Example 3-10 Activating RCAC on the EMPLOYEES table</section_header_level_1>
<unordered_list> <unordered_list>
<list_item><location><page_14><loc_22><loc_60><loc_62><loc_61></location>/* Active Row Access Control (permissions) */</list_item> <list_item><location><page_14><loc_22><loc_60><loc_62><loc_61></location>/* Active Row Access Control (permissions) */</list_item>
<list_item><location><page_14><loc_22><loc_58><loc_58><loc_60></location>/* Active Column Access Control (masks)</list_item> <list_item><location><page_14><loc_22><loc_58><loc_58><loc_60></location>/* Active Column Access Control (masks)</list_item>

File diff suppressed because one or more lines are too long

View File

@ -334,8 +334,7 @@ WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'HR', 'EMP' ) = 1 THEN EMPLOYEES
- -Employees can see only their own unmasked TAX\_ID. - -Employees can see only their own unmasked TAX\_ID.
- -Managers see a masked version of TAX\_ID with the first five characters replaced with the X character (for example, XXX-XX-1234). - -Managers see a masked version of TAX\_ID with the first five characters replaced with the X character (for example, XXX-XX-1234).
- -Any other person sees the entire TAX\_ID as masked, for example, XXX-XX-XXXX. - -Any other person sees the entire TAX\_ID as masked, for example, XXX-XX-XXXX.
- To implement this column mask, run the SQL statement that is shown in Example 3-9.
To implement this column mask, run the SQL statement that is shown in Example 3-9.
Example 3-9 Creating a mask on the TAX\_ID column Example 3-9 Creating a mask on the TAX\_ID column
@ -355,7 +354,7 @@ Now that you have created the row permission and the two column masks, RCAC must
- 1. Run the SQL statements that are shown in Example 3-10. - 1. Run the SQL statements that are shown in Example 3-10.
Example 3-10 Activating RCAC on the EMPLOYEES table ## Example 3-10 Activating RCAC on the EMPLOYEES table
- /* Active Row Access Control (permissions) */ - /* Active Row Access Control (permissions) */
- /* Active Column Access Control (masks) - /* Active Column Access Control (masks)

File diff suppressed because one or more lines are too long