feat: Implement new reading-order model (#916)

* Implement new reading-order model, replacing DS GLM model (WIP)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update reading-order model branch

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lockfile [skip ci]

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add captions, footnotes and merges [skip ci]

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updates for reading-order implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updates for reading-order implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests and lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes, update tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add normalization, update tests again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests with code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Push final lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* sanitize text

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Inlcude furniture, Update tests with furniture

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix content_layer assignment

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Delete empty file docling/models/ds_glm_model.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
This commit is contained in:
Christoph Auer
2025-02-20 17:51:17 +01:00
committed by GitHub
parent c031a7ae47
commit c93e36988f
67 changed files with 757 additions and 871 deletions

View File

@@ -6,7 +6,7 @@ Front cover
<!-- image -->
<!-- image -->
Front cover
## Contents
@@ -108,20 +108,20 @@ This paper was produced by the IBM DB2 for i Center of Excellence team in partne
<!-- image -->
<!-- image -->
Jim Bainbridge is a senior DB2 consultant on the DB2 for i Center of Excellence team in the IBM Lab Services and Training organization. His primary role is training and implementation services for IBM DB2 Web Query for i and business analytics. Jim began his career with IBM 30 years ago in the IBM Rochester Development Lab, where he developed cooperative processing products that paired IBM PCs with IBM S/36 and AS/.400 systems. In the years since, Jim has held numerous technical roles, including independent software vendors technical support on a broad range of IBM technologies and products, and supporting customers in the IBM Executive Briefing Center and IBM Project Office.
<!-- image -->
Hernando Bedoya is a Senior IT Specialist at STG Lab Services and Training in Rochester, Minnesota. He writes extensively and teaches IBM classes worldwide in all areas of DB2 for i. Before joining STG Lab Services, he worked in the ITSO for nine years writing multiple IBM Redbooksfi publications. He also worked for IBM Colombia as an IBM AS/400fi IT Specialist doing presales support for the Andean countries. He has 28 years of experience in the computing field and has taught database classes in Colombian universities. He holds a Master's degree in Computer Science from EAFIT, Colombia. His areas of expertise are database technology, performance, and data warehousing. Hernando can be contacted at hbedoya@us.ibm.com .
## Authors
<!-- image -->
Chapter 1.
1
Chapter 1.
## Securing and protecting IBM DB2 data
Recent news headlines are filled with reports of data breaches and cyber-attacks impacting global businesses of all sizes. The Identity Theft Resource Center$^{1}$ reports that almost 5000 data breaches have occurred since 2005, exposing over 600 million records of data. The financial cost of these data breaches is skyrocketing. Studies from the Ponemon Institute$^{2}$ revealed that the average cost of a data breach increased in 2013 by 15% globally and resulted in a brand equity loss of $9.4 million per attack. The average cost that is incurred for each lost record containing sensitive information increased more than 9% to $145 per record.
@@ -196,8 +196,6 @@ Table 2-1 FUNCTION\_USAGE view
To discover who has authorization to define and manage RCAC, you can use the query that is shown in Example 2-1.
Example 2-1 Query to determine who has authority to define and manage RCAC
SELECT
function\_id,
@@ -241,10 +239,10 @@ Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
| User action | *JOBCTL | QIBM\_DB\_SECADM | QIBM\_DB\_SQLADM | QIBM\_DB\_SYSMON | No Authority |
|--------------------------------------------------------------------------------|-----------|------------------|------------------|------------------|----------------|
| SET CURRENT DEGREE (SQL statement) | X | | X | | |
| CHGQRYA command targeting a different user's job | X | | X | | |
| STRDBMON or ENDDBMON commands targeting a different user's job | X | | X | | |
| CHGQRYA command targeting a different users job | X | | X | | |
| STRDBMON or ENDDBMON commands targeting a different users job | X | | X | | |
| STRDBMON or ENDDBMON commands targeting a job that matches the current user | X | | X | X | X |
| QUSRJOBI() API format 900 or System i Navigator's SQL Details for Job | X | | X | X | |
| QUSRJOBI() API format 900 or System i Navigators SQL Details for Job | X | | X | X | |
| Visual Explain within Run SQL scripts | X | | X | X | X |
| Visual Explain outside of Run SQL scripts | X | | X | | |
| ANALYZE PLAN CACHE procedure | X | | X | | |
@@ -253,7 +251,7 @@ Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
| MODIFY PLAN CACHE PROPERTIES procedure (currently does not check authority) | X | | X | | |
| CHANGE PLAN CACHE SIZE procedure (currently does not check authority) | X | | X | | |
The SQL CREATE PERMISSION statement that is shown in Figure 3-1 is used to define and initially enable or disable the row access rules.Figure 3-1 CREATE PERMISSION SQL statement
Figure 3-1 CREATE PERMISSION SQL statement
<!-- image -->
@@ -261,8 +259,6 @@ The SQL CREATE PERMISSION statement that is shown in Figure 3-1 is used to defin
A column mask is a database object that manifests a column value access control rule for a specific column in a specific table. It uses a CASE expression that describes what you see when you access the column. For example, a teller can see only the last four digits of a tax identification number.
Table 3-1 summarizes these special registers and their values.
Table 3-1 Special registers and their corresponding values
| Special register | Corresponding value |
@@ -336,8 +332,6 @@ WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR', 'EMP' ) = 1 THEN EMPLOYEES . D
- -Any other person sees the entire TAX\_ID as masked, for example, XXX-XX-XXXX.
- To implement this column mask, run the SQL statement that is shown in Example 3-9.
Example 3-9 Creating a mask on the TAX\_ID column
```
CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX_ID RETURN CASE WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX_ID , 8 , 4 ) ) WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ;
```
@@ -374,13 +368,12 @@ Figure 3-11 Selecting the EMPLOYEES table from System i Navigator
<!-- image -->
- 2. Figure 4-68 shows the Visual Explain of the same SQL statement, but with RCAC enabled. It is clear that the implementation of the SQL statement is more complex because the row permission rule becomes part of the WHERE clause.
- 3. Compare the advised indexes that are provided by the Optimizer without RCAC and with RCAC enabled. Figure 4-69 shows the index advice for the SQL statement without RCAC enabled. The index being advised is for the ORDER BY clause.
Figure 4-68 Visual Explain with RCAC enabled
<!-- image -->
- 3. Compare the advised indexes that are provided by the Optimizer without RCAC and with RCAC enabled. Figure 4-69 shows the index advice for the SQL statement without RCAC enabled. The index being advised is for the ORDER BY clause.
Figure 4-69 Index advice with no RCAC
<!-- image -->
@@ -397,10 +390,10 @@ Implement roles and separation of duties
Leverage row permissions on the database
Protect columns by defining column masks
This IBM Redpaper publication provides information about the IBM i 7.2 feature of IBM DB2 for i Row and Column Access Control (RCAC). It offers a broad description of the function and advantages of controlling access to data in a comprehensive and transparent way. This publication helps you understand the capabilities of RCAC and provides examples of defining, creating, and implementing the row permissions and column masks in a relational database environment.
Protect columns by defining column masks
This paper is intended for database engineers, data-centric application developers, and security officers who want to design and implement RCAC as a part of their data control and governance policy. A solid background in IBM i object level security, DB2 for i relational database concepts, and SQL is assumed.
<!-- image -->