update to new docling-core and update test results with figures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-07-26 20:14:47 +00:00 · 2024-09-23 20:22:19 +02:00 · 2024-09-23 20:22:19 +02:00 · ddb20be002
commit ddb20be002
parent d0d1ac0957
7 changed files with 504 additions and 6 deletions
--- a/poetry.lock
+++ b/poetry.lock
@ -957,13 +957,13 @@ files = [
 [[package]]
 name = "docling-core"
-version = "1.5.0"
+version = "1.6.0"
 description = "A python library to define and validate data types in Docling."
 optional = false
 python-versions = "<4.0,>=3.9"
 files = [
-    {file = "docling_core-1.5.0-py3-none-any.whl", hash = "sha256:1a8bb4940ecbf98c6381298f3ad121d95aa8895883150a5dd113a348a0987d09"},
+    {file = "docling_core-1.6.0-py3-none-any.whl", hash = "sha256:a947a6585377ad9b74484adbe541383e0c3d55bf31176faef2fd72560fee24f0"},
-    {file = "docling_core-1.5.0.tar.gz", hash = "sha256:bc8ddbae16e2b740225f37758125eb95b9fcd4202542c4547a9683a7ad423e10"},
+    {file = "docling_core-1.6.0.tar.gz", hash = "sha256:e3325c12948f5ef426b8862189690db833dd49b0b335ffa399fcf2b54fbf2b44"},
 ]
 [package.dependencies]
@ -7257,4 +7257,4 @@ examples = ["langchain-huggingface", "langchain-milvus", "langchain-text-splitte
 [metadata]
 lock-version = "2.0"
 python-versions = "^3.10"
-content-hash = "7ee1e9e99c23e075fb1f8722e4fc9e6c0b02a4282f4e67ebbcd75598720536b7"
+content-hash = "ecdd1d482db8b0bf76fa72b791048197a745ccdbbe7fdaa5a6f1e40306115166"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -23,7 +23,7 @@ packages = [{include = "docling"}]
 [tool.poetry.dependencies]
 python = "^3.10"
 pydantic = "^2.0.0"
-docling-core = "^1.5.0"
+docling-core = "^1.6.0"
 docling-ibm-models = "^1.2.0"
 deepsearch-glm = "^0.21.1"
 filetype = "^1.2.0"
--- a/tests/data/2203.01017v2.md
+++ b/tests/data/2203.01017v2.md
@ -20,6 +20,8 @@ Tables organize valuable content in a concise and compact representation. This c
 b. Red-annotation of bounding boxes, Blue-predictions by TableFormer
 <!-- image -->
 c. Structure predicted by TableFormer:
 Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.
@ -77,6 +79,9 @@ We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and Tabl
 Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets
 <!-- image -->
 Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets
 balance in the previous datasets.
 The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as "simple" when it does not contain row spans or column spans, otherwise it is "complex". The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits.
@ -120,6 +125,12 @@ CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table im
 Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.
 <!-- image -->
 Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.
 Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes.
 <!-- image -->
 Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes.
 forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder .
@ -222,6 +233,10 @@ Japanese language (previously unseen by TableFormer):
 Example table from FinTabNet:
 <!-- image -->
 <!-- image -->
 b. Structure predicted by TableFormer, with superimposed matched PDF cell text:
 |                                                    |             | 論文ファイル   | 論文ファイル   | 参考文献   | 参考文献   |
@ -249,6 +264,14 @@ Text is aligned to match original for ease of viewing
 Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset.
 <!-- image -->
 Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset.
 <!-- image -->
 Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.
 <!-- image -->
 Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.
 ## 5.5. Qualitative Analysis
@ -381,6 +404,9 @@ Although TableFormer can predict the table structure and the bounding boxes for
 Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.
 <!-- image -->
 Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.
 · TableFormer output does not include the table cell content.
 · There are occasional inaccuracies in the predictions of the bounding boxes.
@ -433,18 +459,46 @@ Figure 8: Example of a table with multi-line header.
 Figure 9: Example of a table with big empty distance between cells.
 <!-- image -->
 Figure 9: Example of a table with big empty distance between cells.
 Figure 10: Example of a complex table with empty cells.
 <!-- image -->
 Figure 10: Example of a complex table with empty cells.
 <!-- image -->
 Figure 11: Simple table with different style and empty cells.
 <!-- image -->
 Figure 11: Simple table with different style and empty cells.
 Figure 12: Simple table predictions and post processing.
 <!-- image -->
 Figure 12: Simple table predictions and post processing.
 Figure 13: Table predictions example on colorful table.
 Figure 14: Example with multi-line text.
 <!-- image -->
 Figure 14: Example with multi-line text.
 Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.
 <!-- image -->
 Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.
 <!-- image -->
 Figure 15: Example with triangular table.
 <!-- image -->
 Figure 15: Example with triangular table.
 Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.
 <!-- image -->
 Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.
--- a/tests/data/2206.01062.md
+++ b/tests/data/2206.01062.md
@ -24,6 +24,9 @@ KDD '22, August 14-18, 2022, Washington, DC, USA © 2022 Copyright held by the o
 Figure 1: Four examples of complex page layouts across different document categories
 <!-- image -->
 Figure 1: Four examples of complex page layouts across different document categories
 ## KEYWORDS
 PDF document conversion, layout segmentation, object-detection, data set, Machine Learning
@ -70,6 +73,9 @@ In addition to open intellectual property constraints for the source documents,
 Figure 2: Distribution of DocLayNet pages across document categories.
 <!-- image -->
 Figure 2: Distribution of DocLayNet pages across document categories.
 to a minimum, since they introduce difficulties in annotation (see Section 4). As a second condition, we focussed on medium to large documents ( > 10 pages) with technical content, dense in complex tables, figures, plots and captions. Such documents carry a lot of information value, but are often hard to analyse with high accuracy due to their challenging layouts. Counterexamples of documents not included in the dataset are receipts, invoices, hand-written documents or photographs showing "text in the wild".
 The pages in DocLayNet can be grouped into six distinct categories, namely Financial Reports , Manuals , Scientific Articles , Laws & Regulations , Patents and Government Tenders . Each document category was sourced from various repositories. For example, Financial Reports contain both free-style format annual reports 2 which expose company-specific, artistic layouts as well as the more formal SEC filings. The two largest categories ( Financial Reports and Manuals ) contain a large amount of free-style layouts in order to obtain maximum variability. In the other four categories, we boosted the variability by mixing documents from independent providers, such as different government websites or publishers. In Figure 2, we show the document categories contained in DocLayNet with their respective sizes.
@ -108,6 +114,9 @@ Table 1: DocLayNet dataset overview. Along with the frequency of each class labe
 Figure 3: Corpus Conversion Service annotation user interface. The PDF page is shown in the background, with overlaid text-cells (in darker shades). The annotation boxes can be drawn by dragging a rectangle over each segment with the respective label from the palette on the right.
 <!-- image -->
 Figure 3: Corpus Conversion Service annotation user interface. The PDF page is shown in the background, with overlaid text-cells (in darker shades). The annotation boxes can be drawn by dragging a rectangle over each segment with the respective label from the palette on the right.
 we distributed the annotation workload and performed continuous quality controls. Phase one and two required a small team of experts only. For phases three and four, a group of 40 dedicated annotators were assembled and supervised.
 Phase 1: Data selection and preparation. Our inclusion criteria for documents were described in Section 3. A large effort went into ensuring that all documents are free to use. The data sources
@ -142,6 +151,9 @@ Phase 3: Training. After a first trial with a small group of people, we realised
 Figure 4: Examples of plausible annotation alternatives for the same page. Criteria in our annotation guideline can resolve cases A to C, while the case D remains ambiguous.
 <!-- image -->
 Figure 4: Examples of plausible annotation alternatives for the same page. Criteria in our annotation guideline can resolve cases A to C, while the case D remains ambiguous.
 were carried out over a timeframe of 12 weeks, after which 8 of the 40 initially allocated annotators did not pass the bar.
 Phase 4: Production annotation. The previously selected 80K pages were annotated with the defined 11 class labels by 32 annotators. This production phase took around three months to complete. All annotations were created online through CCS, which visualises the programmatic PDF text-cells as an overlay on the page. The page annotation are obtained by drawing rectangular bounding-boxes, as shown in Figure 3. With regard to the annotation practices, we implemented a few constraints and capabilities on the tooling level. First, we only allow non-overlapping, vertically oriented, rectangular boxes. For the large majority of documents, this constraint was sufficient and it speeds up the annotation considerably in comparison with arbitrary segmentation shapes. Second, annotator staff were not able to see each other's annotations. This was enforced by design to avoid any bias in the annotation, which could skew the numbers of the inter-annotator agreement (see Table 1). We wanted
@ -172,6 +184,9 @@ The primary goal of DocLayNet is to obtain high-quality ML models capable of acc
 Figure 5: Prediction performance (mAP@0.5-0.95) of a Mask R-CNN network with ResNet50 backbone trained on increasing fractions of the DocLayNet dataset. The learning curve flattens around the 80% mark, indicating that increasing the size of the DocLayNet dataset with similar data will not yield significantly better predictions.
 <!-- image -->
 Figure 5: Prediction performance (mAP@0.5-0.95) of a Mask R-CNN network with ResNet50 backbone trained on increasing fractions of the DocLayNet dataset. The learning curve flattens around the 80% mark, indicating that increasing the size of the DocLayNet dataset with similar data will not yield significantly better predictions.
 paper and leave the detailed evaluation of more recent methods mentioned in Section 2 for future work.
 In this section, we will present several aspects related to the performance of object detection models on DocLayNet. Similarly as in PubLayNet, we will evaluate the quality of their predictions using mean average precision (mAP) with 10 overlaps that range from 0.5 to 0.95 in steps of 0.05 (mAP@0.5-0.95). These scores are computed by leveraging the evaluation code provided by the COCO API [16].
@ -298,6 +313,9 @@ To date, there is still a significant gap between human and ML accuracy on the l
 Figure 6: Example layout predictions on selected pages from the DocLayNet test-set. (A, D) exhibit favourable results on coloured backgrounds. (B, C) show accurate list-item and paragraph differentiation despite densely-spaced lines. (E) demonstrates good table and figure distinction. (F) shows predictions on a Chinese patent with multiple overlaps, label confusion and missing boxes.
 <!-- image -->
 Figure 6: Example layout predictions on selected pages from the DocLayNet test-set. (A, D) exhibit favourable results on coloured backgrounds. (B, C) show accurate list-item and paragraph differentiation despite densely-spaced lines. (E) demonstrates good table and figure distinction. (F) shows predictions on a Chinese patent with multiple overlaps, label confusion and missing boxes.
 Diaconu, Mai Thanh Minh, Marc, albinxavi, fatih, oleg, and wanghao yang. ultralytics/yolov5: v6.0 - yolov5n nano models, roboflow integration, tensorflow export, opencv dnn support, October 2021.
 [14] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. CoRR , abs/2005.12872, 2020.
--- a/tests/data/2305.03393v1.md
+++ b/tests/data/2305.03393v1.md
@ -16,6 +16,9 @@ In modern document understanding systems [1,15], table extraction is typically a
 Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL).
 <!-- image -->
 Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL).
 today, table detection in documents is a well understood problem, and the latest state-of-the-art (SOTA) object detection methods provide an accuracy comparable to human observers [7,8,10,14,23]. On the other hand, the problem of table structure recognition (TSR) is a lot more challenging and remains a very active area of research, in which many novel machine learning algorithms are being explored [3,4,5,9,11,12,13,14,17,18,21,22].
 Recently emerging SOTA methods for table structure recognition employ transformer-based models, in which an image of the table is provided to the network in order to predict the structure of the table as a sequence of tokens. These image-to-sequence (Im2Seq) models are extremely powerful, since they allow for a purely data-driven solution. The tokens of the sequence typically belong to a markup language such as HTML, Latex or Markdown, which allow to describe table structure as rows, columns and spanning cells in various configurations. In Figure 1, we illustrate how HTML is used to represent the table-structure of a particular example table. Public table-structure data sets such as PubTabNet [22], and FinTabNet [21], which were created in a semi-automated way from paired PDF and HTML sources (e.g. PubMed Central), popularized primarily the use of HTML as ground-truth representation format for TSR.
@ -44,6 +47,9 @@ ulary and can be interpreted as a table structure. For example, with the HTML to
 Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet.
 <!-- image -->
 Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet.
 Obviously, HTML and other general-purpose markup languages were not designed for Im2Seq models. As such, they have some serious drawbacks. First, the token vocabulary needs to be artificially large in order to describe all plausible tabular structures. Since most Im2Seq models use an autoregressive approach, they generate the sequence token by token. Therefore, to reduce inference time, a shorter sequence length is critical. Every table-cell is represented by at least two tokens ( <td> and </td> ). Furthermore, when tokenizing the HTML structure, one needs to explicitly enumerate possible column-spans and row-spans as words. In practice, this ends up requiring 28 different HTML tokens (when including column- and row-spans up to 10 cells) just to describe every table in the PubTabNet dataset. Clearly, not every token is equally represented, as is depicted in Figure 2. This skewed distribution of tokens in combination with variable token row-length makes it challenging for models to learn the HTML structure.
 Additionally, it would be desirable if the representation would easily allow an early detection of invalid sequences on-the-go, before the prediction of the entire table structure is completed. HTML is not well-suited for this purpose as the verification of incomplete sequences is non-trivial or even impossible.
@ -78,6 +84,9 @@ A notable attribute of OTSL is that it has the capability of achieving lossless
 Fig. 3. OTSL description of table structure: A - table example; B - graphical representation of table structure; C - mapping structure on a grid; D - OTSL structure encoding; E - explanation on cell encoding
 <!-- image -->
 Fig. 3. OTSL description of table structure: A - table example; B - graphical representation of table structure; C - mapping structure on a grid; D - OTSL structure encoding; E - explanation on cell encoding
 ## 4.2 Language Syntax
 The OTSL representation follows these syntax rules:
@ -112,6 +121,9 @@ To evaluate the impact of OTSL on prediction accuracy and inference times, we co
 Fig. 4. Architecture sketch of the TableFormer model, which is a representative for the Im2Seq approach.
 <!-- image -->
 Fig. 4. Architecture sketch of the TableFormer model, which is a representative for the Im2Seq approach.
 We rely on standard metrics such as Tree Edit Distance score (TEDs) for table structure prediction, and Mean Average Precision (mAP) with 0.75 Intersection Over Union (IOU) threshold for the bounding-box predictions of table cells. The predicted OTSL structures were converted back to HTML format in
 order to compute the TED score. Inference timing results for all experiments were obtained from the same machine on a single core with AMD EPYC 7763 CPU @2.45 GHz.
@ -155,12 +167,18 @@ To illustrate the qualitative differences between OTSL and HTML, Figure 5 demons
 Fig. 5. The OTSL model produces more accurate bounding boxes with less overlap (E) than the HTML model (D), when predicting the structure of a sparse table (A), at twice the inference speed because of shorter sequence length (B),(C). "PMC2807444_006_00.png" PubTabNet. μ
 <!-- image -->
 Fig. 5. The OTSL model produces more accurate bounding boxes with less overlap (E) than the HTML model (D), when predicting the structure of a sparse table (A), at twice the inference speed because of shorter sequence length (B),(C). "PMC2807444_006_00.png" PubTabNet. μ
 μ
 ≥
 Fig. 6. Visualization of predicted structure and detected bounding boxes on a complex table with many rows. The OTSL model (B) captured repeating pattern of horizontally merged cells from the GT (A), unlike the HTML model (C). The HTML model also didn't complete the HTML sequence correctly and displayed a lot more of drift and overlap of bounding boxes. "PMC5406406_003_01.png" PubTabNet.
 <!-- image -->
 Fig. 6. Visualization of predicted structure and detected bounding boxes on a complex table with many rows. The OTSL model (B) captured repeating pattern of horizontally merged cells from the GT (A), unlike the HTML model (C). The HTML model also didn't complete the HTML sequence correctly and displayed a lot more of drift and overlap of bounding boxes. "PMC5406406_003_01.png" PubTabNet.
 ## 6 Conclusion
 We demonstrated that representing tables in HTML for the task of table structure recognition with Im2Seq models is ill-suited and has serious limitations. Furthermore, we presented in this paper an Optimized Table Structure Language (OTSL) which, when compared to commonly used general purpose languages, has several key benefits.
--- a/tests/data/redp5110.md
+++ b/tests/data/redp5110.md
@ -1,7 +1,15 @@
 Front cover
 <!-- image -->
 ## Row and Column Access Control Support in IBM DB2 for i
 <!-- image -->
 <!-- image -->
 <!-- image -->
 International Technical Support Organization
 ## Row and Column Access Control Support in IBM DB2 for i
@ -194,6 +202,8 @@ GLYPH<g115>GLYPH<g3> GLYPH<g53>GLYPH<g72>GLYPH<g79>GLYPH<g92>GLYPH<g3> GLYPH<g82
 GLYPH<g115>GLYPH<g3> GLYPH<g55> GLYPH<g68>GLYPH<g78>GLYPH<g72>GLYPH<g3> GLYPH<g68>GLYPH<g71>GLYPH<g89>GLYPH<g68>GLYPH<g81>GLYPH<g87>GLYPH<g68>GLYPH<g74>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g68>GLYPH<g70>GLYPH<g70>GLYPH<g72>GLYPH<g86>GLYPH<g86>GLYPH<g3> GLYPH<g87>GLYPH<g82>GLYPH<g3> GLYPH<g68> GLYPH<g3> GLYPH<g90>GLYPH<g82>GLYPH<g85>GLYPH<g79>GLYPH<g71>GLYPH<g90>GLYPH<g76>GLYPH<g71>GLYPH<g72>GLYPH<g3> GLYPH<g86>GLYPH<g82>GLYPH<g88>GLYPH<g85>GLYPH<g70>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g72>GLYPH<g91>GLYPH<g83>GLYPH<g72>GLYPH<g85>GLYPH<g87>GLYPH<g76>GLYPH<g86>GLYPH<g72>
 <!-- image -->
 Power Services
 ## DB2 for i Center of Excellence
@ -252,6 +262,8 @@ Pricing depends on the scope of work. Learn more about the DB2 for i Center of E
 ibm.com GLYPH<g18>GLYPH<g86>GLYPH<g92>GLYPH<g86>GLYPH<g87>GLYPH<g72>GLYPH<g80>GLYPH<g86>GLYPH<g18>GLYPH<g86>GLYPH<g72>GLYPH<g85>GLYPH<g89>GLYPH<g76>GLYPH<g70>GLYPH<g72>GLYPH<g86>GLYPH<g18>GLYPH<g79>GLYPH<g68>GLYPH<g69>GLYPH<g86>GLYPH<g72>GLYPH<g85>GLYPH<g89>GLYPH<g76>GLYPH<g70>GLYPH<g72>GLYPH<g86>
 <!-- image -->
 © Copyright IBM Corporation 2013
 IBM Corporation
@ -268,6 +280,8 @@ This document is current as of the initial date of publication and may be change
 Not all offerings are available in every country in which IBM operates.
 <!-- image -->
 Please Recycle
 ## Preface
@ -278,12 +292,26 @@ This paper is intended for database engineers, data-centric application develope
 This paper was produced by the IBM DB2 for i Center of Excellence team in partnership with the International Technical Support Organization (ITSO), Rochester, Minnesota US.
 <!-- image -->
 <!-- image -->
 Jim Bainbridge is a senior DB2 consultant on the DB2 for i Center of Excellence team in the IBM Lab Services and Training organization. His primary role is training and implementation services for IBM DB2 Web Query for i and business analytics. Jim began his career with IBM 30 years ago in the IBM Rochester Development Lab, where he developed cooperative processing products that paired IBM PCs with IBM S/36 and AS/.400 systems. In the years since, Jim has held numerous technical roles, including independent software vendors technical support on a broad range of IBM technologies and products, and supporting customers in the IBM Executive Briefing Center and IBM Project Office.
 Hernando Bedoya is a Senior IT Specialist at STG Lab Services and Training in Rochester, Minnesota. He writes extensively and teaches IBM classes worldwide in all areas of DB2 for i. Before joining STG Lab Services, he worked in the ITSO for nine years writing multiple IBM Redbooksfi publications. He also worked for IBM Colombia as an IBM AS/400fi IT Specialist doing presales support for the Andean countries. He has 28 years of experience in the computing field and has taught database classes in Colombian universities. He holds a Master's degree in Computer Science from EAFIT, Colombia. His areas of expertise are database technology, performance, and data warehousing. Hernando can be contacted at hbedoya@us.ibm.com .
 ## Authors
 <!-- image -->
 <!-- image -->
 <!-- image -->
 <!-- image -->
 <!-- image -->
 Rob Bestgen is a member of the DB2 for i Center of Excellence team helping customers use the capabilities of DB2 for i. In addition, Rob is the chief architect of the DB2 SQL Query Engine (SQE) for DB2 for i and is the product development manager for DB2 Web Query for i.
 Mike Cain is a Senior Technical Staff Member within the IBM Systems and Technology Group. He is also the founder and team leader of the DB2 for i Center of Excellence in Rochester, Minnesota US. Before his current position, he worked as an IBM AS/400 Systems Engineer and technical consultant. Before joining IBM in 1988, Mike worked as a System/38 programmer and data processing manager for a property and casualty insurance company. Mike has 26 years of experience with IBM, engaging clients and Business Partners around the world. In addition to assisting clients, he uses his knowledge and experience to influence the IBM solution, development, and support processes.
@ -294,6 +322,10 @@ Jim Denton is a senior consultant at the IBM DB2 for i Center of Excellence, whe
 Doug Mack is a DB2 for i and Business Intelligence Consultant in the IBM Power Systems™ Lab Services organization. Doug's 30+ year career with IBM spans many roles, including product development, technical sales support, Business Intelligence Sales Specialist, and DB2 for i Product Marketing Manager. Doug is a featured speaker at User Group conferences and meetings, IBM Technical Conferences, and Executive Briefings.
 <!-- image -->
 <!-- image -->
 Tom McKinley is an IBM Lab Services Consultant working on DB2 for IBM i in Rochester MN. His main focus is complex query performance that is associated with Business Intelligence running on Very Large Databases. He worked as a developer or performance analyst in the DB area from 1986 until 2006. Some of his major pieces of work include the Symmetric Multiple processing capabilities of DB2 for IBM i and Large Object Data types. In addition, he was on the original team that designed and built the SQL Query Engine. Before his database work, he worked on Licensed Internal Code for System 34 and System 36.
 Kent Milligan is a senior DB2 consultant on the DB2 for i Center of Excellence team within the IBM Lab Services and Training organization. His primary responsibility is helping software developers use the latest DB2 technologies and port applications from other databases to DB2 for i. After graduating from the University of Iowa, Kent spent the first eight years of his IBM career as a member of the DB2 development team in Rochester.
@ -350,6 +382,8 @@ GLYPH<SM590000> Stay current on recent Redbooks publications with RSS Feeds:
 http://www.redbooks.ibm.com/rss.html
 <!-- image -->
 Chapter 1.
 ## Securing and protecting IBM DB2 data
@ -402,6 +436,9 @@ As shown in Figure 1-1, it is an all-or-nothing access to the rows of a table.
 Figure 1-1 All-or-nothing access to the rows of a table
 <!-- image -->
 Figure 1-1 All-or-nothing access to the rows of a table
 Many businesses are trying to limit data access to a need-to-know basis. This security goal means that users should be given access only to the minimum set of data that is required to perform their job. Often, users with object-level access are given access to row and column values that are beyond what their business task requires because that object-level security provides an all-or-nothing solution. For example, object-level controls allow a manager to access data about all employees. Most security policies limit a manager to accessing data only for the employees that they manage.
 ## 1.3.1 Existing row and column control
@ -414,12 +451,17 @@ Even if you are willing to live with these performance and management issues, a
 Figure 1-2 Existing row and column controls
 <!-- image -->
 Figure 1-2 Existing row and column controls
 ## 1.3.2 New controls: Row and Column Access Control
 Based on the challenges that are associated with the existing technology available for controlling row and column access at a more granular level, IBM delivered new security support in the IBM i 7.2 release; this support is known as Row and Column Access Control (RCAC).
 The new DB2 RCAC support provides a method for controlling data access across all interfaces and all types of users with a data-centric solution. Moving security processing to the database layer makes it easier to build controls that meet your compliance policies. The RCAC support provides an additional layer of security that complements object-level authorizations to limit data access to a need-to-know basis. Therefore, it is critical that you first have a sound object-level security implementation in place.
 <!-- image -->
 Chapter 2.
 ## Roles and separation of duties
@ -595,6 +637,8 @@ Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
 | Edit Authorization List ( EDTAUTL ) CL command               |           | X                |                  |                  |                |
 | Work with Authorization Lists ( WRKAUTL ) CL command         |           | X                |                  |                  |                |
 <!-- image -->
 Chapter 3.
 3
@ -647,6 +691,9 @@ The SQL CREATE PERMISSION statement that is shown in Figure 3-1 is used to defin
 Figure 3-1 CREATE PERMISSION SQL statement
 <!-- image -->
 Figure 3-1 CREATE PERMISSION SQL statement
 ## Column mask
 A column mask is a database object that manifests a column value access control rule for a specific column in a specific table. It uses a CASE expression that describes what you see when you access the column. For example, a teller can see only the last four digits of a tax identification number.
@ -655,6 +702,9 @@ Column masks replace the need to create and use views to implement access contro
 Figure 3-2 CREATE MASK SQL statement
 <!-- image -->
 Figure 3-2 CREATE MASK SQL statement
 ## 3.1.2 Enabling and activating RCAC
 You can enable, disable, or regenerate row permissions and column masks by using the SQL ALTER PERMISSION statement and the SQL ALTER MASK statement, as shown in Figure 3-3 on page 17.
@ -665,12 +715,18 @@ Note: An exclusive lock is required on the table object to perform the alter ope
 Figure 3-3 ALTER PERMISSION and ALTER MASK SQL statements
 <!-- image -->
 Figure 3-3 ALTER PERMISSION and ALTER MASK SQL statements
 You can activate and deactivate RCAC for new or existing tables by using the SQL ALTER TABLE statement (Figure 3-4). The ACTIVATE or DEACTIVATE clause must be the option that is specified in the statement. No other alterations are permitted at the same time. The activating and deactivating effectively turns on or off all RCAC processing for the table. Only enabled row permissions and column masks take effect when activating RCAC.
 Note: An exclusive lock is required on the table object to perform the alter operation. All open cursors must be closed.
 Figure 3-4 ALTER TABLE SQL statement
 <!-- image -->
 Figure 3-4 ALTER TABLE SQL statement
 When row access control is activated on a table, a default permission is established for that table. The name of this permission is QIBM_DEFAULT_ <table-name>_<schema-name>. This default permission contains a simple piece of logic (0=1) which is never true. The default permission effectively denies access to every user unless there is a permission defined that allows access explicitly. If row access control is activated on a table, and there is no permission that is defined, no one has permission to any rows. All queries against the table produce an empty set.
 It is possible to define, create, and enable multiple permissions on a table. Logically, all of the permissions are ORed together to form a comprehensive test of the user's ability to access the data. A column can have only one mask that is defined over it. From an implementation standpoint, it does not matter if you create the column masks first or the row permissions first.
@ -721,6 +777,9 @@ GLYPH<SM590000> When proc1 ends, the session reverts to its original state with
 Figure 3-5 Special registers and adopted authority
 <!-- image -->
 Figure 3-5 Special registers and adopted authority
 ## 3.2.2 Built-in global variables
 Built-in global variables are provided with the database manager and are used in SQL statements to retrieve scalar values that are associated with the variables.
@ -875,6 +934,9 @@ The result of this query is shown in Figure 3-7, which is the total number of em
 Figure 3-7 Number of employees
 <!-- image -->
 Figure 3-7 Number of employees
 2. Run a second SQL statement (shown in Example 3-6) that lists the employees. If you have read access to the table, you see all the rows no matter who you are.
 Example 3-6 Displaying the information of the Employees
@ -909,6 +971,9 @@ CREATE PERMISSION HR_SCHEMA.PERMISSION1_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS E
 Figure 3-9 Row permissions that are shown in System i Navigator
 <!-- image -->
 Figure 3-9 Row permissions that are shown in System i Navigator
 ## 3.6.5 Defining and creating column masks
 Define the different masks for the columns that are sensitive by completing the following steps:
@ -952,6 +1017,9 @@ CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYE
 Figure 3-10 Column masks shown in System i Navigator
 <!-- image -->
 Figure 3-10 Column masks shown in System i Navigator
 ## 3.6.6 Activating RCAC
 Now that you have created the row permission and the two column masks, RCAC must be activated. The row permission and the two column masks are enabled (last clause in the scripts), but now you must activate RCAC on the table. To do so, complete the following steps:
@ -966,10 +1034,16 @@ Example 3-10 Activating RCAC on the EMPLOYEES table
 Figure 3-11 Selecting the EMPLOYEES table from System i Navigator
 <!-- image -->
 Figure 3-11 Selecting the EMPLOYEES table from System i Navigator
 3. The EMPLOYEES table definition is displayed, as shown in Figure 3-12. Note that the Row access control and Column access control options are checked.
 Figure 3-12 RCAC enabled on the EMPLOYEES table
 <!-- image -->
 Figure 3-12 RCAC enabled on the EMPLOYEES table
 ## 3.6.7 Demonstrating data access with RCAC
 You are now ready to start testing RCAC with the four different users. Complete the following steps:
@ -984,18 +1058,30 @@ SELECT COUNT(*) as ROW_COUNT FROM HR_SCHEMA.EMPLOYEES;
 Figure 3-13 Count of EMPLOYEES by HR
 <!-- image -->
 Figure 3-13 Count of EMPLOYEES by HR
 3. The result of the same query for a user who is logged on as TQSPENSER (Manager) is shown in Figure 3-14. TQSPENSER has five employees in his department and he can also see his own row, which is why the count is 6.
 Figure 3-14 Count of EMPLOYEES by a manager
 <!-- image -->
 Figure 3-14 Count of EMPLOYEES by a manager
 4. The result of the same query that is run by an employee (DSSMITH) gives the result that is shown in Figure 3-15. Each employee can see only his or her own data (row).
 Figure 3-15 Count of EMPLOYEES by an employee
 <!-- image -->
 Figure 3-15 Count of EMPLOYEES by an employee
 5. The result of the same query that is run by the Consultant/DBE gives the result that is shown in Figure 3-16. The consultants/DBE can manage and implement RCAC, but they do not see any rows at all.
 Figure 3-16 Count of EMPLOYEES by a consultant
 <!-- image -->
 Figure 3-16 Count of EMPLOYEES by a consultant
 Does the result make sense? Yes, it does because RCAC is enabled.
 6. Run queries against the EMPLOYEES table. The query that is used in this example runs and tests with the four different user profiles and is the same query that was run in 3.6.3, "Demonstrating data access without RCAC" on page 24. It is shown in Example 3-12.
@ -1058,6 +1144,8 @@ Figure 3-23 Employee on leave - Manager of Field Reps user
 Figure 3-24 Employees on leave - employee user
 <!-- image -->
 Chapter 4.
 4
@ -1104,6 +1192,9 @@ GLYPH<SM590000> The row permission for the TRANSACTIONS table is based on the AC
 Figure 4-1 Internet banking example
 <!-- image -->
 Figure 4-1 Internet banking example
 ## 4.2 Description of the users roles and responsibilities
 During the requirements gathering phase, the following groups of users are identified and codified:
@ -1168,6 +1259,9 @@ Figure 4-4 shows the data model of the banking scenario that is used in this exa
 Figure 4-4 Data model of the banking scenario
 <!-- image -->
 Figure 4-4 Data model of the banking scenario
 This section covers the following steps:
 GLYPH<SM590000> Reviewing the tables that are used in this example
@ -1208,6 +1302,9 @@ To review the attributes of each table that is used in this banking example, com
 Figure 4-6 CUSTOMERS table attributes
 <!-- image -->
 Figure 4-6 CUSTOMERS table attributes
 3. Click the Columns tab to see the columns of the CUSTOMERS table, as shown in Figure 4-7.
 Figure 4-7 Column definitions of the CUSTOMERS table
@ -1216,10 +1313,16 @@ Figure 4-7 Column definitions of the CUSTOMERS table
 Figure 4-8 Reviewing the constraints on the CUSTOMERS table
 <!-- image -->
 Figure 4-8 Reviewing the constraints on the CUSTOMERS table
 5. Review the definition of the ACCOUNTS table. The definition of the ACCOUNTS table is shown in Figure 4-9. RCAC has not been defined for this table yet.
 Figure 4-9 ACCOUNTS table attributes
 <!-- image -->
 Figure 4-9 ACCOUNTS table attributes
 6. Click the Columns tab to see the columns of the ACCOUNTS table, as shown in Figure 4-10.
 Figure 4-10 Column definitions of the ACCOUNTS table
@ -1232,6 +1335,9 @@ Figure 4-11 Reviewing the constraints on the ACCOUNTS table
 Figure 4-12 TRANSACTIONS table attributes
 <!-- image -->
 Figure 4-12 TRANSACTIONS table attributes
 9. Click the Columns tab to see the columns of the TRANSACTIONS table, as shown in Figure 4-13.
 Figure 4-13 Column definitions of the TRANSACTIONS table
@ -1252,22 +1358,37 @@ Complete the following steps:
 Figure 4-15 Application administration
 <!-- image -->
 Figure 4-15 Application administration
 2. The Application Administration window opens, as shown in Figure 4-16. Click IBM i  Database and select the function usage ID of Database Security Administrator .
 Figure 4-16 Application administration for IBM i
 <!-- image -->
 Figure 4-16 Application administration for IBM i
 3. Click Customize for the function usage ID of Database Security Administrator, as shown in Figure 4-17.
 Figure 4-17 Customizing the Database Security Administrator function usage ID
 <!-- image -->
 Figure 4-17 Customizing the Database Security Administrator function usage ID
 4. The Customize Access window opens, as shown in Figure 4-18. Click the users that need to implement RCAC. For this example, HBEDOYA and MCAIN are selected. Click Add and then click OK .
 Figure 4-18 Customize Access window
 <!-- image -->
 Figure 4-18 Customize Access window
 5. The Application Administrator window opens again. The function usage ID of Database Security Administrator now has an X in the Customized Access column, as shown in Figure 4-19.
 Figure 4-19 Function usage ID Database Security Administrator customized
 <!-- image -->
 Figure 4-19 Function usage ID Database Security Administrator customized
 6. Run an SQL query that shows which user profiles are enabled to define RCAC. The SQL query is shown in Figure 4-20.
 Figure 4-20 Query to display user profiles with function usage ID for RCAC
@ -1282,16 +1403,25 @@ Complete the following steps:
 Figure 4-21 Creating group profiles
 <!-- image -->
 Figure 4-21 Creating group profiles
 2. The New Group window opens, as shown in Figure 4-22. For each new group, enter the Group name (ADMIN, CUSTOMER, TELLER, and DBE) and add the user profiles that are associated to this group by selecting the user profile and clicking Add .
 Figure 4-22 shows adding user TQSPENCER to the TELLER group profile.
 Figure 4-22 Creating group profiles and adding users
 <!-- image -->
 Figure 4-22 Creating group profiles and adding users
 3. After you create all the group profiles, you should see them listed in System i Navigator under Users and Groups  Groups , as shown in Figure 4-23.
 Figure 4-23 Newly created group profiles
 <!-- image -->
 Figure 4-23 Newly created group profiles
 ## 4.3.4 Creating the CUSTOMER_LOGIN_ID global variable
 In this step, you create a global variable that is used to capture the Customer_Login_ID information, which is required to validate the permissions. For more information about global variables, see 3.2.2, "Built-in global variables" on page 19.
@ -1302,18 +1432,30 @@ Complete the following steps:
 Figure 4-24 Creating a global variable
 <!-- image -->
 Figure 4-24 Creating a global variable
 2. The New Global Variable window opens, as shown in Figure 4-25. Enter the global variable name of CUSTOMER_LOGIN_ID, select the data type of VARCHAR, and leave the default value of NULL. This default value ensures that users that do not use the web interface do not have permission to access the data. Click OK .
 Figure 4-25 Creating a global variable called CUSTOMER_LOGIN_ID
 <!-- image -->
 Figure 4-25 Creating a global variable called CUSTOMER_LOGIN_ID
 3. Now that the global variable is created, assign permissions to the variable so that it can be set by the program. Right-click the CUSTOMER_LOGIN_ID global variable and select Permissions , as shown in Figure 4-26.
 Figure 4-26 Setting permissions on the CUSTOMER_LOGIN_ID global variable
 <!-- image -->
 Figure 4-26 Setting permissions on the CUSTOMER_LOGIN_ID global variable
 4. The Permissions window opens, as shown in Figure 4-27. Select Change authority for Webuser so that the application can set this global variable.
 Figure 4-27 Setting change permissions for Webuser on the CUSTOMER_LOGIN_ID global variable
 <!-- image -->
 Figure 4-27 Setting change permissions for Webuser on the CUSTOMER_LOGIN_ID global variable
 ## 4.3.5 Defining and creating row permissions
 You now ready to define the row permissions of the tables. Complete the following steps:
@ -1322,6 +1464,9 @@ You now ready to define the row permissions of the tables. Complete the followin
 Figure 4-28 Selecting new row permissions
 <!-- image -->
 Figure 4-28 Selecting new row permissions
 2. The New Row Permission window opens, as shown in Figure 4-29. Enter the information regarding the row permissions on the CUSTOMERS table. This row permission defines what is established in the following policy:
 -User profiles that belong to DBE, ADMIN, and TELLER group profiles can see all the rows.
@ -1334,6 +1479,9 @@ Select the Enabled option. Click OK .
 Figure 4-29 New row permissions on the CUSTOMERS table
 <!-- image -->
 Figure 4-29 New row permissions on the CUSTOMERS table
 3. Define the row permissions for the ACCOUNTS table. The New Row Permission window opens, as shown in Figure 4-30. Enter the information regarding the row permissions on the ACCOUNTS table. This row permission defines what is established in the following policy:
 -User profiles that belong to DBE, ADMIN and TELLER group profiles can see all the rows.
@ -1346,6 +1494,9 @@ Select the Enabled option. Click OK .
 Figure 4-30 New row permissions on the ACCOUNTS table
 <!-- image -->
 Figure 4-30 New row permissions on the ACCOUNTS table
 4. Define the row permissions on the TRANSACTIONS table. The New Row Permission window opens, as shown in Figure 4-31. Enter the information regarding the row permissions on the TRANSACTIONS table. This row permission defines what is established in the following policy:
 -User profiles that belong to DBE, ADMIN, and TELLER group profiles can see all of the rows.
@ -1358,10 +1509,16 @@ Note: You must join back to ACCOUNTS and then to CUSTOMERS by using a subquery t
 Figure 4-31 New row permissions on the TRANSACTIONS table
 <!-- image -->
 Figure 4-31 New row permissions on the TRANSACTIONS table
 5. To verify that the row permissions are enabled, from System i Navigator, click Row Permissions , as shown in Figure 4-32. The three row permissions are created and enabled.
 Figure 4-32 List of row permissions on BANK_SCHEMA
 <!-- image -->
 Figure 4-32 List of row permissions on BANK_SCHEMA
 ## 4.3.6 Defining and creating column masks
 This section defines the masks on the columns. Complete the following steps:
@ -1370,6 +1527,9 @@ This section defines the masks on the columns. Complete the following steps:
 Figure 4-33 Creating a column mask
 <!-- image -->
 Figure 4-33 Creating a column mask
 2. In the New Column Mask window, which is shown in Figure 4-34, enter the following information:
 -Select the CUSTOMERS table on which to create the column mask.
@ -1382,6 +1542,9 @@ Select the Enabled option. Click OK .
 Figure 4-34 Defining a column mask on the CUSTOMERS table
 <!-- image -->
 Figure 4-34 Defining a column mask on the CUSTOMERS table
 3. Repeat steps 1 on page 58 and 2 to create column masks for the following columns:
 -MASK_DRIVERS_LICENSE_ON_CUSTOMERS
@ -1400,6 +1563,9 @@ Figure 4-34 Defining a column mask on the CUSTOMERS table
 Figure 4-35 List of column masks on BANK_SCHEMA
 <!-- image -->
 Figure 4-35 List of column masks on BANK_SCHEMA
 ## 4.3.7 Restricting the inserting and updating of masked data
 This step defines the check constraints that support the column masks to make sure that on INSERTS or UPDATES, data is not written with a masked value. For more information about the propagation of masked data, see 6.8, "Avoiding propagation of masked data" on page 108.
@ -1410,10 +1576,16 @@ This step defines the check constraints that support the column masks to make su
 Figure 4-36 Definition of the CUSTOMERS table
 <!-- image -->
 Figure 4-36 Definition of the CUSTOMERS table
 2. From the CUSTOMERS definition window, click the Check Constraints tab and click Add , as shown in Figure 4-37.
 Figure 4-37 Adding a check constraint
 <!-- image -->
 Figure 4-37 Adding a check constraint
 3. The New Check Constraint window opens, as shown in Figure 4-38. Complete the following steps:
 a. Select the CUSTOMER_EMAIL column.
@ -1424,14 +1596,23 @@ c. Select the On update violation, preserve column value option and click OK .
 Figure 4-38 Specifying a new check constraint on the CUSTOMERS table
 <!-- image -->
 Figure 4-38 Specifying a new check constraint on the CUSTOMERS table
 4. Figure 4-39 shows that there is now a check constraint on the CUSTOMERS table that prevents any masked data from being updated to the CUSTOMER_EMAIL column.
 Figure 4-39 Check constraint on the CUSTOMERS table
 <!-- image -->
 Figure 4-39 Check constraint on the CUSTOMERS table
 5. Create all the other check constraints that are associated to each of the masks on the CUSTOMERS table. After this is done, these constraints should look like the ones that are shown in Figure 4-40.
 Figure 4-40 List of check constraints on the CUSTOMERS table
 <!-- image -->
 Figure 4-40 List of check constraints on the CUSTOMERS table
 ## 4.3.8 Activating row and column access control
 You are now ready to activate RCAC on all three tables in this example. Complete the following steps:
@ -1440,14 +1621,23 @@ You are now ready to activate RCAC on all three tables in this example. Complete
 Figure 4-41 Enabling RCAC on the CUSTOMERS table
 <!-- image -->
 Figure 4-41 Enabling RCAC on the CUSTOMERS table
 2. Enable RCAC on the ACCOUNTS table. Right-click the ACCOUNTS table and select Definition . As shown Figure 4-42, make sure that you select Row access control and Column access control . Click OK .
 Figure 4-42 Enabling RCAC on ACCOUNTS
 <!-- image -->
 Figure 4-42 Enabling RCAC on ACCOUNTS
 3. Enable RCAC on the TRANSACTIONS table. Right-click the TRANSACTIONS table and select Definition . As shown in Figure 4-43, make sure that you select Row access control . Click OK .
 Figure 4-43 Enabling RCAC on TRANSACTIONS
 <!-- image -->
 Figure 4-43 Enabling RCAC on TRANSACTIONS
 ## 4.3.9 Reviewing row permissions
 This section displays all the row permissions after enabling RCAC. Complete the following steps:
@ -1456,14 +1646,23 @@ This section displays all the row permissions after enabling RCAC. Complete the
 Figure 4-44 Row permissions after enabling RCAC
 <!-- image -->
 Figure 4-44 Row permissions after enabling RCAC
 2. Look at one of the row permission definitions by right-clicking it and selecting Definition , as shown in Figure 4-45.
 Figure 4-45 Selecting row permission definition
 <!-- image -->
 Figure 4-45 Selecting row permission definition
 3. A window opens, as shown in Figure 4-46. Take note of the nonsensical search condition (0=1) of the QIBM_DEFAULT row permission. This permission is ORed with all of the others and it ensures that if someone does not meet any of the criteria from the row permission then this condition is tested, and because it is false the access is denied.
 Figure 4-46 Search condition of the QIBM_DEFAULT row permission
 <!-- image -->
 Figure 4-46 Search condition of the QIBM_DEFAULT row permission
 ## 4.3.10 Demonstrating data access with RCAC
 You are now ready to test the RCAC definitions. Run the following SQL statements with each type of user (DBE, SECURITY, TELLER, ADMIN, and WEBUSER):
@ -1508,6 +1707,9 @@ To test a SECURITY user, complete the following steps:
 Figure 4-50 SECURITY session user
 <!-- image -->
 Figure 4-50 SECURITY session user
 2. The number of rows in the CUSTOMERS table that the security officer can see is shown in Figure 4-51. The security officer cannot see any data at all.
 Figure 4-51 Number of rows that the security officer can see in the CUSTOMERS table
@ -1516,6 +1718,9 @@ Figure 4-51 Number of rows that the security officer can see in the CUSTOMERS ta
 Figure 4-52 SQL statement that is run by the SECURITY user - no results
 <!-- image -->
 Figure 4-52 SQL statement that is run by the SECURITY user - no results
 ## Data access for TELLER user with RCAC
 To test a Teller (TQSPENCER) user, complete the following steps:
@ -1528,6 +1733,9 @@ Figure 4-53 TELLER session user
 Figure 4-54 Number of rows that the TELLER user can see in the CUSTOMERS table
 <!-- image -->
 Figure 4-54 Number of rows that the TELLER user can see in the CUSTOMERS table
 3. The result of the third SQL statement is shown in Figure 4-55. Note the masked columns. The TELLER user, TQSPENSER, can see all the rows, but there are some columns where the result is masked.
 Figure 4-55 SQL statement that is run by the TELLER user with masked columns
@ -1540,6 +1748,9 @@ To test an ADMIN (VGLUCCHESS) user, complete the following steps:
 Figure 4-56 ADMIN session user
 <!-- image -->
 Figure 4-56 ADMIN session user
 2. The number of rows that the ADMIN user can see is shown in Figure 4-57. The ADMIN user can see all the rows.
 Figure 4-57 Number of rows that the ADMIN can see in the CUSTOMERS table
@ -1556,18 +1767,30 @@ To test a CUSTOMERS (WEBUSER) user that accesses the database by using the web a
 Figure 4-59 WEBUSER session user
 <!-- image -->
 Figure 4-59 WEBUSER session user
 2. A global variable (CUSTOMER_LOGIN_ID) is set by the web application and then is used to check the row permissions. Figure 4-60 shows setting the global variable by using the customer login ID.
 Figure 4-60 Setting the global variable CUSTOMER_LOGIN_ID
 <!-- image -->
 Figure 4-60 Setting the global variable CUSTOMER_LOGIN_ID
 3. Verify that the global variable was set with the correct value by clicking the Global Variable tab, as shown in Figure 4-61.
 Figure 4-61 Viewing the global variable value
 <!-- image -->
 Figure 4-61 Viewing the global variable value
 4. The number of rows that the WEBUSER can see is shown in Figure 4-62. This user can see only the one row that belongs to his web-based user ID.
 Figure 4-62 Number of rows that the WEBUSER can see in the CUSTOMERS table
 <!-- image -->
 Figure 4-62 Number of rows that the WEBUSER can see in the CUSTOMERS table
 5. The result of the third SQL statement is shown in Figure 4-63. There are no masked columns, and the user can see only one row, which is the user's own row.
 Figure 4-63 SQL statement that is run by WEBUSER - no masked columns
@ -1598,18 +1821,32 @@ This section looks at some other interesting information that is related to RCAC
 Figure 4-67 Visual Explain with no RCAC enabled
 <!-- image -->
 Figure 4-67 Visual Explain with no RCAC enabled
 2. Figure 4-68 shows the Visual Explain of the same SQL statement, but with RCAC enabled. It is clear that the implementation of the SQL statement is more complex because the row permission rule becomes part of the WHERE clause.
 Figure 4-68 Visual Explain with RCAC enabled
 <!-- image -->
 Figure 4-68 Visual Explain with RCAC enabled
 3. Compare the advised indexes that are provided by the Optimizer without RCAC and with RCAC enabled. Figure 4-69 shows the index advice for the SQL statement without RCAC enabled. The index being advised is for the ORDER BY clause.
 Figure 4-69 Index advice with no RCAC
 <!-- image -->
 Figure 4-69 Index advice with no RCAC
 4. Now, look at the advised indexes with RCAC enabled. As shown in Figure 4-70, there is an additional index being advised, which is basically for the row permission rule. For more information, see 6.4.2, "Index advisor" on page 99.
 Figure 4-70 Index advice with RCAC enabled
 <!-- image -->
 Figure 4-70 Index advice with RCAC enabled
 <!-- image -->
 Chapter 5.
 5
@ -1696,6 +1933,9 @@ In this example, the application reads the data for an update to correct the mis
 Figure 5-1 Accidental update with masked values scenario
 <!-- image -->
 Figure 5-1 Accidental update with masked values scenario
 Obviously, careful planning and testing should be exercised to avoid accidental updates with masked values.
 DB2 for i also enhanced its check constraint support in the IBM i 7.2 release with a new ON UPDATE clause that allows the existing value to be preserved when a masked value is detected by a check constraint. Details about how to employ this new check constraint support can be found in 6.8.1, "Check constraint solution" on page 108.
@ -1724,6 +1964,8 @@ If the target table has RCAC controls defined and activated, then the CPYF comma
 The CPYLIB command is enhanced with the same Access Control ( ACCCTL ) parameter as the CRTDUPOBJ command in the IBM i 7.2 release (see 5.4.1, "Create Duplicate Object (CRTDUPOBJ) command" on page 82). Row permissions and column masks are copied to the new object in the new library by default because the default value for the ACCCTL parameter is *ALL .
 <!-- image -->
 Chapter 6.
 ## Additional considerations
@ -1824,12 +2066,18 @@ Note: Column masks can influence an SQL INSERT or UPDATE . For example, you cann
 Figure 6-2 Masking differences between Fieldproc and RCAC
 <!-- image -->
 Figure 6-2 Masking differences between Fieldproc and RCAC
 ## 6.2 RCAC effects on data movement
 As described earlier and shown in Figure 6-3, RCAC is applied pervasively regardless of the data access programming interface, SQL statement, or IBM i command. The effects of RCAC on data movement scenarios can be profound and possibly problematic. It is important to understand these effects and make the appropriate adjustments to avoid incorrect results or data loss.
 Figure 6-3 RCAC and data movement
 <!-- image -->
 Figure 6-3 RCAC and data movement
 The "user" that is running the data movement application or process, whether it be a high availability (HA) scenario, an extract, transform, load (ETL) scenario, or just copying data from one file or table to another one, must have permission to all the source rows without masking, and not be restricted from putting rows into the target. Allowing the data movement application or process to bypass the RCAC rules must be based on a clear and concise understanding of the organization's object security and data access policy. Proper design, implementation, and testing are critical success factors when applying RCAC.
 Important: RCAC is applied to the table or physical file access. It is not applied to the journal receiver access. Any and all database transactions are represented in the journal regardless of RCAC row permissions and column masks. This makes it essential that IBM i security is used to ensure that only authorized personnel have access to the journaled data.
@ -1854,6 +2102,9 @@ For example, given a "source" table with a row permission defined as NAME <> 'CA
 Figure 6-4 RCAC effects on data movement from SOURCE
 <!-- image -->
 Figure 6-4 RCAC effects on data movement from SOURCE
 ## 6.2.2 Effects when RCAC is defined on the target table
 Example 6-2 shows a simple example that illustrates the effect of RCAC as defined on the target table.
@ -1866,6 +2117,9 @@ Given a "target" table with a row permission defined as NAME <> 'CAIN' and a col
 Figure 6-5 RCAC effects on data movement on TARGET
 <!-- image -->
 Figure 6-5 RCAC effects on data movement on TARGET
 ## 6.2.3 Effects when RCAC is defined on both source and target tables
 Example 6-3 shows a simple example that illustrates the effect of RCAC as defined on both the source and the target tables.
@ -1880,6 +2134,9 @@ Although the source rows where NAME <> 'CAIN' do satisfy the target table's perm
 Figure 6-6 RCAC effects on data movement on SOURCE and TARGET
 <!-- image -->
 Figure 6-6 RCAC effects on data movement on SOURCE and TARGET
 ## 6.3 RCAC effects on joins
 As mentioned previously, a fundamental concept of row permission is that it defines a logical subset of rows that a user or group of users is permitted to access and use. This subset becomes the new basis of any query against the table that has RCAC enabled.
@ -1890,40 +2147,61 @@ As shown in Figure 6-7, there are two different sets, set A and set B. However,
 Figure 6-7 Set A and set B with row permissions
 <!-- image -->
 Figure 6-7 Set A and set B with row permissions
 ## 6.3.1 Inner joins
 Inner join defines the intersection of two data sets. For a row to be returned from the inner join query, it must appear in both sets, as shown in Figure 6-8.
 Figure 6-8 Inner join without RCAC permission
 <!-- image -->
 Figure 6-8 Inner join without RCAC permission
 Given that row permission serves to eliminate logically rows from one or more sets, the result set from an inner join (and a subquery) can be different when RCAC is applied. RCAC can reduce the number of rows that are permitted to be accessed by the join, as shown in Figure 6-9.
 Effect of column masks on inner joins: Because column masks are applied after the query final results are determined, the masked value has no effect on the join processing and corresponding query result set.
 Figure 6-9 Inner join with RCAC permission
 <!-- image -->
 Figure 6-9 Inner join with RCAC permission
 ## 6.3.2 Outer joins
 Outer joins preserve one or both sides of two data sets. A row can be returned from the outer join query if it appears in the primary set (LEFT, RIGHT, or both in the case of FULL), as shown in Figure 6-10. Column values from the secondary set are returned if the row has a match in the primary set. Otherwise, NULL is returned for the column value by default.
 Figure 6-10 Outer join without RCAC permission
 <!-- image -->
 Figure 6-10 Outer join without RCAC permission
 Given that row permission serves to eliminate logically rows from one or more sets, more column values that are returned from the secondary table in outer join can be NULL when RCAC is applied, as shown in Figure 6-11.
 Effect of column masks on inner joins: Because column masks are applied after the query final results are determined, the masked value has no effect on the join processing and corresponding query result set.
 Figure 6-11 Outer join with RCAC permission
 <!-- image -->
 Figure 6-11 Outer join with RCAC permission
 ## 6.3.3 Exception joins
 Exception joins preserve one side of two data sets. A row can be returned from the exception join query if it appears in the primary set (LEFT or RIGHT) and the row does not appear in the secondary set, as shown in Figure 6-12. Column values from the secondary set are returned as NULL by default.
 Figure 6-12 Exception join without RCAC permission
 <!-- image -->
 Figure 6-12 Exception join without RCAC permission
 Given that row permission serves to eliminate logically rows from one or more sets, more rows can appear to be exceptions when RCAC is applied, as shown in Figure 6-13. Also, because column masks are applied after the query final results are determined, the masked value has no effect on the join processing and corresponding query result set.
 Figure 6-13 Exception join with RCAC permission
 <!-- image -->
 Figure 6-13 Exception join with RCAC permission
 ## 6.4 Monitoring, analyzing, and debugging with RCAC
 It is assumed (and it is a critical success factor) that the database engineer or application developer has a thorough understanding of the DB2 for i Query Optimizer, Database Engine, and all the associated tools and techniques.
@ -1954,18 +2232,30 @@ Figure 6-14 shows how Visual Explain externalizes RCAC.
 Figure 6-14 Visual Explain indicating that RCAC is applied
 <!-- image -->
 Figure 6-14 Visual Explain indicating that RCAC is applied
 Figure 6-15 shows the main dashboard of an SQL Performance Monitor. Click Summary .
 Figure 6-15 SQL Performance Monitor
 <!-- image -->
 Figure 6-15 SQL Performance Monitor
 Figure 6-16 shows the summary of an SQL Performance Monitor with an indication that RCAC is applied.
 Figure 6-16 SQL Performance Monitor indicating that RCAC is applied
 <!-- image -->
 Figure 6-16 SQL Performance Monitor indicating that RCAC is applied
 Figure 6-17 shows the statements of an SQL Performance Monitor and how RCAC is externalized.
 Figure 6-17 SQL Performance Monitor showing statements and RCAC
 <!-- image -->
 Figure 6-17 SQL Performance Monitor showing statements and RCAC
 When implementing RCAC as part of a comprehensive and pervasive data access control initiative, consider that the database monitoring and analysis tools can collect literal values that are passed as part of SQL statements. These literal values can be viewed as part of the information collected. If any of the literals are based on or are used with masked columns, it is important to review the database engineer's policy for viewing these data elements. For example, supposed that column CUSTOMER_TAX_ID is deemed masked for the database engineer and the CUSTOMER_TAX_ID column is used in a predicate as follows:
 WHERE CUSTOMER_TAX_ID = '123-45-7890'
@ -1984,10 +2274,16 @@ For example, the query that is shown in Figure 6-18 produces index advice for th
 Figure 6-18 Index advice and RCAC
 <!-- image -->
 Figure 6-18 Index advice and RCAC
 In Figure 6-19, index advisor is showing an index for the ACCOUNTS and CUSTOMERS tables based on the RCAC rule text.
 Figure 6-19 Index advisor based on the RCAC rule
 <!-- image -->
 Figure 6-19 Index advisor based on the RCAC rule
 For more information about creating and using indexes, see IBM DB2 for i indexing methods and strategies , found at:
 http://www.ibm.com/partnerworld/wps/servlet/ContentHandler/stg_ast_sys_wp_db2_i_in dexing_methods_strategies
@ -2080,10 +2376,16 @@ Any access to an SQL view that is over one or more tables that have RCAC also ha
 Figure 6-21 View definition and user query
 <!-- image -->
 Figure 6-21 View definition and user query
 What the query optimizer plans for and what the database engine runs is shown in the Figure 6-22.
 Figure 6-22 Query rewrite with RCAC
 <!-- image -->
 Figure 6-22 Query rewrite with RCAC
 ## 6.5.2 Materialized query tables
 When the query to populate a materialized query table (MQT) is run by the system on either the create table or a refresh table, and one or more source tables have RCAC defined, the row permissions and column masks are ignored. This means that the MQT has all of the data.
@ -2160,10 +2462,16 @@ For programs that access records sequentially, in or out of key order, the added
 Figure 6-23 Native record access with no RCAC
 <!-- image -->
 Figure 6-23 Native record access with no RCAC
 Before the record, as identified by the key, is considered available, the RCAC logic must be run. If the record is rejected by RCAC, the next record in sequence that is permissible must be identified. This spinning through the records can take a long time and uses many resources, as shown in Figure 6-24.
 Figure 6-24 Native record level access with RCAC
 <!-- image -->
 Figure 6-24 Native record level access with RCAC
 After the row permissions and column masks are designed and implemented, adequate performance and scalability testing are recommended.
 ## 6.7 Exclusive lock to implement RCAC (availability issues)
@ -2266,6 +2574,9 @@ Figure 6-25 illustrates that object level security is the first check and that R
 Figure 6-25 Object-level security and RCAC permissions
 <!-- image -->
 Figure 6-25 Object-level security and RCAC permissions
 To get access to the table and the rows, the user must pass the object level authority test and the RCAC permission test.
 The IBM i journal captures the transactional data and places an image of the row in the journal receiver. If the user has access to the journal receiver, the row image can be viewed if the user has authority to the journal receiver.
@ -2274,6 +2585,8 @@ Although the SQL Plan Cache data, the SQL Plan Cache Snapshot data, and the SQL
 The ability to monitor, analyze, debug, and tune data-centric applications effectively and efficiently requires some understanding of the underlying data, or at least the attributes of the data. The organization must be willing to reconcile the conflicting requirements of "restricting access to data", and "needing access to data".
 <!-- image -->
 Chapter 7.
 7
@ -2334,6 +2647,9 @@ For example, assume that the BANKSCHEMA library (which is the system name or sho
 Figure 7-1 Restoring tables to different schemas
 <!-- image -->
 Figure 7-1 Restoring tables to different schemas
 The only way to fix this issue is to re-create the row permission or column mask after the restore operation. Re-creation of the row permission or column mask is required only for definitions that reference other DB2 objects, but it is simpler to re-create all of the RCAC definitions instead of a subset. For example, generate the SQL using System i Navigator, clear the "Schema qualify names for objects" and select the "OR REPLACE clause", and then run the generated script.
 ## 7.2.2 Table migration
@ -2360,6 +2676,8 @@ GLYPH<SM590000> IBM i Version 7.2 Security Reference Guide , found at:
 http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzarl/rzarlkickoff.h tm?lang=en
 <!-- image -->
 Chapter 8.
 ## Designing and planning for success
@ -2406,8 +2724,12 @@ To further assist you with understanding and implementing RCAC, the DB2 for i Ce
 If you are interested in engaging with the DB2 for i Center of Excellence, contact Mike Cain at mcain@us.ibm.com .
 <!-- image -->
 Appendix A.
 <!-- image -->
 ## Database definitions for the RCAC banking example
 This appendix provides the database definitions or DDLs to re-create the Row and Column Access Control (RCAC) scenario that is described in Chapter 4, "Implementing Row and Column Access Control: Banking example" on page 37. The script that is shown in Example A-1 is the DDL script that is used to implement this example.
@ -2486,6 +2808,10 @@ This paper is intended for database engineers, data-centric application develope
 REDP-5110-00
 <!-- image -->
 <!-- image -->
 INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION
 ## BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE
--- a/tests/data/redp5695.md
+++ b/tests/data/redp5695.md
@ -1,7 +1,13 @@
 Front cover
 <!-- image -->
 ## IBM Cloud Pak for Data on IBM Z
 <!-- image -->
 <!-- image -->
 ## Executive overview
 Most industries are susceptible to fraud, which poses a risk to both businesses and consumers. According to The National Health Care Anti-Fraud Association, health care fraud alone causes the nation around $68 billion annually.$^{1}$ This statistic does not include the numerous other industries where fraudulent activities occur daily. In addition, the growing amount of data that enterprises own makes it difficult for them to detect fraud. Businesses can benefit by using an analytical platform to fully integrate their data with artificial intelligence (AI) technology.
@ -38,6 +44,9 @@ Figure 1 on page 3 shows a picture of the IBM z16 mainframe.
 Figure 1 IBM z16
 <!-- image -->
 Figure 1 IBM z16
 ## IBM z16 and IBM LinuxONE Emperor 4 features
 IBM Z are based on enterprise mainframe technology. Starting with transaction-based workloads and databases, IBM Z has undergone tremendous transformations in its system design for many generations to build servers that cater to Linux-based workloads and security with a cyberresilient system, and support quantum computing and modernization by using a hybrid cloud with a focus on data and AI.
@ -46,12 +55,18 @@ Figure 2 provides a snapshot of the IBM Z processor roadmap, which depicts the j
 Figure 2 IBM Z: Processor roadmap
 <!-- image -->
 Figure 2 IBM Z: Processor roadmap
 The IBM z16 and IBM LinuxONE Emperor 4 are the latest of the IBM Z, and they are developed with a 'built to build' focus to provide a powerful, cyberresilient, open, and secure platform for business with an extra focus on sustainability to help build sustainable data centers. Although the z16 server can host both IBM z/OSfi and Linux workloads, LinuxONE Emperor 4 is built to host Linux only workloads with a focus on consolidation and resiliency. Depending on the workload, consolidation from numerous x86 servers into a LinuxONE Emperor 4 can help reduce energy consumption by 75% and data center floor space by 50%, which helps to achieve the sustainability goals of the organization.
 Figure 3 on page 5 shows a summary of the system design of IBM LinuxONE Emperor 4 with the IBM Telum™ processor. The IBM Telum processor chip is designed to run enterprise applications efficiently where their data resides to embed AI with super low latency. The support for higher bandwidth and I/O rates is supported through FCP Express cards with an endpoint security solution. The memory subsystem supports up to 40 TB of memory.
 Figure 3 System design of IBM z16 LinuxONE Emperor 4
 <!-- image -->
 Figure 3 System design of IBM z16 LinuxONE Emperor 4
 The IBM z16 and IBM LinuxONE Emperor 4 servers are built with 7-nm technology at a 5.2 GHz speed. They consist of four dual-chip modules (DCMs) per central processor complex (CPC) drawer, each of which is built with two 8-core Telum processor chips that has "first in the industry" on-chip acceleration for mid-transaction, real-time AI inferencing, which supports many different use cases, including fraud detection.
 Each core has access to a huge private 32 MB L2 cache where up to 16 MB of the L2 cache of an inactive core can be used as virtual cache (L3 / L4) by neighboring active cores on the chip. This cache helps address translation and access checking by prefetching the same virtual cache into the L2 cache. The virtual cache also includes Neural Network Processing Assist instructions and direct memory access with protection, and per chip GZIP compression.
@ -60,12 +75,18 @@ Figure 4 provides more information about the features of AI Accelerator integrat
 Figure 4 IBM z16 on-chip AI Accelerator integration with IBM Z processor cores
 <!-- image -->
 Figure 4 IBM z16 on-chip AI Accelerator integration with IBM Z processor cores
 The IBM z16 and IBM LinuxONE Emperor 4 server platforms are built with the hardware features that are shown in Figure 4 with addressing data and AI workloads in mind. Regardless of where the ML and deep learning (DL) frameworks are used to build and train data and AI models, the inferencing on existing enterprise application data can happen along currently running enterprise business applications. CP4D 4.6 supports Tensorflow and IBM Snap ML frameworks, which are optimized to use the on-chip AI Accelerator during inferencing. Support for various other frameworks is planned for future releases.
 Figure 5 on page 7 shows the seamless integration of AI into existing enterprises workloads on the IBM z16 while leveraging the underlying hardware capabilities.
 Figure 5 Seamless integration
 <!-- image -->
 Figure 5 Seamless integration
 ## What is Cloud Pak for Data on IBM Z
 IBM Cloud Pak for Data allows enterprises to simplify, unify, and automate the delivery of data and AI. It categorizes the activities within the journey to AI as four rungs of the AI Ladder: Collect, Organize, Analyze, and Infuse. For more information about each of the AI Ladder rungs, see Become Data Driven with IBM Z Infused Data Fabric , REDP-5680.
@ -76,6 +97,9 @@ Figure 6 shows a solution overview of CP4D. The infrastructure alternatives are
 Figure 6 Solution overview of Cloud Pak for Data
 <!-- image -->
 Figure 6 Solution overview of Cloud Pak for Data
 We highlight the four main pillars that make IBM Z the correct infrastructure for CP4D:
 GLYPH<SM590000> Performance and Scale
@ -136,6 +160,9 @@ Figure 7 on page 11 provides an overview of the components that are supported on
 Figure 7 Developing, training, and deploying an AI model on Cloud Pak for Data on IBM Z and IBM LinuxONE
 <!-- image -->
 Figure 7 Developing, training, and deploying an AI model on Cloud Pak for Data on IBM Z and IBM LinuxONE
 In summary, here are some of the reasons why you should choose AI on IBM Z:
 GLYPH<SM590000> World-class AI inference platform for enterprise workloads:
@ -228,6 +255,9 @@ For example, a business can start testing a model before production for fairness
 Figure 8 Typical AI model lifecycle
 <!-- image -->
 Figure 8 Typical AI model lifecycle
 Due to regulations, more stakeholders adopt the typical AI model lifecycle to protect their brand from new end-to-end risks. To ensure various aspects of both regulatory compliance and security, the personas that must be involved include the chief financial officer (CFO), chief marketing officer (CMO), chief data officer (CDO), HR, and chief regulatory officer (CRO), along with the data engineers, data scientists, and business analysts, who build AI workflows.
 ## IBM governance solution for IBM Z
@ -280,44 +310,74 @@ Figure 9 on page 16 shows the end-to-end flow for a remote AI governance solutio
 Figure 9 Remote AI governance solution end-to-end flow
 <!-- image -->
 Figure 9 Remote AI governance solution end-to-end flow
 To achieve end-to-end AI governance, complete the following steps:
 1. Create a model entry in IBM OpenPages by using CP4D on a x86 platform, as shown in Figure 10.
 Figure 10 Creating a model entry in IBM OpenPages
 <!-- image -->
 Figure 10 Creating a model entry in IBM OpenPages
 2. Train a model by using Watson Studio and by using development tools such as Jupyter Notebook or JupyterLab on CP4D on Red Hat OpenShift on a virtual machine on IBM Z, as shown in Figure 11.
 Figure 11 Training an AI model by using Watson Studio
 <!-- image -->
 Figure 11 Training an AI model by using Watson Studio
 3. Deploy the model by using WML on CP4D on Red Hat OpenShift on a virtual machine on IBM Z, as shown in Figure 12.
 Figure 12 Deploying an AI model by using WML on Cloud Pak for Data
 <!-- image -->
 Figure 12 Deploying an AI model by using WML on Cloud Pak for Data
 4. Track the external model lifecycle by browsing through the Catalogs/Platform assets catalog by using AI Factsheets and OpenPages while using CP4D on an x86 platform, as shown in Figure 13. The external model (deployed on CP4D on Red Hat OpenShift on a virtual machine on IBM Z) is saved as a platform asset catalog on the x86 platform.
 Figure 13 External model
 <!-- image -->
 Figure 13 External model
 You can track the model through each stage of the model lifecycle, as shown in Figure 14, by using AI Factsheets and OpenPages.
 Figure 14 Tracking the model
 <!-- image -->
 Figure 14 Tracking the model
 You can see that the model facts are tracked and synchronized to IBM OpenPages for risk management, as shown in Figure 15.
 Figure 15 Model facts that are tracked and synchronized to IBM OpenPages on an x86 platform
 <!-- image -->
 Figure 15 Model facts that are tracked and synchronized to IBM OpenPages on an x86 platform
 5. Create an external model by using IBM OpenScale on the x86 platform, as shown in Figure 16.
 Figure 16 Creating an external model on an x86 platform
 <!-- image -->
 Figure 16 Creating an external model on an x86 platform
 IBM OpenScale provides a comprehensive dashboard that tracks fairness, quality monitoring, drift, and explainability of a model. Fairness determines whether your model produces biased outcomes. Quality determines how well your model predicts outcomes. Drift is the degradation of predictive performance over time. A sample is shown in Figure 17 on page 21.
 Figure 17 IBM OpenScale dashboard that is used to monitor the external model
 <!-- image -->
 Figure 17 IBM OpenScale dashboard that is used to monitor the external model
 You developed and deployed the AI model by using Watson Studio, WML on CP4D on Red Hat OpenShift on a virtual machine on IBM Z, and end-to-end AI model governance by leveraging AI Factsheets, OpenScale, and OpenPages on CP4D on a x86 platform. Figure 18 shows end-to-end AI governance when using IBM OpenPages, AI Factsheets, and OpenScale.
 Figure 18 Final result: End-to-end AI governance when using IBM OpenPages, AI Factsheets, and OpenScale
 <!-- image -->
 Figure 18 Final result: End-to-end AI governance when using IBM OpenPages, AI Factsheets, and OpenScale
 ## Use case 2: Credit default risk assessment
 In today's world, many individuals or businesses seeking loans to meet their growing business needs often look to financial institutions. Financial institutions can offer loans to individuals or businesses and charge interest based on the current market situations.
@ -336,6 +396,9 @@ Figure 19 on page 23 shows a sample architecture about how to design and develop
 Figure 19 Architecture for credit risk prediction by using an ML AI model on IBM Z
 <!-- image -->
 Figure 19 Architecture for credit risk prediction by using an ML AI model on IBM Z
 A data scientist can leverage Watson Studio to develop and train an AI model and WML to deploy and score the model. In this sample architecture, the WML Python run time leverages the ML framework, IBM Snap Machine Learning (Snap ML), for scoring, can leverage an integrated AI accelerator at the time of model import.
 Then, the banking loan approval team can send a loan applicant request to the IBM WebSphere Application Server, which can make a request to the AI inference endpoint. The AI inference engine scores the transaction and sends the result back to the loan approval team. Based on the results, the approval team can decide on whether to approve a loan or not, and also decide how much they can lend, timelines, and other factors.
@ -350,6 +413,9 @@ Figure 20 shows an architecture for predicting credit risk by using DL on IBM Z.
 Figure 20 Architecture for credit risk prediction by using DL on IBM Z
 <!-- image -->
 Figure 20 Architecture for credit risk prediction by using DL on IBM Z
 Data scientists can start creating and training a DL AI model by using a Jupyter Notebook instance and Watson Studio. Then, they can deploy the model by using WML on CP4D running on IBM Z, which provides an endpoint. Other applications, including the IBM WebSphere server, can produce credit risk results by using the model's endpoint.
 In summary, here are some considerations for developing real-time AI models, such as credit risk assessment:
@ -386,6 +452,9 @@ Figure 21 provides a high-level diagram of a clearing and settlement use case fo
 Figure 21 Clearing and settlement use case for financial transactions by using Cloud Pak for Data
 <!-- image -->
 Figure 21 Clearing and settlement use case for financial transactions by using Cloud Pak for Data
 Here are the steps of the high-level process flow:
 1. Create a connection to a database (for example, an IBM Db2fi database) where the historical data will be used for ML model building.
@ -442,6 +511,9 @@ Figure 22 provides an overview of the inferencing architecture for the RUL of an
 Figure 22 Inferencing architecture on IBM Z
 <!-- image -->
 Figure 22 Inferencing architecture on IBM Z
 Because we are looking into data-driven model development, the data set of our target is the run-to-failure data of the engine. We are looking into a supervised learning problem, and we use regression techniques to learn from the data. DL techniques such as Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU) are our choice because we are looking into a time series data set. TensorFlow or PyTorch frameworks are leveraged to create models. AI governance monitors the data and model drift to maintain the model quality throughout the model's life.
 Open-source data from NASA was used to build the AI model, which then was deployed on CP4D. CP4D enables the data-scientist's journey from modeling to deployment in a seamless process. Data engineers leverage Db2 to host the data set, which includes the training, testing, and validation of a data set. Since data is hosted on Db2, you can expect low latency while retrieving the data and serve data security needs because Db2 is hosted on the IBM Z platform. Data is fetched by the data refinery to do the necessary pre-processing and data imputations. You can use the programming languages Golang or C++ for real-time predictions, depending on customer needs. For more information about this topic, see "Use case 3: Clearing and settlement" on page 25.
@ -462,6 +534,9 @@ Figure 23 on page 29 provides a more in-depth view of the architecture of an AI-
 Figure 23 In-depth architectural view
 <!-- image -->
 Figure 23 In-depth architectural view
 In summary, consider the following points while developing an AI-based predictive maintenance application:
 GLYPH<SM590000> CP4D offers a Python run time to build a custom solution stack, but also supports different components like Watson Studio, WML, Db2, Data Refinery, OpenScale, AI Factsheets, and OpenPages.
@ -502,6 +577,9 @@ S
 Figure 24 Architecture for AI-powered video analytics
 <!-- image -->
 Figure 24 Architecture for AI-powered video analytics
 Live camera feeds or recorded videos of an infant's movement are the inputs for a pose detection model. This video streaming data was stored in IBM Cloudfi Object Storage for image processing. Video data must be transformed into frames so that the infant's body poses can be detected. These post-estimation components of the pipeline predict the location of all 17-person key points with 3 degrees of freedom each (x, y location and visibility) plus two virtual alignment key points. This approach also embraces a compute-intensive heat map prediction of infant body posture.
 When changes in body posture or movement happen, analytics can be performed, and a threshold can be set for the angle of the body and posture movements. An analysis can be performed on movement that is based on that threshold to help to predict an infant's health index in the output video stream by leveraging the IBM z16 on-chip AI acceleration, which provides an execution speed in real time on an edge device, which cannot be achieved by other means.
@ -640,10 +718,14 @@ UNIX is a registered trademark of The Open Group in the United States and other
 Other company, product, or service names may be trademarks or service marks of others.
 <!-- image -->
 Back cover
 REDP-5695-00
 ISBN 0738461067
-Printed in U.S.A.
+Printed in U.S.A.
 <!-- image -->