fix(pypdfium2): Fix OCR bounding box misalignment caused by mismatched rotation metadata (#2039)

* Fix OCR bounding box misalignment caused by rotation metadata

Signed-off-by: AndrewTsai0406 <tsai247365@gmail.com>

* Add rotation-mismatch scanned pdf test case

Signed-off-by: AndrewTsai0406 <tsai247365@gmail.com>

* add ground truth for ocr_test_rotation_mismatch.pdf

Signed-off-by: AndrewTsai0406 <tsai247365@gmail.com>

* add ground truth for ocr_test_rotation_mismatch.pdf

Signed-off-by: AndrewTsai0406 <tsai247365@gmail.com>

* Updated test GT and merged from main

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix OCR test by excluding mismatched rotation example

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: AndrewTsai0406 <tsai247365@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
AndrewTsai0406
2025-09-01 23:22:43 +08:00
committed by GitHub
parent 9f4bc5b2f1
commit 4d94e38223
4 changed files with 45 additions and 3 deletions

Binary file not shown.