mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-27 04:24:45 +00:00
* feat: add PATENT_USPTO as input format Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * feat: add USPTO backend parser Add a backend implementation to parse patent applications and grants from the United States Patent Office (USPTO). Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * refactor: change the name of the USPTO input format Change the name of the patent USPTO input format to show the typical format (XML). Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * refactor: address several input formats with same mime type Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * refactor: group XML backend parsers in a subfolder Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * chore: add safe initialization of PatentUsptoDocumentBackend Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> |
||
---|---|---|
.. | ||
docx | ||
groundtruth | ||
html | ||
md | ||
pptx | ||
uspto | ||
xlsx | ||
2203.01017v2.pdf | ||
2206.01062.pdf | ||
2305.03393v1-pg9-img.png | ||
2305.03393v1-pg9.pdf | ||
2305.03393v1.pdf | ||
redp5110_sampled.pdf | ||
test_01.asciidoc | ||
test_02.asciidoc |