docling/docling
Cesar Berrospi Ramis 428b656793
feat(xml-jats): parse XML JATS documents (#967)
* chore(xml-jats): separate authors and affiliations

In XML PubMed (JATS) backend, convert authors and affiliations as they
are typically rendered on PDFs.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* fix(xml-jats): replace new line character by a space

Instead of removing new line character from text, replace it by a space character.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* feat(xml-jats): improve existing parser and extend features

Partially support lists, respect reading order, parse more sections, support equations, better text formatting.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore(xml-jats): rename PubMed objects to JATS

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-02-17 10:43:31 +01:00
..
backend feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
chunking feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
cli feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) 2025-02-12 15:18:01 +01:00
datamodel feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
models feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) 2025-02-12 15:18:01 +01:00
pipeline feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) 2025-02-12 15:18:01 +01:00
utils feat: Add content_layer property to items to address body, furniture and other roles (#735) 2025-02-10 12:07:49 +01:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
document_converter.py feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
exceptions.py feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) 2025-02-12 15:18:01 +01:00
py.typed fix: Add py.typed marker file (#531) 2024-12-06 13:42:14 +01:00