mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295)
* Adding new latex symbols, simplifying how equations are added to text Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Identify headers through inhenrited style Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Log warning message instead of print Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Adding new latex symbols, simplifying how equations are added to text Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Identify headers through inhenrited style Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Log warning message instead of print Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) fix wrong type text extracted by tesseract_ocr_cli_model Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com> Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * fix(docx): Improve text parsing (#1268) * chore: bump version to 2.28.4 [skip ci] Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Improve text parsing Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) fix wrong type text extracted by tesseract_ocr_cli_model Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com> Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Flexibilize heading detection Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Fix trailing space Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Remove trailing space Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> --------- Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Guilhem VERMOREL <83694424+guilhemvermorel@users.noreply.github.com> Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * docs: add visual grounding example (#1270) Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * feat(docx): add text formatting and hyperlink support (#630) * feat: Enable markdown text formatting for docx Signed-off-by: SimJeg <sjegou@nvidia.com> * Fix imports Signed-off-by: SimJeg <sjegou@nvidia.com> * Use Formatting Signed-off-by: SimJeg <sjegou@nvidia.com> * Handle hyperlink Signed-off-by: SimJeg <sjegou@nvidia.com> * Handle formatting properly for DocItemLabel.PARAGRAPH Signed-off-by: SimJeg <sjegou@nvidia.com> * Use inline group Signed-off-by: SimJeg <sjegou@nvidia.com> * Handle bullet lists Signed-off-by: SimJeg <sjegou@nvidia.com> * Strip elements Signed-off-by: SimJeg <sjegou@nvidia.com> * Strip elements Signed-off-by: SimJeg <sjegou@nvidia.com> * Run black and mypy Signed-off-by: SimJeg <sjegou@nvidia.com> * Handle header and footer Signed-off-by: SimJeg <sjegou@nvidia.com> * Use inline_fmt everywhere Signed-off-by: SimJeg <sjegou@nvidia.com> * Run precommit Signed-off-by: SimJeg <sjegou@nvidia.com> * Address feedback Signed-off-by: SimJeg <sjegou@nvidia.com> * Fix add_list_item Signed-off-by: SimJeg <sjegou@nvidia.com> * fix minor bugs, mark helper methods internal Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: SimJeg <sjegou@nvidia.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Co-authored-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * fix(pptx): check if picture shape has an image attached (#1316) Check if picture shape has an image attached in pptx backend Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * chore: update lock file (#1315) chore: update lock Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * docs: add plugins docs (#1319) add plugin docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * feat: handle <code> tags as code blocks (#1320) handle <code> tags as code blocks Signed-off-by: FernandoSSI <fernandosi2005@gmail.com> Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Adding new latex symbols, simplifying how equations are added to text Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Identify headers through inhenrited style Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Log warning message instead of print Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Adding new latex symbols, simplifying how equations are added to text Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> --------- Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Signed-off-by: SimJeg <sjegou@nvidia.com> Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: FernandoSSI <fernandosi2005@gmail.com> Co-authored-by: Guilhem VERMOREL <83694424+guilhemvermorel@users.noreply.github.com> Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Simon Jégou <SimJeg@users.noreply.github.com> Co-authored-by: Panos Vagenas <pva@zurich.ibm.com> Co-authored-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Fernando Santos <121275806+FernandoSSI@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
0499cd1c1e
commit
14e9c0ce9a
@@ -196,8 +196,8 @@
|
||||
"content_layer": "body",
|
||||
"label": "formula",
|
||||
"prov": [],
|
||||
"orig": "A= \\pi r^{2} ",
|
||||
"text": "A= \\pi r^{2} "
|
||||
"orig": "A= \\pi r^{2}",
|
||||
"text": "A= \\pi r^{2}"
|
||||
},
|
||||
{
|
||||
"self_ref": "#/texts/2",
|
||||
@@ -352,8 +352,8 @@
|
||||
"content_layer": "body",
|
||||
"label": "formula",
|
||||
"prov": [],
|
||||
"orig": "A= \\pi r^{2} ",
|
||||
"text": "A= \\pi r^{2} "
|
||||
"orig": "A= \\pi r^{2}",
|
||||
"text": "A= \\pi r^{2}"
|
||||
},
|
||||
{
|
||||
"self_ref": "#/texts/15",
|
||||
@@ -532,8 +532,8 @@
|
||||
"content_layer": "body",
|
||||
"label": "formula",
|
||||
"prov": [],
|
||||
"orig": "A= \\pi r^{2} ",
|
||||
"text": "A= \\pi r^{2} "
|
||||
"orig": "A= \\pi r^{2}",
|
||||
"text": "A= \\pi r^{2}"
|
||||
},
|
||||
{
|
||||
"self_ref": "#/texts/30",
|
||||
|
||||
Reference in New Issue
Block a user