docling/tests/data/groundtruth/docling_v2/elife-56337.nxml.itxt
Cesar Berrospi Ramis e1e3053695
fix: fix HTML table parser and JATS backend bugs (#1948)
Fix a bug in parsing HTML tables in HTML backend.
Fix a bug in test file that prevented JATS backend tests.
Ensure that the JATS backend creates headings with the right level.
Remove unnecessary data files for testing JATS backend.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-07-16 10:49:24 +02:00

149 lines
19 KiB
Plaintext
Vendored
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

item-0 at level 0: unspecified: group _root_
item-1 at level 1: title: KRAB-zinc finger protein gene ex ... retrotransposons in the murine lineage
item-2 at level 2: paragraph: Gernot Wolf, Alberto de Iaco, Mi ... Ralls, Didier Trono, Todd S Macfarlan
item-3 at level 2: paragraph: The Eunice Kennedy Shriver Natio ... Lausanne (EPFL), Lausanne, Switzerland
item-4 at level 2: section_header: Abstract
item-5 at level 3: text: The Krüppel-associated box zinc ... edundant role restricting TE activity.
item-6 at level 2: section_header: Introduction
item-7 at level 3: text: Nearly half of the human and mou ... s are active beyond early development.
item-8 at level 3: text: TEs, especially long terminal re ... f evolutionarily young KRAB-ZFP genes.
item-9 at level 2: section_header: Results
item-10 at level 3: section_header: Mouse KRAB-ZFPs target retrotransposons
item-11 at level 4: text: We analyzed the RNA expression p ... duplications (Kauzlaric et al., 2017).
item-12 at level 4: text: To determine the binding sites o ... ctive in the early embryo (Figure 1A).
item-13 at level 4: picture
item-13 at level 5: caption: Figure 1. Genome-wide binding patterns of mouse KRAB-ZFPs. (A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value<1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p<1e-10, peak enrichment >20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white.
item-14 at level 4: table with [9x5]
item-14 at level 5: caption: Table 1. KRAB-ZFP genes clusters in the mouse genome that were investigated in this study. * Number of protein-coding KRAB-ZFP genes identified in a previously published screen (Imbeault et al., 2017) and the ChIP-seq data column indicates the number of KRAB-ZFPs for which ChIP-seq was performed in this study.
item-15 at level 4: text: We generally observed that KRAB- ... responsible for this silencing effect.
item-16 at level 4: text: To further test the hypothesis t ... t easily evade repression by mutation.
item-17 at level 4: text: Our KRAB-ZFP ChIP-seq dataset al ... ntirely shift the mode of DNA binding.
item-18 at level 3: section_header: Genetic deletion of KRAB-ZFP gen ... leads to retrotransposon reactivation
item-19 at level 4: text: The majority of KRAB-ZFP genes a ... ung et al., 2014; Deniz et al., 2018).
item-20 at level 4: picture
item-20 at level 5: caption: Figure 2. Retrotransposon reactivation in KRAB-ZFP cluster KO ES cells. (A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test.
item-21 at level 3: section_header: KRAB-ZFP cluster deletions license TE-borne enhancers
item-22 at level 4: text: We next used our RNA-seq dataset ... vating effects of TEs on nearby genes.
item-23 at level 4: picture
item-23 at level 5: caption: Figure 3. TE-dependent gene activation in KRAB-ZFP cluster KO ES cells. (A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value<0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value<0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5 truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p<0.01, Students t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines.
item-24 at level 4: text: While we generally observed that ... he internal region and not on the LTR.
item-25 at level 3: section_header: ETn retrotransposition in Chr4-cl KO and WT mice
item-26 at level 4: text: IAP, ETn/ETnERV and MuLV/RLTR4 r ... s may contribute to reduced viability.
item-27 at level 4: text: We reasoned that retrotransposon ... Tn insertions at a high recovery rate.
item-28 at level 4: text: Using this dataset, we first con ... nsertions in our pedigree (Figure 4A).
item-29 at level 4: picture
item-29 at level 5: caption: Figure 4. ETn retrotransposition in Chr4-cl KO mice. (A) Pedigree of mice used for transposon insertion screening by capture-seq in mice of different strain backgrounds. The number of novel ETn insertions (only present in one animal) are indicated. For animals whose direct ancestors have not been screened, the ETn insertions are shown in parentheses since parental inheritance cannot be excluded in that case. Germ line insertions are indicated by asterisks. All DNA samples were prepared from tail tissues unless noted (-S: spleen, -E: ear, -B:Blood) (B) Statistical analysis of ETn insertion frequency in tail tissue from 30 Chr4-cl KO, KO/WT and WT mice that were derived from one Chr4-c KO x KO/WT and two Chr4-cl KO/WT x KO/WT matings. Only DNA samples that were collected from juvenile tails were considered for this analysis. P-values were calculated using one-sided Wilcoxon Rank Sum Test. In the last panel, KO, WT and KO/WT mice derived from all matings were combined for the statistical analysis.
item-30 at level 4: text: To validate some of the novel ET ... ess might have truncated this element.
item-31 at level 4: text: Besides novel ETn insertions tha ... tions (Figure 4—figure supplement 3D).
item-32 at level 4: text: Finally, we asked whether there ... s clearly also play an important role.
item-33 at level 2: section_header: Discussion
item-34 at level 3: text: C2H2 zinc finger proteins, about ... ) depending upon their insertion site.
item-35 at level 3: text: Despite a lack of widespread ETn ... ion of the majority of KRAB-ZFP genes.
item-36 at level 2: section_header: Materials and methods
item-37 at level 3: table with [31x5]
item-37 at level 4: caption: Key resources table
item-38 at level 3: section_header: Cell lines and transgenic mice
item-39 at level 4: text: Mouse ES cells and F9 EC cells w ... KO/KO and KO/WT (B6/129 F2) offspring.
item-40 at level 3: section_header: Generation of KRAB-ZFP expressing cell lines
item-41 at level 4: text: KRAB-ZFP ORFs were PCR-amplified ... led and further expanded for ChIP-seq.
item-42 at level 3: section_header: CRISPR/Cas9 mediated deletion of KRAB-ZFP clusters and an MMETn insertion
item-43 at level 4: text: All gRNAs were expressed from th ... PCR genotyping (Supplementary file 3).
item-44 at level 3: section_header: ChIP-seq analysis
item-45 at level 4: text: For ChIP-seq analysis of KRAB-ZF ... 010 or Khil et al., 2012 respectively.
item-46 at level 4: text: ChIP-seq libraries were construc ... were re-mapped using Bowtie (--best).
item-47 at level 3: section_header: Luciferase reporter assays
item-48 at level 4: text: For KRAB-ZFP repression assays, ... after transfection as described above.
item-49 at level 3: section_header: RNA-seq analysis
item-50 at level 4: text: Whole RNA was purified using RNe ... lemented in the R function p.adjust().
item-51 at level 3: section_header: Reduced representation bisulfite sequencing (RRBS-seq)
item-52 at level 4: text: For RRBS-seq analysis, Chr4-cl W ... h sample were considered for analysis.
item-53 at level 3: section_header: Retrotransposition assay
item-54 at level 4: text: The retrotransposition vectors p ... were stained with Amido Black (Sigma).
item-55 at level 3: section_header: Capture-seq screen
item-56 at level 4: text: To identify novel retrotransposo ... assembly using the Unicycler software.
item-57 at level 2: section_header: Funding Information
item-58 at level 3: text: This paper was supported by the following grants:
item-59 at level 3: list: group list
item-60 at level 4: list_item: http://dx.doi.org/10.13039/10000 ... ment 1ZIAHD008933 to Todd S Macfarlan.
item-61 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... ndation 310030_152879 to Didier Trono.
item-62 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... dation 310030B_173337 to Didier Trono.
item-63 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... ch Council No. 268721 to Didier Trono.
item-64 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... rch Council No 694658 to Didier Trono.
item-65 at level 2: section_header: Acknowledgements
item-66 at level 3: text: We thank Alex Grinberg, Jeanne Y ... 268721; Transpos-X, No. 694658) (DT).
item-67 at level 2: section_header: Additional information
item-68 at level 2: section_header: Additional files
item-69 at level 2: section_header: Data availability
item-70 at level 3: text: All NGS data has been deposited ... GenBank database (MH449667- MH449669).
item-71 at level 3: text: The following datasets were generated:
item-72 at level 3: text: Wolf G. Retrotransposon reactiva ... ession Omnibus (2019). NCBI: GSE115291
item-73 at level 3: text: Wolf G. Mus musculus musculus st ... e. NCBI GenBank (2019). NCBI: MH449667
item-74 at level 3: text: Wolf G. Mus musculus musculus st ... e. NCBI GenBank (2019). NCBI: MH449668
item-75 at level 3: text: Wolf G. Mus musculus musculus st ... e. NCBI GenBank (2019). NCBI: MH449669
item-76 at level 3: text: The following previously published datasets were used:
item-77 at level 3: text: Castro-Diaz N, Ecco G, Coluccio ... ssion Omnibus (2014). NCBI: GSM1406445
item-78 at level 3: text: Andrew ZX. H3K9me3_ChIPSeq (Ctrl ... ssion Omnibus (2014). NCBI: GSM1327148
item-79 at level 2: section_header: References
item-80 at level 3: list: group list
item-81 at level 4: list_item: Bailey TL, Boden M, Buske FA, Fr ... OI: 10.1093/nar/gkp335, PMID: 19458158
item-82 at level 4: list_item: Baust C, Gagnier L, Baillie GJ, ... 77.21.11448-11458.2003, PMID: 14557630
item-83 at level 4: list_item: Blaschke K, Ebata KT, Karimi MM, ... I: 10.1038/nature12362, PMID: 23812591
item-84 at level 4: list_item: Brodziak A, Ziółko E, Muc-Wierzg ... I: 10.12659/msm.882892, PMID: 22648263
item-85 at level 4: list_item: Castro-Diaz N, Ecco G, Coluccio ... 10.1101/gad.241661.114, PMID: 24939876
item-86 at level 4: list_item: Chuong EB, Elde NC, Feschotte C. ... 0.1126/science.aad5497, PMID: 26941318
item-87 at level 4: list_item: Dan J, Liu Y, Liu N, Chiourea M, ... 6/j.devcel.2014.03.004, PMID: 24735877
item-88 at level 4: list_item: De Iaco A, Planet E, Coluccio A, ... . DOI: 10.1038/ng.3858, PMID: 28459456
item-89 at level 4: list_item: Deniz Ö, de la Rica L, Cheng KCL ... 1186/s13059-017-1376-y, PMID: 29351814
item-90 at level 4: list_item: Dewannieux M, Heidmann T. Endoge ... 6/j.coviro.2013.08.005, PMID: 24004725
item-91 at level 4: list_item: Ecco G, Cassano M, Kauzlaric A, ... 6/j.devcel.2016.02.024, PMID: 27003935
item-92 at level 4: list_item: Ecco G, Imbeault M, Trono D. KRA ... OI: 10.1242/dev.132605, PMID: 28765213
item-93 at level 4: list_item: Frank JA, Feschotte C. Co-option ... 6/j.coviro.2017.07.021, PMID: 28818736
item-94 at level 4: list_item: Gagnier L, Belancio VP, Mager DL ... 1186/s13100-019-0157-4, PMID: 31011371
item-95 at level 4: list_item: Groner AC, Meylan S, Ciuffi A, Z ... 1/journal.pgen.1000869, PMID: 20221260
item-96 at level 4: list_item: Hancks DC, Kazazian HH. Roles fo ... 1186/s13100-016-0065-9, PMID: 27158268
item-97 at level 4: list_item: Imbeault M, Helleboid PY, Trono ... I: 10.1038/nature21683, PMID: 28273063
item-98 at level 4: list_item: Jacobs FM, Greenberg D, Nguyen N ... I: 10.1038/nature13760, PMID: 25274305
item-99 at level 4: list_item: Kano H, Kurahashi H, Toda T. Gen ... 0.1073/pnas.0705483104, PMID: 17984064
item-100 at level 4: list_item: Karimi MM, Goyal P, Maksakova IA ... 016/j.stem.2011.04.004, PMID: 21624812
item-101 at level 4: list_item: Kauzlaric A, Ecco G, Cassano M, ... 1/journal.pone.0173746, PMID: 28334004
item-102 at level 4: list_item: Khil PP, Smagulova F, Brick KM, ... 10.1101/gr.130583.111, PMID: 22367190
item-103 at level 4: list_item: Krueger F, Andrews SR. Bismark: ... /bioinformatics/btr167, PMID: 21493656
item-104 at level 4: list_item: Langmead B, Salzberg SL. Fast ga ... OI: 10.1038/nmeth.1923, PMID: 22388286
item-105 at level 4: list_item: Legiewicz M, Zolotukhin AS, Pilk ... 0.1074/jbc.M110.182840, PMID: 20978285
item-106 at level 4: list_item: Lehoczky JA, Thomas PE, Patrie K ... 1/journal.pgen.1003967, PMID: 24339789
item-107 at level 4: list_item: Leung D, Du T, Wagner U, Xie W, ... 0.1073/pnas.1322273111, PMID: 24757056
item-108 at level 4: list_item: Lilue J, Doran AG, Fiddes IT, Ab ... 1038/s41588-018-0223-8, PMID: 30275530
item-109 at level 4: list_item: Liu S, Brind'Amour J, Karimi MM, ... 10.1101/gad.244848.114, PMID: 25228647
item-110 at level 4: list_item: Love MI, Huber W, Anders S. Mode ... 1186/s13059-014-0550-8, PMID: 25516281
item-111 at level 4: list_item: Lugani F, Arora R, Papeta N, Pat ... 1/journal.pgen.1003206, PMID: 23437001
item-112 at level 4: list_item: Macfarlan TS, Gifford WD, Drisco ... I: 10.1038/nature11244, PMID: 22722858
item-113 at level 4: list_item: Maksakova IA, Romanish MT, Gagni ... 1/journal.pgen.0020002, PMID: 16440055
item-114 at level 4: list_item: Matsui T, Leung D, Miyashita H, ... I: 10.1038/nature08858, PMID: 20164836
item-115 at level 4: list_item: Najafabadi HS, Mnaimneh S, Schmi ... DOI: 10.1038/nbt.3128, PMID: 25690854
item-116 at level 4: list_item: Nellåker C, Keane TM, Yalcin B, ... .1186/gb-2012-13-6-r45, PMID: 22703977
item-117 at level 4: list_item: O'Geen H, Frietze S, Farnham PJ. ... 7/978-1-60761-753-2_27, PMID: 20680851
item-118 at level 4: list_item: Patel A, Yang P, Tinkham M, Prad ... 016/j.cell.2018.02.058, PMID: 29551271
item-119 at level 4: list_item: Ribet D, Dewannieux M, Heidmann ... OI: 10.1101/gr.2924904, PMID: 15479948
item-120 at level 4: list_item: Richardson SR, Gerdes P, Gerhard ... 10.1101/gr.219022.116, PMID: 28483779
item-121 at level 4: list_item: Rowe HM, Jakobsson J, Mesnard D, ... I: 10.1038/nature08674, PMID: 20075919
item-122 at level 4: list_item: Rowe HM, Kapopoulou A, Corsinott ... 10.1101/gr.147678.112, PMID: 23233547
item-123 at level 4: list_item: Schauer SN, Carreira PE, Shukla ... 10.1101/gr.226993.117, PMID: 29643204
item-124 at level 4: list_item: Schultz DC, Ayyanathan K, Negore ... OI: 10.1101/gad.973302, PMID: 11959841
item-125 at level 4: list_item: Semba K, Araki K, Matsumoto K, S ... 1/journal.pgen.1003204, PMID: 23436999
item-126 at level 4: list_item: Sripathy SP, Stevens J, Schultz ... : 10.1128/MCB.00487-06, PMID: 16954381
item-127 at level 4: list_item: Thomas JH, Schneider S. Coevolut ... 10.1101/gr.121749.111, PMID: 21784874
item-128 at level 4: list_item: Thompson PJ, Macfarlan TS, Lorin ... 6/j.molcel.2016.03.029, PMID: 27259207
item-129 at level 4: list_item: Treger RS, Pope SD, Kong Y, Toku ... 6/j.immuni.2018.12.022, PMID: 30709743
item-130 at level 4: list_item: Vlangos CN, Siuniak AN, Robinson ... 1/journal.pgen.1003205, PMID: 23437000
item-131 at level 4: list_item: Wang J, Xie G, Singh M, Ghanbari ... I: 10.1038/nature13804, PMID: 25317556
item-132 at level 4: list_item: Wolf D, Hug K, Goff SP. TRIM28 m ... 0.1073/pnas.0805540105, PMID: 18713861
item-133 at level 4: list_item: Wolf G, Greenberg D, Macfarlan T ... 1186/s13100-015-0050-8, PMID: 26435754
item-134 at level 4: list_item: Wolf G, Yang P, Füchtbauer AC, F ... 10.1101/gad.252767.114, PMID: 25737282
item-135 at level 4: list_item: Yamauchi M, Freitag B, Khan C, B ... JVI.69.2.1142-1149.1995, PMID: 7529329
item-136 at level 4: list_item: Zhang Y, Liu T, Meyer CA, Eeckho ... .1186/gb-2008-9-9-r137, PMID: 18798982
item-137 at level 1: caption: Figure 1. Genome-wide binding pa ... onsensus fingers highlighted in white.
item-138 at level 1: caption: Table 1. KRAB-ZFP genes clusters ... ChIP-seq was performed in this study.
item-139 at level 1: caption: Figure 2. Retrotransposon reacti ... s were calculated using paired t-test.
item-140 at level 1: caption: Figure 3. TE-dependent gene acti ... Gm13051 are indicated by dashed lines.
item-141 at level 1: caption: Figure 4. ETn retrotransposition ... combined for the statistical analysis.
item-142 at level 1: caption: Key resources table