feat(xml-jats): parse XML JATS documents (#967)

* chore(xml-jats): separate authors and affiliations

In XML PubMed (JATS) backend, convert authors and affiliations as they
are typically rendered on PDFs.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* fix(xml-jats): replace new line character by a space

Instead of removing new line character from text, replace it by a space character.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* feat(xml-jats): improve existing parser and extend features

Partially support lists, respect reading order, parse more sections, support equations, better text formatting.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore(xml-jats): rename PubMed objects to JATS

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
This commit is contained in:
Cesar Berrospi Ramis
2025-02-17 10:43:31 +01:00
committed by GitHub
parent e1436a8b05
commit 428b656793
35 changed files with 13688 additions and 30671 deletions

View File

@@ -1,165 +1,149 @@
item-0 at level 0: unspecified: group _root_
item-1 at level 1: title: KRAB-zinc finger protein gene ex ... retrotransposons in the murine lineage
item-2 at level 2: paragraph: Wolf Gernot; 1: The Eunice Kenne ... tes of Health: Bethesda: United States
item-3 at level 2: section_header: Abstract
item-4 at level 3: text: The Krüppel-associated box zinc ... edundant role restricting TE activity.
item-5 at level 2: section_header: Introduction
item-6 at level 3: text: Nearly half of the human and mou ... s are active beyond early development.
item-7 at level 3: text: TEs, especially long terminal re ... f evolutionarily young KRAB-ZFP genes.
item-8 at level 2: section_header: Results
item-9 at level 3: section_header: Mouse KRAB-ZFPs target retrotransposons
item-10 at level 4: text: We analyzed the RNA expression p ... duplications (Kauzlaric et al., 2017).
item-11 at level 4: text: To determine the binding sites o ... ctive in the early embryo (Figure 1A).
item-12 at level 4: text: We generally observed that KRAB- ... responsible for this silencing effect.
item-13 at level 4: text: To further test the hypothesis t ... t easily evade repression by mutation.
item-14 at level 4: text: Our KRAB-ZFP ChIP-seq dataset al ... ntirely shift the mode of DNA binding.
item-15 at level 3: section_header: Genetic deletion of KRAB-ZFP gen ... leads to retrotransposon reactivation
item-16 at level 4: text: The majority of KRAB-ZFP genes a ... ung et al., 2014; Deniz et al., 2018).
item-17 at level 3: section_header: KRAB-ZFP cluster deletions license TE-borne enhancers
item-18 at level 4: text: We next used our RNA-seq dataset ... vating effects of TEs on nearby genes.
item-19 at level 4: text: While we generally observed that ... he internal region and not on the LTR.
item-20 at level 3: section_header: ETn retrotransposition in Chr4-cl KO and WT mice
item-21 at level 4: text: IAP, ETn/ETnERV and MuLV/RLTR4 r ... s may contribute to reduced viability.
item-22 at level 4: text: We reasoned that retrotransposon ... Tn insertions at a high recovery rate.
item-23 at level 4: text: Using this dataset, we first con ... nsertions in our pedigree (Figure 4A).
item-24 at level 4: text: To validate some of the novel ET ... ess might have truncated this element.
item-25 at level 4: text: Besides novel ETn insertions tha ... tions (Figure 4—figure supplement 3D).
item-26 at level 4: text: Finally, we asked whether there ... s clearly also play an important role.
item-27 at level 2: section_header: Discussion
item-28 at level 3: text: C2H2 zinc finger proteins, about ... ) depending upon their insertion site.
item-29 at level 3: text: Despite a lack of widespread ETn ... ion of the majority of KRAB-ZFP genes.
item-30 at level 2: section_header: Materials and methods
item-31 at level 3: section_header: Cell lines and transgenic mice
item-32 at level 4: text: Mouse ES cells and F9 EC cells w ... KO/KO and KO/WT (B6/129 F2) offspring.
item-33 at level 3: section_header: Generation of KRAB-ZFP expressing cell lines
item-34 at level 4: text: KRAB-ZFP ORFs were PCR-amplified ... led and further expanded for ChIP-seq.
item-35 at level 3: section_header: CRISPR/Cas9 mediated deletion of KRAB-ZFP clusters and an MMETn insertion
item-36 at level 4: text: All gRNAs were expressed from th ... PCR genotyping (Supplementary file 3).
item-37 at level 3: section_header: ChIP-seq analysis
item-38 at level 4: text: For ChIP-seq analysis of KRAB-ZF ... 010 or Khil et al., 2012 respectively.
item-39 at level 4: text: ChIP-seq libraries were construc ... were re-mapped using Bowtie (--best).
item-40 at level 3: section_header: Luciferase reporter assays
item-41 at level 4: text: For KRAB-ZFP repression assays, ... after transfection as described above.
item-42 at level 3: section_header: RNA-seq analysis
item-43 at level 4: text: Whole RNA was purified using RNe ... lemented in the R function p.adjust().
item-44 at level 3: section_header: Reduced representation bisulfite sequencing (RRBS-seq)
item-45 at level 4: text: For RRBS-seq analysis, Chr4-cl W ... h sample were considered for analysis.
item-46 at level 3: section_header: Retrotransposition assay
item-47 at level 4: text: The retrotransposition vectors p ... were stained with Amido Black (Sigma).
item-48 at level 3: section_header: Capture-seq screen
item-49 at level 4: text: To identify novel retrotransposo ... assembly using the Unicycler software.
item-50 at level 2: section_header: Tables
item-51 at level 3: table with [9x5]
item-51 at level 4: caption: Table 1.: * Number of protein-coding KRAB-ZFP genes identified in a previously published screen (Imbeault et al., 2017) and the ChIP-seq data column indicates the number of KRAB-ZFPs for which ChIP-seq was performed in this study.
item-52 at level 3: table with [31x5]
item-52 at level 4: caption: Key resources table:
item-53 at level 2: section_header: Figures
item-54 at level 3: picture
item-54 at level 4: caption: Figure 1.: Genome-wide binding patterns of mouse KRAB-ZFPs.
(A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value<1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p<1e-10, peak enrichment >20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white.
Figure 1—source data 1.KRAB-ZFP expression in 40 mouse tissues and cell lines (ENCODE).Mean values of replicates are shown as log2 transcripts per million.
Figure 1—source data 2.Probability heatmap of KRAB-ZFP binding to TEs.Values corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test).
item-55 at level 3: picture
item-55 at level 4: caption: Figure 1—figure supplement 1.: ES cell-specific expression of KRAB-ZFP gene clusters.
(A) Heatmap showing expression patterns of mouse KRAB-ZFPs in 40 mouse tissues and cell lines (ENCODE). Heatmap colors indicate gene expression levels in log2 transcripts per million (TPM). The asterisk indicates a group of 30 KRAB-ZFPs that are exclusively expressed in ES cells. (B) Physical location of the genes encoding for the 30 KRAB-ZFPs that are exclusively expressed in ES cells. (C) Phylogenetic (Maximum likelihood) tree of the KRAB domains of mouse KRAB-ZFPs. KRAB-ZFPs encoded on the gene clusters on chromosome 2 and 4 are highlighted. The scale bar at the bottom indicates amino acid substitutions per site.
item-56 at level 3: picture
item-56 at level 4: caption: Figure 1—figure supplement 2.: KRAB-ZFP binding motifs and their repression activity.
(A) Comparison of computationally predicted (bottom) and experimentally determined (top) KRAB-ZFP binding motifs. Only significant pairs are shown (FDR < 0.1). (B) Luciferase reporter assays to confirm KRAB-ZFP repression of the identified target sites. Bars show the luciferase activity (normalized to Renilla luciferase) of reporter plasmids containing the indicated target sites cloned upstream of the SV40 promoter. Reporter plasmids were co-transfected into 293 T cells with a Renilla luciferase plasmid for normalization and plasmids expressing the targeting KRAB-ZFP. Normalized mean luciferase activity (from three replicates) is shown relative to luciferase activity of the reporter plasmid co-transfected with an empty pcDNA3.1 vector.
item-57 at level 3: picture
item-57 at level 4: caption: Figure 1—figure supplement 3.: KRAB-ZFP binding to ETn retrotransposons.
(A) Comparison of the PBSLys1,2 sequence with Zfp961 binding motifs in nonrepetitive peaks (Nonrep) and peaks at ETn elements. (B) Retrotransposition assays of original (ETnI1-neoTNF and MusD2-neoTNF Ribet et al., 2004) and modified reporter vectors where the Rex2 or Gm13051 binding motifs where removed. Schematic of reporter vectors are displayed at the top. HeLa cells were transfected as described in the Materials and Methods section and neo-resistant colonies, indicating retrotransposition events, were selected and stained. (C) Stem-loop structure of the ETn RNA export signal, the Gm13051 motif on the corresponding DNA is marked with red circles, the part of the motif that was deleted is indicated with grey crosses (adapted from Legiewicz et al., 2010).
item-58 at level 3: picture
item-58 at level 4: caption: Figure 2.: Retrotransposon reactivation in KRAB-ZFP cluster KO ES cells.
(A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test.
Figure 2—source data 1.Differential H3K9me3 and KAP1 distribution in WT and KRAB-ZFP cluster KO ES cells at TE families and KRAB-ZFP bound TE insertions.Differential read counts and statistical testing were determined by DESeq2.
item-59 at level 3: picture
item-59 at level 4: caption: Figure 2—figure supplement 1.: Epigenetic changes at TEs and TE-borne enhancers in KRAB-ZFP cluster KO ES cells.
(A) Differential analysis of summative (all individual insertions combined) H3K9me3 enrichment at TE groups in Chr10-cl, Chr13.1-cl and Chr13.2-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in orange (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (B) Top: Schematic view of the Cd59a/Cd59b locus with a 5 truncated ETn insertion. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). Bottom: Transcriptional activity of a 5 kb fragment with or without fragments of the ETn insertion was tested by luciferase reporter assay in Chr4-cl WT and KO ES cells.
item-60 at level 3: picture
item-60 at level 4: caption: Figure 3.: TE-dependent gene activation in KRAB-ZFP cluster KO ES cells.
(A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value<0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value<0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5 truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p<0.01, Students t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines.
item-61 at level 3: picture
item-61 at level 4: caption: Figure 4.: ETn retrotransposition in Chr4-cl KO mice.
(A) Pedigree of mice used for transposon insertion screening by capture-seq in mice of different strain backgrounds. The number of novel ETn insertions (only present in one animal) are indicated. For animals whose direct ancestors have not been screened, the ETn insertions are shown in parentheses since parental inheritance cannot be excluded in that case. Germ line insertions are indicated by asterisks. All DNA samples were prepared from tail tissues unless noted (-S: spleen, -E: ear, -B:Blood) (B) Statistical analysis of ETn insertion frequency in tail tissue from 30 Chr4-cl KO, KO/WT and WT mice that were derived from one Chr4-c KO x KO/WT and two Chr4-cl KO/WT x KO/WT matings. Only DNA samples that were collected from juvenile tails were considered for this analysis. P-values were calculated using one-sided Wilcoxon Rank Sum Test. In the last panel, KO, WT and KO/WT mice derived from all matings were combined for the statistical analysis.
Figure 4—source data 1.Coordinates of identified novel ETn insertions and supporting capture-seq read counts.Genomic regions indicate cluster of supporting reads.
Figure 4—source data 2.Sequences of capture-seq probes used to enrich genomic DNA for ETn and MuLV (RLTR4) insertions.
item-62 at level 3: picture
item-62 at level 4: caption: Figure 4—figure supplement 1.: Birth statistics of KRAB-ZFP cluster KO mice and TE reactivation in adult tissues.
(A) Birth statistics of Chr4- and Chr2-cl mice derived from KO/WT x KO/WT matings in different strain backgrounds. (B) RNA-seq analysis of TE expression in Chr2- (left) and Chr4-cl (right) KO tissues. TE groups with the highest reactivation phenotype in ES cells are shown separately. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. Experiments were performed in at least two biological replicates.
item-63 at level 3: picture
item-63 at level 4: caption: Figure 4—figure supplement 2.: Identification of polymorphic ETn and MuLV retrotransposon insertions in Chr4-cl KO and WT mice.
Heatmaps show normalized capture-seq read counts in RPM (Read Per Million) for identified polymorphic ETn (A) and MuLV (B) loci in different mouse strains. Only loci with strong support for germ line ETn or MuLV insertions (at least 100 or 3000 ETn or MuLV RPM, respectively) in at least two animals are shown. Non-polymorphic insertion loci with high read counts in all screened mice were excluded for better visibility. The sample information (sample name and cell type/tissue) is annotated at the bottom, with the strain information indicated by color at the top. The color gradient indicates log10(RPM+1).
item-64 at level 3: picture
item-64 at level 4: caption: Figure 4—figure supplement 3.: Confirmation of novel ETn insertions identified by capture-seq.
(A) PCR validation of novel ETn insertions in genomic DNA of three littermates (IDs: T09673, T09674 and T00436) and their parents (T3913 and T3921). Primer sequences are shown in Supplementary file 3. (B) ETn capture-seq read counts (RPM) at putative novel somatic (loci identified exclusively in one single animal), novel germ line (loci identified in several littermates) insertions, and at B6 reference ETn elements. (C) Heatmap shows capture-seq read counts (RPM) of a Chr4-cl KO mouse (ID: C6733) as determined in different tissues. Each row represents a novel ETn locus that was identified in at least one tissue. The color gradient indicates log10(RPM+1). (D) Heatmap shows the capture-seq RPM in technical replicates using the same Chr4-cl KO DNA sample (rep1/rep2) or replicates with DNA samples prepared from different sections of the tail from the same mouse at different ages (tail1/tail2). Each row represents a novel ETn locus that was identified in at least one of the displayed samples. The color gradient indicates log10(RPM+1).
item-65 at level 2: section_header: References
item-66 at level 3: list: group list
item-67 at level 4: list_item: TL Bailey; M Boden; FA Buske; M ... arching. Nucleic Acids Research (2009)
item-68 at level 4: list_item: C Baust; L Gagnier; GJ Baillie; ... the mouse. Journal of Virology (2003)
item-69 at level 4: list_item: K Blaschke; KT Ebata; MM Karimi; ... -like state in ES cells. Nature (2013)
item-70 at level 4: list_item: A Brodziak; E Ziółko; M Muc-Wier ... erimental and Clinical Research (2012)
item-71 at level 4: list_item: N Castro-Diaz; G Ecco; A Colucci ... stem cells. Genes & Development (2014)
item-72 at level 4: list_item: EB Chuong; NC Elde; C Feschotte. ... ndogenous retroviruses. Science (2016)
item-73 at level 4: list_item: J Dan; Y Liu; N Liu; M Chiourea; ... n silencing. Developmental Cell (2014)
item-74 at level 4: list_item: A De Iaco; E Planet; A Coluccio; ... cental mammals. Nature Genetics (2017)
item-75 at level 4: list_item: Ö Deniz; L de la Rica; KCL Cheng ... onic stem cells. Genome Biology (2018)
item-76 at level 4: list_item: M Dewannieux; T Heidmann. Endoge ... rs. Current Opinion in Virology (2013)
item-77 at level 4: list_item: G Ecco; M Cassano; A Kauzlaric; ... ult tissues. Developmental Cell (2016)
item-78 at level 4: list_item: G Ecco; M Imbeault; D Trono. KRAB zinc finger proteins. Development (2017)
item-79 at level 4: list_item: JA Frank; C Feschotte. Co-option ... on. Current Opinion in Virology (2017)
item-80 at level 4: list_item: L Gagnier; VP Belancio; DL Mager ... ansposon insertions. Mobile DNA (2019)
item-81 at level 4: list_item: AC Groner; S Meylan; A Ciuffi; N ... omatin spreading. PLOS Genetics (2010)
item-82 at level 4: list_item: DC Hancks; HH Kazazian. Roles fo ... ns in human disease. Mobile DNA (2016)
item-83 at level 4: list_item: M Imbeault; PY Helleboid; D Tron ... ene regulatory networks. Nature (2017)
item-84 at level 4: list_item: FM Jacobs; D Greenberg; N Nguyen ... SVA/L1 retrotransposons. Nature (2014)
item-85 at level 4: list_item: H Kano; H Kurahashi; T Toda. Gen ... e dactylaplasia phenotype. PNAS (2007)
item-86 at level 4: list_item: MM Karimi; P Goyal; IA Maksakova ... cripts in mESCs. Cell Stem Cell (2011)
item-87 at level 4: list_item: A Kauzlaric; G Ecco; M Cassano; ... related genetic units. PLOS ONE (2017)
item-88 at level 4: list_item: PP Khil; F Smagulova; KM Brick; ... ction of ssDNA. Genome Research (2012)
item-89 at level 4: list_item: F Krueger; SR Andrews. Bismark: ... eq applications. Bioinformatics (2011)
item-90 at level 4: list_item: B Langmead; SL Salzberg. Fast ga ... t with bowtie 2. Nature Methods (2012)
item-91 at level 4: list_item: M Legiewicz; AS Zolotukhin; GR P ... Journal of Biological Chemistry (2010)
item-92 at level 4: list_item: JA Lehoczky; PE Thomas; KM Patri ... n Polypodia mice. PLOS Genetics (2013)
item-93 at level 4: list_item: D Leung; T Du; U Wagner; W Xie; ... methyltransferase Setdb1. PNAS (2014)
item-94 at level 4: list_item: J Lilue; AG Doran; IT Fiddes; M ... unctional loci. Nature Genetics (2018)
item-95 at level 4: list_item: S Liu; J Brind'Amour; MM Karimi; ... germ cells. Genes & Development (2014)
item-96 at level 4: list_item: MI Love; W Huber; S Anders. Mode ... ata with DESeq2. Genome Biology (2014)
item-97 at level 4: list_item: F Lugani; R Arora; N Papeta; A P ... short tail mouse. PLOS Genetics (2013)
item-98 at level 4: list_item: TS Macfarlan; WD Gifford; S Dris ... ous retrovirus activity. Nature (2012)
item-99 at level 4: list_item: IA Maksakova; MT Romanish; L Gag ... mouse germ line. PLOS Genetics (2006)
item-100 at level 4: list_item: T Matsui; D Leung; H Miyashita; ... methyltransferase ESET. Nature (2010)
item-101 at level 4: list_item: HS Najafabadi; S Mnaimneh; FW Sc ... y lexicon. Nature Biotechnology (2015)
item-102 at level 4: list_item: C Nellåker; TM Keane; B Yalcin; ... 8 mouse strains. Genome Biology (2012)
item-103 at level 4: list_item: H O'Geen; S Frietze; PJ Farnham. ... s. Methods in Molecular Biology (2010)
item-104 at level 4: list_item: A Patel; P Yang; M Tinkham; M Pr ... ndem zinc finger proteins. Cell (2018)
item-105 at level 4: list_item: D Ribet; M Dewannieux; T Heidman ... s-mobilization. Genome Research (2004)
item-106 at level 4: list_item: SR Richardson; P Gerdes; DJ Gerh ... d early embryo. Genome Research (2017)
item-107 at level 4: list_item: HM Rowe; J Jakobsson; D Mesnard; ... in embryonic stem cells. Nature (2010)
item-108 at level 4: list_item: HM Rowe; A Kapopoulou; A Corsino ... nic stem cells. Genome Research (2013)
item-109 at level 4: list_item: SN Schauer; PE Carreira; R Shukl ... carcinogenesis. Genome Research (2018)
item-110 at level 4: list_item: DC Schultz; K Ayyanathan; D Nego ... r proteins. Genes & Development (2002)
item-111 at level 4: list_item: K Semba; K Araki; K Matsumoto; H ... short tail mice. PLOS Genetics (2013)
item-112 at level 4: list_item: SP Sripathy; J Stevens; DC Schul ... Molecular and Cellular Biology (2006)
item-113 at level 4: list_item: JH Thomas; S Schneider. Coevolut ... c finger genes. Genome Research (2011)
item-114 at level 4: list_item: PJ Thompson; TS Macfarlan; MC Lo ... tory repertoire. Molecular Cell (2016)
item-115 at level 4: list_item: RS Treger; SD Pope; Y Kong; M To ... irus expression SNERV. Immunity (2019)
item-116 at level 4: list_item: CN Vlangos; AN Siuniak; D Robins ... Ptf1a expression. PLOS Genetics (2013)
item-117 at level 4: list_item: J Wang; G Xie; M Singh; AT Ghanb ... s naive-like stem cells. Nature (2014)
item-118 at level 4: list_item: D Wolf; K Hug; SP Goff. TRIM28 m ... iruses in embryonic cells. PNAS (2008)
item-119 at level 4: list_item: G Wolf; D Greenberg; TS Macfarla ... ger protein family. Mobile DNA (2015a)
item-120 at level 4: list_item: G Wolf; P Yang; AC Füchtbauer; E ... roviruses. Genes & Development (2015b)
item-121 at level 4: list_item: M Yamauchi; B Freitag; C Khan; B ... silencers. Journal of Virology (1995)
item-122 at level 4: list_item: Y Zhang; T Liu; CA Meyer; J Eeck ... ChIP-Seq (MACS). Genome Biology (2008)
item-123 at level 1: caption: Table 1.: * Number of protein-co ... ChIP-seq was performed in this study.
item-124 at level 1: caption: Key resources table:
item-125 at level 1: caption: Figure 1.: Genome-wide binding p ... with TE groups (Fishers exact test).
item-126 at level 1: caption: Figure 1—figure supplement 1.: E ... tes amino acid substitutions per site.
item-127 at level 1: caption: Figure 1—figure supplement 2.: K ... sfected with an empty pcDNA3.1 vector.
item-128 at level 1: caption: Figure 1—figure supplement 3.: K ... (adapted from Legiewicz et al., 2010).
item-129 at level 1: caption: Figure 2.: Retrotransposon react ... cal testing were determined by DESeq2.
item-130 at level 1: caption: Figure 2—figure supplement 1.: E ... r assay in Chr4-cl WT and KO ES cells.
item-131 at level 1: caption: Figure 3.: TE-dependent gene act ... Gm13051 are indicated by dashed lines.
item-132 at level 1: caption: Figure 4.: ETn retrotranspositio ... A for ETn and MuLV (RLTR4) insertions.
item-133 at level 1: caption: Figure 4—figure supplement 1.: B ... in at least two biological replicates.
item-134 at level 1: caption: Figure 4—figure supplement 2.: I ... color gradient indicates log10(RPM+1).
item-135 at level 1: caption: Figure 4—figure supplement 3.: C ... color gradient indicates log10(RPM+1).
item-2 at level 2: paragraph: Gernot Wolf, Alberto de Iaco, Mi ... Ralls, Didier Trono, Todd S Macfarlan
item-3 at level 2: paragraph: The Eunice Kennedy Shriver Natio ... Lausanne (EPFL), Lausanne, Switzerland
item-4 at level 2: section_header: Abstract
item-5 at level 3: text: The Krüppel-associated box zinc ... edundant role restricting TE activity.
item-6 at level 2: section_header: Introduction
item-7 at level 3: text: Nearly half of the human and mou ... s are active beyond early development.
item-8 at level 3: text: TEs, especially long terminal re ... f evolutionarily young KRAB-ZFP genes.
item-9 at level 2: section_header: Results
item-10 at level 3: section_header: Mouse KRAB-ZFPs target retrotransposons
item-11 at level 4: text: We analyzed the RNA expression p ... duplications (Kauzlaric et al., 2017).
item-12 at level 4: text: To determine the binding sites o ... ctive in the early embryo (Figure 1A).
item-13 at level 4: picture
item-13 at level 5: caption: Figure 1. Genome-wide binding patterns of mouse KRAB-ZFPs. (A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value<1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p<1e-10, peak enrichment >20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white.
item-14 at level 4: table with [9x5]
item-14 at level 5: caption: Table 1. KRAB-ZFP genes clusters in the mouse genome that were investigated in this study. * Number of protein-coding KRAB-ZFP genes identified in a previously published screen (Imbeault et al., 2017) and the ChIP-seq data column indicates the number of KRAB-ZFPs for which ChIP-seq was performed in this study.
item-15 at level 4: text: We generally observed that KRAB- ... responsible for this silencing effect.
item-16 at level 4: text: To further test the hypothesis t ... t easily evade repression by mutation.
item-17 at level 4: text: Our KRAB-ZFP ChIP-seq dataset al ... ntirely shift the mode of DNA binding.
item-18 at level 3: section_header: Genetic deletion of KRAB-ZFP gen ... leads to retrotransposon reactivation
item-19 at level 4: text: The majority of KRAB-ZFP genes a ... ung et al., 2014; Deniz et al., 2018).
item-20 at level 4: picture
item-20 at level 5: caption: Figure 2. Retrotransposon reactivation in KRAB-ZFP cluster KO ES cells. (A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test.
item-21 at level 3: section_header: KRAB-ZFP cluster deletions license TE-borne enhancers
item-22 at level 4: text: We next used our RNA-seq dataset ... vating effects of TEs on nearby genes.
item-23 at level 4: picture
item-23 at level 5: caption: Figure 3. TE-dependent gene activation in KRAB-ZFP cluster KO ES cells. (A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value<0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value<0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5 truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p<0.01, Students t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines.
item-24 at level 4: text: While we generally observed that ... he internal region and not on the LTR.
item-25 at level 3: section_header: ETn retrotransposition in Chr4-cl KO and WT mice
item-26 at level 4: text: IAP, ETn/ETnERV and MuLV/RLTR4 r ... s may contribute to reduced viability.
item-27 at level 4: text: We reasoned that retrotransposon ... Tn insertions at a high recovery rate.
item-28 at level 4: text: Using this dataset, we first con ... nsertions in our pedigree (Figure 4A).
item-29 at level 4: picture
item-29 at level 5: caption: Figure 4. ETn retrotransposition in Chr4-cl KO mice. (A) Pedigree of mice used for transposon insertion screening by capture-seq in mice of different strain backgrounds. The number of novel ETn insertions (only present in one animal) are indicated. For animals whose direct ancestors have not been screened, the ETn insertions are shown in parentheses since parental inheritance cannot be excluded in that case. Germ line insertions are indicated by asterisks. All DNA samples were prepared from tail tissues unless noted (-S: spleen, -E: ear, -B:Blood) (B) Statistical analysis of ETn insertion frequency in tail tissue from 30 Chr4-cl KO, KO/WT and WT mice that were derived from one Chr4-c KO x KO/WT and two Chr4-cl KO/WT x KO/WT matings. Only DNA samples that were collected from juvenile tails were considered for this analysis. P-values were calculated using one-sided Wilcoxon Rank Sum Test. In the last panel, KO, WT and KO/WT mice derived from all matings were combined for the statistical analysis.
item-30 at level 4: text: To validate some of the novel ET ... ess might have truncated this element.
item-31 at level 4: text: Besides novel ETn insertions tha ... tions (Figure 4—figure supplement 3D).
item-32 at level 4: text: Finally, we asked whether there ... s clearly also play an important role.
item-33 at level 2: section_header: Discussion
item-34 at level 3: text: C2H2 zinc finger proteins, about ... ) depending upon their insertion site.
item-35 at level 3: text: Despite a lack of widespread ETn ... ion of the majority of KRAB-ZFP genes.
item-36 at level 2: section_header: Materials and methods
item-37 at level 3: table with [31x5]
item-37 at level 4: caption: Key resources table
item-38 at level 3: section_header: Cell lines and transgenic mice
item-39 at level 4: text: Mouse ES cells and F9 EC cells w ... KO/KO and KO/WT (B6/129 F2) offspring.
item-40 at level 3: section_header: Generation of KRAB-ZFP expressing cell lines
item-41 at level 4: text: KRAB-ZFP ORFs were PCR-amplified ... led and further expanded for ChIP-seq.
item-42 at level 3: section_header: CRISPR/Cas9 mediated deletion of KRAB-ZFP clusters and an MMETn insertion
item-43 at level 4: text: All gRNAs were expressed from th ... PCR genotyping (Supplementary file 3).
item-44 at level 3: section_header: ChIP-seq analysis
item-45 at level 4: text: For ChIP-seq analysis of KRAB-ZF ... 010 or Khil et al., 2012 respectively.
item-46 at level 4: text: ChIP-seq libraries were construc ... were re-mapped using Bowtie (--best).
item-47 at level 3: section_header: Luciferase reporter assays
item-48 at level 4: text: For KRAB-ZFP repression assays, ... after transfection as described above.
item-49 at level 3: section_header: RNA-seq analysis
item-50 at level 4: text: Whole RNA was purified using RNe ... lemented in the R function p.adjust().
item-51 at level 3: section_header: Reduced representation bisulfite sequencing (RRBS-seq)
item-52 at level 4: text: For RRBS-seq analysis, Chr4-cl W ... h sample were considered for analysis.
item-53 at level 3: section_header: Retrotransposition assay
item-54 at level 4: text: The retrotransposition vectors p ... were stained with Amido Black (Sigma).
item-55 at level 3: section_header: Capture-seq screen
item-56 at level 4: text: To identify novel retrotransposo ... assembly using the Unicycler software.
item-57 at level 2: section_header: Funding Information
item-58 at level 3: text: This paper was supported by the following grants:
item-59 at level 3: list: group list
item-60 at level 4: list_item: http://dx.doi.org/10.13039/10000 ... ment 1ZIAHD008933 to Todd S Macfarlan.
item-61 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... ndation 310030_152879 to Didier Trono.
item-62 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... dation 310030B_173337 to Didier Trono.
item-63 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... ch Council No. 268721 to Didier Trono.
item-64 at level 4: list_item: http://dx.doi.org/10.13039/50110 ... rch Council No 694658 to Didier Trono.
item-65 at level 2: section_header: Acknowledgements
item-66 at level 3: text: We thank Alex Grinberg, Jeanne Y ... 268721; Transpos-X, No. 694658) (DT).
item-67 at level 2: section_header: Additional information
item-68 at level 2: section_header: Additional files
item-69 at level 2: section_header: Data availability
item-70 at level 3: text: All NGS data has been deposited ... GenBank database (MH449667- MH449669).
item-71 at level 3: text: The following datasets were generated:
item-72 at level 3: text: Wolf G. Retrotransposon reactiva ... ession Omnibus (2019). NCBI: GSE115291
item-73 at level 3: text: Wolf G. Mus musculus musculus st ... e. NCBI GenBank (2019). NCBI: MH449667
item-74 at level 3: text: Wolf G. Mus musculus musculus st ... e. NCBI GenBank (2019). NCBI: MH449668
item-75 at level 3: text: Wolf G. Mus musculus musculus st ... e. NCBI GenBank (2019). NCBI: MH449669
item-76 at level 3: text: The following previously published datasets were used:
item-77 at level 3: text: Castro-Diaz N, Ecco G, Coluccio ... ssion Omnibus (2014). NCBI: GSM1406445
item-78 at level 3: text: Andrew ZX. H3K9me3_ChIPSeq (Ctrl ... ssion Omnibus (2014). NCBI: GSM1327148
item-79 at level 2: section_header: References
item-80 at level 3: list: group list
item-81 at level 4: list_item: Bailey TL, Boden M, Buske FA, Fr ... OI: 10.1093/nar/gkp335, PMID: 19458158
item-82 at level 4: list_item: Baust C, Gagnier L, Baillie GJ, ... 77.21.11448-11458.2003, PMID: 14557630
item-83 at level 4: list_item: Blaschke K, Ebata KT, Karimi MM, ... I: 10.1038/nature12362, PMID: 23812591
item-84 at level 4: list_item: Brodziak A, Ziółko E, Muc-Wierzg ... I: 10.12659/msm.882892, PMID: 22648263
item-85 at level 4: list_item: Castro-Diaz N, Ecco G, Coluccio ... 10.1101/gad.241661.114, PMID: 24939876
item-86 at level 4: list_item: Chuong EB, Elde NC, Feschotte C. ... 0.1126/science.aad5497, PMID: 26941318
item-87 at level 4: list_item: Dan J, Liu Y, Liu N, Chiourea M, ... 6/j.devcel.2014.03.004, PMID: 24735877
item-88 at level 4: list_item: De Iaco A, Planet E, Coluccio A, ... . DOI: 10.1038/ng.3858, PMID: 28459456
item-89 at level 4: list_item: Deniz Ö, de la Rica L, Cheng KCL ... 1186/s13059-017-1376-y, PMID: 29351814
item-90 at level 4: list_item: Dewannieux M, Heidmann T. Endoge ... 6/j.coviro.2013.08.005, PMID: 24004725
item-91 at level 4: list_item: Ecco G, Cassano M, Kauzlaric A, ... 6/j.devcel.2016.02.024, PMID: 27003935
item-92 at level 4: list_item: Ecco G, Imbeault M, Trono D. KRA ... OI: 10.1242/dev.132605, PMID: 28765213
item-93 at level 4: list_item: Frank JA, Feschotte C. Co-option ... 6/j.coviro.2017.07.021, PMID: 28818736
item-94 at level 4: list_item: Gagnier L, Belancio VP, Mager DL ... 1186/s13100-019-0157-4, PMID: 31011371
item-95 at level 4: list_item: Groner AC, Meylan S, Ciuffi A, Z ... 1/journal.pgen.1000869, PMID: 20221260
item-96 at level 4: list_item: Hancks DC, Kazazian HH. Roles fo ... 1186/s13100-016-0065-9, PMID: 27158268
item-97 at level 4: list_item: Imbeault M, Helleboid PY, Trono ... I: 10.1038/nature21683, PMID: 28273063
item-98 at level 4: list_item: Jacobs FM, Greenberg D, Nguyen N ... I: 10.1038/nature13760, PMID: 25274305
item-99 at level 4: list_item: Kano H, Kurahashi H, Toda T. Gen ... 0.1073/pnas.0705483104, PMID: 17984064
item-100 at level 4: list_item: Karimi MM, Goyal P, Maksakova IA ... 016/j.stem.2011.04.004, PMID: 21624812
item-101 at level 4: list_item: Kauzlaric A, Ecco G, Cassano M, ... 1/journal.pone.0173746, PMID: 28334004
item-102 at level 4: list_item: Khil PP, Smagulova F, Brick KM, ... 10.1101/gr.130583.111, PMID: 22367190
item-103 at level 4: list_item: Krueger F, Andrews SR. Bismark: ... /bioinformatics/btr167, PMID: 21493656
item-104 at level 4: list_item: Langmead B, Salzberg SL. Fast ga ... OI: 10.1038/nmeth.1923, PMID: 22388286
item-105 at level 4: list_item: Legiewicz M, Zolotukhin AS, Pilk ... 0.1074/jbc.M110.182840, PMID: 20978285
item-106 at level 4: list_item: Lehoczky JA, Thomas PE, Patrie K ... 1/journal.pgen.1003967, PMID: 24339789
item-107 at level 4: list_item: Leung D, Du T, Wagner U, Xie W, ... 0.1073/pnas.1322273111, PMID: 24757056
item-108 at level 4: list_item: Lilue J, Doran AG, Fiddes IT, Ab ... 1038/s41588-018-0223-8, PMID: 30275530
item-109 at level 4: list_item: Liu S, Brind'Amour J, Karimi MM, ... 10.1101/gad.244848.114, PMID: 25228647
item-110 at level 4: list_item: Love MI, Huber W, Anders S. Mode ... 1186/s13059-014-0550-8, PMID: 25516281
item-111 at level 4: list_item: Lugani F, Arora R, Papeta N, Pat ... 1/journal.pgen.1003206, PMID: 23437001
item-112 at level 4: list_item: Macfarlan TS, Gifford WD, Drisco ... I: 10.1038/nature11244, PMID: 22722858
item-113 at level 4: list_item: Maksakova IA, Romanish MT, Gagni ... 1/journal.pgen.0020002, PMID: 16440055
item-114 at level 4: list_item: Matsui T, Leung D, Miyashita H, ... I: 10.1038/nature08858, PMID: 20164836
item-115 at level 4: list_item: Najafabadi HS, Mnaimneh S, Schmi ... DOI: 10.1038/nbt.3128, PMID: 25690854
item-116 at level 4: list_item: Nellåker C, Keane TM, Yalcin B, ... .1186/gb-2012-13-6-r45, PMID: 22703977
item-117 at level 4: list_item: O'Geen H, Frietze S, Farnham PJ. ... 7/978-1-60761-753-2_27, PMID: 20680851
item-118 at level 4: list_item: Patel A, Yang P, Tinkham M, Prad ... 016/j.cell.2018.02.058, PMID: 29551271
item-119 at level 4: list_item: Ribet D, Dewannieux M, Heidmann ... OI: 10.1101/gr.2924904, PMID: 15479948
item-120 at level 4: list_item: Richardson SR, Gerdes P, Gerhard ... 10.1101/gr.219022.116, PMID: 28483779
item-121 at level 4: list_item: Rowe HM, Jakobsson J, Mesnard D, ... I: 10.1038/nature08674, PMID: 20075919
item-122 at level 4: list_item: Rowe HM, Kapopoulou A, Corsinott ... 10.1101/gr.147678.112, PMID: 23233547
item-123 at level 4: list_item: Schauer SN, Carreira PE, Shukla ... 10.1101/gr.226993.117, PMID: 29643204
item-124 at level 4: list_item: Schultz DC, Ayyanathan K, Negore ... OI: 10.1101/gad.973302, PMID: 11959841
item-125 at level 4: list_item: Semba K, Araki K, Matsumoto K, S ... 1/journal.pgen.1003204, PMID: 23436999
item-126 at level 4: list_item: Sripathy SP, Stevens J, Schultz ... : 10.1128/MCB.00487-06, PMID: 16954381
item-127 at level 4: list_item: Thomas JH, Schneider S. Coevolut ... 10.1101/gr.121749.111, PMID: 21784874
item-128 at level 4: list_item: Thompson PJ, Macfarlan TS, Lorin ... 6/j.molcel.2016.03.029, PMID: 27259207
item-129 at level 4: list_item: Treger RS, Pope SD, Kong Y, Toku ... 6/j.immuni.2018.12.022, PMID: 30709743
item-130 at level 4: list_item: Vlangos CN, Siuniak AN, Robinson ... 1/journal.pgen.1003205, PMID: 23437000
item-131 at level 4: list_item: Wang J, Xie G, Singh M, Ghanbari ... I: 10.1038/nature13804, PMID: 25317556
item-132 at level 4: list_item: Wolf D, Hug K, Goff SP. TRIM28 m ... 0.1073/pnas.0805540105, PMID: 18713861
item-133 at level 4: list_item: Wolf G, Greenberg D, Macfarlan T ... 1186/s13100-015-0050-8, PMID: 26435754
item-134 at level 4: list_item: Wolf G, Yang P, Füchtbauer AC, F ... 10.1101/gad.252767.114, PMID: 25737282
item-135 at level 4: list_item: Yamauchi M, Freitag B, Khan C, B ... JVI.69.2.1142-1149.1995, PMID: 7529329
item-136 at level 4: list_item: Zhang Y, Liu T, Meyer CA, Eeckho ... .1186/gb-2008-9-9-r137, PMID: 18798982
item-137 at level 1: caption: Figure 1. Genome-wide binding pa ... onsensus fingers highlighted in white.
item-138 at level 1: caption: Table 1. KRAB-ZFP genes clusters ... ChIP-seq was performed in this study.
item-139 at level 1: caption: Figure 2. Retrotransposon reacti ... s were calculated using paired t-test.
item-140 at level 1: caption: Figure 3. TE-dependent gene acti ... Gm13051 are indicated by dashed lines.
item-141 at level 1: caption: Figure 4. ETn retrotransposition ... combined for the statistical analysis.
item-142 at level 1: caption: Key resources table