feat(xml-jats): parse XML JATS documents (#967)

* chore(xml-jats): separate authors and affiliations

In XML PubMed (JATS) backend, convert authors and affiliations as they
are typically rendered on PDFs.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* fix(xml-jats): replace new line character by a space

Instead of removing new line character from text, replace it by a space character.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* feat(xml-jats): improve existing parser and extend features

Partially support lists, respect reading order, parse more sections, support equations, better text formatting.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore(xml-jats): rename PubMed objects to JATS

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
This commit is contained in:
Cesar Berrospi Ramis
2025-02-17 10:43:31 +01:00
committed by GitHub
parent e1436a8b05
commit 428b656793
35 changed files with 13688 additions and 30671 deletions

View File

@@ -1,6 +1,8 @@
# KRAB-zinc finger protein gene expansion in response to active retrotransposons in the murine lineage
Wolf Gernot; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; de Iaco Alberto; 2: School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL): Lausanne: Switzerland; Sun Ming-An; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; Bruno Melania; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; Tinkham Matthew; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; Hoang Don; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; Mitra Apratim; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; Ralls Sherry; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States; Trono Didier; 2: School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL): Lausanne: Switzerland; Macfarlan Todd S; 1: The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health: Bethesda: United States
Gernot Wolf, Alberto de Iaco, Ming-An Sun, Melania Bruno, Matthew Tinkham, Don Hoang, Apratim Mitra, Sherry Ralls, Didier Trono, Todd S Macfarlan
The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health, Bethesda, United States; School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
## Abstract
@@ -20,6 +22,23 @@ We analyzed the RNA expression profiles of mouse KRAB-ZFPs across a wide range o
To determine the binding sites of the KRAB-ZFPs within these and other gene clusters, we expressed epitope-tagged KRAB-ZFPs using stably integrating vectors in mouse embryonic carcinoma (EC) or ES cells (Table 1, Supplementary file 1) and performed chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We then determined whether the identified binding sites are significantly enriched over annotated TEs and used the non-repetitive peak fraction to identify binding motifs. We discarded 7 of 68 ChIP-seq datasets because we could not obtain a binding motif or a target TE and manual inspection confirmed low signal to noise ratio. Of the remaining 61 KRAB-ZFPs, 51 significantly overlapped at least one TE subfamily (adjusted p-value&lt;1e-5). Altogether, 81 LTR retrotransposon, 18 LINE, 10 SINE and one DNA transposon subfamilies were targeted by at least one of the 51 KRAB-ZFPs (Figure 1A and Supplementary file 1). Chr2-cl KRAB-ZFPs preferably bound IAPEz retrotransposons and L1-type LINEs, while Chr4-cl KRAB-ZFPs targeted various retrotransposons, including the closely related MMETn (hereafter referred to as ETn) and ETnERV (also known as MusD) elements (Figure 1A). ETn elements are non-autonomous LTR retrotransposons that require trans-complementation by the fully coding ETnERV elements that contain Gag, Pro and Pol genes (Ribet et al., 2004). These elements have accumulated to ~240 and~100 copies in the reference C57BL/6 genome, respectively, with ~550 solitary LTRs (Baust et al., 2003). Both ETn and ETnERVs are still active, generating polymorphisms and mutations in several mouse strains (Gagnier et al., 2019). The validity of our ChIP-seq screen was confirmed by the identification of binding motifs - which often resembled the computationally predicted motifs (Figure 1—figure supplement 2A) - for the majority of screened KRAB-ZFPs (Supplementary file 1). Moreover, predicted and experimentally determined motifs were found in targeted TEs in most cases (Supplementary file 1), and reporter repression assays confirmed KRAB-ZFP induced silencing for all the tested sequences (Figure 1—figure supplement 2B). Finally, we observed KAP1 and H3K9me3 enrichment at most of the targeted TEs in wild type ES cells, indicating that most of these KRAB-ZFPs are functionally active in the early embryo (Figure 1A).
Figure 1. Genome-wide binding patterns of mouse KRAB-ZFPs. (A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value&lt;1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p&lt;1e-10, peak enrichment &gt;20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white.
<!-- image -->
Table 1. KRAB-ZFP genes clusters in the mouse genome that were investigated in this study. * Number of protein-coding KRAB-ZFP genes identified in a previously published screen (Imbeault et al., 2017) and the ChIP-seq data column indicates the number of KRAB-ZFPs for which ChIP-seq was performed in this study.
| Cluster | Location | Size (Mb) | # of KRAB-ZFPs* | ChIP-seq data |
|-----------|------------|-------------|-------------------|-----------------|
| Chr2 | Chr2 qH4 | 3.1 | 40 | 17 |
| Chr4 | Chr4 qE1 | 2.3 | 21 | 19 |
| Chr10 | Chr10 qC1 | 0.6 | 6 | 1 |
| Chr13.1 | Chr13 qB3 | 1.2 | 6 | 2 |
| Chr13.2 | Chr13 qB3 | 0.8 | 26 | 12 |
| Chr8 | Chr8 qB3.3 | 0.1 | 4 | 4 |
| Chr9 | Chr9 qA3 | 0.1 | 4 | 2 |
| Other | - | - | 248 | 4 |
We generally observed that KRAB-ZFPs present exclusively in mouse target TEs that are restricted to the mouse genome, indicating KRAB-ZFPs and their targets emerged together. For example, several mouse-specific KRAB-ZFPs in Chr2-cl and Chr4-cl target IAP and ETn elements which are only found in the mouse genome and are highly active. This is the strongest data to date supporting that recent KRAB-ZFP expansions in these young clusters is a response to recent TE activity. Likewise, ZFP599 and ZFP617, both conserved in Muroidea, bind to various ORR1-type LTRs which are present in the rat genome (Supplementary file 1). However, ZFP961, a KRAB-ZFP encoded on a small gene cluster on chromosome 8 that is conserved in Muroidea targets TEs that are only found in the mouse genome (e.g. ETn), a paradox we have previously observed with ZFP809, which also targets TEs that are evolutionarily younger than itself (Wolf et al., 2015b). The ZFP961 binding site is located at the 5 end of the internal region of ETn and ETnERV elements, a sequence that usually contains the primer binding site (PBS), which is required to prime retroviral reverse transcription. Indeed, the ZFP961 motif closely resembles the PBSLys1,2 (Figure 1—figure supplement 3A), which had been previously identified as a KAP1-dependent target of retroviral repression (Yamauchi et al., 1995; Wolf et al., 2008). Repression of the PBSLys1,2 by ZFP961 was also confirmed in reporter assays (Figure 1—figure supplement 2B), indicating that ZFP961 is likely responsible for this silencing effect.
To further test the hypothesis that KRAB-ZFPs target sites necessary for retrotransposition, we utilized previously generated ETn and ETnERV retrotransposition reporters in which we mutated KRAB-ZFP binding sites (Ribet et al., 2004). Whereas the ETnERV reporters are sufficient for retrotransposition, the ETn reporter requires ETnERV genes supplied in trans. We tested and confirmed that the REX2/ZFP600 and GM13051 binding sites within these TEs are required for efficient retrotransposition (Figure 1—figure supplement 3B). REX2 and ZFP600 both bind a target about 200 bp from the start of the internal region (Figure 1B), a region that often encodes the packaging signal. GM13051 binds a target coding for part of a highly structured mRNA export signal (Legiewicz et al., 2010) near the 3 end of the internal region of ETn (Figure 1—figure supplement 3C). Both signals are characterized by stem-loop intramolecular base-pairing in which a single mutation can disrupt loop formation. This indicates that at least some KRAB-ZFPs evolved to bind functionally essential target sequences which cannot easily evade repression by mutation.
@@ -30,10 +49,18 @@ Our KRAB-ZFP ChIP-seq dataset also provided unique insights into the emergence o
The majority of KRAB-ZFP genes are harbored in large, highly repetitive clusters that have formed by successive complex segmental duplications (Kauzlaric et al., 2017), rendering them inaccessible to conventional gene targeting. We therefore developed a strategy to delete entire KRAB-ZFP gene clusters in ES cells (including the Chr2-cl and Chr4-cl as well as two clusters on chromosome 13 and a cluster on chromosome 10) using two CRISPR/Cas9 gRNAs targeting unique regions flanking each cluster, and short single-stranded repair oligos with homologies to both sides of the projected cut sites. Using this approach, we generated five cluster KO ES cell lines in at least two biological replicates and performed RNA sequencing (RNA-seq) to determine TE expression levels. Strikingly, four of the five cluster KO ES cells exhibited distinct TE reactivation phenotypes (Figure 2A). Chr2-cl KO resulted in reactivation of several L1 subfamilies as well as RLTR10 (up to more than 100-fold as compared to WT) and IAPEz ERVs. In contrast, the most strongly upregulated TEs in Chr4-cl KO cells were ETn/ETnERV (up to 10-fold as compared to WT), with several other ERV groups modestly reactivated. ETn/ETnERV elements were also upregulated in Chr13.2-cl KO ES cells while the only upregulated ERVs in Chr13.1-cl KO ES cells were MMERVK10C elements (Figure 2A). Most reactivated retrotransposons were targeted by at least one KRAB-ZFP that was encoded in the deleted cluster (Figure 2A and Supplementary file 1), indicating a direct effect of these KRAB-ZFPs on TE expression levels. Furthermore, we observed a loss of KAP1 binding and H3K9me3 at several TE subfamilies that are targeted by at least one KRAB-ZFP within the deleted Chr2-cl and Chr4-cl (Figure 2B, Figure 2—figure supplement 1A), including L1, ETn and IAPEz elements. Using reduced representation bisulfite sequencing (RRBS-seq), we found that a subset of KRAB-ZFP bound TEs were partially hypomethylated in Chr4-cl KO ES cells, but only when grown in genome-wide hypomethylation-inducing conditions (Blaschke et al., 2013; Figure 2C and Supplementary file 2). These data are consistent with the hypothesis that KRAB-ZFPs/KAP1 are not required to establish DNA methylation, but under certain conditions they protect specific TEs and imprint control regions from genome-wide demethylation (Leung et al., 2014; Deniz et al., 2018).
Figure 2. Retrotransposon reactivation in KRAB-ZFP cluster KO ES cells. (A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value&lt;0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test.
<!-- image -->
### KRAB-ZFP cluster deletions license TE-borne enhancers
We next used our RNA-seq datasets to determine the effect of KRAB-ZFP cluster deletions on gene expression. We identified 195 significantly upregulated and 130 downregulated genes in Chr4-cl KO ES cells, and 108 upregulated and 59 downregulated genes in Chr2-cl KO ES cells (excluding genes on the deleted cluster) (Figure 3A). To address whether gene deregulation in Chr2-cl and Chr4-cl KO ES cells is caused by nearby TE reactivation, we determined whether genes near certain TE subfamilies are more frequently deregulated than random genes. We found a strong correlation of gene upregulation and TE proximity for several TE subfamilies, of which many became transcriptionally activated themselves (Figure 3B). For example, nearly 10% of genes that are located within 100 kb (up- or downstream of the TSS) of an ETn element are upregulated in Chr4-cl KO ES cells, as compared to 0.8% of all genes. In Chr2-cl KO ES cells, upregulated genes were significantly enriched near various LINE groups but also IAPEz-int and RLTR10-int elements, indicating that TE-binding KRAB-ZFPs in these clusters limit the potential activating effects of TEs on nearby genes.
Figure 3. TE-dependent gene activation in KRAB-ZFP cluster KO ES cells. (A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value&lt;0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value&lt;0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5 truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p&lt;0.01, Students t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines.
<!-- image -->
While we generally observed that TE-associated gene reactivation is not caused by elongated or spliced transcription starting at the retrotransposons, we did observe that the strength of the effect of ETn elements on gene expression is stronger on genes in closer proximity. About 25% of genes located within 20 kb of an ETn element, but only 5% of genes located at a distance between 50 and 100 kb from the nearest ETn insertion, become upregulated in Chr4-cl KO ES cells. Importantly however, the correlation is still significant for genes that are located at distances between 50 and 100 kb from the nearest ETn insertion, indicating that ETn elements can act as long-range enhancers of gene expression in the absence of KRAB-ZFPs that target them. To confirm that Chr4-cl KRAB-ZFPs such as GM13051 block ETn-borne enhancers, we tested the ability of a putative ETn enhancer to activate transcription in a reporter assay. For this purpose, we cloned a 5 kb fragment spanning from the GM13051 binding site within the internal region of a truncated ETn insertion to the first exon of the Cd59a gene, which is strongly activated in Chr4-cl KO ES cells (Figure 2—figure supplement 1B). We observed strong transcriptional activity of this fragment which was significantly higher in Chr4-cl KO ES cells. Surprisingly, this activity was reduced to background when the internal segment of the ETn element was not included in the fragment, suggesting the internal segment of the ETn element, but not its LTR, contains a Chr4-cl KRAB-ZFP sensitive enhancer. To further corroborate these findings, we genetically deleted an ETn element that is located about 60 kb from the TSS of Chst1, one of the top-upregulated genes in Chr4-cl KO ES cells (Figure 3C). RT-qPCR analysis revealed that the Chst1 upregulation phenotype in Chr4-cl KO ES cells diminishes when the ETn insertion is absent, providing direct evidence that a KRAB-ZFP controlled ETn-borne enhancer regulates Chst1 expression (Figure 3D). Furthermore, ChIP-seq confirmed a general increase of H3K4me3, H3K4me1 and H3K27ac marks at ETn elements in Chr4-cl KO ES cells (Figure 3E). Notably, enhancer marks were most pronounced around the GM13051 binding site near the 3 end of the internal region, confirming that the enhancer activity of ETn is located on the internal region and not on the LTR.
### ETn retrotransposition in Chr4-cl KO and WT mice
@@ -44,6 +71,10 @@ We reasoned that retrotransposon activation could account for the reduced viabil
Using this dataset, we first confirmed the polymorphic nature of both ETn and MuLV retrotransposons in laboratory mouse strains (Figure 4—figure supplement 2A), highlighting the potential of these elements to retrotranspose. To identify novel insertions, we filtered out insertions that were supported by ETn/MuLV-paired reads in more than one animal. While none of the 54 ancestry-controlled mice showed a single novel MuLV insertion, we observed greatly varying numbers of up to 80 novel ETn insertions in our pedigree (Figure 4A).
Figure 4. ETn retrotransposition in Chr4-cl KO mice. (A) Pedigree of mice used for transposon insertion screening by capture-seq in mice of different strain backgrounds. The number of novel ETn insertions (only present in one animal) are indicated. For animals whose direct ancestors have not been screened, the ETn insertions are shown in parentheses since parental inheritance cannot be excluded in that case. Germ line insertions are indicated by asterisks. All DNA samples were prepared from tail tissues unless noted (-S: spleen, -E: ear, -B:Blood) (B) Statistical analysis of ETn insertion frequency in tail tissue from 30 Chr4-cl KO, KO/WT and WT mice that were derived from one Chr4-c KO x KO/WT and two Chr4-cl KO/WT x KO/WT matings. Only DNA samples that were collected from juvenile tails were considered for this analysis. P-values were calculated using one-sided Wilcoxon Rank Sum Test. In the last panel, KO, WT and KO/WT mice derived from all matings were combined for the statistical analysis.
<!-- image -->
To validate some of the novel ETn insertions, we designed specific PCR primers for five of the insertions and screened genomic DNA of the mice in which they were identified as well as their parents. For all tested insertions, we were able to amplify their flanking sequence and show that these insertions are absent in their parents (Figure 4—figure supplement 3A). To confirm their identity, we amplified and sequenced three of the novel full-length ETn insertions. Two of these elements (Genbank accession: MH449667-68) resembled typical ETnII elements with identical 5 and 3 LTRs and target site duplications (TSD) of 4 or 6 bp, respectively. The third sequenced element (MH449669) represented a hybrid element that contains both ETnI and MusD (ETnERV) sequences. Similar insertions can be found in the B6 reference genome; however, the identified novel insertion has a 2.5 kb deletion of the 5 end of the internal region. Additionally, the 5 and 3 LTR of this element differ in one nucleotide near the start site and contain an unusually large 248 bp TSD (containing a SINE repeat) indicating that an improper integration process might have truncated this element.
Besides novel ETn insertions that were only identified in one specific animal, we also observed three ETn insertions that could be detected in several siblings but not in their parents or any of the other screened mice. This strongly indicates that these retrotransposition events occurred in the germ line of the parents from which they were passed on to some of their offspring. One of these germ line insertions was evidently passed on from the offspring to the next generation (Figure 4A). As expected, the read numbers supporting these novel germ line insertions were comparable to the read numbers that were found in the flanking regions of annotated B6 ETn insertions (Figure 4—figure supplement 3B). In contrast, virtually all novel insertions that were only found in one animal were supported by significantly fewer reads (Figure 4—figure supplement 3B). This indicates that these elements resulted from retrotransposition events in the developing embryo and not in the zygote or parental germ cells. Indeed, we detected different sets of insertions in various tissues from the same animal (Figure 4—figure supplement 3C). Even between tail samples that were collected from the same animal at different ages, only a fraction of the new insertions were present in both samples, while technical replicates from the same genomic DNA samples showed a nearly complete overlap in insertions (Figure 4—figure supplement 3D).
@@ -58,6 +89,41 @@ Despite a lack of widespread ETn activation in Chr4-cl KO mice, it still remains
## Materials and methods
Key resources table
| Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
|------------------------------------------|----------------------------------------|-----------------------------------|-------------------------------------|------------------------------------------------------|
| Strain, strain background (Mus musculus) | 129 × 1/SvJ | The Jackson Laboratory | 000691 | Mice used to generate mixed strain Chr4-cl KO mice |
| Cell line (Homo-sapiens) | HeLa | ATCC | ATCC CCL-2 | |
| Cell line (Mus musculus) | JM8A3.N1 C57BL/6N-Atm1Brd | KOMP Repository | PL236745 | B6 ES cells used to generate KO cell lines and mice |
| Cell line (Mus musculus) | B6;129 Gt(ROSA)26Sortm1(cre/ERT)Nat/J | The Jackson Laboratory | 004847 | ES cells used to generate KO cell lines and mice |
| Cell line (Mus musculus) | R1 ES cells | Andras Nagy lab | R1 | 129 ES cells used to generate KO cell lines and mice |
| Cell line (Mus musculus) | F9 Embryonic carcinoma cells | ATCC | ATCC CRL-1720 | |
| Antibody | Mouse monoclonal ANTI-FLAG M2 antibody | Sigma-Aldrich | Cat# F1804, RRID:AB\_262044 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti-HA | Abcam | Cat# ab9110, RRID:AB\_307019 | ChIP (1 µg/107 cells) |
| Antibody | Mouse monoclonal anti-HA | Covance | Cat# MMS-101P-200, RRID:AB\_10064068 | |
| Antibody | Rabbit polyclonal anti-H3K9me3 | Active Motif | Cat# 39161, RRID:AB\_2532132 | ChIP (3 µl/107 cells) |
| Antibody | Rabbit polyclonal anti-GFP | Thermo Fisher Scientific | Cat# A-11122, RRID:AB\_221569 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti- H3K4me3 | Abcam | Cat# ab8580, RRID:AB\_306649 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti- H3K4me1 | Abcam | Cat# ab8895, RRID:AB\_306847 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti- H3K27ac | Abcam | Cat# ab4729, RRID:AB\_2118291 | ChIP (1 µg/107 cells) |
| Recombinant DNA reagent | pCW57.1 | Addgene | RRID:Addgene\_41393 | Inducible lentiviral expression vector |
| Recombinant DNA reagent | pX330-U6-Chimeric\_BB-CBh-hSpCas9 | Addgene | RRID:Addgene\_42230 | CRISPR/Cas9 expression construct |
| Sequence-based reagent | Chr2-cl KO gRNA.1 | This paper | Cas9 gRNA | GCCGTTGCTCAGTCCAAATG |
| Sequenced-based reagent | Chr2-cl KO gRNA.2 | This paper | Cas9 gRNA | GATACCAGAGGTGGCCGCAAG |
| Sequenced-based reagent | Chr4-cl KO gRNA.1 | This paper | Cas9 gRNA | GCAAAGGGGCTCCTCGATGGA |
| Sequence-based reagent | Chr4-cl KO gRNA.2 | This paper | Cas9 gRNA | GTTTATGGCCGTGCTAAGGTC |
| Sequenced-based reagent | Chr10-cl KO gRNA.1 | This paper | Cas9 gRNA | GTTGCCTTCATCCCACCGTG |
| Sequenced-based reagent | Chr10-cl KO gRNA.2 | This paper | Cas9 gRNA | GAAGTTCGACTTGGACGGGCT |
| Sequenced-based reagent | Chr13.1-cl KO gRNA.1 | This paper | Cas9 gRNA | GTAACCCATCATGGGCCCTAC |
| Sequenced-based reagent | Chr13.1-cl KO gRNA.2 | This paper | Cas9 gRNA | GGACAGGTTATAGGTTTGAT |
| Sequenced-based reagent | Chr13.2-cl KO gRNA.1 | This paper | Cas9 gRNA | GGGTTTCTGAGAAACGTGTA |
| Sequenced-based reagent | Chr13.2-cl KO gRNA.2 | This paper | Cas9 gRNA | GTGTAATGAGTTCTTATATC |
| Commercial assay or kit | SureSelectQXT Target Enrichment kit | Agilent | G9681-90000 | |
| Software, algorithm | Bowtie | http://bowtie-bio.sourceforge.net | RRID:SCR\_005476 | |
| Software, algorithm | MACS14 | https://bio.tools/macs | RRID:SCR\_013291 | |
| Software, algorithm | Tophat | https://ccb.jhu.edu | RRID:SCR\_013035 | |
### Cell lines and transgenic mice
Mouse ES cells and F9 EC cells were cultivated as described previously (Wolf et al., 2015b) unless stated otherwise. Chr4-cl KO ES cells originate from B6;129 Gt(ROSA)26Sortm1(cre/ERT)Nat/J mice (Jackson lab), all other KRAB-ZFP cluster KO ES cell lines originate from JM8A3.N1 C57BL/6N-Atm1Brd ES cells (KOMP Repository). Chr2-cl KO and WT ES cells were initially grown in serum-containing media (Wolf et al., 2015b) but changed to 2i media (De Iaco et al., 2017) for several weeks before analysis. To generate Chr4-cl and Chr2-cl KO mice, the cluster deletions were repeated in B6 ES (KOMP repository) or R1 (Nagy lab) ES cells, respectively, and heterozygous clones were injected into B6 albino blastocysts. Chr2-cl KO mice were therefore kept on a mixed B6/Svx129/Sv-CP strain background while Chr4-cl KO mice were initially derived on a pure C57BL/6 background. For capture-seq screens, Chr4-cl KO mice were crossed with 129 × 1/SvJ mice (Jackson lab) to produce the founder mice for Chr4-cl KO and WT (B6/129 F1) offspring. Chr4-cl KO/WT (B6/129 F1) were also crossed with 129 × 1/SvJ mice to get Chr4-cl KO/WT (B6/129 F1) mice, which were intercrossed to give rise to the parents of Chr4-cl KO/KO and KO/WT (B6/129 F2) offspring.
@@ -96,173 +162,99 @@ The retrotransposition vectors pCMV-MusD2, pCMV-MusD2-neoTNF and pCMV-ETnI1-neoT
To identify novel retrotransposon insertions, genomic DNA from various tissues (Supplementary file 4) was purified and used for library construction with target enrichment using the SureSelectQXT Target Enrichment kit (Agilent). Custom RNA capture probes were designed to hybridize with the 120 bp 5 ends of the 5 LTRs and the 120 bp 3 ends of the 3 LTR of about 600 intact (internal region flanked by two LTRs) MMETn/RLTRETN retrotransposons or of 140 RLTR4\_MM/RLTR4 retrotransposons that were upregulated in Chr4-cl KO ES cells (Figure 4—source data 2). Enriched libraries were sequenced on an Illumina HiSeq as paired-end 50 bp reads. R1 and R2 reads were mapped to the mm9 genome separately, using settings that only allow non-duplicated, uniquely mappable reads (Bowtie -m 1 --best --strata; samtools rmdup -s) and under settings that allow multimapping and duplicated reads (Bowtie --best). Of the latter, only reads that overlap (min. 50% of read) with RLTRETN, MMETn-int, ETnERV-int, ETnERV2-int or ETnERV3-int repeats (ETn) or RLTR4, RLTR4\_MM-int or MuLV-int repeats (RLTR4) were kept. Only uniquely mappable reads whose paired reads were overlapping with the repeats mentioned above were used for further analysis. All ETn- and RLTR4-paired reads were then clustered (as bed files) using BEDTools (bedtools merge -i -n -d 1000) to receive a list of all potential annotated and non-annotated new ETn or RLTR4 insertion sites and all overlapping ETn- or RLTR4-paired reads were counted for each sample at each locus. Finally, all regions that were located within 1 kb of an annotated RLTRETN, MMETn-int, ETnERV-int, ETnERV2-int or ETnERV3-int repeat as well as regions overlapping with previously identified polymorphic ETn elements (Nellåker et al., 2012) were removed. Genomic loci with at least 10 reads per million unique ETn- or RLTR4-paired reads were considered as insertion sites. To qualify for a de-novo insertion, we allowed no called insertions in any of the other screened mice at the locus and not a single read at the locus in the ancestors of the mouse. Insertions at the same locus in at least two siblings from the same offspring were considered as germ line insertions, if the insertion was absent in the parents and mice who were not direct descendants from these siblings. Full-length sequencing of new ETn insertions was done by Sanger sequencing of short PCR products in combination with Illumina sequencing of a large PCR product (Supplementary file 3), followed by de-novo assembly using the Unicycler software.
## Tables
## Funding Information
Table 1.: * Number of protein-coding KRAB-ZFP genes identified in a previously published screen (Imbeault et al., 2017) and the ChIP-seq data column indicates the number of KRAB-ZFPs for which ChIP-seq was performed in this study.
This paper was supported by the following grants:
| Cluster | Location | Size (Mb) | # of KRAB-ZFPs* | ChIP-seq data |
|-----------|------------|-------------|-------------------|-----------------|
| Chr2 | Chr2 qH4 | 3.1 | 40 | 17 |
| Chr4 | Chr4 qE1 | 2.3 | 21 | 19 |
| Chr10 | Chr10 qC1 | 0.6 | 6 | 1 |
| Chr13.1 | Chr13 qB3 | 1.2 | 6 | 2 |
| Chr13.2 | Chr13 qB3 | 0.8 | 26 | 12 |
| Chr8 | Chr8 qB3.3 | 0.1 | 4 | 4 |
| Chr9 | Chr9 qA3 | 0.1 | 4 | 2 |
| Other | - | - | 248 | 4 |
- http://dx.doi.org/10.13039/100009633Eunice Kennedy Shriver National Institute of Child Health and Human Development 1ZIAHD008933 to Todd S Macfarlan.
- http://dx.doi.org/10.13039/501100001711Swiss National Science Foundation 310030\_152879 to Didier Trono.
- http://dx.doi.org/10.13039/501100001711Swiss National Science Foundation 310030B\_173337 to Didier Trono.
- http://dx.doi.org/10.13039/501100000781European Research Council No. 268721 to Didier Trono.
- http://dx.doi.org/10.13039/501100000781European Research Council No 694658 to Didier Trono.
Key resources table:
## Acknowledgements
| Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
|------------------------------------------|----------------------------------------|-----------------------------------|-------------------------------------|------------------------------------------------------|
| Strain, strain background (Mus musculus) | 129 × 1/SvJ | The Jackson Laboratory | 000691 | Mice used to generate mixed strain Chr4-cl KO mice |
| Cell line (Homo-sapiens) | HeLa | ATCC | ATCC CCL-2 | |
| Cell line (Mus musculus) | JM8A3.N1 C57BL/6N-Atm1Brd | KOMP Repository | PL236745 | B6 ES cells used to generate KO cell lines and mice |
| Cell line (Mus musculus) | B6;129 Gt(ROSA)26Sortm1(cre/ERT)Nat/J | The Jackson Laboratory | 004847 | ES cells used to generate KO cell lines and mice |
| Cell line (Mus musculus) | R1 ES cells | Andras Nagy lab | R1 | 129 ES cells used to generate KO cell lines and mice |
| Cell line (Mus musculus) | F9 Embryonic carcinoma cells | ATCC | ATCC CRL-1720 | |
| Antibody | Mouse monoclonal ANTI-FLAG M2 antibody | Sigma-Aldrich | Cat# F1804, RRID:AB\_262044 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti-HA | Abcam | Cat# ab9110, RRID:AB\_307019 | ChIP (1 µg/107 cells) |
| Antibody | Mouse monoclonal anti-HA | Covance | Cat# MMS-101P-200, RRID:AB\_10064068 | |
| Antibody | Rabbit polyclonal anti-H3K9me3 | Active Motif | Cat# 39161, RRID:AB\_2532132 | ChIP (3 µl/107 cells) |
| Antibody | Rabbit polyclonal anti-GFP | Thermo Fisher Scientific | Cat# A-11122, RRID:AB\_221569 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti- H3K4me3 | Abcam | Cat# ab8580, RRID:AB\_306649 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti- H3K4me1 | Abcam | Cat# ab8895, RRID:AB\_306847 | ChIP (1 µg/107 cells) |
| Antibody | Rabbit polyclonal anti- H3K27ac | Abcam | Cat# ab4729, RRID:AB\_2118291 | ChIP (1 µg/107 cells) |
| Recombinant DNA reagent | pCW57.1 | Addgene | RRID:Addgene\_41393 | Inducible lentiviral expression vector |
| Recombinant DNA reagent | pX330-U6-Chimeric\_BB-CBh-hSpCas9 | Addgene | RRID:Addgene\_42230 | CRISPR/Cas9 expression construct |
| Sequence-based reagent | Chr2-cl KO gRNA.1 | This paper | Cas9 gRNA | GCCGTTGCTCAGTCCAAATG |
| Sequenced-based reagent | Chr2-cl KO gRNA.2 | This paper | Cas9 gRNA | GATACCAGAGGTGGCCGCAAG |
| Sequenced-based reagent | Chr4-cl KO gRNA.1 | This paper | Cas9 gRNA | GCAAAGGGGCTCCTCGATGGA |
| Sequence-based reagent | Chr4-cl KO gRNA.2 | This paper | Cas9 gRNA | GTTTATGGCCGTGCTAAGGTC |
| Sequenced-based reagent | Chr10-cl KO gRNA.1 | This paper | Cas9 gRNA | GTTGCCTTCATCCCACCGTG |
| Sequenced-based reagent | Chr10-cl KO gRNA.2 | This paper | Cas9 gRNA | GAAGTTCGACTTGGACGGGCT |
| Sequenced-based reagent | Chr13.1-cl KO gRNA.1 | This paper | Cas9 gRNA | GTAACCCATCATGGGCCCTAC |
| Sequenced-based reagent | Chr13.1-cl KO gRNA.2 | This paper | Cas9 gRNA | GGACAGGTTATAGGTTTGAT |
| Sequenced-based reagent | Chr13.2-cl KO gRNA.1 | This paper | Cas9 gRNA | GGGTTTCTGAGAAACGTGTA |
| Sequenced-based reagent | Chr13.2-cl KO gRNA.2 | This paper | Cas9 gRNA | GTGTAATGAGTTCTTATATC |
| Commercial assay or kit | SureSelectQXT Target Enrichment kit | Agilent | G9681-90000 | |
| Software, algorithm | Bowtie | http://bowtie-bio.sourceforge.net | RRID:SCR\_005476 | |
| Software, algorithm | MACS14 | https://bio.tools/macs | RRID:SCR\_013291 | |
| Software, algorithm | Tophat | https://ccb.jhu.edu | RRID:SCR\_013035 | |
We thank Alex Grinberg, Jeanne Yimdjo and Victoria Carter for generating and maintaining transgenic mice. We also thank members of the Macfarlan and Trono labs for useful discussion, Steven Coon, James Iben, Tianwei Li and Anna Malawska for NGS and computational support. This work was supported by NIH grant 1ZIAHD008933 and the NIH DDIR Innovation Award program (TSM), and by subsidies from the Swiss National Science Foundation (310030\_152879 and 310030B\_173337) and the European Research Council (KRABnKAP, No. 268721; Transpos-X, No. 694658) (DT).
## Figures
## Additional information
Figure 1.: Genome-wide binding patterns of mouse KRAB-ZFPs.
(A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value&lt;1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p&lt;1e-10, peak enrichment &gt;20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white.
Figure 1—source data 1.KRAB-ZFP expression in 40 mouse tissues and cell lines (ENCODE).Mean values of replicates are shown as log2 transcripts per million.
Figure 1—source data 2.Probability heatmap of KRAB-ZFP binding to TEs.Values corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fishers exact test).
## Additional files
<!-- image -->
## Data availability
Figure 1—figure supplement 1.: ES cell-specific expression of KRAB-ZFP gene clusters.
(A) Heatmap showing expression patterns of mouse KRAB-ZFPs in 40 mouse tissues and cell lines (ENCODE). Heatmap colors indicate gene expression levels in log2 transcripts per million (TPM). The asterisk indicates a group of 30 KRAB-ZFPs that are exclusively expressed in ES cells. (B) Physical location of the genes encoding for the 30 KRAB-ZFPs that are exclusively expressed in ES cells. (C) Phylogenetic (Maximum likelihood) tree of the KRAB domains of mouse KRAB-ZFPs. KRAB-ZFPs encoded on the gene clusters on chromosome 2 and 4 are highlighted. The scale bar at the bottom indicates amino acid substitutions per site.
All NGS data has been deposited in GEO (GSE115291). Sequences of full-length de novo ETn insertions have been deposited in the GenBank database (MH449667- MH449669).
<!-- image -->
The following datasets were generated:
Figure 1—figure supplement 2.: KRAB-ZFP binding motifs and their repression activity.
(A) Comparison of computationally predicted (bottom) and experimentally determined (top) KRAB-ZFP binding motifs. Only significant pairs are shown (FDR &lt; 0.1). (B) Luciferase reporter assays to confirm KRAB-ZFP repression of the identified target sites. Bars show the luciferase activity (normalized to Renilla luciferase) of reporter plasmids containing the indicated target sites cloned upstream of the SV40 promoter. Reporter plasmids were co-transfected into 293 T cells with a Renilla luciferase plasmid for normalization and plasmids expressing the targeting KRAB-ZFP. Normalized mean luciferase activity (from three replicates) is shown relative to luciferase activity of the reporter plasmid co-transfected with an empty pcDNA3.1 vector.
Wolf G. Retrotransposon reactivation and mobilization upon deletions of megabase scale KRAB zinc finger gene clusters in mice. NCBI Gene Expression Omnibus (2019). NCBI: GSE115291
<!-- image -->
Wolf G. Mus musculus musculus strain C57BL/6x129X1/SvJ retrotransposon MMETn-int, complete sequence. NCBI GenBank (2019). NCBI: MH449667
Figure 1—figure supplement 3.: KRAB-ZFP binding to ETn retrotransposons.
(A) Comparison of the PBSLys1,2 sequence with Zfp961 binding motifs in nonrepetitive peaks (Nonrep) and peaks at ETn elements. (B) Retrotransposition assays of original (ETnI1-neoTNF and MusD2-neoTNF Ribet et al., 2004) and modified reporter vectors where the Rex2 or Gm13051 binding motifs where removed. Schematic of reporter vectors are displayed at the top. HeLa cells were transfected as described in the Materials and Methods section and neo-resistant colonies, indicating retrotransposition events, were selected and stained. (C) Stem-loop structure of the ETn RNA export signal, the Gm13051 motif on the corresponding DNA is marked with red circles, the part of the motif that was deleted is indicated with grey crosses (adapted from Legiewicz et al., 2010).
Wolf G. Mus musculus musculus strain C57BL/6x129X1/SvJ retrotransposon MMETn-int, complete sequence. NCBI GenBank (2019). NCBI: MH449668
<!-- image -->
Wolf G. Mus musculus musculus strain C57BL/6x129X1/SvJ retrotransposon MMETn-int, complete sequence. NCBI GenBank (2019). NCBI: MH449669
Figure 2.: Retrotransposon reactivation in KRAB-ZFP cluster KO ES cells.
(A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value&lt;0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test.
Figure 2—source data 1.Differential H3K9me3 and KAP1 distribution in WT and KRAB-ZFP cluster KO ES cells at TE families and KRAB-ZFP bound TE insertions.Differential read counts and statistical testing were determined by DESeq2.
The following previously published datasets were used:
<!-- image -->
Castro-Diaz N, Ecco G, Coluccio A, Kapopoulou A, Duc J, Trono D. Evollutionally dynamic L1 regulation in embryonic stem cells. NCBI Gene Expression Omnibus (2014). NCBI: GSM1406445
Figure 2—figure supplement 1.: Epigenetic changes at TEs and TE-borne enhancers in KRAB-ZFP cluster KO ES cells.
(A) Differential analysis of summative (all individual insertions combined) H3K9me3 enrichment at TE groups in Chr10-cl, Chr13.1-cl and Chr13.2-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in orange (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (B) Top: Schematic view of the Cd59a/Cd59b locus with a 5 truncated ETn insertion. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). Bottom: Transcriptional activity of a 5 kb fragment with or without fragments of the ETn insertion was tested by luciferase reporter assay in Chr4-cl WT and KO ES cells.
<!-- image -->
Figure 3.: TE-dependent gene activation in KRAB-ZFP cluster KO ES cells.
(A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value&lt;0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value&lt;0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5 truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p&lt;0.01, Students t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines.
<!-- image -->
Figure 4.: ETn retrotransposition in Chr4-cl KO mice.
(A) Pedigree of mice used for transposon insertion screening by capture-seq in mice of different strain backgrounds. The number of novel ETn insertions (only present in one animal) are indicated. For animals whose direct ancestors have not been screened, the ETn insertions are shown in parentheses since parental inheritance cannot be excluded in that case. Germ line insertions are indicated by asterisks. All DNA samples were prepared from tail tissues unless noted (-S: spleen, -E: ear, -B:Blood) (B) Statistical analysis of ETn insertion frequency in tail tissue from 30 Chr4-cl KO, KO/WT and WT mice that were derived from one Chr4-c KO x KO/WT and two Chr4-cl KO/WT x KO/WT matings. Only DNA samples that were collected from juvenile tails were considered for this analysis. P-values were calculated using one-sided Wilcoxon Rank Sum Test. In the last panel, KO, WT and KO/WT mice derived from all matings were combined for the statistical analysis.
Figure 4—source data 1.Coordinates of identified novel ETn insertions and supporting capture-seq read counts.Genomic regions indicate cluster of supporting reads.
Figure 4—source data 2.Sequences of capture-seq probes used to enrich genomic DNA for ETn and MuLV (RLTR4) insertions.
<!-- image -->
Figure 4—figure supplement 1.: Birth statistics of KRAB-ZFP cluster KO mice and TE reactivation in adult tissues.
(A) Birth statistics of Chr4- and Chr2-cl mice derived from KO/WT x KO/WT matings in different strain backgrounds. (B) RNA-seq analysis of TE expression in Chr2- (left) and Chr4-cl (right) KO tissues. TE groups with the highest reactivation phenotype in ES cells are shown separately. Significantly up- and downregulated elements (adjusted p-value&lt;0.05) are highlighted in red and green, respectively. Experiments were performed in at least two biological replicates.
<!-- image -->
Figure 4—figure supplement 2.: Identification of polymorphic ETn and MuLV retrotransposon insertions in Chr4-cl KO and WT mice.
Heatmaps show normalized capture-seq read counts in RPM (Read Per Million) for identified polymorphic ETn (A) and MuLV (B) loci in different mouse strains. Only loci with strong support for germ line ETn or MuLV insertions (at least 100 or 3000 ETn or MuLV RPM, respectively) in at least two animals are shown. Non-polymorphic insertion loci with high read counts in all screened mice were excluded for better visibility. The sample information (sample name and cell type/tissue) is annotated at the bottom, with the strain information indicated by color at the top. The color gradient indicates log10(RPM+1).
<!-- image -->
Figure 4—figure supplement 3.: Confirmation of novel ETn insertions identified by capture-seq.
(A) PCR validation of novel ETn insertions in genomic DNA of three littermates (IDs: T09673, T09674 and T00436) and their parents (T3913 and T3921). Primer sequences are shown in Supplementary file 3. (B) ETn capture-seq read counts (RPM) at putative novel somatic (loci identified exclusively in one single animal), novel germ line (loci identified in several littermates) insertions, and at B6 reference ETn elements. (C) Heatmap shows capture-seq read counts (RPM) of a Chr4-cl KO mouse (ID: C6733) as determined in different tissues. Each row represents a novel ETn locus that was identified in at least one tissue. The color gradient indicates log10(RPM+1). (D) Heatmap shows the capture-seq RPM in technical replicates using the same Chr4-cl KO DNA sample (rep1/rep2) or replicates with DNA samples prepared from different sections of the tail from the same mouse at different ages (tail1/tail2). Each row represents a novel ETn locus that was identified in at least one of the displayed samples. The color gradient indicates log10(RPM+1).
<!-- image -->
Andrew ZX. H3K9me3\_ChIPSeq (Ctrl). NCBI Gene Expression Omnibus (2014). NCBI: GSM1327148
## References
- TL Bailey; M Boden; FA Buske; M Frith; CE Grant; L Clementi; J Ren; WW Li; WS Noble. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research (2009)
- C Baust; L Gagnier; GJ Baillie; MJ Harris; DM Juriloff; DL Mager. Structure and expression of mobile ETnII retroelements and their coding-competent MusD relatives in the mouse. Journal of Virology (2003)
- K Blaschke; KT Ebata; MM Karimi; JA Zepeda-Martínez; P Goyal; S Mahapatra; A Tam; DJ Laird; M Hirst; A Rao; MC Lorincz; M Ramalho-Santos. Vitamin C induces Tet-dependent DNA demethylation and a blastocyst-like state in ES cells. Nature (2013)
- A Brodziak; E Ziółko; M Muc-Wierzgoń; E Nowakowska-Zajdel; T Kokot; K Klakla. The role of human endogenous retroviruses in the pathogenesis of autoimmune diseases. Medical Science Monitor : International Medical Journal of Experimental and Clinical Research (2012)
- N Castro-Diaz; G Ecco; A Coluccio; A Kapopoulou; B Yazdanpanah; M Friedli; J Duc; SM Jang; P Turelli; D Trono. Evolutionally dynamic L1 regulation in embryonic stem cells. Genes &amp; Development (2014)
- EB Chuong; NC Elde; C Feschotte. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science (2016)
- J Dan; Y Liu; N Liu; M Chiourea; M Okuka; T Wu; X Ye; C Mou; L Wang; L Wang; Y Yin; J Yuan; B Zuo; F Wang; Z Li; X Pan; Z Yin; L Chen; DL Keefe; S Gagos; A Xiao; L Liu. Rif1 maintains telomere length homeostasis of ESCs by mediating heterochromatin silencing. Developmental Cell (2014)
- A De Iaco; E Planet; A Coluccio; S Verp; J Duc; D Trono. DUX-family transcription factors regulate zygotic genome activation in placental mammals. Nature Genetics (2017)
- Ö Deniz; L de la Rica; KCL Cheng; D Spensberger; MR Branco. SETDB1 prevents TET2-dependent activation of IAP retroelements in naïve embryonic stem cells. Genome Biology (2018)
- M Dewannieux; T Heidmann. Endogenous retroviruses: acquisition, amplification and taming of genome invaders. Current Opinion in Virology (2013)
- G Ecco; M Cassano; A Kauzlaric; J Duc; A Coluccio; S Offner; M Imbeault; HM Rowe; P Turelli; D Trono. Transposable elements and their KRAB-ZFP controllers regulate gene expression in adult tissues. Developmental Cell (2016)
- G Ecco; M Imbeault; D Trono. KRAB zinc finger proteins. Development (2017)
- JA Frank; C Feschotte. Co-option of endogenous viral sequences for host cell function. Current Opinion in Virology (2017)
- L Gagnier; VP Belancio; DL Mager. Mouse germ line mutations due to retrotransposon insertions. Mobile DNA (2019)
- AC Groner; S Meylan; A Ciuffi; N Zangger; G Ambrosini; N Dénervaud; P Bucher; D Trono. KRAB-zinc finger proteins and KAP1 can mediate long-range transcriptional repression through heterochromatin spreading. PLOS Genetics (2010)
- DC Hancks; HH Kazazian. Roles for retrotransposon insertions in human disease. Mobile DNA (2016)
- M Imbeault; PY Helleboid; D Trono. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature (2017)
- FM Jacobs; D Greenberg; N Nguyen; M Haeussler; AD Ewing; S Katzman; B Paten; SR Salama; D Haussler. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature (2014)
- H Kano; H Kurahashi; T Toda. Genetically regulated epigenetic transcriptional activation of retrotransposon insertion confers mouse dactylaplasia phenotype. PNAS (2007)
- MM Karimi; P Goyal; IA Maksakova; M Bilenky; D Leung; JX Tang; Y Shinkai; DL Mager; S Jones; M Hirst; MC Lorincz. DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct sets of genes, retroelements, and chimeric transcripts in mESCs. Cell Stem Cell (2011)
- A Kauzlaric; G Ecco; M Cassano; J Duc; M Imbeault; D Trono. The mouse genome displays highly dynamic populations of KRAB-zinc finger protein genes and related genetic units. PLOS ONE (2017)
- PP Khil; F Smagulova; KM Brick; RD Camerini-Otero; GV Petukhova. Sensitive mapping of recombination hotspots using sequencing-based detection of ssDNA. Genome Research (2012)
- F Krueger; SR Andrews. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics (2011)
- B Langmead; SL Salzberg. Fast gapped-read alignment with bowtie 2. Nature Methods (2012)
- M Legiewicz; AS Zolotukhin; GR Pilkington; KJ Purzycka; M Mitchell; H Uranishi; J Bear; GN Pavlakis; SF Le Grice; BK Felber. The RNA transport element of the murine musD retrotransposon requires long-range intramolecular interactions for function. Journal of Biological Chemistry (2010)
- JA Lehoczky; PE Thomas; KM Patrie; KM Owens; LM Villarreal; K Galbraith; J Washburn; CN Johnson; B Gavino; AD Borowsky; KJ Millen; P Wakenight; W Law; ML Van Keuren; G Gavrilina; ED Hughes; TL Saunders; L Brihn; JH Nadeau; JW Innis. A novel intergenic ETnII-β insertion mutation causes multiple malformations in Polypodia mice. PLOS Genetics (2013)
- D Leung; T Du; U Wagner; W Xie; AY Lee; P Goyal; Y Li; KE Szulwach; P Jin; MC Lorincz; B Ren. Regulation of DNA methylation turnover at LTR retrotransposons and imprinted loci by the histone methyltransferase Setdb1. PNAS (2014)
- J Lilue; AG Doran; IT Fiddes; M Abrudan; J Armstrong; R Bennett; W Chow; J Collins; S Collins; A Czechanski; P Danecek; M Diekhans; DD Dolle; M Dunn; R Durbin; D Earl; A Ferguson-Smith; P Flicek; J Flint; A Frankish; B Fu; M Gerstein; J Gilbert; L Goodstadt; J Harrow; K Howe; X Ibarra-Soria; M Kolmogorov; CJ Lelliott; DW Logan; J Loveland; CE Mathews; R Mott; P Muir; S Nachtweide; FCP Navarro; DT Odom; N Park; S Pelan; SK Pham; M Quail; L Reinholdt; L Romoth; L Shirley; C Sisu; M Sjoberg-Herrera; M Stanke; C Steward; M Thomas; G Threadgold; D Thybert; J Torrance; K Wong; J Wood; B Yalcin; F Yang; DJ Adams; B Paten; TM Keane. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nature Genetics (2018)
- S Liu; J Brind'Amour; MM Karimi; K Shirane; A Bogutz; L Lefebvre; H Sasaki; Y Shinkai; MC Lorincz. Setdb1 is required for germline development and silencing of H3K9me3-marked endogenous retroviruses in primordial germ cells. Genes &amp; Development (2014)
- MI Love; W Huber; S Anders. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology (2014)
- F Lugani; R Arora; N Papeta; A Patel; Z Zheng; R Sterken; RA Singer; G Caridi; C Mendelsohn; L Sussel; VE Papaioannou; AG Gharavi. A retrotransposon insertion in the 5' regulatory domain of Ptf1a results in ectopic gene expression and multiple congenital defects in Danforth's short tail mouse. PLOS Genetics (2013)
- TS Macfarlan; WD Gifford; S Driscoll; K Lettieri; HM Rowe; D Bonanomi; A Firth; O Singer; D Trono; SL Pfaff. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature (2012)
- IA Maksakova; MT Romanish; L Gagnier; CA Dunn; LN van de Lagemaat; DL Mager. Retroviral elements and their hosts: insertional mutagenesis in the mouse germ line. PLOS Genetics (2006)
- T Matsui; D Leung; H Miyashita; IA Maksakova; H Miyachi; H Kimura; M Tachibana; MC Lorincz; Y Shinkai. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature (2010)
- HS Najafabadi; S Mnaimneh; FW Schmitges; M Garton; KN Lam; A Yang; M Albu; MT Weirauch; E Radovani; PM Kim; J Greenblatt; BJ Frey; TR Hughes. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nature Biotechnology (2015)
- C Nellåker; TM Keane; B Yalcin; K Wong; A Agam; TG Belgard; J Flint; DJ Adams; WN Frankel; CP Ponting. The genomic landscape shaped by selection on transposable elements across 18 mouse strains. Genome Biology (2012)
- H O'Geen; S Frietze; PJ Farnham. Using ChIP-seq technology to identify targets of zinc finger transcription factors. Methods in Molecular Biology (2010)
- A Patel; P Yang; M Tinkham; M Pradhan; M-A Sun; Y Wang; D Hoang; G Wolf; JR Horton; X Zhang; T Macfarlan; X Cheng. DNA conformation induces adaptable binding by tandem zinc finger proteins. Cell (2018)
- D Ribet; M Dewannieux; T Heidmann. An active murine transposon family pair: retrotransposition of "master" MusD copies and ETn trans-mobilization. Genome Research (2004)
- SR Richardson; P Gerdes; DJ Gerhardt; FJ Sanchez-Luque; GO Bodea; M Muñoz-Lopez; JS Jesuadian; MHC Kempen; PE Carreira; JA Jeddeloh; JL Garcia-Perez; HH Kazazian; AD Ewing; GJ Faulkner. Heritable L1 retrotransposition in the mouse primordial germline and early embryo. Genome Research (2017)
- HM Rowe; J Jakobsson; D Mesnard; J Rougemont; S Reynard; T Aktas; PV Maillard; H Layard-Liesching; S Verp; J Marquis; F Spitz; DB Constam; D Trono. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature (2010)
- HM Rowe; A Kapopoulou; A Corsinotti; L Fasching; TS Macfarlan; Y Tarabay; S Viville; J Jakobsson; SL Pfaff; D Trono. TRIM28 repression of retrotransposon-based enhancers is necessary to preserve transcriptional dynamics in embryonic stem cells. Genome Research (2013)
- SN Schauer; PE Carreira; R Shukla; DJ Gerhardt; P Gerdes; FJ Sanchez-Luque; P Nicoli; M Kindlova; S Ghisletti; AD Santos; D Rapoud; D Samuel; J Faivre; AD Ewing; SR Richardson; GJ Faulkner. L1 retrotransposition is a common feature of mammalian hepatocarcinogenesis. Genome Research (2018)
- DC Schultz; K Ayyanathan; D Negorev; GG Maul; FJ Rauscher. SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes &amp; Development (2002)
- K Semba; K Araki; K Matsumoto; H Suda; T Ando; A Sei; H Mizuta; K Takagi; M Nakahara; M Muta; G Yamada; N Nakagata; A Iida; S Ikegawa; Y Nakamura; M Araki; K Abe; K Yamamura. Ectopic expression of Ptf1a induces spinal defects, urogenital defects, and anorectal malformations in Danforth's short tail mice. PLOS Genetics (2013)
- SP Sripathy; J Stevens; DC Schultz. The KAP1 corepressor functions to coordinate the assembly of de novo HP1-demarcated microenvironments of heterochromatin required for KRAB zinc finger protein-mediated transcriptional repression. Molecular and Cellular Biology (2006)
- JH Thomas; S Schneider. Coevolution of retroelements and tandem zinc finger genes. Genome Research (2011)
- PJ Thompson; TS Macfarlan; MC Lorincz. Long terminal repeats: from parasitic elements to building blocks of the transcriptional regulatory repertoire. Molecular Cell (2016)
- RS Treger; SD Pope; Y Kong; M Tokuyama; M Taura; A Iwasaki. The lupus susceptibility locus Sgp3 encodes the suppressor of endogenous retrovirus expression SNERV. Immunity (2019)
- CN Vlangos; AN Siuniak; D Robinson; AM Chinnaiyan; RH Lyons; JD Cavalcoli; CE Keegan. Next-generation sequencing identifies the Danforth's short tail mouse mutation as a retrotransposon insertion affecting Ptf1a expression. PLOS Genetics (2013)
- J Wang; G Xie; M Singh; AT Ghanbarian; T Raskó; A Szvetnik; H Cai; D Besser; A Prigione; NV Fuchs; GG Schumann; W Chen; MC Lorincz; Z Ivics; LD Hurst; Z Izsvák. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature (2014)
- D Wolf; K Hug; SP Goff. TRIM28 mediates primer binding site-targeted silencing of Lys1,2 tRNA-utilizing retroviruses in embryonic cells. PNAS (2008)
- G Wolf; D Greenberg; TS Macfarlan. Spotting the enemy within: targeted silencing of foreign DNA in mammalian genomes by the Krüppel-associated box zinc finger protein family. Mobile DNA (2015a)
- G Wolf; P Yang; AC Füchtbauer; EM Füchtbauer; AM Silva; C Park; W Wu; AL Nielsen; FS Pedersen; TS Macfarlan. The KRAB zinc finger protein ZFP809 is required to initiate epigenetic silencing of endogenous retroviruses. Genes &amp; Development (2015b)
- M Yamauchi; B Freitag; C Khan; B Berwin; E Barklis. Stem cell factor binding to retrovirus primer binding site silencers. Journal of Virology (1995)
- Y Zhang; T Liu; CA Meyer; J Eeckhoute; DS Johnson; BE Bernstein; C Nusbaum; RM Myers; M Brown; W Li; XS Liu. Model-based analysis of ChIP-Seq (MACS). Genome Biology (2008)
- Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research 37:W202W208 (2009). DOI: 10.1093/nar/gkp335, PMID: 19458158
- Baust C, Gagnier L, Baillie GJ, Harris MJ, Juriloff DM, Mager DL. Structure and expression of mobile ETnII retroelements and their coding-competent MusD relatives in the mouse. Journal of Virology 77:1144811458 (2003). DOI: 10.1128/JVI.77.21.11448-11458.2003, PMID: 14557630
- Blaschke K, Ebata KT, Karimi MM, Zepeda-Martínez JA, Goyal P, Mahapatra S, Tam A, Laird DJ, Hirst M, Rao A, Lorincz MC, Ramalho-Santos M. Vitamin C induces Tet-dependent DNA demethylation and a blastocyst-like state in ES cells. Nature 500:222226 (2013). DOI: 10.1038/nature12362, PMID: 23812591
- Brodziak A, Ziółko E, Muc-Wierzgoń M, Nowakowska-Zajdel E, Kokot T, Klakla K. The role of human endogenous retroviruses in the pathogenesis of autoimmune diseases. Medical Science Monitor : International Medical Journal of Experimental and Clinical Research 18:RA80RA88 (2012). DOI: 10.12659/msm.882892, PMID: 22648263
- Castro-Diaz N, Ecco G, Coluccio A, Kapopoulou A, Yazdanpanah B, Friedli M, Duc J, Jang SM, Turelli P, Trono D. Evolutionally dynamic L1 regulation in embryonic stem cells. Genes &amp; Development 28:13971409 (2014). DOI: 10.1101/gad.241661.114, PMID: 24939876
- Chuong EB, Elde NC, Feschotte C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351:10831087 (2016). DOI: 10.1126/science.aad5497, PMID: 26941318
- Dan J, Liu Y, Liu N, Chiourea M, Okuka M, Wu T, Ye X, Mou C, Wang L, Wang L, Yin Y, Yuan J, Zuo B, Wang F, Li Z, Pan X, Yin Z, Chen L, Keefe DL, Gagos S, Xiao A, Liu L. Rif1 maintains telomere length homeostasis of ESCs by mediating heterochromatin silencing. Developmental Cell 29:719 (2014). DOI: 10.1016/j.devcel.2014.03.004, PMID: 24735877
- De Iaco A, Planet E, Coluccio A, Verp S, Duc J, Trono D. DUX-family transcription factors regulate zygotic genome activation in placental mammals. Nature Genetics 49:941945 (2017). DOI: 10.1038/ng.3858, PMID: 28459456
- Deniz Ö, de la Rica L, Cheng KCL, Spensberger D, Branco MR. SETDB1 prevents TET2-dependent activation of IAP retroelements in naïve embryonic stem cells. Genome Biology 19:6 (2018). DOI: 10.1186/s13059-017-1376-y, PMID: 29351814
- Dewannieux M, Heidmann T. Endogenous retroviruses: acquisition, amplification and taming of genome invaders. Current Opinion in Virology 3:646656 (2013). DOI: 10.1016/j.coviro.2013.08.005, PMID: 24004725
- Ecco G, Cassano M, Kauzlaric A, Duc J, Coluccio A, Offner S, Imbeault M, Rowe HM, Turelli P, Trono D. Transposable elements and their KRAB-ZFP controllers regulate gene expression in adult tissues. Developmental Cell 36:611623 (2016). DOI: 10.1016/j.devcel.2016.02.024, PMID: 27003935
- Ecco G, Imbeault M, Trono D. KRAB zinc finger proteins. Development 144:27192729 (2017). DOI: 10.1242/dev.132605, PMID: 28765213
- Frank JA, Feschotte C. Co-option of endogenous viral sequences for host cell function. Current Opinion in Virology 25:8189 (2017). DOI: 10.1016/j.coviro.2017.07.021, PMID: 28818736
- Gagnier L, Belancio VP, Mager DL. Mouse germ line mutations due to retrotransposon insertions. Mobile DNA 10:15 (2019). DOI: 10.1186/s13100-019-0157-4, PMID: 31011371
- Groner AC, Meylan S, Ciuffi A, Zangger N, Ambrosini G, Dénervaud N, Bucher P, Trono D. KRAB-zinc finger proteins and KAP1 can mediate long-range transcriptional repression through heterochromatin spreading. PLOS Genetics 6:e1000869 (2010). DOI: 10.1371/journal.pgen.1000869, PMID: 20221260
- Hancks DC, Kazazian HH. Roles for retrotransposon insertions in human disease. Mobile DNA 7:9 (2016). DOI: 10.1186/s13100-016-0065-9, PMID: 27158268
- Imbeault M, Helleboid PY, Trono D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543:550554 (2017). DOI: 10.1038/nature21683, PMID: 28273063
- Jacobs FM, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, Paten B, Salama SR, Haussler D. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature 516:242245 (2014). DOI: 10.1038/nature13760, PMID: 25274305
- Kano H, Kurahashi H, Toda T. Genetically regulated epigenetic transcriptional activation of retrotransposon insertion confers mouse dactylaplasia phenotype. PNAS 104:1903419039 (2007). DOI: 10.1073/pnas.0705483104, PMID: 17984064
- Karimi MM, Goyal P, Maksakova IA, Bilenky M, Leung D, Tang JX, Shinkai Y, Mager DL, Jones S, Hirst M, Lorincz MC. DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct sets of genes, retroelements, and chimeric transcripts in mESCs. Cell Stem Cell 8:676687 (2011). DOI: 10.1016/j.stem.2011.04.004, PMID: 21624812
- Kauzlaric A, Ecco G, Cassano M, Duc J, Imbeault M, Trono D. The mouse genome displays highly dynamic populations of KRAB-zinc finger protein genes and related genetic units. PLOS ONE 12:e0173746 (2017). DOI: 10.1371/journal.pone.0173746, PMID: 28334004
- Khil PP, Smagulova F, Brick KM, Camerini-Otero RD, Petukhova GV. Sensitive mapping of recombination hotspots using sequencing-based detection of ssDNA. Genome Research 22:957965 (2012). DOI: 10.1101/gr.130583.111, PMID: 22367190
- Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27:15711572 (2011). DOI: 10.1093/bioinformatics/btr167, PMID: 21493656
- Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nature Methods 9:357359 (2012). DOI: 10.1038/nmeth.1923, PMID: 22388286
- Legiewicz M, Zolotukhin AS, Pilkington GR, Purzycka KJ, Mitchell M, Uranishi H, Bear J, Pavlakis GN, Le Grice SF, Felber BK. The RNA transport element of the murine musD retrotransposon requires long-range intramolecular interactions for function. Journal of Biological Chemistry 285:4209742104 (2010). DOI: 10.1074/jbc.M110.182840, PMID: 20978285
- Lehoczky JA, Thomas PE, Patrie KM, Owens KM, Villarreal LM, Galbraith K, Washburn J, Johnson CN, Gavino B, Borowsky AD, Millen KJ, Wakenight P, Law W, Van Keuren ML, Gavrilina G, Hughes ED, Saunders TL, Brihn L, Nadeau JH, Innis JW. A novel intergenic ETnII-β insertion mutation causes multiple malformations in Polypodia mice. PLOS Genetics 9:e1003967 (2013). DOI: 10.1371/journal.pgen.1003967, PMID: 24339789
- Leung D, Du T, Wagner U, Xie W, Lee AY, Goyal P, Li Y, Szulwach KE, Jin P, Lorincz MC, Ren B. Regulation of DNA methylation turnover at LTR retrotransposons and imprinted loci by the histone methyltransferase Setdb1. PNAS 111:66906695 (2014). DOI: 10.1073/pnas.1322273111, PMID: 24757056
- Lilue J, Doran AG, Fiddes IT, Abrudan M, Armstrong J, Bennett R, Chow W, Collins J, Collins S, Czechanski A, Danecek P, Diekhans M, Dolle DD, Dunn M, Durbin R, Earl D, Ferguson-Smith A, Flicek P, Flint J, Frankish A, Fu B, Gerstein M, Gilbert J, Goodstadt L, Harrow J, Howe K, Ibarra-Soria X, Kolmogorov M, Lelliott CJ, Logan DW, Loveland J, Mathews CE, Mott R, Muir P, Nachtweide S, Navarro FCP, Odom DT, Park N, Pelan S, Pham SK, Quail M, Reinholdt L, Romoth L, Shirley L, Sisu C, Sjoberg-Herrera M, Stanke M, Steward C, Thomas M, Threadgold G, Thybert D, Torrance J, Wong K, Wood J, Yalcin B, Yang F, Adams DJ, Paten B, Keane TM. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nature Genetics 50:15741583 (2018). DOI: 10.1038/s41588-018-0223-8, PMID: 30275530
- Liu S, Brind'Amour J, Karimi MM, Shirane K, Bogutz A, Lefebvre L, Sasaki H, Shinkai Y, Lorincz MC. Setdb1 is required for germline development and silencing of H3K9me3-marked endogenous retroviruses in primordial germ cells. Genes &amp; Development 28:20412055 (2014). DOI: 10.1101/gad.244848.114, PMID: 25228647
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550 (2014). DOI: 10.1186/s13059-014-0550-8, PMID: 25516281
- Lugani F, Arora R, Papeta N, Patel A, Zheng Z, Sterken R, Singer RA, Caridi G, Mendelsohn C, Sussel L, Papaioannou VE, Gharavi AG. A retrotransposon insertion in the 5' regulatory domain of Ptf1a results in ectopic gene expression and multiple congenital defects in Danforth's short tail mouse. PLOS Genetics 9:e1003206 (2013). DOI: 10.1371/journal.pgen.1003206, PMID: 23437001
- Macfarlan TS, Gifford WD, Driscoll S, Lettieri K, Rowe HM, Bonanomi D, Firth A, Singer O, Trono D, Pfaff SL. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487:5763 (2012). DOI: 10.1038/nature11244, PMID: 22722858
- Maksakova IA, Romanish MT, Gagnier L, Dunn CA, van de Lagemaat LN, Mager DL. Retroviral elements and their hosts: insertional mutagenesis in the mouse germ line. PLOS Genetics 2:e2 (2006). DOI: 10.1371/journal.pgen.0020002, PMID: 16440055
- Matsui T, Leung D, Miyashita H, Maksakova IA, Miyachi H, Kimura H, Tachibana M, Lorincz MC, Shinkai Y. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature 464:927931 (2010). DOI: 10.1038/nature08858, PMID: 20164836
- Najafabadi HS, Mnaimneh S, Schmitges FW, Garton M, Lam KN, Yang A, Albu M, Weirauch MT, Radovani E, Kim PM, Greenblatt J, Frey BJ, Hughes TR. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nature Biotechnology 33:555562 (2015). DOI: 10.1038/nbt.3128, PMID: 25690854
- Nellåker C, Keane TM, Yalcin B, Wong K, Agam A, Belgard TG, Flint J, Adams DJ, Frankel WN, Ponting CP. The genomic landscape shaped by selection on transposable elements across 18 mouse strains. Genome Biology 13:R45 (2012). DOI: 10.1186/gb-2012-13-6-r45, PMID: 22703977
- O'Geen H, Frietze S, Farnham PJ. Using ChIP-seq technology to identify targets of zinc finger transcription factors. Methods in Molecular Biology 649:437455 (2010). DOI: 10.1007/978-1-60761-753-2\_27, PMID: 20680851
- Patel A, Yang P, Tinkham M, Pradhan M, Sun M-A, Wang Y, Hoang D, Wolf G, Horton JR, Zhang X, Macfarlan T, Cheng X. DNA conformation induces adaptable binding by tandem zinc finger proteins. Cell 173:221233 (2018). DOI: 10.1016/j.cell.2018.02.058, PMID: 29551271
- Ribet D, Dewannieux M, Heidmann T. An active murine transposon family pair: retrotransposition of "master" MusD copies and ETn trans-mobilization. Genome Research 14:22612267 (2004). DOI: 10.1101/gr.2924904, PMID: 15479948
- Richardson SR, Gerdes P, Gerhardt DJ, Sanchez-Luque FJ, Bodea GO, Muñoz-Lopez M, Jesuadian JS, Kempen MHC, Carreira PE, Jeddeloh JA, Garcia-Perez JL, Kazazian HH, Ewing AD, Faulkner GJ. Heritable L1 retrotransposition in the mouse primordial germline and early embryo. Genome Research 27:13951405 (2017). DOI: 10.1101/gr.219022.116, PMID: 28483779
- Rowe HM, Jakobsson J, Mesnard D, Rougemont J, Reynard S, Aktas T, Maillard PV, Layard-Liesching H, Verp S, Marquis J, Spitz F, Constam DB, Trono D. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature 463:237240 (2010). DOI: 10.1038/nature08674, PMID: 20075919
- Rowe HM, Kapopoulou A, Corsinotti A, Fasching L, Macfarlan TS, Tarabay Y, Viville S, Jakobsson J, Pfaff SL, Trono D. TRIM28 repression of retrotransposon-based enhancers is necessary to preserve transcriptional dynamics in embryonic stem cells. Genome Research 23:452461 (2013). DOI: 10.1101/gr.147678.112, PMID: 23233547
- Schauer SN, Carreira PE, Shukla R, Gerhardt DJ, Gerdes P, Sanchez-Luque FJ, Nicoli P, Kindlova M, Ghisletti S, Santos AD, Rapoud D, Samuel D, Faivre J, Ewing AD, Richardson SR, Faulkner GJ. L1 retrotransposition is a common feature of mammalian hepatocarcinogenesis. Genome Research 28:639653 (2018). DOI: 10.1101/gr.226993.117, PMID: 29643204
- Schultz DC, Ayyanathan K, Negorev D, Maul GG, Rauscher FJ. SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes &amp; Development 16:919932 (2002). DOI: 10.1101/gad.973302, PMID: 11959841
- Semba K, Araki K, Matsumoto K, Suda H, Ando T, Sei A, Mizuta H, Takagi K, Nakahara M, Muta M, Yamada G, Nakagata N, Iida A, Ikegawa S, Nakamura Y, Araki M, Abe K, Yamamura K. Ectopic expression of Ptf1a induces spinal defects, urogenital defects, and anorectal malformations in Danforth's short tail mice. PLOS Genetics 9:e1003204 (2013). DOI: 10.1371/journal.pgen.1003204, PMID: 23436999
- Sripathy SP, Stevens J, Schultz DC. The KAP1 corepressor functions to coordinate the assembly of de novo HP1-demarcated microenvironments of heterochromatin required for KRAB zinc finger protein-mediated transcriptional repression. Molecular and Cellular Biology 26:86238638 (2006). DOI: 10.1128/MCB.00487-06, PMID: 16954381
- Thomas JH, Schneider S. Coevolution of retroelements and tandem zinc finger genes. Genome Research 21:18001812 (2011). DOI: 10.1101/gr.121749.111, PMID: 21784874
- Thompson PJ, Macfarlan TS, Lorincz MC. Long terminal repeats: from parasitic elements to building blocks of the transcriptional regulatory repertoire. Molecular Cell 62:766776 (2016). DOI: 10.1016/j.molcel.2016.03.029, PMID: 27259207
- Treger RS, Pope SD, Kong Y, Tokuyama M, Taura M, Iwasaki A. The lupus susceptibility locus Sgp3 encodes the suppressor of endogenous retrovirus expression SNERV. Immunity 50:334347 (2019). DOI: 10.1016/j.immuni.2018.12.022, PMID: 30709743
- Vlangos CN, Siuniak AN, Robinson D, Chinnaiyan AM, Lyons RH, Cavalcoli JD, Keegan CE. Next-generation sequencing identifies the Danforth's short tail mouse mutation as a retrotransposon insertion affecting Ptf1a expression. PLOS Genetics 9:e1003205 (2013). DOI: 10.1371/journal.pgen.1003205, PMID: 23437000
- Wang J, Xie G, Singh M, Ghanbarian AT, Raskó T, Szvetnik A, Cai H, Besser D, Prigione A, Fuchs NV, Schumann GG, Chen W, Lorincz MC, Ivics Z, Hurst LD, Izsvák Z. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516:405409 (2014). DOI: 10.1038/nature13804, PMID: 25317556
- Wolf D, Hug K, Goff SP. TRIM28 mediates primer binding site-targeted silencing of Lys1,2 tRNA-utilizing retroviruses in embryonic cells. PNAS 105:1252112526 (2008). DOI: 10.1073/pnas.0805540105, PMID: 18713861
- Wolf G, Greenberg D, Macfarlan TS. Spotting the enemy within: targeted silencing of foreign DNA in mammalian genomes by the Krüppel-associated box zinc finger protein family. Mobile DNA 6:17 (2015a). DOI: 10.1186/s13100-015-0050-8, PMID: 26435754
- Wolf G, Yang P, Füchtbauer AC, Füchtbauer EM, Silva AM, Park C, Wu W, Nielsen AL, Pedersen FS, Macfarlan TS. The KRAB zinc finger protein ZFP809 is required to initiate epigenetic silencing of endogenous retroviruses. Genes &amp; Development 29:538554 (2015b). DOI: 10.1101/gad.252767.114, PMID: 25737282
- Yamauchi M, Freitag B, Khan C, Berwin B, Barklis E. Stem cell factor binding to retrovirus primer binding site silencers. Journal of Virology 69:11421149 (1995). DOI: 10.1128/JVI.69.2.1142-1149.1995, PMID: 7529329
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS). Genome Biology 9:R137 (2008). DOI: 10.1186/gb-2008-9-9-r137, PMID: 18798982