Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologie... more Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
SUMMARY◻The order Caryophyllales exhibits complex pigment evolution, with mutual exclusion of ant... more SUMMARY◻The order Caryophyllales exhibits complex pigment evolution, with mutual exclusion of anthocyanin and betalain pigments. Given recent evidence for multiple shifts to betalain pigmentation, we re-evaluated potential mechanisms underpinning the exclusion of anthocyanins from betalain-pigmented lineages.◻We examined the evolution of the flavonoid pathway using transcriptomic and genomic datasets covering 309 species in 31 families. Orthologs and paralogs of known flavonoid synthesis genes were identified by sequence similarity, with gene duplication and gene loss inferred by phylogenetic and syntenic analysis. Relative transcript abundances were assessed to reveal broad-scale gene expression changes between betalain- and anthocyanin-pigmented lineages.◻Most flavonoid genes are retained and transcribed in betalain-pigmented lineages, and many also show evidence of extensive gene duplication within betalain-pigmented lineages. However, expression of several flavonoid genes is red...
Background: Flavonoids and carotenoids are pigments involved in stress mitigation and numerous ot... more Background: Flavonoids and carotenoids are pigments involved in stress mitigation and numerous other processes. Both pigment classes can contribute to flower and fruit coloration. Carotenoids and flavonoid aglycons are produced by a pathway that is largely conserved across land plants. Glycosylations, acylations, and methylations of the flavonoid aglycones can be species-specific and lead to a plethora of biochemically diverse flavonoids. We previously developed KIPEs for the automatic annotation of biosynthesis pathways and presented an application on the flavonoid aglycone biosynthesis. Findings: KIPEs3 is an improved version with additional features and the potential to identify not just the core biosynthesis players, but also candidates involved in the decoration steps and in the transport of flavonoids. Functionality of KIPEs3 is demonstrated through the analysis of the flavonoid biosynthesis in Arabidopsis thaliana Nd-1, Capsella grandiflora, and Dioscorea dumetorum. We demons...
The development of antibody therapies against SARS-CoV-2 remains a challenging task during the on... more The development of antibody therapies against SARS-CoV-2 remains a challenging task during the ongoing COVID-19 pandemic. All approved therapeutic antibodies are directed against the receptor binding domain (RBD) of Spike and lost neutralization efficacy against continuously emerging SARS-CoV-2 variants, which especially mutate in the RBD region. Previously, phage display has been used to identify epitopes of antibody responses against several diseases. Such epitopes have been applied to design vaccines or neutralizing antibodies. Here, we constructed an ORFeome phage display library for the SARS-CoV-2 genome. Open reading frames (ORFs) representing the SARS-CoV-2 genome were displayed on the surface of phage particles in order to identify enriched immunogenic epitopes from COVID-19 patients. Library quality was assessed by both, NGS and epitope mapping of a monoclonal antibody with known binding site. The most prominent epitope captured represented parts of Spike´s fusion peptide (...
Flavonoids are a biochemically diverse group of specialized metabolites in plants that are derive... more Flavonoids are a biochemically diverse group of specialized metabolites in plants that are derived from phenylalanine. While the biosynthesis of the flavonoid aglycone is highly conserved across species and well characterized, numerous species-specific decoration steps and their relevance remained largely unexplored. The flavonoid biosynthesis takes place at the cytosolic site of the endoplasmatic reticulum (ER), but accumulation of various flavonoids was observed in the central vacuole. A universal explanation for the subcellular transport of flavonoids has eluded researchers for decades. Current knowledge suggests that a glutathione S-transferase-like protein (ligandin) protects anthocyanins and potentially proanthocyanidin precursors during the transport to the central vacuole. ABCC transporters and to a lower extend MATE transporters sequester anthocyanins into the vacuole. Glycosides of specific proanthocyanidin precursors are sequestered through MATE transporters. A P-ATPase i...
Arabidopsis thaliana is currently the most important plant model organism and therefore frequentl... more Arabidopsis thaliana is currently the most important plant model organism and therefore frequently used to investigate processes, which are more complex in other plants. The A. thaliana Columbia-0 (Col-0) genome sequence was the first available one of all plants [1] and comes with a high quality annotation [2]. Despite the use of numerous A. thaliana accessions in research projects, no other genome sequence of this species was available for a long time. Pan-genomic investigations were restricted to re-sequencing studies mainly limited by the available sequencing capacities. This hampered the discovery of large structural variants and investigations of genome evolution. Substantial technological progress during the last years made sequencing and de novo assembly of plant genomes feasible-even for single research groups. Since genes are determining the phenotype of a plant species, they are often the focus of genome sequencing projects. One major challenge during the prediction of protein encoding genes is the accurate detection of splice sites. Although terminal dinucleotides in introns are well conserved on the genomic level with GT at the 5'-end and AG at the 3'-end, there are a few reports about some rare variations [3,4]. Because of the extremely high number of possible gene models when considering splice site combinations besides this canonical GT-AG combination, ab initio gene prediction cannot identify non-canonical splice site combinations. Objectives of this work were i) the generation of a high quality A. thaliana Niederzenz-1 (Nd-1) genome sequence assembly with a corresponding annotation and comparison against the Col-0 reference genome sequence, ii) investigation of non-canonical splice sites in A. thaliana, and iii) transfer of methods and knowledge about splice sites to the investigation of non-canonical splice sites across annotated plant genome sequences.
In this work we present new concepts of VANESA, a tool for modeling and simulation in systems bio... more In this work we present new concepts of VANESA, a tool for modeling and simulation in systems biology. We provide a convenient way to handle mathematical expressions and take physical units into account. Simulation and result management has been improved, and syntax and consistency checks, based on physical units, reduce modeling errors. As a proof of concept, essential components of the aerobic carbon metabolism of the green microalga Chlamydomonas reinhardtii are modeled and simulated. The modeling process is based on xHPN Petri net formalism and simulation is performed with OpenModelica, a powerful environment and compiler for Modelica. VANESA, as well as OpenModelica, is open source, free-of-charge for non-commercial use, and is available at: http://agbi.techfak.uni-bielefeld.de/vanesa.
Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in gre... more Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that was established about 25 years ago. Here, we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Columbia-0 (Col-0) reference sequence were detected. The number of deletions exceeds the number of insertions, thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions, e.g., the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.
Once a suitable reference sequence is generated, genomic differences within a species are often a... more Once a suitable reference sequence is generated, genomic differences within a species are often assessed by re-sequencing. Variant calling processes can reveal all differences between two strains, accessions, genotypes, or individuals. These variants can be enriched with predictions about their functional implications based on available structural annotations. Although these predictions on a per variant basis are often accurate, some challenging cases require the simultaneous incorporation of multiple adjacent variants into this prediction process. Examples are neighboring variants which modify each others' functional impact. Neighborhood-Aware Variant Impact Predictor (NAVIP) considers all variants within a given protein coding sequence when predicting the functional consequences. NAVIP is freely available on github: https://github.com/bpucker/NAVIP.
Recent progress in sequencing technologies facilitates plant science experiments through the avai... more Recent progress in sequencing technologies facilitates plant science experiments through the availability of genome and transcriptome sequences. Genome assemblies provide details about genes, transposable elements, and the general genome structure. The availability of a reference genome sequence for a species enables and supports numerous wet lab analyses and comprehensive bioinformatic investigations e.g. genome-wide investigations of gene families. After generating a genome sequence, gene prediction and the generation of functional annotations are the major challenges. Although these methods were improved substantially over the last years, incorporation of external hints like RNA-Seq reads is beneficial. Once a high-quality sequence and annotation is available for a species, diversity between accessions can be assessed by re-sequencing. This helps in revealing single nucleotide variants, insertions and deletions, and larger structural variants like inversions and transpositions. Identification of these variants requires sophisticated bioinformatic tools and many of them were developed during past years. Sequence variants can be harnessed for the genetic mapping of traits. Several mapping-by-sequencing approaches were developed to find underlying genes for relevant traits in crops. These genomic approaches are complemented by various transcriptomic methods dominated by a very popular RNA-Seq technology. Transcript abundance is measured via sequencing of the corresponding cDNA molecules. RNA-Seq reads can be subjected to transcriptome assembly or gene expression analysis, e.g. for the identification of transcripts abundance between different tissues, conditions, or genotypes.
The flavonoid biosynthesis is a well characterised model system for specialised metabolism and tr... more The flavonoid biosynthesis is a well characterised model system for specialised metabolism and transcriptional regulation in plants. Flavonoids have numerous biological functions like UV protection and pollinator attraction, but also biotechnological potential. Here, we present Knowledge-based Identification of Pathway Enzymes (KIPEs) as an automatic approach for the identification of players in the flavonoid biosynthesis. KIPEs combines comprehensive sequence similarity analyses with the inspection of functionally relevant amino acid residues and domains in subjected peptide sequences. Comprehensive sequence sets of flavonoid biosynthesis enzymes and knowledge about functionally relevant amino acids were collected. As a proof of concept, KIPEs was applied to investigate the flavonoid biosynthesis of the medicinal plantCroton tigliumbased on a transcriptome assembly. Enzyme candidates for all steps in the biosynthesis network were identified and matched to previous reports of corres...
The 'big data revolution' has enabled novel types of analyses in the life sciences, facil... more The 'big data revolution' has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the challenges, limitations and risks associated with it. Due to the prominence, abundance and wide distribution of sequencing results, we focus on the reuse of publicly available sequence datasets. Through selected examples of successful reuse of different data (genome, transcriptome, proteome, metabolome, phenotype and ecosystem), with their respective limitations and risks, we illustrate the enormous potential of the practice. A checklist to determine the reuse value and potential of particular dataset is also provided.
BackgroundIn addition to the BAC-based reference sequence of the accession Columbia-0 from the ye... more BackgroundIn addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species.ResultsHere we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences.ConclusionsDetailed analys...
The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio ge... more The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11's information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via compar...
Fibrosis is a hallmark of adipose tissue (AT) dysfunction and obesity-associated insulin resistan... more Fibrosis is a hallmark of adipose tissue (AT) dysfunction and obesity-associated insulin resistance that results from an impaired collagen turnover. Peptidase D (PEPD) plays a vital role in collagen turnover by degrading proline-containing dipeptides. Nevertheless, its speci c function and importance in AT is unknown. GWAS identi ed the rs731839 variant in the locus near PEPD that uncouples obesity from insulin resistance and dyslipidaemia, thus indicating that defective PEPD might impair AT remodelling and exacerbate metabolic complications. Here we show that in human and murine obesity, PEPD expression and activity decrease in AT, coupled to the release of PEPD systemically. Both events, in turn, are associated with the accumulation of brosis in AT and insulin resistance. Using pharmacologic and genetic animal models of PEPD down-regulation, we show that whereas dysfunctional PEPD activity provokes AT brosis, it is the PEPD secreted by AT the main contributor to in ammation, insulin resistance and metabolic dysfunction. Also, PEPD originated in in ammatory macrophages (M), plays an essential role promoting bro-in ammatory responses via activation of EGFR in M and preadipocytes. Using genetic ablation of pepd in M that prevents obesity-induced PEPD release, also averts AT bro-in ammation and obesity-associated metabolic dysfunctions. Taking advantage of factor analysis, we have identi ed the coupling of prolidase decreased activity and increased systemic levels of PEPD as the essential pathogenic triggers of AT brosis and insulin resistance. Thus, PEPD produced by M quali es as a biomarker of AT bro-in ammation and a therapeutic target for AT brosis and obesityassociated insulin resistance and type 2 diabetes.
Belongs to the study 'A Partially Phase-separated Genome Sequence Assembly of the Vitis roots... more Belongs to the study 'A Partially Phase-separated Genome Sequence Assembly of the Vitis rootstock 'Börner' (_Vitis riparia_ x _Vitis cinerea_) and its Exploitation for Marker Development and Targeted Mapping' which contains additional information about this dataset.
Trifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally... more Trifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 hours after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9,941 non-coding RNA genes were predicted and functional annotations were assigned.
Grapevine breeding has become highly relevant due to upcoming challenges like climate change, a d... more Grapevine breeding has become highly relevant due to upcoming challenges like climate change, a decrease in the number of available fungicides, increasing public concern about plant protection, and the demand for a sustainable production. Downy mildew caused by Plasmopara viticola is one of the most devastating diseases worldwide of cultivated Vitis vinifera. In modern breeding programs, therefore, genetic marker technologies and genomic data are used to develop new cultivars with defined and stacked resistance loci. Potential sources of resistance are wild species of American or Asian origin. The interspecific hybrid of Vitis riparia Gm 183 x Vitis cinerea Arnold, available as the rootstock cultivar 'Börner,' carries several relevant resistance loci. We applied next-generation sequencing to enable the reliable identification of simple sequence repeats (SSR), and we also generated a draft genome sequence assembly of 'Börner' to access genome-wide sequence variations in a comprehensive and highly reliable way. These data were used to cover the 'Börner' genome with genetic marker positions. A subset of these marker positions was used for targeted mapping of the P. viticola resistance locus, Rpv14, to validate the marker position list. Based on the reference genome sequence PN40024, the position of this resistance locus can be narrowed down to less than 0.5 Mbp on chromosome 5.
Different Musa species, subspecies, and cultivars are currently investigated to reveal their geno... more Different Musa species, subspecies, and cultivars are currently investigated to reveal their genomic diversity. Here, we compare the Musa acuminata cultivar Dwarf Cavendish against the previously released Pahang assembly. Numerous small sequence variants were detected and the ploidy of the cultivar presented here was determined as triploid based on sequence variant frequencies. Illumina sequencing also revealed a duplication of a large segment of chromosome 2 in the genome of the cultivar studied. Comparison against previously sequenced cultivars provided evidence that this duplication is unique to Dwarf Cavendish. Although no functional relevance of this duplication was identified, this example shows the potential of plants to tolerate such aneuploidies.
Combined awareness about the power and limitations of bioinformatics and molecular biology enable... more Combined awareness about the power and limitations of bioinformatics and molecular biology enables advanced research based on high-throughput data. Despite an increasing demand for scientists with a combined background in both fields, the education in dry lab and wet lab is often separated. This work describes an example of integrated education with focus on genomics and transcriptomics. Participants learn computational and molecular biology methods in the same practical course. Peer-review is applied as a teaching method to foster cooperative learning of students with heterogeneous backgrounds. Evaluation results indicate acceptance and appreciation of this approach.
Uploads
Papers by Boas Pucker