Overview of Projects for Summer 2021

Computational Approaches for Simulation and Analysis of Population Genomic Datasets - Dr. Dapper

Recombination, the physical exchange of DNA between homologous chromosomes during meiosis, is required for successful gametogenesis in most sexually reproducing species. Furthermore, recombination rate is a fundamental genomic parameter, shaping major features of the genomic landscapeDespite the central importance of understanding the causes and consequences of the ubiquitous variation in recombination rate, fundamental questions about its evolution remain unanswered. The Dapper lab aims to generate significant advances in our understanding of the evolution of recombination rate by bridging the current gap between theoretical and empirical approaches. One of the major areas of ongoing research in the lab focuses on developing computational approaches to model recombination rate as a quantitative trait with a complex genetic architecture. The power of this approach allows the generation of predictions that are directly testable using the plethora of empirical data currently being generated in a number of species.

In some species, including humans, meiotic recombination is concentrated in small genomic regions.  These “recombination hotspots” leave signatures in fine-scale patterns of linkage disequilibrium.  Researchers have developed computational approaches that utilize this correlation to characterize fine-scale recombination rate across the genome. These approaches have led to the important inference that hotspots evolve quickly in some species, but are conserved in others.  However, Dapper’s results suggest that by ignoring demographic history, these population genomic approaches likely overestimate the power to detect hotspots and therefore underestimates the degree of hotspot sharing between species [2]. This observation motivates the re-analysis of previous conclusions about the rate of evolution of recombination hotspots. One simple, but illuminating approach, is to simulate population genomic data under demographic histories that match populations of interest and measure the accuracy of hotspot detection using different programs. This project will introduce REU students to cutting-edge, but accessible, computational approaches for both simulating and analyzing population genomic datasets. Students will also have the opportunity to choose a focal population or species that is of interest to them, such as humans or chimpanzees.

Bioinformatics Analysis of Transcription Fidelity - Dr. Gout

The overarching research interests in the Gout lab revolve around understanding how selective pressures shape fundamental cellular processes such as replication, transcription and splicing. A particular interest is investigating the selective pressures operating on the fidelity of transcription. To this end, Gout recently contributed to the adaptation of the circle-sequencing assay into the first method to efficiently and accurately measure transcription error rates in a genome-wide scale. Gout has developed the bioinformatics tools necessary to analyze this new type of sequencing data and applied these tools to investigate the transcription error rate in several organisms, including Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. While differences in the fidelity of replication (mutation rate) between these organisms are well characterized and explained by theoretical work grounded in the principles of population genetics (the drift-barrier hypothesis), almost nothing was known about variation in the fidelity of transcription before our pioneering work. Surprisingly, Gout found that the initial predictions of the drift-barrier hypothesis regarding the evolution of transcription fidelity are not supported by the data that we have generated from these three organisms, suggesting that the selective pressures operating on transcription error rates are more complex than that on DNA mutation rates. A current interest is investigating the variation in transcription fidelity across the entire tree of life and deciphering the molecular mechanisms responsible for this variation.

In collaboration with Dr. Vermulst’s laboratory at University of Southern California, large amounts of new genomics data (sequencing of libraries made with the circle-sequencing assay) is being generated in several organisms. Analyzing these data sets and exploring in greater details the genomics feature that impact the fidelity of transcription represents an enormous amount of computational work which can easily be divided into sub-projects that students can chose from. For example, a student could choose to focus on investigating the possible impacts of gene expression level on the fidelity of transcription. To do so, they would cross the specific data generated in the Gout and Vermulst labs with generic expression level data available from public databases.

Bioinformatics Approaches in Evolutionary Genomics - Dr. Hoffmann

The Hoffmann lab is broadly interested in evolutionary genomics and molecular evolution. An overriding theme is to better understand the connection between the emergence of novel genes and the origins of biological innovations. Relating to this theme, the lab 1) explores the different mechanisms involved in the origin of new genes, 2) assesses the forces underlying the retention and functional variation of these genes, and 3) works to gain insight into the processes underlying intra- and inter-specific variation in the number and nature of genes in animal genomes. Current projects include studies of the evolution of animal gene families, the emergence of novel genes via gene and genome duplication, functional variation among paralogous members of a gene family, the evolution of small RNA repertoires, and the interplay between transposable elements and small RNAs. Pursuit of these questions uses an integrative approach that involves combining bioinformatics and evolutionary genomics with perspectives from other disciplines such as molecular population genetics, cellular and structural biology, protein biochemistry and animal physiology that are brought by collaborators.

Specifically, at present there are four major areas of interest in the lab: 1) explore how the whole genome duplications early in the history of this have shaped the repertoire of genes involved in cellular processes in vertebrates, with an emphasis on genes involved in cancer, and in recombination and cell differentiation, 2) explore variation in the number and kind of hemoglobin genes in the genomes of animals and how this might be related to changes in respiratory physiology, 3) checking variation in genes involved in processing non-coding RNAs in butterflies, and 4) exploring signatures of adaptive evolution in genes associated with longevity in the long-lived butterflies of the genus Heliconius relative to short-lived butterflies. The last two projects are conducted in collaboration with Brian Counterman from the Department of Biological Sciences at Mississippi State University.

Interactive Visualization of High Throughput Genomic and Transcriptomic Sequence Data - Dr. Jankun-Kelly

Dr. Jankun-Kelly works in the area of visualization, where bioinformatics visualization is one application. In general, he studies the visualization process and how to help users get the most out of visualization. He has developed novel methods for visualization interfaces [3], interfaces for linked image browsing[4], models for visual exploration [5], and visual analysis tools for bioinformatics [6]. REU projects for bioinformatics will challenge students to work together with biologists to solve complex problems via interactive computer graphics. While two examples of such projects that have been worked on by REU students are given below, actual projects will be determined in collaboration with application scientist and the student.

The Jankun-Kelly lab has developed MSAVis, which visually fuses multiple sequence alignments with depictions of the conserved domains over the individual proteins. For each conserved domain, an alignment block is shown indicates where the conserved domain exists for the given protein. As it stands, MSAVis does not allow editing of protein sequences to test different alignment hypotheses; this is a feature of interest to its users. A student would add this functionality which would involve modifying MSAVis’ interaction mechanisms and integrating it with sequence alignment software. There are additional protein features that could be integrated such as binding sites or information about secondary structure. Such a project would involve designing the visual metaphors for the added information and designing the interface to query the biological databases to extract them. Another tool, Gene Atlas, allows the efficient comparison of multiple gene expression samples (usually from species at different times in their life cycle) to be compared efficiently. Additional interaction methods and visual metaphors could be explored by REU students to make this a tool with genuine impact on biological studies. It can also be applied to different genomic domains.

Environmental Modulation of Mycolactone Gene Expression - Dr. Jordan

Dr. Jordan’s research areas include microbial ecology, transmission and pathogenesis of the environmental pathogen, Mycobacterium ulcerans. What we do not know include how the organism is transmitted to humans, and under what environmental circumstances lead to the production of mycolactone, a lipid toxin and sole virulence determinant of M. ulcerans. These gaps in the knowledge base is important because M. ulcerans infection leads to a devastating skin disease known as Buruli ulcer that impacts at least 33 countries with highest incidence in rural West Africa.

Potential project(s) for the REU fellows include: 1. Environmental screening for the presence and abundance of M. ulcerans among aquatic samples collected from French Guiana. The objective of this work is to determine presence and abundance among environmental samples collected from aquatic habitats known to have high MU diversity in order to test the hypothesis that M. ulcerans resides and replicates within a specific niche within the aquatic habitat, and that its toxin, mycolactone has evolved to allow persistence in such environments. In order to test this hypothesis, DNA will be isolated from preserved samples that have been collected from aquatic habitats from French Guiana, South America. The isolated DNA will be subjected to semi-quantitative and quantitative PCR targeting M. ulcerans.  Positive samples will be strain typed using Variable Number Tandem Repeat Profiling and verified by amplicon sequencing and comparison against the BLAST database. We expect specific aquatic matrices (such as water filtrand, soil, or invertebrates) to be positive for M. ulcerans. We also expect diversity to be higher among head waters. 2. Impact of salinity on mycolactone gene expression. The objective of this work is to determine whether there is modulation of mycolactone gene expression and production when subjected to increasing concentrations of salt, as administered in media. This objective will test the hypothesis that expression of genes responsible for mycolactone production is upregulated as a stress response. In order to test this hypothesis, M. ulcerans replicates will be grown to exponential phase then placed into petri plates and subjected to 0, 0.5. 1.5, 3.5 and 5% salt amended media. The bacteria will be collected and serially diluted for plating to determine salt concentration impact on M. ulcerans growth.  Additionally, RNA will be isolated from the bacteria and, following isolation and verification of RNA integrity, converted to cDNA for RT-PCR targeting genes responsible for mycolactone production.  Modulation of gene expression will be analyzed using computation software in the R package. We expect mycolactone to be upregulated upon increased salt exposure, but that growth will be inversely related. Data from both projects will be valuable for assessing the environmental niche of M. ulcerans, determining the mode of transmission of the pathogen to people, and conditions for mycolactone production.  Additionally, methods described will allow the student to obtain or develop skills of molecular biology, data management and data interpretation.

Using Bioinformatics to Understand Polyamine-Mediated Regulation of Capsule in Pneumococci - Dr. Nanduri

Research in the Nanduri lab focuses on understanding the intersection of polyamine metabolism and bacterial virulence. Polyamines are ubiquitous small cationic molecules that are important for virulence of human pathogens including pneumococcus, and their intracellular concentrations are tightly regulated by synthesis, and transport. The Nanduri lab has shown that deletion of either the polyamine transport operon (ΔpotABCD) or the biosynthesis gene for spermidine (ΔspeE, spermidine synthase)  or cadaverine biosynthesis (ΔcadA, lysine decarboxylase) result in an attenuated phenotype in murine models of colonization, pneumonia and sepsis [7]. Recombinant PotD protein has been shown to be protective in active and passive immunization studies. The lab has also reported that deletion of cadA results in the loss of capsule [8], an important virulence factor in Spn that is required for evading host phagocytosis. Polyamine metabolism genes are conserved among pneumococci, consequently, polyamine metabolism is an attractive anti-virulence target for drug discovery. The Nanduri lab routinely uses functional genomics such as proteomics, RNA-Seq and metabolomics to identify pneumococcal pathways responsive to altered polyamine metabolism.

Students will participate in research to identify polyamine mediated regulation of capsule in pneumococci and help understand the impact of impaired polyamine metabolism (i.e. transport and synthesis on pneumococcal adaptation to various stressors such as acid, temperature, oxidative stress or nutritional stress) using functional genomics. Analysis of each type of functional genomics data (proteomics, RNA-seq and untargeted metabolomics), for biological interpretation will be carried out by utilizing a number of open source bioinformatics databases such as KEGG, STRING, DAVID, BioCyc. The Nanduri lab will provide exposure to basic microbiology, molecular biology and biochemistry methods. REU students will also become familiar with standard tools for interpreting the function of lists of proteins, genes, and metabolites. We expect the trainees to expand that breadth of bioinformatics databases being used to include additional resources such as TIGRFAMs, COGs, and transcription factor databases in conjunction with integrated multi-omics analysis for biological discovery.

Computational approaches to study biological networks - Dr. Popescu

The Popescu lab focuses on computational approaches to study biological networks and processes in order to understand evolution and dynamics at the molecular and cellular levels. Using system analysis methods, the lab seeks to analyze the properties of biochemical networks, infer plant protein interactions and to study the dynamics of signaling networks. Popescu designs methods for inference of gene regulatory networks by integrating expression variation, transcription factor binding and interactome data with predictions from comparative analysis of conserved sequences of several plant genomes. This includes developing new analytical tools to infer cis-regulatory networks from conserved sequences, to identify control structures (network motifs) from genomics and proteomics data and to study perturbation of cellular dynamics associated with copy number variations. Another current research direction focuses on gaining a predictive understanding of cellular decisions during a plant's response to stress. Using high-throughput assays, we have recently identified key proteins involved in plant's response to biotic and abiotic stress, including peptidases, trafficking proteins and transcription factors involved in immune responses.

A key research direction in the lab focuses on identification and analysis of plant functional, transcriptional and physical interaction networks. The major goals of the project are: 1. identify the components of signal-processing networks in newly sequenced crop plants; 2. develop new methods and algorithms to predict conserved pathways and to perform comparative analyses of signal-processing networks structure; 3. develop methods and algorithms for modeling and simulation analysis of signaling dynamics. Among the research projects targeted at undergraduate research are: 1) Plant gene regulatory networks evolutionary analysis; the goal of this project is to develop methods for gene regulatory networks evolutionary analysis. This work is expected to yield DNA sequence motifs that act as transcription factor binding sites, participate in regulatory networks, and are conserved across species. 2) Comparative analysis of plant genomes and gene networks; the goal of this project is to develop methods for comparative genomics analysis, functional analysis and network analysis for polypoid plant genomes. 3) From Phenotypes to Systems Modeling; the goal of this project is to develop a systems biology model for plant stress response cellular circuits.

High Throughput Omics Approaches to Elucidate Disease Resistance Mechanisms in Maize - Dr. Warburton

The Corn Host Plant Resistance Research Unit of the USDA Agriculture Research Service seeks to identify and use natural allelic diversity in maize to resist insect and fungal pathogens. The lab has extensive experience in the identification of maize genetic sequences associated with resistance to lepidopteran insects and aflatoxin accumulation, which now allow breeders to improve maize performance under biotic stress. Genomics, transcriptomics, proteomics, and metabolomics are used to identify genes, pathways, and mechanisms of resistance against fungal and insect attack.  Lab members have extensive genetic mapping, pathway, and differential expression/abundance analysis experience. The information gained leads to molecular markers that have been used for marker assisted selection and improvement of resistance in practical breeding programs; the next step will be the improvement of maize genotypes via gene editing.

In the course of our genetic analyses, the lab generates very large amounts of sequencing data. This data ranges from high coverage but low depth (Genotype by Sequencing data, GBS) to low coverage and high depth (one gene sequenced multiple times in multiple individuals). In addition, expression sequencing data (RNAseq) and qRT-PCR data is being generated to analyze gene expression differences between treated and untreated genotypes. This data must all be stored, retrieved and analyzed as efficiently as possible, and re-analyzed as new information comes to light. The large scale sequencing data generated, usually by core facilities, must be aligned and analyzed before conclusions can be drawn. The data storage, retrieval, and analysis is computationally intensive, and REU participants can design and implement efficient solutions. In addition, new reference sequences are being generated for maize and the pests being studied (A. flavus, corn ear worm, fall army worm, and others), as well as model plant and animal organisms. Re-analyzing data in light of new sequences and gene ontologies is laborious and the development of automated pipelines for analysis and interpretation are ideal tasks for undergraduate students.

Leveraging RNAseq and RADseq Data to Elucidate Population Processes Across Landscapes - Dr. Welch

Dr. Welch is an evolutionary geneticist with two distinct research foci. Transcriptional variance within and among populations is being studied using the annual sunflower, Helianthus annuus, and RNAseq based methods. He also investigates the genetic dynamics of small populations using Caribbean iguanas as a model system. Projects for undergraduates will be designed to both generate usable data and serve as complete introductions to hypothesis driven research. Recent efforts involving sunflowers employed an REU student Melissa Wood to map RNAseq data from 95 individuals to the recently assembled sunflower genome. She then tested the hypothesis that populations at different latitudes are locally adapted by estimating gene expression divergence between the samples. Future projects for REU students in this vein will focus on identifying the specific genes and pathways that exhibit heritable variation in gene expression.

A major aim of the Welch lab is understanding the role that inbreeding depression plays in regulating the population dynamics of small populations. Caribbean iguana populations with exceptionally low migration rates are ideal for this work because these populations show evidence of long-term stability. Until recently, microsatellites have been the marker of choice for these studies. Moving forward, we will be developing genomic tools for this work, and assaying variation using techniques such as RADseq. We recently published a complete mtDNA genome for Iguana delicatissima. An undergraduate at UNC Asheville is even lead author on that study. This work will provide ideal opportunities for future REU students interested in computational biology.