Algorithms for High Throughput Sequencing
High throughput sequencing technologies are generating a wealth of sequence data. These technologies are often used to obtain sequences from DNA or RNA samples and perform computation and analysis of digital gene expression values. However, these short sequence “reads” must be aligned to an available reference genome before generating such values. An ongoing research project in the Perkins lab is to study sequence mapping and design improved algorithms for mapping transcribed sequences.
Data Integration Using Biological Ontologies
A number of ontologies have been developed to describe different aspects of biological problems. These include the Gene Ontology for describing the function, processes, and location of gene products, anatomy ontologies, and developmental stage ontologies. Most data mining approaches have focused on discovery of patterns in a single ontology. We are developing new algorithms for integrating data from multiple ontologies and finding patterns in the data at multiple levels of abstraction.
Visualization of Multiple Sequence Alignments
Projects focus around improving MSAVis, a multiple sequence alignment visualization systems previously developed. MSAVis has several feasible extensions that can be tackled in parallel by dedicated students. This would involve addition of features to edit protein sequences adding functionality to integrate protein features such as binding site or secondary structure information. Another project would involve implementing a Web interface for MSAVis.
Functional Genomics in Developmental Biology
Research projects in functional genomics are aimed at identifying molecular mechanisms regulating early mammalian embryonic development using functional genomics approaches. Projects will be available in reproductive biology as it relates to mammalian germ cells and developmental biology as well as molecular genetics of mouse and cattle. Specifically, we will ask fundamental questions on regulation of preimplantation embryogenesis to conduct hypothesis driven research.
Repeat Analysis of S. pneumonia
In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Streptococcus pneumoniae (pneumococcus), a Gram-positive human pathogen, is the most common cause of community-acquired pneumonia and a leading cause of meningitis, sinusitis, chronic bronchitis, and otitis media. Projects will involve utilizing open source tools like REpseek and TRedD with the S. pneumoniae TIGR4 genome and will reconcile the expression of the repeat regions identified by these tools and the expression based on genomic tiling.
Analysis and Application of Transposable Elements
These projects focus upon the analysis and application of transposable elements (TE) in a variety of taxa. Transposable elements are DNA sequences in a genome with the ability to move about in and/or make copies of themselves. They are ubiquitous in eukaryotes, sometimes making up as much as 50% or more of the genome mass. Currently, we are investigating TE dynamics in bats, crocodilians, flies, primates, lizards, and rodents. The specific objectives in each case vary. Research projects are ongoing in the areas of bats, crocodilians, and flies. Students would be involved in the analysis of large amounts of sequence data from selected genomes and would be as asked to use a variety of software packages to identify known transposable element families or to query genomes for TE identification de novo.
Computational Identification of Reassortment and
Recombination in H1N1 Influenza
To September 27, 2009, a novel swine-origin H1N1 strain has been spread globally and causing the ongoing pandemics, which has caused more than 340,000 laboratory confirmed cases and 4,100 deaths globally (www.who.int). As a negative strand RNA virus, influenza A virus has been notorious for its rapid mutations and frequent reassortments. Two major concerns are raised for this pandemic: (a) rapid mutations could cause the current vaccine to be ineffective; (b) the potential reassortments of H1N1 with other influenza viruses. In this project, a novel computational method for rapid reassortment and recombination identification is proposed based on partially supervising fuzzy clustering. In this algorithm, we will center on a cost function optimization schema that utilizes statistical properties from the data population, which could serve as the statistical assessment of the genotype.