REU Faculty Past Projects

2021

Gary Benson
Departments of Biology and Computer Science

Genetic variation and linkage to phenotype

Ethan Deyle
Department of Biology

Quantifying cross-scale interaction in complex natural systems

Josee Dupuis
Department of Biostatistics

Fine-mapping of genetic loci for quantitative traits

W. Evan Johnson
Departments of Biostatistics and Medicine

Profiling human microbial communities

Jennifer Bhatnagar
Department of Biology

Ecological forecasting: Predicting changes in soil microorganisms

Michael Dietze
Department of Earth and Environment

Near-term ecological forecasting

Joshua Campbell
Department of Computational Biomedicine

Single cell transcriptomics

Sarah W. Davies
Department of Biology

Genes and pathways regulating symbiosis in corals

Karen Allen
Department of Chemistry

Determining modes of protein-membrane interaction

Prasad Patil
Department of Biostatistics

Multi-Study Feature Selection

Chunyu Liu
Department of Biostatistics

Genetic and Life Style Factors for Complex Phenotypes

Cynthia Bradham
Department of Biology

Identifying Cell-Types Across Treatments in Single-cell RNA Sequencing Data

2019

Joshua Campbell
Department of Medicine
Division of Computational Biomedicine

Single Cell Sequencing Analysis

Sarah Davies
Department of Biology

Coral symbiosis and Climate Change

W. Evan Johnson
Departments of Medicine and Biostatistics

Tools and software for profiling microbial communities in multiple human diseases

Paola Sebastiani
Department of Biostatistics

Biomarkers of Healthy Aging

Jennifer Bhatnagar
Department of Biology

Predicting changes in the Earth microbiome; near-term ecological forecasting of critical soil microorganisms

Andrew Emili
Department of Biochemistry

Biomolecular Interactomes

2018

Gary Benson
Departments of Biology and Computer Science

Algorithms for detecting genetic variation

Michael Dietze
Department Earth & Environment

Near-term forecasting of ecological processes: integrating multiple data streams and models

Kirill Korolev
Department of Physics

Temporal and spatial variation of microbiota in inflammatory bowel disease

Paola Sebastiani
Department of Biostatistics

Biomarkers of Healthy Aging

Jennifer Bhatnagar
Department of Biology

Predicting changes in the Earth microbiome; near-term ecological forecasting of critical soil microorganisms

Joe Zaia
Department of Biochemistry

Glycan Abundance Imputation through Clustering and Machine Learning

High-throughput differential gene expression database

2016

Daniel Segre, PhD

Synthetic ecology of microbes

Synthetic ecology of microbes is a young, fast-developing research area concerned with the design, construction and understanding of engineered microbial consortia. The idea of designing microbial consortia is inspired by the ubiquitous presence of microbial communities on our planet, and the key role that these communities play in many aspects of human life, including biogeochemical cycles, animal and plant physiology, and metabolic engineering. Synthetic ecology may allow us to perform specific tasks by understanding and embracing – rather than avoiding – properties that seem often inherent in the natural microbial world, such as diversity, resilience, competition for resources, division of labor, and obligate interdependence. Moreover, an engineered community of organisms may perform tasks that no individual species could possibly perform on its own. In recent years, the Segrè lab has pioneered new computational methods for studying metabolic dynamics in natural and engineered microbial ecosystems, based on the knowledge of all metabolic reactions encoded in an organism’s genome.

COMETS-strip-636x152

One of these approaches, an open source platform for the Computation of Microbial Ecosystems in Time and Space (COMETS), has been successfully tested on small synthetic communities. The specific project suggested for this summer program will involve simulating computationally and testing experimentally the interaction between different bacterial species on a Petri dish. In particular, based on available literature data on co-occurring species in natural microbial communities, and on predictions previously generated in the Segrè lab, the student will select two or more species to use in the experiments. COMETS simulations will predict the growth of the species on their own and in co-culture, including possible changes in colony morphology. The predictions will be compared with experimental measurements of colony growth on a Petri dish, using an established protocol for automated acquisition and analysis of images taken with a regular slide scanner connected to a computer and an Arduino microcontroller. Thus the student will have a chance to learn about genome-scale models of artificial microbial communities, and to test predictions using a simple but quantitative and instructive experimental setup. The project will have the potential to probe mechanistically putative inter-species microbial interactions, and can be easily extended to multiple organisms and conditions.

Gary Benson, PhD

Bit-Parallel Sequence Alignment Algorithms for Tandem Repeat Detection

Mark A. Kon, PhD

Work on copy number versus gene expression biomarkers in cancer

The use of copy number information in the analysis, separation, and prognostication for cancer subtypes is becoming a common approach in cancer biomarker analysis. However, the comparison of predictions arising from copy number biomarkers in tissue samples to those arising from gene expression information has had some difficulties. Among these is the fact that these two information subtypes are quite incompatible in their formats. Our group has developed a formatting procedure for copy number information that makes it similar in format to gene expression information. This will allow the importation of a large number of gene expression analysis tools to the study of copy number information, now in a parallel fashion. One of the goals of this project will be to analyze the application of both toolkits (gene expression and copy number) to subtyping and outcome prediction in cancer, in order to compare their effectiveness as well as find whether they synergize as predictive tools. A second goal will be to determine the methods’ usefulness in unsupervised learning. This involves the discovery of cancer subtypes from larger groups of cancer samples using clustering and related methods. The use of parallel data formats for both copy number and gene expression data may have some interesting implications for such subtyping.

This project will initially involve the student’s development of skills in implementing machine learning programs for prediction of subtypes and outcomes based on feature vectors involving genomic and or copy number information. Once the skills involving tools such as support vector machine and random forest have been developed, the student will be expected to apply these tools two predict outcomes of ovarian and other cancer classes based on the two information types. There will be a need for computational skills and for mathematical understanding of basic concepts. In addition to the application of machine learning to the prediction of cancer outcomes and identification of subtypes, the participant will be expected to attend the laboratory meetings of students/postdocs affiliated with the DeLisi group at BU, which is involved largely with uses of machine learning in computational biology.

Karen Allen, Ph.D.

HALOALKANOIC ACID DEHALOGENASE SUPERFAMILY (HADSF)

The HAD superfamily is a large enzyme family (~120,000 non-redundant sequences) of phosphotransferases (phosphomutases, ATPases and phosphatases) represented in all three kingdoms of life, and, within each cell, by numerous homologues (28 in E. coli; 35 in Salmonella typhimurium; 31 in Pseudomonas aeruginosa; 30 in Mycobacterium tuberculosis; 84 in Caenorhabdit karen-pic-400x300 is elegans; 169 in Arabidopsis thaliana; and 183 in human). Approximately 80-90% of the HAD superfamily members are phosphatases and it is estimated that 40% of the bacterial metabolome is comprised of phosphorylated compounds. Although the HADSF fold is dominant among eukaryotic and prokaryotic phosphatases, it has yet to be truly exploited for inhibitor discovery. Such inhibitors would be invaluable to discovery and study of metabolic pathways involving phosphorylated metabolites. This is in contrast to the phosphotyrosine phosphatase family of phosphatases, for which great progress has been made in inhibitor design and focused library-screening-based discovery. To date there are just two reports of HADSF phosphatase inhibitor discovery. We aim to focus on targeting the region of the HAD proteins responsible for phosphoryl-group binding (contributed by the catalytic domain). This may have the added benefit of producing a moregeneralizable phospho-mimetic, one of the “holy grails” of phosphoryl transfer.

apgm-17[D6]_GS-PEG6K-24_1x_50

In order to identify such a mimetic, we will leverage a number of atomic resolution (~1 Å) structures of HAD members liganded to transition-state analogues. These enzymes invariably form a trigonal bipyramid with the phosphoryl group together with an apical ligand from the nucleophilic aspartate in the phosphatase. The REU student will utilize this data to make a template molecular model of an inhibitor scaffold defined by hydrogen bond donors and acceptors which ignore the phosphorus atom itself. This model scaffold will then be utilized to mine databases of known binding fragments and inhibitors. The student will also utilize chemi-informatics and protein mapping algorithms in order to analyze the chemical diversity of “hits” and the match to the biophysical properties of the corresponding binding sites. Ultimately, the compounds will be tested experimentally on a set of HAD phosphotranferases for inhibitory activity and successful compounds will be studied for binding mode by obtaining X-ray crystal structures of the complexes with the HAD enzymes. Through these studies, students will gain exposure to chemi-informatic library searching and analysis, in silico docking and structural analysis (with the possibility of experimental kinetics and structure analysis).

Douglas Densmore, PhD

Living Computing Project (LCP)

The Living Computing Project (www.programmingbiology.org) investigates computing paradigms in living organisms. Specifically, it explores if digital, analog, memory, and communication concepts can be implemented in cellular environments. Understanding if quantitative approaches and standardized metrics can broadly be applied to these systems will help us develop the formalized mechanisms we can use to specify, design, and verify these systems. Solutions in medicine, materials, sensing, and manufacturing will be able to be more easily D.-Densmore-200x300 created, efficiently implemented, and broadly distributed if computing paradigms are found to be applicable.

The collection of 10 UROP students for the summer of 2016 will be aiding in this research. Specifically they will be involved in one of four efforts:

2016 Boston University Wet Lab iGEM Team – Four Students – These students will take basic DNA building blocks and assemble them into genetic circuits. These circuits will act as either digital or memory based computing elements. The students will then characterize these circuits to extract quantitative metrics related to their performance. This data will be archived physically and electronically along with the biological DNA information to begin to curate a library of computational components for the LCP. These components will be housed in the LCP Inventory of Composable Elements (ICE), the Synthetic Biology Open Language (SBOL) Stack, and BTSync (for flow cytometer data). This collection of information will be used to augment existing design software to predict the performance of future circuits and search for optimized designs. This project will require molecular biology skills and bioinformatics analysis. These students will be supervised by BME graduate student Divya Israni.

2016 Boston University Hardware iGEM Team – Three Students – These students will be creating a microfluidic design environment to automate the testing of genetic logic circuits. The fabrication, control, and data extraction for this platform will be automated with the use of software tools. The team will be creating a genetic system that interfaces with off the shelf sensors and hardware so that it can be quickly reconfigured for numerous environments and designs. It will consist of a set of input locations, intermediate locations, switch fabric, and output locations. This will allow for a generic device that is differentiated experiment by experiment. This project will involve embedded systems design, CNC milling, 3D printing, and software programming. These students will be supervised by ECE graduate student Ryan Silva.

Phagebook and CIDAR Software – Two Students – Synthetic biology software includes tools for specification, design, assembly, verification, and data management activities. CIDAR lab (www.cidarlab.org) has numerous software packages that need to be made either more robust, user friendly, or more widely tested. These include a design environment for functional specification and assembly of genetic circuits (Phoenix) as well as a social media platform for synthetic biology (Phagebook). This project will require web design, Java/Javascript, cloud computing, and database skills. These students will be supervised by ECE graduate student Prashant Vaidyanathan.

Living Computing Project Research Intern – One Student – This student will work on fundamental research questions related to models of computation in synthetic biology (e.g. state machines, data flow networks) and how they can be formalized and assigned to biological elements. This project will require computational interests, programming skills, and some computer science exposure. This student will be supervised by ECE Research Assistant Professor Dr. Swapnil Bhatia.

Jennifer Talbot, PhD

Understanding variation in microbial community composition in both space and time

New DNA sequencing technology has fundamentally transformed our understanding of microbial communities. We can now rapidly census the species composition of microbial communities in complex systems like soil, and relate them to the spatial and environmental factors that structure these communities. This technology has enabled us to test classic ecological the talbot-e1410557129684 ories about how microbial communities change across space (Fierer & Jackson, 2006). However, little work has characterized how microbial communities change over time (Shade et al., 2013). Filling this knowledge gap will increase our chances of accurately forecasting how microbial systems will respond to disturbance in the future. This is a critical need in Earth system science, because we are beginning to find that specific microbial species have unique activities in the cycling of elements and energy within the biosphere. Nevertheless, to date there is no work testing the relative importance of space vs. time in shaping microbial community composition and activity.

Approach and learning outcomes
We propose a meta-analysis approach to determining the time vs. space variation in microbial community composition. To do this, we will collect DNA sequence data from already identified publications that have resolution in both space and time. This sequence data comes from high-throughput sequencing platforms (e.g. 454-pyrosequencing, Illumina MiSeq runs) that generate Gb of sequence data for each sample set for a publication.

Once collected, this data will be analyzed for community composition using the QIIME bioinformatic pipeline (Caporaso et al., 2010) through the BU SCC. The community data will then be used to develop a statistical model that partitions the variance in community composition data on orthogonal axes of space and time.

We seek an REU student to collect DNA sequence data published online, work with the data through bioinformatics pipelines, and analyze the data using statistical software (R). The project will involve development of coding skills using Jupyter Notebooks and statistical training in analysis and visualization of microbial community composition data.

A successful project will result in training of:
1) Front-to-back analysis of large DNA sequence-based microbial community datasets;
2) Analysis and visualization of data using R software packages (e.g. ggplot2);
3) Communicating results in a presentation and draft of a manuscript by the end of the internship

REU Faculty Past Projects

2021

Genetic variation and linkage to phenotype

Quantifying cross-scale interaction in complex natural systems

Fine-mapping of genetic loci for quantitative traits

Profiling human microbial communities

Ecological forecasting: Predicting changes in soil microorganisms

Near-term ecological forecasting

Single cell transcriptomics

Genes and pathways regulating symbiosis in corals

Determining modes of protein-membrane interaction

Multi-Study Feature Selection

Genetic and Life Style Factors for Complex Phenotypes

Identifying Cell-Types Across Treatments in Single-cell RNA Sequencing Data

2019

Single Cell Sequencing Analysis

Coral symbiosis and Climate Change

Tools and software for profiling microbial communities in multiple human diseases

Biomarkers of Healthy Aging

Predicting changes in the Earth microbiome; near-term ecological forecasting of critical soil microorganisms

Biomolecular Interactomes

2018

Algorithms for detecting genetic variation

Near-term forecasting of ecological processes: integrating multiple data streams and models

Temporal and spatial variation of microbiota in inflammatory bowel disease

Biomarkers of Healthy Aging

Predicting changes in the Earth microbiome; near-term ecological forecasting of critical soil microorganisms

Glycan Abundance Imputation through Clustering and Machine Learning

2017

Data-Driven Methods for Automated Design of Lab on a Chip Devices

An integrated genomics information resources platform

Temporal and spatial variation of microbiota in inflammatory bowel disease

A structural map of the human genome

High-throughput differential gene expression database

2016

Synthetic ecology of microbes

Bit-Parallel Sequence Alignment Algorithms for Tandem Repeat Detection

Work on copy number versus gene expression biomarkers in cancer

HALOALKANOIC ACID DEHALOGENASE SUPERFAMILY (HADSF)

Living Computing Project (LCP)

Understanding variation in microbial community composition in both space and time