2022 BRITE REU Faculty Projects

Ecological forecasting
Michael Dietze
Department of Earth and Environment

Background: For the past couple of years, we have been running daily forecasts of carbon and water fluxes across several field sites maintained by the National Ecological Observatory Network (NEON). These forecasts have an hourly resolution and 35 day duration, and thus for any point in time we compare observations to predictions made 1 to 35+ days in advance, with the aim of better understanding the predictability of the carbon and water cycles in general, and our ability to anticipate ecosystem stress in particular, as well as to test model assumptions and hypotheses.

Objective and Approach: This summer our goal is to assimilate new data streams (both remote sensing and tower collected) into the forecast system. To constrain our forecasts, we currently assimilate NASA’s MODIS LAI and the carbon and water fluxes from the NEON towers. This summer we aim to add some combination of forest biomass (from NASA’s GEDI lidar and ground measurements), soil moisture (both field and NASA SMAP), soil respiration, soil organic carbon, sapflux, tree growth (dendrometer, tree ring, inventory), or solar induced fluorescence (NASA OCO-2 and OCO-3), with the exact data streams prioritized dependent on student interest and project needs (expectation is that a student would tackle 1-2 data streams). Analysis would start at a single site, which would serve as the testbed for scaling up the assimilation to other sites we are forecasting. Adding data to the assimilation will allow us to assess the value of each data stream in constraining the different components of these cycles and how they affect the predictability of ecosystem processes. Other possible projects, beyond adding additional data constraints, include: (1) helping to develop an uncertainty partitioning analysis for the forecast system, (2) helping with site selection and workflow integration for scaling up the system from NEON to all of North America, (3) resurrecting our R Shiny portal for forecast dissemination, visualization, and analysis.

Genetic and life style factors for complex phenotypes
Chunyu Liu
Department of Biostatistics

The Liu lab develops statistical approaches and applies those methodologies to identify genetic and life style factors that influence complex phenotypes. Two projects are available:

1) Mitochondrial DNA (mtDNA) sequencing project: Mitochondria are power house in human cells. mtDNA is involves in the major pathway for power production. Students will have the opportunity to use publicly available software to identify mutations in the mitochondrial genome (mtDNA) from whole genome sequencing data in human. In addition, they will also have the opportunities to perform association analysis of the mtDNA mutations with cardiovascular disease.

2) Gene expression and alcohol consumption project: Gene expression is the process by which information in a gene is used to generate messenger RNA (mRNA) for protein production. Students will have the opportunity to identify genes that are related to alcohol consumption. In addition, students will explore gene pathway analyses to identify gene networks that are related to alcohol consumption and cardiovascular disease.

Spatial subcellular single-cell analysis
Ruben Dries
Department of Medicine, Computational Biomedicine

The Dries lab develops computational methods to better understand the role of spatial architecture in biological processes such as tissue regeneration and cancer. More specifically we investigate spatially coherent gene expression patterns, signaling pathways, and how the role of a single cell is defined by its neighboring cells. We have previously developed Giotto, which is a comprehensive toolbox to process, analyze and visualize spatial datasets. New cutting-edge spatial technologies are being developed that operate at the subcellular level and this project involves the re-analysis of such datasets. Spatial data analysis is a multi-faceted field and students will learn the first steps in image processing, spatial statistics, visualization, and R package development.

Coral symbiosis and environmental conditions
Sara Davies
Department of Biology

The Davies lab is interested in understanding how coral symbiosis and environmental conditions interact to govern coral physiology. Corals are known to host variable algal and bacterial communities that often translate to different levels of thermal tolerance, and these partnerships can vary across different environments. However, the relative roles that spatial and temporal environmental variability play in shaping these complex interactions between partners requires resolution. This project involves analyzing algal (ITS2) and bacterial (16S) symbiont communities from two species of Caribbean corals collected from Florida, Belize, and Panama. These data will be paired with satellite acquired environmental data (sea surface temperature, light, nutrients, etc.) to assess how environmental conditions play a role in shaping symbiont communities across multiple ecologically relevant scales. Students will develop knowledge and skills related to ecology, bioinformatics, spatiotemporal analyses, and statistical analyses.

Alzheimer’s-related quantitative trait loci (the xQTL Project)
Xiaoling Zhang
Departments of Medicine and Biostatistics

The Zhang lab develops and applies statistical and systems biology methods to identify putative causal genetic variants/genes by mapping the effect of genetic variation on molecular quantitative traits, which provides insights into the molecular mechanisms as well as novel biomarkers and therapeutic targets for human complex diseases. This genetic analysis of molecular quantitative traits includes gene expression (eQTL), transcript-isoform (splicing-QTL) for coding and non-coding RNAs, and cell-type-specific eQTLs using bulk and single-cell transcriptomics generated from human brains. Our project is part of the ongoing xQTL project, a collaborative effort across the Alzheimer’s Disease Sequencing Project Functional Genomics Consortium, aiming to generate a reference map of Alzheimer’s-related quantitative trait loci (QTLs) (https://www.nia.nih.gov/research/ad-genetics?msclkid=44f67304a7e411ec9085c9c71957cf21)

Combining protein interaction data with single-cell sequencing
Andrew Emili
Department of Biochemistry

The Emili Lab is at the forefront of the field of network biology, in particular when it comes to mapping the complete set of protein interactions that occur in the cell. One of the open questions in this field revolves around discovering how these patterns of interaction vary across different cell types and contexts. This project would aim to take a computational approach towards tackling that challenge, by leveraging novel protein interaction data in combination with publicly-available single-cell sequencing databases. Students will learn about analysis methods for co-fractionation mass spectrometry (CF/MS) data for inferring protein interactions as well as methods for processing and analyzing single-cell RNA-Seq data in the R programming language.

Spatial proteomics of lung premalignant lesions
Jennifer Beane
Department of Medicine, Section of Computational Biomedicine

Exposure to cigarette smoke creates a field of injury throughout the entire respiratory

tract by inducing a variety of genomic alterations that can lead to an at-risk airway where bronchial premalignant lesions (PMLs) develop. Bronchial premalignant lesions (PMLs) are precursors of lung squamous cell carcinoma, but have variable outcome, and we lack tools to identify and treat PMLs at risk for progression to cancer. We have previously described molecular alterations in PMLs associated with disease severity and progression and now we are using imaging mass cytometry to understand disease-associated interactions between the lung epithelium and the microenvironment. As part of this project, students will learn how to analyze imaging mass cytometry data on PMLs. Students will test and evaluate analysis methods to address data quality, clustering, cell-to-cell interactions, and integration with other single cell genomics data.

Metabolic modeling and the interactions of soil microorganisms
Jennifer Bhatnagar
Department of Biology

The Bhatnagar lab studies the soil microbiome under changing environmental conditions. Soil microorganisms perform a variety of essential roles; acting as plant symbionts, animal pathogens, and free-living decomposers that recycle nutrients and carbon through the biosphere. Yet we know very little about how these organisms interact with each other and their environment. This project explores these interactions by studying the biochemistry of soil microorganisms that is encoded in their genomes. Students will help refine the biochemical reactions predicted from genome-scale metabolic modeling of individual microbial taxa in complex soil communities. This process involves determining which reactions are infeasible, based on the reactions reported from similar organisms. Students will learn about metagenomics, metabolic modeling (Flux Balance Analysis), and data visualization.

Ideally, students would have taken biochemistry.

Quantifying cross-scale interaction in complex natural systems
Ethan Deyle
Department of Biology

Many of the tools scientists use to quantitatively study the world were developed for engineered systems and laboratory experiments, where a single cause produces a single effect independent of other variables (“linear separability”). Natural systems, whether individual cells or entire ocean ecosystems, frequently deviate from these expectations. Instead, interactions are often state-dependent, where the action of a cause depends on the context around it (i.e. it depends on the state of other variables). This nonlinear state-dependence can interfere with the comfortable, correlative approaches to studying systems, but also presents rich opportunities. This project will center on applying nonlinear causal inference to identify interaction between scales of complexity in natural systems (e.g. single fish populations and ecosystem functioning or single cell expression and organism physiology). Options are available to focus on applied data study of the mass-death of corals on Florida’s Coral Reef, recovery trajectories for tropical rainforest and coral reef communities, aquatic food-webs, or marine fishery management. It is also possible to focus entirely on numeric simulation data. Students will gain hands-on experience in data processing, non-parametric statistics, and time-series analysis using R or Python (based on preference). Previous coding experience is not a strict requirement but will affect the scope of the project.

Genetic variation and linkage to phenotype
Gary Benson
Dept. of Biology, Dept. of Computer Science

The Benson lab develops algorithms and software for biological sequence comparison and repeat detection in genomic sequences. The focus is understanding the occurrence and functional effects of tandem repeats (TRs), and especially, those with variable copy number, also known as variable number of tandem repeats (VNTRs). The lab has developed an analysis tool, VNTRseek, to identify VNTRs, using high-throughput sequencing data, but it is limited to those TRs that ﬁt within a sequencing read. This project will develop new algorithmic and statistical methods to permit detection of longer VNTR repeats and the use of longer read sequencing technologies. Additionally, an online database will be created to store and analyze the variant data. Students will gain knowledge in human genetic variability and DNA repeats, and skills in analyzing high-throughput sequencing data, algorithm design and testing, and database development.