BMSIP Projects 2025
Alternative splicing in Alzheimer’s Disease, PacBio top
PI: Uwe Beffert
Intern: Rachel Bozadjian
Neurons rely on precise alternative splicing to generate diverse protein isoforms essential for function and survival. In Alzheimer’s disease (AD), dysregulation of splicing can lead to aberrant protein expression, affecting neurodegeneration pathways. Long-read sequencing technologies, such as PacBio Iso-Seq, allow for full-length transcript identification, making them well-suited for studying alternative isoform expression in AD. This project will analyze PacBio long-read RNA sequencing data from postmortem AD and control brains, providing insights into transcriptional and splicing differences that may contribute to disease pathology.
Despite extensive research on AD, the role of alternative splicing in neurodegeneration remains unclear. Most transcriptomics studies use short-read sequencing, which struggles to resolve full-length isoforms accurately. This project will leverage PacBio long-read sequencing to identify AD-specific splicing patterns, uncovering novel transcript variants that may contribute to disease mechanisms. By applying bioinformatics approaches to detect isoform-level expression differences, we aim to generate hypotheses about disease-relevant transcript isoforms that could serve as biomarkers or therapeutic targets.
Ligand-Receptor interactions in Alzheimer’s Disease, AlphaFold top
PI: Uwe Beffert
Intern: Riya Jadhav
Alzheimer’s disease (AD) is influenced by key extracellular proteins that regulate synaptic plasticity and neuronal function. Two such proteins, Reelin and apolipoprotein E (apoE), interact with members of the LDL receptor family, including ApoER2, VLDLR, and LDLR. These interactions influence synaptic signaling and tau phosphorylation, both of which are implicated in AD pathogenesis. Understanding how Reelin and apoE interact with their receptors—and whether they synergize or compete for binding sites—could provide insights into disease mechanisms and therapeutic targets.
This project aims to use AlphaFold-Multimer (AF-M) to computationally predict ligand-receptor interactions between Reelin, apoE, and their receptors. Experimental data suggest that Reelin binds ApoER2 and VLDLR strongly, but poorly to LDLR. However, the accuracy of AF-M in predicting these affinities remains unclear. By comparing AF-M predictions to known experimental binding affinities, we will assess its utility for modeling biologically relevant interactions and explore whether apoE and Reelin compete or cooperate at receptor binding sites.
TCAB1 Mutations in Pediatric Osteosarcoma top
PI: Rachel Flynn
Intern: Sydney Sorbello
Telomeres protect chromosome ends and shorten with each cell division, leading to cellular aging or death when critically short. Cancer cells bypass this limit using one of two telomere-elongation mechanisms: telomerase reactivation (common in ~90% of cancers) or the alternative lengthening of telomeres (ALT) pathway, which relies on homologous recombination and is prevalent in certain tumor types like osteosarcoma.
ALT tumors often harbor mutations in chromatin remodeling and DNA repair genes. However, these mutations alone do not fully explain ALT activation. Recent evidence suggests that simultaneous disruption of the telomerase RNA component (hTR) and other key regulators may be necessary to trigger ALT.
The telomerase holoenzyme includes hTR and a scaffold of additional proteins necessary for processing and trafficking. Loss of one such component, TCAB1, compromises telomerase activity and is implicated in telomere disorders. Notably, TCAB1 overlaps with TP53, a tumor suppressor gene frequently mutated in cancers, especially osteosarcoma. Preliminary analysis of ALT-positive osteosarcoma tumors revealed frequent deletions or reduced expression of both TCAB1 and TP53, suggesting that alterations in this genomic region may promote ALT activation.
Hypothesis:
Functional inactivation of TCAB1 is an early genetic event in ALT pathway activation.
Specific Aim:
Define TCAB1 complex loss of function in osteosarcoma.
Using whole-genome sequencing of five osteosarcoma tumors, the project aims to investigate whether genetic alterations at the TCAB1/TP53 locus lead to the loss of hTR function and contribute to ALT activation.
GPS2-mediated signaling and mitonuclear contact sites top
PI: Sahana Mitra
Intern: Tyler Kwok
Recent studies suggest that mitochondria communicate with the nucleus through mitonuclear contact sites, which are essential for retrograde signaling (MRR). Disruption of these sites may contribute to cancer and neurodegeneration. While TSPO has been identified as a component in cancer cells, other tethering molecules in humans remain largely unknown. Evidence from Toxoplasma gondii and our own unpublished data suggests that the nuclear pore complex (NPC) may mediate contact sites and GPS2-driven retrograde signaling independently of TSPO.
Project Objective:
Identify components of mitonuclear contact sites required for GPS2-mediated retrograde signaling.
We use a GFP-based reporter assay to screen for candidate tethering proteins (e.g., TOMMs, NUPs). While effective, the method is time-consuming. This project aims to develop an automated image analysis pipeline to quantify contact site formation across large-scale screens.
A differential gene expression approach to investigating the maintenance of phenotypic variation in a color polymorphic beetle top
PI: Lynette Strickland
Intern: Katherine Kitrick
This project focuses on Chelymorpha alternans, a Neotropical beetle with five genetically determined color morphs. Typically, two to three phenotypes coexist within a population, maintained by ecological and evolutionary pressures across life stages. Larval survival may depend on environmental or host plant factors, while adult predation, influenced by predator learning, may also shape phenotype frequencies.
Although aposematic species are expected to converge on a single warning signal, many — like C. alternans — retain phenotypic variation. One possible explanation is variation in chemical defense: some morphs may be better at acquiring or storing plant toxins.
Genomic resources available (RAD-seq, reference genome, RNA-seq) enable exploration of this question.
Project Objective:
Use differential gene expression analysis to identify genes involved in toxin sequestration and assess whether their expression varies across color phenotypes of C. alternans.
Measuring kinetic rates of synthetic transcription factors via proSEQ data top
PI: Mo Khalil
Intern: Nicholas White
Previously, a framework for constructing synthetic transcription factors (sTF) in S. cerevisiae was developed in the Khalil lab. In this follow up work, second generation sTFs will be examined for potential kinetic differences in mRNA transcription. To capture mRNA transcription with granular temporal resolution, precision run-on sequencing (PROseq) will be used, which offers single nucleotide resolution of nascent RNA 3′ ends.
In “A Synthetic Biology Framework for Programming Eukaryotic Transcription Functions” by Khalil et al, a framework for engineering Zinc Finger to be modular and tunable was demonstrated. These synthetic transcription factors (sTFs) have been decomposed into DNA binding domain which allows for promoter specificity, an activation domain for recruiting transcription machinery and an additional protein-protein interaction domain which allows for cooperative interaction with other transcription factors.
Dose dependent curves were demonstrated previously but to get a more accurate look at the kinetic data for these sTFs, precision run-on sequencing will be used, a new technique which effectively stalls mRNA mid transcription with biotin labeled nucleotide. This dataset will be analyzed to determine more accurate kinetics.
Understanding epigenomic landscape underlying microglia cellular states transitions top
PI: Lei Hou
Intern: Wenshou He
Microglia (MG) is highly involved in pathology of Alzheimer’s diseases (AD) and other brain disorders. Previous single-nuclei-RNAseq studies have identified multiple MG cellular states. Some of these states, including inflammatory MG and lipid loaded MG, are known to be associated with AD. The regulatory mechanisms, especially epigenomic regulation underlying the transitions among these cellular states is an important topic in the field.
Our hypothesis is that a plastic epigenomic regulatory network govern the stability of MG cellular states and their transitions. We would like to test the hypothesis by testing a series of quantitative models to predict differential expressed genes between two cellular states and predict the key regulators that potentially drive the transitions. Investigation on such epigenomic regulatory network will enable us to capture key enhancers which will missed by TF-gene regulatory network, and also bridge the non-coding variants associated with AD to the potential regulatory mechanisms.
Multi-omic Approaches to Building Genome Scale Models of Bacteria from Soil, Marine, and Human Microbiomes top
PI: Melisa Osborne
Intern: Benjamin Pfeiffer
Broadly, it will address the lack of diversity in model organisms with genome scale models that are available for computational investigation of communities of microbes in silico. More specifically, we will be developing pipelines for building genome scale models of specific organisms using various types of wetlab data.
Whole genome sequencing data – analyzed using software tools in KBASE (DOE organized genomic database) and independent software packages such as EggNog, CarveMe, RAST, Prokka, etc will be used to generate draft Genome Scale Models/Metabolic Networks. The COMETS software package will be used to evaluate the performance of these models. Part of the project will include assessing and utilizing best software tools for the additional incorporation of other data types – metabolomic data, proteomic data, and TNSeq data – into model building.
Multi-modal omics integration to study HSC engraftment potential top
PI: Ruben Dries
Intern: Anuradha Basyal
Hematopoietic stem cells (HSCs) are a cornerstone in cellular therapy to treat diseases such as leukemia, inherited metabolic and auto-immune disorders and hemoglobinopathies. However, ex vivo expansion and genetic editing of HSC have been shown to lead to loss of engraftment capacity, thereby limiting treatment efficiencies. Understanding how to better maintain or enhance engraftment potential is critical and will result in safer and more effective treatments. Previous analysis on human fetal liver (FL)-derived HSCs demonstrated that these cells possess superior engraftment capacity as compared to postnatal sources.
Our hypothesis states that the human fetal liver niche harbors a unique set of regulatory instructions to control the repopulating potential of HSC. We will use various omics datasets to decode the signals orchestrating this unique biological process during human development. We will consider spatial niche composition, signaling pathways, and processes related to 3’ alternative polyadenylation.
Cross-Species Single-Cell Database for Brain Aging and Neurodegenerative Diseases top
PI: Chao Zhang
Intern: Sofiya Patra
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and a growing public health crisis. In the U.S., the number of AD dementia cases is expected to rise dramatically, from 4 million to 14 million by 2050. While aging is the most significant risk factor for AD, it is not synonymous with disease. Healthy aging does not inherently lead to neurodegeneration, yet distinguishing between normal aging and pathological processes remains a major challenge. Understanding this distinction is crucial for developing effective treatments. Animal models, such as monkeys and mice, are widely used to study AD and aging. However, these species do not naturally develop the full spectrum of AD pathology seen in humans. This limitation hinders our ability to accurately model disease progression and identify therapeutic targets. Advancements in single-cell technologies provide an opportunity to analyze aging and AD at a cellular level, offering insights into shared and species-specific mechanisms.
Despite extensive research, the biological mechanisms that distinguish healthy aging from AD remain poorly understood. A major challenge in AD research is the lack of an ideal animal model that fully recapitulates human disease progression. Current studies rely on mice and monkeys, but their genetic and pathological differences from humans limit translational insights. To overcome these limitations, integrating single-cell data across species could provide a more comprehensive understanding of aging and AD. However, existing data integration tools may not be optimized for cross-species comparisons. We propose to develop a novel algorithm to integrate single-cell datasets from humans, monkeys, and mice at different age stages. By comparing our method with existing integration tools, we aim to improve cross-species analysis, uncover conserved and divergent aging processes, and enhance our understanding of AD pathogenesis.
Building a multimodal model for treatment outcome prediction by integrating histology images and gene expression data. top
PI: Chao Zhang
Intern: Elaine Huang
Ovarian and breast cancers are among the leading causes of cancer-related deaths in women worldwide. The pathological analysis of Whole Slide Images (WSIs) of ovarian and breast tumors plays a crucial role in understanding the progression and classification of these diseases. Advanced medical imaging techniques, combined with computational analysis, have the potential to unlock insights into the morphological characteristics of these cancers, leading to better diagnostic accuracy and personalized treatment strategies. However, the analysis of WSIs is challenged by the sheer size of the images and the complexity of tumor biology, necessitating the development of sophisticated image processing and machine learning algorithms.
This internship aims to tackle the critical challenge of processing and analyzing pathological WSIs of ovarian and breast cancers. The project’s goal is to develop and implement algorithms that can efficiently and accurately identify cancerous tissues, quantify tumor heterogeneity, and predict clinical outcomes. This involves overcoming obstacles such as image segmentation, feature extraction, and the classification of cancer subtypes. By enhancing our capability to analyze WSIs, we aim to contribute to the early detection of ovarian and breast cancers, improve the accuracy of prognosis predictions, and assist in the formulation of personalized treatment plans.
Enhancing Interpretability and Generalizability in Alzheimer’s Disease Risk Modeling top
PI: Jinying Chen
Intern: Joshi, Dhruvi
The development of Alzheimer’s disease (AD) can span many years before symptom manifestation and clinical diagnosis. Effective treatments for AD are considered controversial and only minimally slow down progression. Assessing and identifying people with risk in cognitive decline and progression along the AD disease continuum may enable earlier intervention and better prognosis via modifiable risk factors and therefore reduce the disease burden.
Many factors (sociodemographic, genetic, environmental/lifestyle, clinical, and the presentation of AD hallmarks) contribute to the risk of developing AD or AD-related cognitive decline. Advanced machine learning (ML) models can incorporate many features and model complex, non-linear relationships between these features and the outcome. However, they often face criticisms on lack of interpretability (i.e., lacking obvious connections between some features important for the model’s prediction accuracy and the outcome to predict) which limits their application, especially in clinical settings. In addition, external evaluations of ML models trained/optimized using one dataset on another dataset can be challenging if the two datasets contain related but different variables. This internship seeks a summer intern student to assist in developing an analysis pipeline that supports the development of interpretable ML models for AD risk prediction and the evaluation of these models across datasets.
Deep Learning Approaches for Detecting Epigenetic and Epitranscriptomic Modifications in Cancer top
PI: Ignaty Leshchiner
Intern: Sandilya Bhamidipati
There is an ever-growing body of literature which show how modifications to the DNA, both genetically and epigenetically, as well as modifications to proteins are distinct in the setting of cancer and neurological disease. Changes to RNA, especially those epitranscriptomic in nature, have not yet extensively been studied. This is because while the technologies to accurately assess the base-pair changes to DNA are well established, the ability to detect native epitranscriptomic changes in DNA and RNA is not yet a robust technology. We are developing deep learning methods that enable calling of modifications from both RNA and DNA with higher accuracy from Native Nanopore based sequencing and identify tumor type specific DNA/RNA modifications. The project will involve analyzing and training the models to improve call accuracy and detect biological changes within samples.
Phylogenetic reconstruction models in tumor biology top
PI: Ignaty Leshchiner
Intern: Beatriz Bergamo
Computational biology and cancer bioinformatics. Research in the lab is focused on applying new genomic technologies, computational analysis and AI methods on data from patients’ tumors to understand the biology behind tumor development, treatment evasion, and progression to metastasis. We are developing and applying tools for simultaneous analysis of multiple samples from the same patient, clonal structure, integration of single cell genomics and transcriptomics, reconstruction of cell subpopulations, their growth kinetics and expression, tumor micro-environment effects, estimation of order of events (“timing”) during tumor development and progression. We work with pre- and post- treatment samples, autopsies and longitudinal blood biopsies in solid and blood malignancies.
Our methodologies allow us to study the sequence of events that occur in the developing cells all the way from normal to malignant state, both on single patient and cohort level.
In the proposed project, we aim to address the premalignant progression knowledge gap by developing and applying computational methodologies to infer progression models and enhance our understanding of the genetic events occurring during the premalignant phase of various cancer subtypes and subclonal progression. By utilizing publicly available and lab generated whole exome and whole genome sequencing data from primary or advanced tumors, we are looking to establish a map of cancer progression for cancers that currently lack detailed information on the premalignant state.
GPS2-mediated signaling and mtUPR pathway top
PI: Valentina Perissi
Intern: Xinyu Li
Mitochondria are unique among intracellular organelles in that they contain multiple copies of their own DNA, however most of the mitochondrial proteome is encoded in the nuclear genome. Coordinated regulation of gene expression across the two interdependent genomes is therefore essential to maintain cellular homeostasis and guaranteed by bidirectional communication pathways, referred to as anterograde (nucleus to mitochondria) and retrograde (mitochondria to nucleus) signaling. We identified G-protein Pathway Suppressor 2 (GPS2) as a key mediator of mitochondria retrograde signaling in mammals and characterized GPS2 chromatin occupancy by ChIPseq in different cell lines, including murine 3T3-L1 preadipocytes treated with mitochondrial stressors, however it remains unclear how it gets recruited to specific target genes in response to retrograde translocation from stressed mitochondria and whether it acts in synergy with the classic mitochondrial unfolded protein response (mtUPR) pathway. A better understanding of the molecular mechanisms regulating the mitochondrial stress response is critical to understand the basis of adaptive responses towards the maintenance of metabolic homeostasis in mammalian cells, with important implication for studies on diseases with a metabolic component, including not only mitochondrial diseases and metabolic syndromes, but also aging and cancer.
The overall goal of this project is to explore the crosstalk between GPS2-mediated retrograde signaling and the mtUPR pathway. Our hypothesis is that different retrograde signaling pathways converge to regulate common target genes and GPS2 is recruited to chromatin through its interaction with stress-response TFs like ATF4/ATF5. This hypothesis was based on the following preliminary observations: i) Numerous classic ATF4/5 target genes were found among the DEGs regulated by GPS2 upon mitochondrial-to-nucleus translocation in 3T3-L1 cells (Cardamone et al., 2018); ii) Direct interaction between ATF4 and GPS2 is reported in BioGRID and was validated by co-immunoprecipitation in Hela cells. Preliminary analysis of two sets of published ATF4 ChIPseq datasets (which was performed by a BMSIP student as part of their Summer 2024 internship) confirmed this hypothesis by identifying several genes occupied by both GPS2 and ATF4. These analyses also revealed that clustering of GPS2/ATF4 target genes based on promoter topology and relative position of GPS2 and ATF4 binding sites correlated to defined stress response and metabolic pathways, thus suggesting an unexpected specificity in the mechanisms of action. We would like to confirm these findings using other published ChIPseq datasets for ATF4 and extend our studies to other stress response factors (such as FOXO and NRF2) which may contribute to GPS2 recruitment to subsets of targets.
Preserving neuronal activity in human cortical organoids. top
PI: Ella Zeldich
Intern: Shivani Pimparkar
Cortical organoids (COs) can be generated generated from induced pluripotent stem cells (iPSCs). COs contain mature neurons and astrocytes and exhibit spontaneous neuronal activity and provide an invaluable platform for in vitro study of the cell-to-cell interactions mimicking human brain. Recent study demonstrated that expanding native oligodendrocyte progenitor cell population present in COs results in the generation of oligodendrocyte containing organoids (OCOs). Within the OCOs, oligodendrocyte can undergo functional maturation and actively myelinate neuronal axons. While this approach offers a promising avenue to understanding oligodendrocyte biology in a 3D human cellular system, our preliminary observations suggest that expansion of the native OPC population in OCOs results in diminished neuronal maturation and activity. Thus, a functional characterization of mature oligodendrocytes and the impact of neuronal activity on their biology is limited in the current OCO model. In our ongoing study, we utilized BrainPhys medium, optimized for electrophysiological activity and neuronal maturation; to rescue neuronal maturation in OCOs. By matching the physiological conditions of the CNS, BrainPhys creates a cellular environment that increases synaptic activity during development and consequently enhances neuronal maturation. OCOs were divided into three groups: one receiving BrainPhys prior to oligodendrocyte expansion, the second receiving BrainPhys after expansion, and the final remaining in standard basal medium. Using these three groups, our goal was to determine the conditions that will result in the presence of oligodendrocyte population and preserved neuronal activity, and assess the impact on enhanced neuronal activity of the maturation of oligodendrocytes. Using immunohistochemical staining and functional neuronal assessment with calcium imaging, we found that transitioning OCOs to BrainPhys prior to oligodendrocyte expansion generates OCOs with functional neurons, while preserving oligodendrocyte maturation.
We also submitted samples from the three experimental groups to sc-RNA-seq. The purpose of the analysis would be to compare cell populations within the OCOs exposed to different media and assess the markers of neuronal maturation and oligodendrocyte development on a single-cell resolution. We expect that our scRNA-seq data analysis will uncover the cellular processes underlying the ability of the BrainPhys media to promote neuronal functions as well as oligodendrocyte development and maturation, leading to a more advanced organoid model that can be widely used in the biomedical field.
Upon completion, our study has the potential to utilize this model to recapitulate known network abnormalities and cellular pathologies in neurodevelopmental disorders. Taken together, our study presents an increasingly comprehensive organoid system capable of modeling neuronal activity, oligodendrocyte lineage, and neuron- oligodendrocyte crosstalk.
Genomics and transcriptomics of the Betta splendens skin top
PI: Nelson Lau
Intern: Manseeb Hossain
The Betta splendens siamese fighting fish has a very compact ~450mb genome and a very diverse and amazing skin coloration pattern. We hypothesize that transposons and unique transcriptome signatures may underlie the koi/marble skin coloration pattern in the Betta fish.
The project will entail analysis of the gene expression network and genome structural variants in the skin tissues of the model fish Betta splendens, also known as the Siamese Fighting Fish. This common pet has an amazing display of skin coloration variations that we hypothesize is affected by transposons either mobilizing or expressing transcripts that modulate the skin transcriptome.
Developmental Pathways in Stem Cell Maintenance, Cancer Progression and Aging top
PI: Deborah Lang
Intern: Jacques Dirabou
This project investigates how developmental pathways that support stem cell maintenance are repurposed in adult melanocyte stem cells and subverted during melanoma progression. Our work centers on melanocytes, melanocyte stem cells, and melanoma, using a multidisciplinary approach that integrates cellular, molecular, and genomic tools. We work with protein, RNA, and DNA analyses, along with mouse models and datasets derived from patient populations.
We are addressing two central questions. First, we aim to define the cistromic and transcriptomic targets of transcription factors that play key roles in development, and to understand how these factors are reused in adult stem cells and corrupted in cancer. Second, we seek to identify distinct populations of melanocyte stem cells and examine how these populations differentiate and change over the course of aging.
Our approach combines functional genomics, in vivo and in vitro models, and patient-derived data to dissect the mechanisms underlying stem cell behavior and tumor progression. By exploring how developmental programs are recycled or hijacked, we aim to uncover fundamental insights into both melanoma biology and stem cell aging. This work has potential applications in regenerative medicine and cancer therapy.
Cross-species data Integration for Single-Cell Neural Datasets top
PI: Chao Zhang
Intern: Jinglin Han
Alzheimer’s disease (AD) is one of the most pressing public health challenges of our time, with the number of affected individuals in the U.S. expected to rise from 4 million to over 14 million by 2050. Although aging is the greatest risk factor, AD is not an inevitable outcome of aging, and distinguishing between normal aging and neurodegeneration remains a fundamental challenge. One key limitation in current research is the lack of accurate models that fully recapitulate human AD pathology, as commonly used model organisms like mice and monkeys do not naturally develop the full spectrum of AD observed in humans. To address this translational gap, this project proposes a computational strategy to align single-cell data across species and developmental stages.
We aim to develop a graph neural diffusion algorithm tailored to integrate brain-derived single-cell and single-nucleus datasets from humans, mice, and monkeys. These include scRNA-seq, snRNA-seq, and snATAC-seq data, spanning healthy, aged, and AD-affected conditions. The goal is to enable cross-species knowledge transfer at the cellular and molecular level, deepening our understanding of neurodegeneration and aging in humans through better-aligned animal models.
The project will focus on several key tasks. First, we will collect and preprocess publicly available single-cell and single-nucleus datasets, ensuring consistency across samples. Next, we will perform orthologous gene mapping using a combination of strategies, including gene symbol matching, sequence-based alignment, database lookups, and functional similarity approaches. These methods will be evaluated for accuracy and biological relevance.
Following gene mapping, we will design and implement a graph neural diffusion algorithm that integrates data across species while preserving tissue- and modality-specific features. This model will be fine-tuned for neural tissue to optimize biological interpretability. To evaluate performance, we will benchmark our method against leading integration tools such as Seurat, Harmony, fastMNN, SAMap, and more recent large language model–based tools like scGPT, CAMEX, and SATURN.
Finally, we will apply the model to real-world neural datasets to assess its effectiveness in transferring biological knowledge across species. This downstream analysis will focus on validating the model in practical scenarios related to aging and Alzheimer’s disease.
The focus of the project will be on data collection, preprocessing, orthologous gene mapping, and initial benchmarking of the integration algorithm. The long-term objective is to establish a robust, generalizable framework for cross-species single-cell integration in the context of brain aging and neurodegeneration.