Spotlight on… Fatema Shafie Khorassani, PhD and Huimin Cheng, PhD
As told to URBAN ARCH Admin Core staff, February 2025
Tell us about your academic and professional background. What do you like most about the field of biostatistics?
Fatema: I majored in math in undergrad at Wayne State University. After graduating, I taught math for a while, and worked as a medical assistant before going back to Wayne State for my MPH, with a concentration in biostatistics. After working as an applied biostatistician for a while, I decided to get my PhD in Biostatistics, which I completed at the University of Michigan in 2023, before joining the biostatistics faculty at the Boston University School of Public Health. What I like most about biostatistics is that it gives me an opportunity to collaborate with researchers doing impactful work in a lot of different areas of public health.
Huimin: I am an Assistant Professor in the Department of Biostatistics at Boston University and a Rafik B. Hariri Junior Faculty Fellow. I earned my Ph.D. in Statistics from the University of Georgia in 2023, and then I joined BU.
What I find most exciting about biostatistics is its direct impact on public health. The field sits at the intersection of theory and real-world impact, allowing me to harness cutting-edge statistical and AI methodologies to tackle pressing biomedical challenges. Whether it’s unraveling intricate disease networks, optimizing personalized treatments, or extracting hidden patterns from massive health datasets, biostatistics provides a powerful framework to generate knowledge that directly influences patient outcomes and policy decisions. The opportunity to collaborate across disciplines—working with clinicians, epidemiologists, and data scientists—keeps the work both intellectually stimulating and deeply meaningful.
Can you share a little about your research interests?
Fatema: My statistical methodology research focuses on data integration methods, survival analysis, causal inference for observational data, and statistical methods for the evaluation of surrogate outcomes collected in clinical trials. My work has most often been motivated by studying health disparities using complex observational data sources.
Huimin: My research is highly interdisciplinary, spanning statistical network analysis, machine learning, causal inference, and AI.
Recently, I am interested in transfer learning and domain adaptation, particularly in cases where data is scarce or heterogeneous. I am developing methodologies that allow knowledge learned from large, well-structured datasets to be transferred effectively to smaller, more complex domains. This is crucial in applications such as personalized medicine, where patient populations vary significantly, and pre-trained models must be adapted to individual contexts.
Another interesting project I am working on is developing an AI reader for biomedical literature, MedReader. This tool leverages large language models (LLM) to build knowledge graphs that help make predictions in drug repurposing and biomarker discovery.
You’ve both recently joined the Department of Biostatistics at Boston University School of Public Health, and the International URBAN ARCH team as biostatisticians for the TALC, TRAC, and GRAIL studies. What have you enjoyed about these roles so far, and what are you most looking forward to as you continue in these roles?
Fatema: As the biostatistician for the TALC and TRAC studies, I have the opportunity to participate in a wide variety of projects, and I am learning a lot about the impactful work that URBAN ARCH has done and is planning to do. I am most looking forward to identifying areas where biostatistics can make a significant impact on advancing the goals of the studies.
Huimin: I have really enjoyed working with the URBAN ARCH team. The interdisciplinary nature of these projects aligns well with my research style, as they bring together experts in epidemiology, clinical research, and biostatistics to tackle complex public health challenges.
One of the most intriguing aspects of this work has been the data collection and analysis process. Unlike my previous research, which primarily relied on publicly available datasets, these studies involve direct data collection from diverse populations in real-world settings. This has given me a deeper appreciation for the complexities of study design, participant recruitment, and data quality control, which are critical to ensuring robust statistical analysis. Understanding the nuances of how data is gathered—whether through clinical trials, patient surveys, or biomarker assessments—has provided a richer context for developing appropriate analytical strategies.
As I continue in this role, I look forward to integrating cutting-edge statistical and AI techniques into the analysis. For example, natural language processing (NLP) can be used to extract critical insights from electronic health records (EHRs), clinical notes, and patient-reported data—transforming unstructured text into structured data.
As a biostatistician, what methodologies or statistical techniques do you find most impactful or innovative? Could they potentially be applied to data from these studies?
Fatema: The most impactful methodology is the one that best answers a research question, regardless of how simple or complicated that method may be. The methodologies that I have personally found most impactful are in the areas of missing data, causal inference, and survival analysis. There is plenty of room for applying those methods to the problems that arise in these studies. As with any cohort study, there are a lot of interesting causal inference and missing data problems that may arise as follow-up continues.
Huimin: As a biostatistician, I find several methodologies and statistical techniques particularly impactful in clinical research, such as Causal Inference for Treatment Effect Estimation, or mixed effect regression model, functional data analysis, and machine learning methods. They are used in different scenarios and address different problem. For example, causal inference helps estimate treatment effects adjusting for confounders. This is particularly useful in the URBAN ARCH studies, where we analyze the effects of some intervention on the outcomes.
What are some of the challenges or opportunities you foresee as a part of your work on these studies?
Fatema: With all the data that URBAN ARCH has collected, there is a lot of opportunity for answering important new public health research questions, and for adapting statistical methods to answer more complicated questions as they arise. Each challenge that arises in the data analysis process is just an opportunity to adapt existing methods to fit the needs of our particular study. I am also very excited about the opportunity to work with, and learn from, talented researchers from around the world, although that unfortunately brings with it the (relatively minor) challenge of scheduling meetings across multiple time zones.
Huimin: One challenge is combining different types of data, like clinical records, behavioral assessments, biomarkers, and EHRs. These data come in different formats and quality levels, so we need advanced AI and statistical methods to make sense of them.
Another challenge is data equity—some populations might be underrepresented, which can lead to biased predictions and limited generalizability. To fix this, we need bias-correction methods, stratified modeling, and fairness-aware machine learning techniques to ensure that our findings apply to diverse patient groups, not just a subset of the population.
But these challenges also bring exciting opportunities. By using multi-modal learning techniques, we can combine these different data sources more effectively, leading to better predictions and more personalized treatments. And by addressing data equity, we can make sure that healthcare research benefits all patients, not just those who are well-represented in the data.
Tell us one thing about yourself that readers might find surprising.
Fatema: I have a serious book hoarding problem!
Huimin: I love karaoke and singing—especially singing in the shower! You might not expect it since I spend most of my time in data and algorithms, but I actually think I’m pretty good at singing. Whether my neighbors agree is another story! Singing is my way to relax, have fun with friends.
Any other comments?
Huimin: I’m really excited to be part of these studies and to collaborate with such a talented, interdisciplinary team. I look forward to learning, contributing, and hopefully making a meaningful impact—while maybe convincing a few colleagues to join me for a karaoke night!