JUNE 11: Lunch and Learn: Dr. Bin Yu, University of California at Berkeley

Lunch and Learn: “Why Veridical Data Science for Medical AI?”

June 11, 2024 | 12-1PM. Hybrid Event.
In-Person: Crosstown Center, 801 Mass Ave, 3rd Floor, Room 305.

Register Now:

Abstract: Data Science is central to AI and has driven most of the recent advances in biomedicine and beyond. Human judgment calls are ubiquitous at every step of the data science life cycle (DSLC): problem formulation, data cleaning, EDA, modeling, and reporting. Such judgment calls are often responsible for the “dangers” of AI by creating a universe of hidden uncertainties well beyond sample-to-sample uncertainty. To mitigate these dangers, veridical (truthful) data science is introduced based on three principles: Predictability, Computability and Stability (PCS). The PCS framework and documentation unify, streamline, and expand on the ideas and best practices of statistics and machine learning. In every step of the DSLC, PCS emphasizes reality checks through predictability, considers computability up front, and takes into account expanded uncertainty sources including those from data curation/cleaning and algorithm choice to build more trust in data results. PCS will be showcased through collaborative research in seeking genetic drivers of heart disease and in cancer detection. We will end with on-going research on PCS uncertainty quantification (UQ). PCS-UQ addresses two other prominent sources of uncertainty in the DSLC from reasonable choices practitioners make in data cleaning and modeling stages (in addition to uncertainty arising from data collection).

Bio: Dr. Bin Yu is Chancellor’s Distinguished Professor and Class of 1936 Second Chair in Statistics, EECS, and Computational Biology at UC Berkeley. Her research focuses on the practice and theory of statistical machine learning, veridical data science, and solving interdisciplinary data problems in neuroscience, genomics, and precision medicine. She and her team have developed algorithms such as iterative random forests (iRF), stability-driven NMF, and adaptive wavelet distillation (AWD) from deep learning models. She is a member of the National Academy of Sciences and of the American Academy of Arts and Sciences. She was a Guggenheim Fellow, and holds an Honorary Doctorate from The University of Lausanne.

View all posts