DataHub

The DataHub is a centralized resource within the Center for Health Data Science that connects researchers to curated health data, along with the infrastructure, policies, processes, and expert support needed to access and use it. Whether you’re just getting started or exploring advanced analytic strategies, the DataHub supports you at every stage of your data journey.

The DataHub includes a comprehensive catalog of key data resources available through CHDS, spanning both internally hosted data and external sources. It also provides curated metadata, guidelines, tools, and procedures to support compliant data access and use.

An infographic displaying the core services of DataHub: Data inventory, aquisition support, and data management.
The DataHub is designed to streamline access to data and provide expert guidance throughout the research process. To learn more or discuss your project, contact bedacprp@bu.edu or explore the links below.

Request Services or Assistance   Collaborative Model & Rates



Our Inventory

Resources listed below include both publicly available datasets (Public) and restricted datasets (Restricted) that require credentialed access (e.g., through DUAs or required training).

Resource Access Category Description
Acumen CMS Data Linkage (NIA)

Restricted

Administrative/Claims Links Medicare claims to NIA-funded studies (e.g., HRS, NHATS). Enables longitudinal analysis of older adults’ health, spending, and outcomes. Managed by Acumen LLC with curated, linked datasets.
ADVANCE Clinical Research Network (led by OCHIN)

Restricted

EHR/Clinical PCORnet network aggregating EHR and claims data from safety-net providers, focusing on underserved populations. Uses PCORnet CDM. Strong for health equity and primary care research.
All of Us Research Program

Restricted

EHR/Clinical NIH initiative collecting EHR, biospecimen, survey, and wearable data from 1M+ participants. Focus on diversity, longitudinal tracking, and personalized medicine via a secure Researcher Workbench.
Alzheimer’s Disease Research Centers (ADRC)

Restricted

Cohort Originally established in 1984, the Alzheimer’s Disease Research Centers (ADRCs) are Congressionally designated Centers of Excellence. The ADRCs support the nation’s increased effort to address Alzheimer’s disease (AD) and Alzheimer’s disease-related dementia (ADRD) by providing information, local resources, support, and opportunities to participate in research.
American Community Survey (ACS)

Public

National survey Annual U.S. Census survey covering 3.5M households. Offers tract-level demographic, housing, and economic data. Key for policy and spatial analysis. Estimates have margins of error for small areas.
Boston Biorepository, Recruitment, and Integrative Network (BBRAIN)

Restricted

EHR/Cohort The Boston Biorepository, Recruitment, and Integrative Network (BBRAIN) is a national research resource created to advance the study of Gulf War Illness (GWI).
CDC Data Catalog

Public

Public health/community CDC offers a variety of public and adminsitrative health datasets measuring mortality (WONDER), food environment (mRFEI), local disease burden (PLACES), and life expectancy (USALEEP) and other concepts.
CMS Medicaid

Restricted

EHR/Administrative/Claims Massachusetts all-payer claims database covering ~80% of the population. Strong for health utilization and insurance analysis.
CMS Medicare

Restricted

Administrative/Claims Claims and enrollment data for ~65M older or disabled Americans. Ideal for longitudinal studies of healthcare use in older adults.
CMS Transformed Medicaid Statistical Information System (T-MSIS)

Restricted

Administrative/Claims Standardized Medicaid/CHIP data from all states. Includes claims, enrollment, and provider data. Designed for national comparability.
Data Axle

Restricted

Administrative/Claims Business and consumer dataset covering 15M+ U.S. businesses and 300M consumers. Useful for market research and outreach.
Epic Electronic Health Record System (EHR)

Restricted

EHR/Clinical Clinical EHR data from Epic Systems. Rich in labs, diagnoses, medications, and procedures. Granular and timely, but limited to care within EPIC networks and varies by institution.
Framinham Heart Study (FHS)

Restricted

Cohort Established by the National Heart, Lung, and Blood Institute (NHLBI) in 1948 in Framingham, Massachusetts, the study follows multiple generations of participants to understand the determinants of cardiovascular disease and related chronic conditions.
Health Resources and Services Administration (HRSA)

Public

National survey Health access and provider shortage data, including HPSAs and FQHCs. Covers underserved regions and providers. Ideal for access-to-care studies; lacks clinical or patient-level detail.
Informatics for Integrating Biology & the Bedside (i2b2)

Public

EHR/Claims Open-source platform for EHR cohort discovery at academic centers. Provides aggregate patient counts for exploratory work. Customizable, but no direct access to patient-level data.
International URBAN Alcohol Research Collaboration on HIV/AIDS (ARCH)

Restricted

Cohort The URBAN ARCH consortium includes three cohorts of people living with HIV in Mbarara, Uganda; St. Petersburg, Russia; and Boston, Massachusetts.
MACS/WIHS Combined Cohort Study (MWCCS)

Restricted

EHR/Clinical Combines MACS and WIHS the longest running observational cohorts of men and women with HIV (and includes people without HIV) in the US along with new recruits. https://statepi.jhsph.edu/mwccs/about-mwccs/
Massachusetts Center for Health Information and Analysis (CHIA) All Payers Claims Database (APCD)

Restricted

EHR/Administrative/Claims Massachusetts all-payer claims database covering ~80% of the population. Strong for health utilization and insurance analysis.
Merative MarketScan Research Database

Restricted

Administrative/Claims Proprietary claims database with 200M+ covered lives. Detailed data for commercially insured individuals. Excellent for utilization and cost studies, but not generalizable to public/uninsured populations.
National Clinical Cohort Collaborative (N3C)

Restricted

EHR/claims NIH-sponsored data enclave with harmonized EHR from 80+ institutions. Initially focused on COVID-19, now expanded. Supports ML and advanced analytics with OMOP-standardized data.
Observational Health Data Sciences and Informatics (OHDSI)

Public

EHR/Clinical Global open-science network using the OMOP data model to enable distributed observational research across EHR and claims data. Tools for comparative effectiveness and large-scale phenotyping.
OptumLabs Data Warehouse

Restricted

EHR/Clinical Large dataset of claims + EHR for 200M+ patients, used in real-world evidence research. Strong linkage, cost/utilization analysis; access is proprietary and costly.
TriNetX

Restricted

EHR/Clinical Federated EHR network used for real-time cohort building, feasibility assessment, and clinical trial recruitment. De-identified data from 120+ global healthcare orgs.
U.S. Census

Public

National survey Decennial population count (~330M U.S. residents). Provides full coverage demographic and housing data. Used for policy, planning, and allocation. Conducted every 10 years.
U.S. Department of Agriculture (USDA)

Public

National survey USDA offers data measuring the food environment, SNAP participation, and rurality measures. Common in food insecurity and rural health work.
VA Informatics ” Computing Infrastructure (VINCI) Restricted

Restricted

Administrative/Claims Secure VA-managed environment offering access to national EHR data for 9M+ veterans, including structured/unstructured data, pharmacy, imaging, and claims. Used in chronic disease and aging research.