Aim 2: Satellite-Based Data
Produce monthly to annual surface PM2.5 concentration estimates across the communities of interest using satellite-based data from the 1-km resolved MAIAC MODIS/AOD retrieval algorithm combined with local land use and meteorological variables and available surface measurements, including data from low-cost sensors where possible.
Land-use regression (LUR) models have been widely used to estimate air pollution exposures and assign long-term exposures in health studies. Satellite-retrieved AOD values are widely used in modeling PM2.5 because measurements are available at high spatial resolution for much of Earth’s surface and because there is a historical repository of AOD measurements. AOD provides generally reliable daily measurements, and is particularly useful for areas lacking PM2.5 monitors. Independently, both AOD and LUR exposure modeling approaches have strengths and weaknesses that when combined in a hybrid model have the ability to complement each other. Studies have started to use hybrid modeling approach to improve PM2.5 prediction performance and downscaling.
Data Sources:
- High Resolution Satellite AOD retrievals: MAIAC AOD at 1 km resolution has been used to derive high-quality PM2.5 concentration estimates in the US to support air pollution exposure assessments and health studies. MAIAC-driven PM2.5 estimates have been shown to be in good agreement with surface data, with R2 values exceeding 0.75 while exhibiting little systematic bias.
- Land Use Data:
- Normalized Difference Vegetation Index (NDVI): We are utilizing other satellite measures such as MODIS land cover type in addition to including predictors of the built environment. NDVI is an index that indicates photosynthetic activity in plants and has been used as measure of greenness and urbanity in health studies. We are using the 16-day NDVI composites, derived from the MODIS sensor onboard the Terra-Aqua satellites. The MOD13Q1 (NASA LPDAAC. 2015) Version 5 data MODIS NDVI product has a 250x250m resolution that has been corrected for atmospheric contamination from water, clouds and aerosols.
- Road Data: Annual average daily traffic (AADT) is being used to characterize traffic volume, while road classification information is being used to improve traffic flow estimation in areas with limited traffic count data. Road length classification has been used in previous studies to characterize traffic as part of hybrid PM2.5 models. We are employing a similar strategy by calculating total road length for major and minor roads by intersecting with the 1×1 km grid cells in order to calculate road type density for each grid. We are also integrating digital elevation models available for our study area from the NASA Shuttle Radar Topographic Mission (SRTM).
- Meteorological data: We are using temperature, relative humidity, dew point, wind speed, wind direction, air pressure, precipitation, visibility and mixing height. Many of these variables are obtained from the NASA North American Land Data Assimilation System (NLDAS) meteorological dataset, which has a spatial resolution of 0.125 degree x 0.125 degree and a temporal resolution of 1 hour. Our modeling grid cells are being matched to the NLDAS grid cells that contain them for meteorological variables which have been used in previous hybrid modeling studies. We are using these data as spatio-temporal inputs during model development.
- Existing surface PM2.5 measurements: In order to validate our spatial statistical models that are being used to create the satellite-derived PM2.5 continuous surfaces for our respective study sites, we are using PM2.5 surface measurements from the U.S. Environmental Protection Agency (EPA) Air Quality System (AQS) federal reference monitors (FRM), as well as existing locally-run networks of PM2.5 monitors such as the NYCASS network operated by NYC DOH.
- Statistical Analysis: We are using a consistent two-stage spatial statistical modeling framework to estimate daily PM2.5 mass concentrations at 1 km resolution in all study areas. The general structure is described in Hu et al., 2014. Basically, the first stage is a linear mixed effects model with day-specific random intercepts and slopes for AOD and meteorological parameters such as temperature or wind speed to account for the temporally varying relationship between PM2.5 and AOD. The second stage is designed to capture any unexplained spatial variability in PM2.5 by the first stage. We are using a geographically weighted regression (GWR), which generates a continuous surface of estimates for each parameter at each location instead of a universal value for all observations. Each site has selected the best set of predictor variables while maintaining identical overall model structure in order to facilitate inter-site comparison of model performance.
The daily simulated surfaces created by our spatial statistical models at our respective study sites are being used to compute monthly and annual mean composite surfaces of PM2.5 concentration estimates across the communities of interest.