Problem Sets for Teaching Data Analysis

The following problems replicate published results and teach students quasi-experimental methods. The problems are appropriate for advanced undergraduates or graduate students learning statistical programming. Instructors can e-mail Tal Gross for solutions.

Regression-discontinuity designs

The following replicates the regression-discontinuity design explored by Anderson, Dobkin, and Gross (2014).

  1. Import into your preferred statistics package two files: an inpatient file and an emergency department file. Those files contain counts of visits to the hospital by age.
  2. Re-create the five graphs in Figure 1 of the paper. The figures should include the data points in the scatterplot, as well as the two regression lines. The regression model is a spline model with a discontinuous jump right after age 23. The key variable is “age in months.” You therefore want to focus on a linear spline OLS regression model which includes three right-hand side variables: an indicator variable for whether the individual is age 23 or above, the age in months (to capture the linear trend), and an additional variable that captures effect of age in months after age 23.
  3. The OLS regression results that are used to generate the lines in Figure 1 do not match the coefficients in Table 1. Describe intuitively why these estimates are not the same.
  4. Next, reproduce all of the results in Table 2. You should be able to reproduce these results exactly. The table in the paper only includes point estimates and standard errors. You should also include p-values underneath the standard errors for each “cell” in the table. (Note that all of the dependent variables in Table 2 have been transformed by taking natural logs; you need to do that in order to match the results exactly.)

Difference-in-differences regressions

The following replicates some of the difference-in-difference regressions explored by Gross and Tobacman (2014). To preserve confidentiality, the authors could only release aggregate counts, but the data are sufficient to approximately replicate the main results of the paper.

  1. Import into your preferred statistics package a file with emergency-department visits. The data is composed of group-week observations: the variable paper_date_2008 indicates each group by the date in which it received its paper check in 2008. The variables start_of_week and end_of_week indicate the dates when the visits underlying the observation occurred.
  2. Run difference-in-difference regressions for three outcomes: adult male ED visits, adulte female ED visits, and alcohol- and drug-related visits. The regressions should have a fixed effect for each paper-check date, a fixed effect for each week, and the logarithm of the visit count as the outcome of interest.
  3. Create the associated event-study figures. These figures are tricky to create, but nearly all papers that rely on standard difference-in-difference designs include figures such as these.