Friday, August 14, 2015

Workshop on Adjusting for Non-Ignorable Missing Data using Heckman-Type Selection Models

Workshop on Adjusting for Non-Ignorable Missing Data using Heckman-Type Selection Models

Harvard University, September 8th 2015 0900 – 1800

Missing data is common problem in survey data, and standard approaches for dealing with this issue rely on the strong and generally untestable assumption that data are ignorable (missing at random) once we condition on the observed characteristics of respondents. The assumption of missing at random is often implausible, including in contexts where the outcome itself may be a predictor of survey participation. For example, estimates of HIV prevalence which rely on data collected from blood tests taken from respondents in nationally representative household surveys may be affected by selection bias if those who are HIV positive are less likely to participate in testing. Then, conventional adjustments for missing data, such as using imputation or inverse-probability weighting, will result in biased estimates because of an incorrect assumption of missing at random. Standard approaches are also likely to result in confidence intervals which are too narrow because they ignore the uncertainty surrounding the unknown relationship between participation and the outcome, which needs to be estimated.    
This workshop will introduce the use of Heckman-type Selection models for adjusting for non-ignorable missing data with the goal of making this approach easily accessible to researchers working with survey data affected by non-participation. A non-technical introduction to different approaches for dealing with missing data will be provided, and we will discuss the implications of not correctly adjusting for missing data which are not missing at random. We will provide an overview of the statistical rationale for the use of selection models, and the R package SemiParBIVProbit will be presented. This software allows researchers to implement this approach in a straightforward and transparent manner in a variety of different contexts affected by missing data. A simulation study will also be used to demonstrate the properties of the model. The final session will be interactive where participants are invited to bring their own datasets, and the audience and presenters will work together on implementing this approach in their own research. Alternatively, the organizers will provide example data. Throughout, we will illustrate the key concepts using data from HIV research.

The workshop is free and open to all interested parties, however space is limited so if you would like to attend please register with Mark McGovern ( The workshop will take place at Harvard on September 8th, exact location to be confirmed. Unfortunately we do not have the funds to cover expenses of participants.

Harvard University: Till Bärnighausen, Guy Harling, Mark McGovern
University College London: Giampiero Marra
University of London Birbeck: Rosalba Radice

Introductions and Background
Implications of Non-Ignorable Missing Data for Parameter Estimates
Introduction to Selection Models
Overview of Applications of Selection Models
Optional Session on Getting Started with R
Introduction to R Package SemiParBIVProbit
Simulation Studies
Interactive session with Data from Participants or Data Provided by Organizers

Key References
Bärnighausen, T., Bor, J., Wandira-Kazibwe, S., & Canning, D. (2011). Correcting HIV Prevalence Estimates for Survey Nonparticipation using Heckman-type Selection Models. Epidemiology, 22(1), 27-35.
Marra, G., Radice, R., Till, B., Wood, S., McGovern, M., 2015. A Unified Modeling Approach to Estimating HIV Prevalence in Sub-Saharan African Countries. Research Report 324, Department of Statistical Science, University College London.

McGovern, M., Bärnighausen, T., Marra, G., Radice, R., 2015. On the Assumption of Bivariate Normality in Selection Models: A Copula Approach Applied to Estimating HIV Prevalence. Epidemiology 26, 229–327.
Marra, Giampiero, and Rosalba Radice, 2015. A Regression Modeling Framework for Analyzing Bivariate Binary Data: The R Package SemiParBIVProbit.
McGovern, M. E., Bärnighausen, T., Salomon, J. A., & Canning, D. (2015). Using Interviewer Random Effects to Remove Selection Bias from HIV Prevalence Estimates. BMC Medical Research Methodology, 15(1), 8.

No comments: