Wednesday, February 16, 2011

Verbal Paradata: How Voice Pitch Can Predict if People Will Answer Survey Questions

From Wikipedia: "The paradata of a survey are data about the process by which the survey data were collected. Example paradata... include the times of day interviews were conducted, how long the interviews took, how many times there were contacts with each interviewee or attempts to contact the interviewee, the reluctance of the interviewee, and the mode of communication (such as phone, Web, email, or in person). Thus there are paradata about each observation in the survey. These attributes affect the costs and management of a survey, the findings of a survey, evaluations of interviewers, and inferences one might make about non-respondents."

This paper by Adam Safir, Tamara Black and Rebecca Steinbach notes that paradata is not always made available to data-analysts; therefore, having it to hand is a considerable advantage. This study by Dirk Heerwegh shows that respondents with less stable attitudes need more time to respond to an attitudinal question. Active Management is an initiative at Statistics Canada to use paradata to improve the data collection process in surveys. Jim O’Reilly from Westat reviews recent uses of paradata here. He discusses the U.S. Census Bureau's Performance and Data Analysis (PANDA) system that has been implemented for the 2007 American Housing Survey. Key goals of the system are to provide early warnings of interviewer difficulty with key survey concepts and possible falsification.

This dissertation by Matthew Jans (University of Michigan) goes one step further. "Exchanges between interviewers and respondents were transcribed and coded for respondent speech and question-answering behavior. Voice pitch was extracted mechanically using the Praat software. Speech, voice, and question-answering behaviors are used as verbal paradata... Results show that verbal paradata can distinguish between income nonrespondents and respondents, even when only using verbal paradata that occur before the income question. Income nonrespondents have lower affective involvement and express more negativity before the income question... There are... potential extensions to interviewer training and design of interventions that could produce more complete income data." Matthew Jan's dissertation "Verbal Paradata and Survey Error: Respondent Speech, Voice, and Question-Answering Behavior Can Predict Income Item Nonresponse" is available here.

1 comment:

Liam Delaney said...

interesting stuff - we should definitely be trying to put more process data into surveys including details about the interviewers, interview process, survey length, time take to answer the questions and so on. The LISS project in Holland is making a lot of this type of data available.