Thursday, June 11, 2009

Why Researchers Should Always Check for Outliers, and What To Do About Them

"Researchers rarely report checking for outliers of any sort. This inference is supported empirically by Osborne, Christiansen, and Gunter (2001), who found that authors reported testing assumptions of the statistical procedure(s) used in their studies--including checking for the presence of outliers--only 8% of the time. Given what we know of the importance of assumptions to accuracy of estimates and error rates, this in itself is alarming. There is no reason to believe that the situation is different in other social science disciplines."

This quote is taken from a peer-reviewed electronic journal article on outliers by Osborne and Overbay (2004), both based at North Carolina State University.

Why do we care? The presence of outliers can lead to inflated error rates and substantial distortions of parameter estimates (e.g., Zimmerman, 1994, 1995, 1998). If non-randomly distributed (which is vert possible with survey data), they can decrease normality (and in multivariate analyses, violate assumptions of sphericity and multivariate normality), altering the odds of making both Type I and Type II errors. They can seriously bias or influence estimates that may be of substantive interest (for more information on these issues, see Rasmussen, 1988; Schwager & Margolin, 1982; Zimmerman, 1994).

What are outliers? An outlier is generally considered to be a data point that is far outside the "norm" for a variable or population (e.g., Jarrell, 1994; Rasmussen, 1988; Stevens, 1984). Hawkins described an outlier as an observation that “deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins, 1980). Outliers have also been defined as values that are “dubious in the eyes of the researcher” (Dixon, 1950).

Where do outliers come from? All of the below are described in detail in the Osborne and Overbay paper:

(i) Outliers from data errors
(ii) Outliers from intentional or motivated mis-reporting
(iii) Outliers from sampling error
(iv) Outliers from standardization failure
(v) Outliers from faulty distributional assumptions
(vi) Outliers as legitimate cases sampled from the correct population
(vii) Outliers as potential focus of inquiry

How do we identify them? Simple rules of thumb (e.g., data points three or more standard deviations from the mean) are good starting points. Some researchers prefer visual inspection of the data.

How do we deal with them? What to do depends in large part on why an outlier is in the data in the first place. Where outliers are illegitimately included in the data, it is only common sense that those data points should be removed. One means of accommodating outliers is the use of transformations. By using transformations, extreme scores can be kept in the data set, and the relative ranking of scores remains, yet the skew and error variance present in the variable(s) can be reduced (Hamilton, 1992). One alternative to transformation is truncation, wherein extreme scores are recoded to the highest (or lowest) reasonable score.

Instead of transformations or truncation, researchers sometimes use various “robust” procedures to protect their data from being distorted by the presence of outliers. These techniques “accommodate the outliers at no serious inconvenience—or are robust against the presence of outliers” (Barnett & Lewis, 1994). A common robust estimation method for univariate distributions involves the use of a trimmed mean, which is calculated by temporarily eliminating extreme observations at both ends of the sample (Anscombe, 1960). Alternatively, researchers may choose to compute a Windsorized mean, for which the highest and lowest observations are temporarily censored, and replaced with adjacent values from the remaining data (Barnett & Lewis, 1994).

All the references to the articles mentioned above are available in the Osborne and Overbay paper.

2 comments:

Kevin Denny said...

Related to the question of outliers is the question of "influence" where individual observations have a large/excessive effect on model parameters. An important contribution which has been around since I was a student (yes, that old) is Belsley, Welch & Kuh's book "Regression diagnostics" which popularized /introduced new techniques for diagnosing such observations such as DFFITS, Cook's D, DFBETAS.

Martin Ryan said...

Thanks for the reference Kevin; I must ask you more about Besley Welch & Kuh.