Friday, April 30, 2010

The paradoxical effect of omitted variables

Omitted variables are terrible. If you are beset by them then (& unless you are lucky that they are orthogonal to whats included) you are condemned to regression hell: your coefficients are biased and inconsistent, you cannot derive policy relevant conclusions and your girlfriend won't love you anymore.
So the conclusion is to get better data. So say you do and you now have a previously omitted variable in your data. You should include it, right?
Wrong actually, if the paper below is correct which it looks like being. The problem is that the standard results in this area are based on there being only one omitted variable. If you have two omitted variables the bias on whats included depends in a messy way on all the correlations between the X's.
Say the model is:
Y=b1*X1 + b2*X2 + b3*X3 [ignoring the constant & disturbance term]

So you don't observe X2 and X3 initially so your estimate of "b1" is biased. It may seem counter-intuitive but adding X2 does not necessarily get you a better estimate of "b1". Actually, its quite intuitive: say omitting X2 was biasing b1 upwards and omitting X3 was having the reverse effect. So its quite possible you could have a small [or even zero] bias and adding in one of them makes things worse. I don't think you don't actually need these opposing biases for the result to hold because there is also the X2,X3 correlation.
Its rather analogous to the Second Best Theorem in Welfare Economics due to Lipsey & Lancaster.
The practical problem is that there may always be an "X3", that is typically you cannot be sure that you have all the relevant variables. Its all rather disturbing.

The Phantom Menace: Omitted Variable Bias in Econometric Research , Kevin Clarke


Martin Ryan said...

Shouldn't one specify a model to be as theoretically informed as possible? Then do various specification tests?

Say if one collects information on personality traits to get a richer specification in a model that estimates the returns to education. Is it better not to collect that information (on personality traits) at all?

Kevin Denny said...

Lets say you are interested in the returns to education. You think "Well I should include personality because they might be correlated so omitting it might bias my parameter of interest". The problem is that unless you are certain that there is no other omitted variable, you cannot be sure that adding in the personality vars will improve your estimate of education returns.
And how can you ever be sure?

Martin Ryan said...

I should clarify that I get the rationale outlined in your main post. I guess my real issue is how far this can be taken.

What if one was estimating returns to education with a particular survey that didn't measure labour market experience --- but then one found a comprehensive labour force survey that allowed one to include labour market experience.

Labour market experience was a previously ommitted variable, but it belongs on the right-hand side. Surely future-orientation, willingness to take risks, conscientiousness, agreeableness and extraversion also belong on the RHS?

Would the purist have to insist: "I can't put labour market experience on the RHS because I have not observed personality traits." The worry might be that more conscientiousness and extraverted individuals are more likely to attain labour market experience. Of course, it is also reasonable to suggest that they would have higher earnings.

Kevin Denny said...

The purist would be correct in saying that including experience now does not necessarily reduce the bias on returns to schooling given the omission of personality, assuming that personality has been incorrectly excluded.
The result is a complete pain in the arse. It might be possible to put some limits or provide guidance as to when this is an issue, I am not sure. If you are interested in the returns to experience that might be a good reason to include it but you can't be sure its going to help you out on the other stuff.