Thursday, March 18, 2010

Using Google Trends to measure zombie attacks

Marginal Revolution points to one of the more unusual pieces to be written by Andrew Gelman. Cowen  particularly likes the line: "We originally wrote this article in Word, but then we converted it to Latex to make it look more like science."

I am gathering from this that Professor Gelman is sceptical about the use of Google Trends and related net harvested data. We have posted a lot of links to work that is ongoing in this area. Over time, the material has looked increasingly promising but we certainly have not seen anything yet on this blog that would knock your socks off in terms of the potential of these types of data e.g. the use of google trends to forecast unemployment is mildly interesting but doesn't look much better than using simple consumer sentiment data (though it may be cheaper of course, which is obviously a consideration). If anyone wants to defend the use of this type of data for social science from Gelman's ridicule, please feel free to suggest something to post. For now, I am going to have a nice comfortable lie-down on the fence, as I have seen nothing that makes me want to change from using good survey data but I would be surprised if people do not come up with some strong applications soon.

Kevin Denny said...

I have read Gelman's piece & I wouldn't draw any serious inference, its just a very light-hearted skit. He is also a very good scholar.

Michael Daly said...

I definitely think there's something there or it's worth identifying the extent that the data is useful vs spurious. My first test was to correlate the weekly time series of "suicide" from 2004 to now with "happy". r (323) = -.376!

Michael Daly said...

the presence of 'zombies' and the prevalence of 'attacks' r(323) = .48.. beginning to think I should get back to my proper data!

Liam Delaney said...

Kevin - humour and satire are different. We have had this one before. If he just wanted to crack a joke, there are plenty of three-line knock knock jokes he could have put out that would have taken him a lot less time. When someone like Gelman (serious scholar as you say) goes to the bother of a lengthy joke like this they are usually trying to make a point of some sort. The point I think he is making here is that there is a lot of hype around google-type data and a danger that there will be a lot of research funded of dubious value in the rush to jump on the bandwagon. I agree that the paper does not allow one to draw "serious inference" but it does, to me, have a clearly satirical as opposed to just a humorous intent.

Liam Delaney said...

actually, based on some links people have sent me I am not even sure the satire here is about webdata. There seems to be a number of zombie papers floating around. I am not going to spend time trying to figure out what they mean. If Kevin is right and they are just a pure joke without any satirical target then I will add an addendum to the post but I will leave it as is for now.

Kevin Denny said...

Gelman is making a joke: he is being ironic. Liam is not getting it. Gelman is American. Liam is Irish. Whats wrong with this picture?

Liam Delaney said...

Is there really no satirical intent with this zombie stuff? Surely, the point of writing papers like that is to point out the logical absurdity of following a line of thought or research to an extreme e.g. when Heckman wrote the spoof about figuring god's preferences. You are just repeating that it is a joke Kevin and I did get that it wasn't a real paper. But are you sure he is not trying to make any point at all? You are a big fan of the Sokal paper. In some sense that was just a joke as well but I think you agree that he was trying to make a point.

Martin Ryan said...

Hopefully we can all agree that there is no such thing as a zombie. Unemployed people on the other hand, are a very real human problem, and growing in large numbers.

While the (web-data related) innovations in unemployment forecasting may not be earth-shattering, what else can we learn from trends in seach queries? There needs to be a focus on what the analyst 'expects' when typing in "zombie" or "unemployment" or "dole" into a trend-analyser.

For example, do unemployed individuals looking for information about welfare payments type in "unemployment" or "dole"? Or something else? One experiment is to type in "unemployment", "dole" and "jobs" into Google Trends, separated by commas.

A few observations can be made:

(i) The search volume for "jobs" is relatively stable over the last 6 years

(ii) News reference volume for "jobs" has exploded over the last 6 years, much more so than for "unemployment"

(iii) There is only enough search activity related to "unemployment" for it to register half-way during 2008

(iv) There is only enough search activity related to "dole" for it to register at the start of 2009

(v) There is a fall-off in search volume for "jobs" at the end of every calendar year

While much of this mirrors what we already know about recent economic activity, I had expected "jobs" to have a much higher search volume over the last year. We of course have to be very careful about drawing conclusions, but the stylised facts about search volume suggest that there were more people searching for jobs in 2004 and 2005 than there were in 2008 and 2009.

We know that there were more people in need of a job in 2008 and 2009, so what is the explanation? Perhaps job-search is more intense during boom-times. In recessions, maybe people are less likely to search for a job (which they simply believe isn't there). This could of course be incorrect, but now there is an open question.

Martin Ryan said...

More challenging questions about search data are currently at play in the commercial arena; it may be no coincidence that Google Trends was opened up to the public (and academics) in 2006, just as these questions were coming more to the fore.

At present, Google, Yahoo and Bing are strongly focused on distinguishing between "interest" and "intent" in search data. There are obvious commercial implications, but solving this problem about interest versus intent would also help academic researchers. When somebody searches for "jobs" do they just want to see "what's out there" (maybe in boom times) or do they desperately intend to obtain employment (maybe in recessions)?

Maybe additional keywords are of no use in solving this interesting puzzle. Additional keywords are important. If you search for "XBOX Price", Google can assume to some extent that you intend to buy an XBOX.

Here is an article from last year about Google executives stating that "understanding people, health, communication, education and knowledge" is the next frontier of search.

Here is a link to Yahoo!'s "Mindset" research project on Intent-driven Search:

Last month, there was an article in the Economist about about Qi Lu: the man behind Bing. According to him, the focus is firmly on "understanding user intent".

Link here:

It's clear that understanding more about search is the big challenge: for the search-engine based advertising business and for social scientists. And here is the main reason why search data is (or should be) so interesting for academics: we don't ask people for the information they provide in search queries. It's a simple statement, but it has value. No matter how well-designed surveys are, there will always be things in the ether, trends in society, that will appear in search data first.

Martin Ryan said...

Errata: News reference volume for "jobs" has exploded over the last 2 years, not the last 6 years.

Michael Daly said...

