Thursday, January 21, 2010

Unemployment Data and Google: From Forecasts to History Class

A belated thanks to Michael Breen for pointing out a recent article on VoxEU.org about predicting unemployment using Google Trends. The article is by D'Amuri and Marcucci: "The predictive power of Google data: New evidence on US unemployment". I have followed the literature on Google-search data and unemployment ( previously, here), but wasn't aware of the work by D'Amuri and Marcucci until now.

The research mentioned on the blog before (by Varian and Choi) was conducted to predict U.S. unemployment insurance claims using Google Trend data based around keywords such as "unemployment" and "social insurance". D'Amuri and Marcucci differ in their approach by using the "Google Index" – the incidence of Google job-search related queries over total queries – proved to have predictive power in forecasting unemployment in Germany and Israel (see Askitas and Zimmermann 2009 and Suhoy 2009).

Both approaches improve the predictive power of unemployment forecasting, but each has some limitation. D'Amuri and Marcucci mention that the Google Index could be partly driven by on-the-job search, rather than unemployed job search activities. It can be argued that this is only a problem to the extent that on-the-job search happens during a recession. D'Amuri and Marcucci also consider that not everyone has access to the internet, and therefore that people using the internet for job search are not randomly selected among job-seekers. It follows that people using the internet for unemployment benefit information are not randomly selected among the newly unemployed. These are illustrations of the sample selection problem in econometric analysis (as distinct from self-selection; Heckman provides an overview here).

So while Google Trends is a useful tool in unemployment (and other) predictions, there are reasons to be cautious; in particular, in relation to sample selection. Another limitation that has been remarked upon is that Google Trends only provides data from 2004 onwards. This is why I was intrigued when I typed "unemployment" into Google earlier today, investiagted the "options" at the top of the page, and then clicked "timeline" instead of "standard view". What I got was a picture along the lines of the one below, except that the chart was for unemployment from 1900-2010, instead of "Book of Revelation". I was unable to take a screen-grab of the unemployment chart; but you can follow a link to it here.


It's possible to click on any decade in the chart, any year and any month. Associated news stories appear in each of these categories. This is a powerful tool for finding out what was beeing reported in the media at the time any major news story was being covered. All of this is powered by Google News Timeline: a web application that organises information chronologically. "It allows users to view news and other data sources on a zoomable, graphical timeline. You can navigate through time by dragging the timeline, setting the "granularity" to weeks, months, years, or decades, or just including a time period in your query." However, using the News Timeline application directly is somewhat different to using the "timeline view" in web search. The latter provides charts such as the one shown above.

The unemployment chart shows two spikes: one at the start of the 1930's, and one in 2009. But what does this mean? According to the Google Blog, "the graph across the top of the page summarizes how dates in your results are spread through time, with higher bars representing a larger number of unique dates." Where does this historical data come from? From Google's "News Archive Search" service. News Archive Search produces the same results as the "timeline view" in web search. Search results include content from a number of sources, including both partner content digitized by Google through their News Archive Partner Program and online archival materials. More information about News Archive Search is available here.

There are some parallels to be drawn between News Archive Search (particularly the associated graphical illustrations) and the "news reference volume" feature in Google Trends. However, it is important to note that Google's "timeline" news graphs are based on monthly data-points; not daily data-points such as those used in (Trends) news reference volume. According to Google, Archive Search works as follows: "Articles related to a single story within a given time period are grouped together to allow users to see a broad perspective on the topics they are searching."

1 comment:

Martin Ryan said...

Google's "timeline view" in web search is not to be confused with the "Google Timeline". This is a fascinating and extremely detailed overview of how Google has evolved since 1995:

http://www.google.com/corporate/timeline/


The last zeitgeist available is for the year-end of 2008:
http://bit.ly/NdOD

You probably could have guessed, but 2008 was all about this:

1. sarah palin
2. beijing 2008
3. facebook login
4. tuenti
5. heath ledger
6. obama
7. nasza klasa
8. wer kennt wen
9. euro 2008
10. jonas brothers