Sunday, December 11, 2011

Using the Internet to Predict the Future

I blogged recently about the predictive power (or not) of Twitter: from marketing to finance; and X-Factor to elections. While there may be skepticism about the predictive power of social networks; (perhaps more so for finance than for marketing/elections/popular culture); even when using an ostensibly professional network such as Twitter; there is no doubt that the medium produces a lot of user-generated information. However, Twitter-users are a select sample: a point emphasised in a recent study by Yahoo! Research. Nonetheless, the production, flow, and consumption of information will undoubtedly be an interesting area of economic research to follow in the future. Indeed, such information is not confined to networks such as Twitter; the whole internet is a veritable goldmine of potentially predictive data.

This possibility has been tapped into before by researchers using Google Trends. I posted on the old Geary blog about a comparison of Google Trends data with polls, bookmakers' odds and prediction markets (based on the British Election): which showed that it is important to be careful when interpreting search data. In particular, I noted before that it is important to distinguish between "interest" and "intent". It is clear that understanding more about search is a big challenge: for the search-engine based advertising business, and for social scientists. Search data is (or should be) interesting to academics; principally because we don't ask people for the information that they provide in search queries. No matter how well-designed surveys are, there will always be things in the ether, trends in society, that will potentially appear in search data first.

Of course, anyone interested in predicting the future should be poring over search data, social-network data, and whatever else they can find on the internet. That is exactly what a company called Recorded Future does. A recent article in the New York Times says:
"A company called Recorded Future looks at 100,000 Web pages an hour, scanning across 50,000 sources that include everything from Securities and Exchange Commission filings to Twitter comments. The idea is to look for statements about the future, like notice of an annual meeting or predictions about when a product might be released, look at past developments and then create a temporal index that suggests trends... its clients have included government agencies and banks. Its products include a $9,000-a-month service for hedge funds that plugs Recorded Future’s insights into their trading networks... (it) also started offering a Web-based version of its product on a subscription basis for $149 a month."
According to the NYT article: "two... key competitors in the Web-based predictions business (are) Palantir Technologies and Quid. Aside from those companies, the open-source statistical programming language known as R is being used as a cheap way to make statistical inference in our data-drenched world; a company called Revolution Analytics sells a commercial version to financial companies and manufacturers, among others."

Wired Magazine ran a piece on Recorded Future last month; saying:
"They aren't traders... but if you'd started using Recorded Future's predictions to buy US stocks on January 1, 2009, you would have made an annual return of 56.69 per cent. (The S&P 500 had an annualised return of 17.22 per cent over the same period.) Between May 13 and August 5 this year, as markets behaved with vertiginous abandon, their strategy returned 10.4 per cent; in contrast, the S&P 500 lost 9.9 per cent of its value. They're data experts: computer scientists, statisticians and experts in linguistics. And in the data, they think, lies the future."
As the promos say: "unleash all that mankind knows about the future". It's what they used to call "the wisdom of crowds".

Postscript: The Salfordian reports that: "the Living Earth Simulator Project (LES) aims to ‘simulate everything’ on the planet, using anything from tweets to government statistics to map out social trends and predict the next economic crisis... The European Commission has... put the Living Earth Simulator at the top of its shortlist for £900m in funding." Also, DCU PhD graduate Adam Bermingham won this year's Irish Software Association award for a student project with the greatest commercial potential. Adam researched, designed and implemented a real-time sentiment monitoring system, SentiSense, to determine how people value different types of opinion when they are monitoring real-time social media content.

