Gerard O'Neill has flagged an interesting data collection engine on the Amarach Research Blog; its called: "We Feel Fine". Full details are avilable here. The engine automatically scours the Internet every ten minutes, harvesting human feelings from a large number of blogs. Blog data comes from a variety of online sources, including LiveJournal, MSN Spaces, MySpace, Blogger, Flickr, Technorati, Feedster, Ice Rocket, and Google. Blog posts are scanned for occurrences of the phrases "I feel" and "I am feeling".
Once a sentence containing "I feel" or "I am feeling" is found, the system "looks backward to the beginning of the sentence, and forward to the end of the sentence, and then saves the full sentence in a database. Once saved, the sentence is scanned to see if it includes one of about 5,000 pre-identified feelings. Because a high percentage of all blogs are hosted by one of several large blogging companies (Blogger, MySpace, MSN Spaces, LiveJournal, etc), the URL format of many blog posts can be used to extract the username of the post's author.
Given the author's username, it is possible to find the user's profile page; and from the profile page, it is possible to extract the age, gender, country, state, and city of the blog's owner. Given the country, state, and city, information on local weather conditions is also retrieved! This could be a powerfiul tool for understanding cross-cultural variation in self-reported well being, while controlling for geography and weather conditions.
No comments:
Post a Comment