Fig 1. Countries used & number of people. |
I used data from round 6 of the European Social Survey (2012), a randomized cross-sectional survey of people aged 15+ from many countries in Europe. The EES, which is free to access, has been conducted every 2 years since 2002. The 2012 edition contains data from 29 countries but I examined only 15 of these for my own arcane reasons (I couldn't fit them all in one graph). There are over 29,000 people in my sample and the frequency distribution by country is described in Figure 1.
The question I was interested in was "How satisfied are you with life as a whole?" ranked on a scale of 0 (extremely dissatisfied) to 10 (extremely satisfied). Among the 15 countries I examined, Denmark ranks first on this measure with an average score of 8.57 and four of the top five scoring countries are from Scandanavia, with Switzerland making up the other spot. The top five here are also the 5 highest-ranked European countries in the Legatum 2013 Prosperity Index, which measures prosperity via a mix of income and wellbeing measures.
The top five are followed by Netherlands, Germany, Britain and Poland who all report average scores over 7. Spain, Ireland, Italy and France are next with scores in the 6.4-6.8 range (I knew France had low wellbeing scores, which apparently motivated former President Sarkozy to set up the Stiglitz-Sen-Fitoussi Commission a few years ago, but I'm a little surprised at how low Ireland is. A major recession will do that I suppose). Portugal and Russia are the two most dissatisfied countries.
Next I examined wellbeing at each decade in life from age 20 to 80. The below figure replicates the famous U-shape of wellbeing by using predicted probabilities of life-satisfaction after controlling for age and age squared. The Stata code I used to produce it is below. The outcome variable "ls" is life-satisfaction. The double ## symbol is a shortcut in Stata which tells it to calculate both the independent effects and interaction effects of the two variables either side of the hashtags - in other words it's the same as including separate variables for age and age squared. The "c." before the "age" variable informs Stata that it is a continuous variable. Looking at the regression again I'm now thinking I should have clustered the regression by country but since this is just for fun I'll let it slide.
Stata code
reg ls c.age##c.age
margins, at(age=(20 30 40 50 60 70 80))
marginsplot
Lastly, I wanted to see whether the U-shape varied by country. I used a very nice command created by Ben Jann called coefplot rather than marginsplot because the former has some nicer graphing options. In order to get all the different estimates in one graph, I ran the regression country-by-country, stored the estimates, and then combined them using coefplot. In the "coefplot" line of code, the command "vertical" recasts the graph along the vertical axis and "nooffset" forces all the data to be plotted on the same plane. The "recast" and "symbol" commands just specify aesthetic changes and "noci" suppresses confidence intervals. Switzerland and Spain are omitted from the below regressions because I forgot to include them.
Stata code
reg ls c.age##c.age if country == "Britain"
margins, at(age=(20 30 40 50 60 70 80)) post
estimates store BritainWB
reg ls c.age##c.age if country == "Denmark"
margins, at(age=(20 30 40 50 60 70 80)) post
estimates store DenmarkWB
... (repeat the above for all countries)
coefplot BritainWB DenmarkWB Finland WB FranceWB GermanyWB IrelandWB ItalyWB NetherlandsWB NorwayWB PolandWB PortugalWB RussiaWB SwedenWB, vertical nooffset recast(connected) noci symbol(point)
The U-shape seems most symmetrical in Germany, Britain, Ireland, Poland and to a lesser extent in Norway, Netherlands and France. There's more of a steady increase in Sweden which is relatively flat from age 20-40 and then rises steadily for each subsequent decade. Old people in Denmark seem to be the most satisfied people in Europe, possibly the world. The way the curve bends for old Danes makes me think there's probably some 100 year old in Copenhagen who reports a score of 11. Italy, Portugal and Russia are exceptions to the U-trend due to low levels of life-satisfaction in the 60-80 age range.
Update 6.11.14
Here is the Stata code I used to produce the above analysis (now with Spain & Switzerland included):
use "C:\File Location\ESS6e02.dta", clear
keep idno cntry stflife agea
rename idno id
rename cntry c
rename stflife ls
rename age a
drop if a > 90 | ls > 10
drop if c == "AL" | c == "BE" | c == "BG" | c == "HR" | c == "CY" | c == "CZ" | c == "EE" | c == "HU" | c == "IL" | c == "IS" | c == "LT" | c == "SI" | c == "SK" | c == "UA" | c == "XK"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Swiss
reg ls c.a##c.a if c == "DE"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Denmark
reg ls c.a##c.a if c == "DK"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Germany
reg ls c.a##c.a if c == "ES"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Spain
reg ls c.a##c.a if c == "FI"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Finland
reg ls c.a##c.a if c == "FR"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store France
reg ls c.a##c.a if c == "GB"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store GB
reg ls c.a##c.a if c == "IE"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Ireland
reg ls c.a##c.a if c == "IT"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Italy
reg ls c.a##c.a if c == "NL"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Netherlands
reg ls c.a##c.a if c == "NO"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Norway
reg ls c.a##c.a if c == "PL"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Poland
reg ls c.a##c.a if c == "PT"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Portugal
reg ls c.a##c.a if c == "RU"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Russia
reg ls c.a##c.a if c == "SE"
margins, at(a=(20 30 40 50 60 70 80)) vsquish post
estimates store Sweden
coefplot Denmark Germany Finland France GB Ireland Italy Netherlands Norway Poland Portugal Russia Spain Sweden Swiss, nooff vertical recast(connected) noci symbol(point)
12 comments:
Not being familiar with the literature, one wonders to what extent the U-curve is flatter or steeper for persons with children than those who never had.
Separating out age-period-cohort is tricky, see for example,
Yang, Yang. "Social inequalities in happiness in the United States, 1972 to 2004: An age-period-cohort analysis." American Sociological Review 73.2 (2008): 204-226.
http://asr.sagepub.com/content/73/2/204.short
Dear Marc,
Interesting post that I tried to redo using Stata and the ESS6e02.dta file.
However, I have some trouble to reproduce your result. Actually, the DV "ls" is not in the file, but instead "stflife". So, I assume you did some pre-processing to create your "ls". Can you also publish that part of your code?
Thanks!
Eric
Hi Eric,
'stflife' is indeed the correct variable, I just renamed it to 'ls' myself using the command "rename stflife ls".
Thanks for pointing that discrepancy out and please let me know if you discover any other errors during your replication.
Mark
Dear Marc,
Thanks. Note that age is actually agea in the file. Using your syntax, edited, I ran:
reg stflife c.agea##c.agea
margins, at(agea=(20 30 40 50 60 70 80))
marginsplot
Interestingly, I now do not get the U-shaped margins plot but instead a downward linear line. So, where do I go wrong?
Eric
Dear Mark,
Thanks for the explanation. Note that age is actually agea in the file. Modifying your code into:
reg stflife c.agea##c.agea
margins, at(agea=(20 30 40 50 60 70 80))
marginsplot
The plot that I get is not U-Shaped but linear (downward by age). So, did I miss something?
Eric
Eric,
That's interesting - did you isolate your regression to the 15 countries I used?
Mark
Eric,
I should also have mentioned in the blog-post that I first clean the life satisfaction and age variables. So for life-satisfaction, I deleted everyone with a value greater than 10 using "drop if stflife > 10" - this removes the 'refusal', 'don't know' and 'no answer' groups. I also deleted people older than 80 with "drop if agea > 80" - this is very important as there are 133 people in the data who are coded with an age of 999 meaning 'not available'. Including them in the regression will bias the estimates because Stata will interpret those people as 999 years old.
Mark
Thanks again Mark,
Following your instructions, I coded this syntax, which gives a a very similar result (I cannot explain the minor differences in country frequencies):
* Code 15 countries included in the analysis
gen cntry_mark = 0
replace cntry_mark = 1 if cntry=="GB"
replace cntry_mark = 1 if cntry=="DK"
replace cntry_mark = 1 if cntry=="FI"
replace cntry_mark = 1 if cntry=="FR"
replace cntry_mark = 1 if cntry=="DE"
replace cntry_mark = 1 if cntry=="IE"
replace cntry_mark = 1 if cntry=="IT"
replace cntry_mark = 1 if cntry=="NL"
replace cntry_mark = 1 if cntry=="NO"
replace cntry_mark = 1 if cntry=="PL"
replace cntry_mark = 1 if cntry=="PT"
replace cntry_mark = 1 if cntry=="RU"
replace cntry_mark = 1 if cntry=="ES"
replace cntry_mark = 1 if cntry=="SE"
replace cntry_mark = 1 if cntry=="CH"
* Code Mark's cohort (29,621 Observations)
gen coh_mark = 0
replace coh_mark = 1 if agea!=999 & stflife<=10 & cntry_mark==1
* Check the frequencies
sum cntry_mark // 54,673 Observations
bysort cntry_mark : tab cntry if coh_mark==1 // 29,716 Observations
reg stflife c.agea##c.agea if coh_mark==1
margins, at(agea=(20 30 40 50 60 70 80))
marginsplot // near identical with the web publication
Mark,
Note that I also tried:
replace coh_mark = 1 if agea<=80 & stflife<=10 & cntry_mark==1 // 28,511 Observations
But, again, that does not give me your 29,621 Observations
Eric
Cheers Eric,
I think the discrepancy is because I suggested dropping people older than 80 in the comment, but I trimmed the age by slightly different criteria when I was working with the data. I've now forgotten what that exact criteria was. For future blogposts like this I will include my do files to better facilitate replications.
Mark
I looked at this as well. Basically the data is trimmed at age <91 for the initial sample size calculation.
Post a Comment