Thursday, August 26, 2010

SPSS/STATA

I don't know any empirical economist who uses SPSS (though I am sure some exist) so I guess this is more for the wider readership. Most psychology and business postgraduate students that I know work with SPSS and also a relatively big chunk of people in medicine and related fields. Ignoring for a moment the fact that many fields and subfields require more specialist software, what do people think on this one? If an MPH student tells me that they are going to do an empirical thesis using, for example, micro-health data and that they are happy enough with SPSS should I roll along or start shouting? In general, does this matter for day-to-day empirical research? In my case, its more or less a no-brainer in that I do not have any empirical collaborator who uses anything other than STATA for their main work so collaboration would be difficult if I started sending them SPSS syntax files. Let me phrase a relatively simple questions - If I was teaching the methods course in public health or psychology which package should I use?

25 comments:

Dennis Alexis Valin Dittrich said...

R (http://www.r-project.org).

Liam Delaney said...

Will post on R soon. You dont have to search very far on the net to find people who believe that R is the future and far superior to other alternatives particularly given that it is free. I have certainly seen a number of examples in things like graphics where R-routines do some amazing things. I guess it would be good to get a sense of the basic parameters of these programmes along the lines of (i) how much fixed cost in terms of using them (ii) how much of this fixed cost is pointless stuff like having to learn programme-specific language rather than actual statistical code (iii) what costs do these packages impose in terms of talking to the main software packages and collaborating with non-initiates (iv) what are the comparative advantages for different types of users,

Judy said...

My Econometrics TA for the first year of the PhD programme was totally against STATA - he said he would never ever use it referring to it as a "canned package". All our problem sets were done in MATLAB and even still our TA said that R was far superior. We have got a Nobel laureate who apparently only uses FORTRAN.

I do not know a huge deal about programming languages but from what I can see there is a clear divide between who uses what - Macroeconomists, probably owing to their fondness for numerical methods seem to use more powerful packages such as R and FORTRAN whereas Applied Micro people seem to be happy with good oul STATA. But as I said, this is just my observation.

Kevin Denny said...

I have looked at R a bit. Its pretty unfriendly for the same reason as Latex & hence will attract people for the same reason. For some purposes it may be useful but as a general purpose statistical platfrom I think Stata dominates by far. But if you need to do fancy 3D graphics or do serious non- and semi-parametric regression the R is the way forward unless you want to pay for S+. Stata has generally been edging into new areas but I think there will always be a place for both. Vive la difference

Liam Delaney said...

Yes, the "canned" package argument comes a lot. I guess I have sympathy for that argument particularly for colleagues who genuinely are constructing estimators that are not available in "non-canned" format.

Did you find the MATLAB aspect useful Judy?

Liam Delaney said...

one thing we should certainly do in universities like UCD is team with computer science to allow people in economics to take courses for credit in different programming languages. Realistically it would be difficult to include detailed instruction on things like FORTRAN within the context of a general MA programme but it would not be difficult to allow a student to take it for credit in computer science.

Mark McGovern said...

I don’t see why you would consider anything other than Stata for a masters programme. At least a general masters that wasn’t part of an integrated 4+ year PhD. In terms of ease of use (you could teach anyone how to run a regression) and features I can’t really see the argument for anything else. At the end of the day it’s like LaTeX vs Word, the output is going to be exactly the same no matter which you choose. So I don’t understand the "canned package" argument for 99.99% of students who have no interest whatsoever in playing around with matrices. For undergrads who have little experience with econometrics let alone statistical packages they just want will be happy with something which is widely used and is moderately accessible. They won’t be programming up their own estimators so there’s no point in subjecting everyone to a course on FORTRAN, GAUSS or even R, it would just be cruel. Although my (limited) experience with MATA suggests it can do a lot of the things that Matlab can. It’s a different matter for PhD students who are specialising in particular fields, but let them take field courses in whatever they’re interested in. I can certainly see the R argument for them.

Mark McGovern said...

Btw it’s the same with LaTeX, there’s no point in using it for a master’s thesis unless you’re considering doing a PhD. The fixed cost just isn’t worth it unless it’s spread out over 3 or 4 years. Having said that, once you are familiar with it there are some things you just couldn’t do any other way. Try writing 7 reports with 100 tables each on the same data. If you have your code done up properly you shouldn’t have to manually edit it. With LaTeX and a good do file it will take you 30 minutes, it would take 30 days in word.

Alan Fernihough said...

Well I would have to contradict the consensus on this thread. I recently changed from STATA to R and found it to be a very simple transition. To me there's no real difference between the two packages -- they're both big calculators that invert matrices. The language is very similar, there's loads of help online, and there's also a module which permits drop-down menus for the less intellectually inclined.

Also, I should put a good word in for gretl -- which I played around with and it seems pretty damn easy to use also.

Personally, I think its a waste of money for universities to be spending so much money on all this software for which there are alternatives. The same goes for office (open office) and windows (linux). Has anybody here tried using Ubuntu or RedHat? They're fine too. All this other crap is a complete waste of university resources. Hopefully, they can start to move to freeware solutions in the future.

Alan Fernihough said...

That said I agree 100% with Liam and Mark -- if a student has access and has only been exposed to STATA they should just stick with it for the three months of a masters thesis.

Oh and on SPSS, I know some people are quite quick to dismiss it as inferior, but as far as I am aware there are some nice packages on it which are not contained in other software.

Peter Carney said...

Writing computer code is an intellectual activity? and more intellectual than using a drop-down menu?

So does it follow that using a hand-saw is more intellectual than using a power saw? these are just tools. I say, use the ones that are proven to work (in most cases) and don't spend all day in the hardware shop..

To use a different analogy, this debate (LaTex..R..Word) isn't dissimilar to asking if one should drive a Mercedes, an Audi, or a VW.. or cycle. They will all get you there, but the alloys on the Audi might be the prettiest..the VW has good fuel consumption. So what!? Should taxi-drivers sit around and come to a profession standard for the vehicles they use?? no, in my opinion (although it's worth noting the NYC taxicab; they use the same colour, but different manufactures).

The choice of tools comes down to taste and time-allocation; I think there are greater returns to using time on other aspects of out MA/PhD than individually learning lots of different (potentially good..) tools and, as Dave, our in-house latin scholar might add: de gustibus non disputandum est.

On freeware systems, and programs, Yeah, they are fine until you try to interact with the outside world -- even printing a document over a network using an OS like linux, for example, can become a very complex task; which quickly erodes the marginal benefits of these alternative systems, as far as I see it.

QED

Alan Fernihough said...

RE drop down menus: I knew somebody would mistake my sarcasm.

RE linux other freeware: the maths department in UCD have (or had) this setup (redhat) and everything like internet and printer networks worked fine. Additionally, R, gretl and open office and so forth work fine on my machine.

QED.

Peter Carney said...

Sarcasm. Ok. That's clever.

So do you bring your 'machine' over to the maths department for printing?



We're economists, right!? Then we know that a certain element of this "progression" isn't new or surprising. If I can be dismal for a minute, professionals (not excluding academics) have a history of making costly efforts to increase the barriers of entry and obturate communication so as to protect their rents. Accept it or not, the 'professional' norms being established (re:LaTeX, e.g.) is, not a substantive improvement in our science or add great value to our output. The same idea applies to other tools discussed here, that are becoming standard practice..

In a less dismal view, I fully acknowledge some people are naturally wired to enjoy new ICT on an intrinsic level. I can personally relate to this - it's a hobby.

And speaking of hobbies, the Latin, fragrantly flying around here, is a particular case in point re: barriers of entry to academia (their gowns are also highly relevant too, but for another day).

...maybe in time we will make reference in LaTeX code with similar jest..

\end

Judy said...

@Liam : Ya, I really enjoyed the MATLAB, its quick and easy to learn and pretty effective. Though there were a few people in my class that used R because the graphs were nicer :)

I think a background in MATLAB is very useful to have as its a pretty straight forward language but I want to use R in the future, heard so many great things about it and most importantly because its free.

I have heard alot of good stuff about PYTHON too. Apparently If you use PYTHON or maybe it was FORTRAN, with R at the same time, they kinda integrate so you can make changes in your R code and it will effect the FORTRAN/PYTHON program so you have FORTRAN/PYTHON as your main program and use R for making changes, or something like that; I wasn't really paying attention to the TA but it sounded pretty cool.

Liam Delaney said...

Alan - "waste of money" is too strong. Though a cost-benefit analysis of some of this spend would be a useful thing to do. From my point of view, particularly when I developed courses for undergraduate and MA econometrics it felt like an obligation to teach the "industry-standard" though having said that it would probably be easier for students to switch to STATA having mastered R than vice-versa. Also, we have increasing numbers of students who will be going back to work in very poor countries eventually so this makes R more appealling.

Liam Delaney said...

Would it make sense to have a "software" module or is this missing the point a bit? MATLAB, FORTRAN, R crash courses at the start of a programme made optional for people likely to develop these things. Or made part of the summer programme here in the Institute.

Liam Delaney said...

Could call the summer school "Nerdathon" or "Nerdaggaden". Other suggestions incorporating the word "nerd" welcome.

Peter Carney said...

It might be worth introducing Dvorak's keyboard to the debate. It has known efficiency gains, and causes fewer strain injuries. In keeping with the current rate of things here, It would make sense to include a tutorial on it at the summer school.

On naming... how about Nerd-Comp

Liam Delaney said...

Peter - in terms of barrier to entry many people make the point that packages like LaTeX and R reduce barriers to entry for independent researchers and universities in poor countries because they are free and completely open-source. We have never had a problem funding STATA here it really is not impossible in the next couple of years that we would not be able to fund upgrades.

Enda Hargaden said...

"Nerdaggaden" sounds like something I should trademark.

I think the occasional, informal crash courses the Institute host is a great idea. I think it would be a good public service to advertise these on the blog and let anyone come along.

Peter Carney said...

Liam - a scenario whereby we rely exclusively on 'freeware' to run our regressions is hardly desirable, regardless of the international politics of inequality.

What was that one..? no such thing as a 'free' lunch? I didn't suggest the cost of entry be zero, that's just non-sense.

Anyway, i'm wandering a bit of the point a bit.. to move the discussion on a bit in a constructive way it might be worth thinking about the benefits of doing a survey of authors who published in the AER in the past 12 months, ask them:

- what they used
- why they used it
- what they would like to use
- what they tell their grad. students to use

I imagine there would be quite a lot of interest in something like this..about 100 emails would do it. and it would give us some benchmark of professional standard.

Liam Delaney said...

Not sure about doing a survey but I will post a more detailed overview of the thinking on this. Peter - for us in UCD it is still a choice variable as to whether we use freeware or licensed statistical software. For many places trying to build capacity, the licenses are too expensive so freeware is not only desirable but essential. Like you, all of my work is with STATA and Im not likely to change that. A lot of very smart people are saying that the future is R though so worth giving that a hearing. I think the crash-courses approach would be a good one to take across the Irish system. I don't think any one place would have sufficient demand but week-long programmes run yearly for the main software might be a shot in the arm. Or we could do them online.........

Alan Fernihough said...

@Peter -- you have misunderstood my point. I agree with you (both on this thread and the other, Colm too) on LaTeX. It's cumbersome.

My point relates to excellent freeware substitutes. For example, open office is pretty much the exact same as MS office -- except it's free.

I have no particular bias. I don't think regressions on R are better than STATA or SPSS or Gauss. They're all just big calculators as far as I'm concerned. However, I think R is getting a lot of flak here for not being user-friendly, which I disagree with.

@Liam, I would be happy to talk to anybody who is interested in learning R. Hopefully, I could convince people that it is worthwhile learning. STATA may be the industry standard (I will also continue to use it) but R will definitely be the industry standard of the future.

Liam Delaney said...

Great, thanks Alan. I have a relatively open mind about these things. Even from reading the very select sample of people who comment on this blog, it is clear that we should be providing detailed training on both LaTeX and R as there seems to be a real demand for this.

Liam Delaney said...

Also, I think Alan's response is fair Peter and we seem to have all converged on a "cost-benefit" interpretation of who should use these packages, depending on the actual money cost (free in the case of R, potentially very expensive in the case of STATA and SPSS), start up learning costs (high in the case of R, lower for STATA and lower still for SPSS), preprogrammed routines and community interaction (arguably highest for STATA from an economists point of view), specialist features (with each package having some and R perhaps have some that are particularly useful for frontier programmes).

I really have never got a sense from R-users that they use it as a cliqueish or "barriertoentry" type thing. Most of them are enthusiastic about the cost advantages and find that they can do things that they can't do with the standard packages.