Tuesday, December 16, 2008

Converting Between Different Types of Data

A good place to start reading on this topic is the SAS-SPSS-STATA FAQ on the UCLA STATA website. You'll see StatTransfer mentioned here, which is the most comprehensive package for switching between different types of data. However, this is a commercial package. There is a free version to download which converts 15 out of every 16 observations, but this is obviously undesirable for the perfectionist.

So I've been looking into data conversion possibilities besides StatTransfer, with an emphasis from the STATA user's point of view. The first thing I noted is that STATA reads in many types of data using infile, infile2, insheet, infix, odbc and fdause. Infile reads unformatted ASCII (text) data. Infile2 reads ASCII (text) data in fixed format with a dictionary. Insheet reads ASCII (text) data created by a spreadsheet. Infix reads ASCII (text) data in fixed format. Odbc reads Access, Excel and dBase files. Fdause reads datasets in FDA (SAS XPORT) format.

Besides all these possibilities, researchers can often find themselves with an SPSS file that they need to convert to STATA. It is possible to do this by opening the SPSS file (.sav) in SPSS and saving it in STATA .dta format (there are many formats to choose from). One thing I noted when saving to STATA format within SPSS is that SE should be selected when saving large (i.e. >250 obs.) data-sets. The highest STATA format available in SPSS 17 is STATA 8 SE, but this opens fine in STATA 9 and 10. It is also possible to get SPSS 17 on a free 30 day trial from the SPSS website.

As an alternative, there is a relatively new STATA plug-in called "usespss". It is available as a STATA ado-file here and here. It reads SPSS (.sav) files into STATA, but if one needs to convert portable SPSS (.por) files (as I did), then a version of SPSS will have to be used for conversion to .dta

No comments: