Importing SPSS syntax data files into R under Linux using PSPP

From Research Computing

To see how R could be used in practice, I decided to download a medium size dataset (3500 or so cases) from the ICPSR (http://www.icpsr.org), to see how it could be imported into R. The ICPSR data set, like others I've downloaded from the ICPSR, includes an SPSS syntax file which tells SPSS the length of each record in the data set (itself a separate text file); the names of various variables in the data set, the possible values of (some of) the variables in the data set; and it gives the column numbers of each of the variables in the data set. It also creates an output file suitable for input to SPSS--this is the file that R could use, if one could somehow create it in Linux. I will describe how to overcome this problem using additional open source software.

The basic problem facing an SPSS user is that their data, which comes in the form of a data set with an SPSS syntax file, cannot be read into R without having an SPSS data format file. It occurred to me that there might be a Linux clone of SPSS that could create the SPSS data format file. I was fortunate to find such a clone: PSPP (version 0.4.0), from the GNU PSSP project. If you search the Internet archives, you can find queries by professional statisticians who were apparently stumped by the problem of importing data into R using the SPSS syntax files they had been using all along with SPSS.

To get the SPSS clone PSSP to compile on my machine, I needed to install the GNU scientific library version 1.4 (version 1.7 had a syntax error, and PSSP needs at libgsl version 1.4).

Some very minor fiddling of the SPSS syntax file was necessary: The first line

SET WIDTH=80 LENGTH=64

had to be changed to

SET /WIDTH=80 /LENGTH=64

and the line

FILE HANDLE  datafile  /NAME='nhsls.dat'  LRECL=3107

had to be channged to

FILE HANDLE  datafile  /NAME='nhsls.dat'  /LRECL=3107

(the remainder of this abysmally tedious file is omitted)

With these syntactical fixes, the data conversion worked:

First the data conversion to SPSS format:

$:~/statistics/ICPSR_06647
/DS0001_Main_Data_File$ pspp SPSS_setup.sps

Then the data import in R:

$ :~/statistics/ICPSR_06647/DS0001_Main_Data_File$ R
R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.2.1  (2005-12-20 r36812)
ISBN 3-900051-07-0
 
...
 
[Previously saved workspace restored]
 
> library(foreign)
> mydata <- read.spss(file='nhsls.spss')
>             

This operation took seconds on my personal computer (a 1GHz emachine).

Using PSPP (which may ultimately replace SPSS and could be sufficient for many users) is an additional step, but using SPSS syntax files to create SPSS format data files with R can be done in Linux.