Mathematical and Statistical Application – SAVAS

Overview

Based on the file extensions, the SAVAS c-shell script makes SAS (version 6.09 or later) data file copies of Stata (version 8 or later) data files or makes Stata (version 8 or later) data file copies of SAS (version 6.09 or later) data files.

The SAVAS script is installed on the Research Computing server: KillDevil. SAVAS requires the Stata program savasas which can be used in Stata to save Stata datasets as SAS datasets. SAVAS requires the SAS macro SAVASTATA to save SAS datasets as Stata datasets. The SAVASTATA macro can be used in a SAS program. The Stata program usesas uses the SAVASTATA macro to load SAS datasets into Stata. The location of the SAVASTATA macro is: /nas02/apps/stata-12/ado/updates/savastata.mac

To use SAVAS, type:

  $  savas  [-options]  DataSetName.ext  ...

Here are some examples:

 

Description

SAVAS can copy one or more SAS/Stata datasets as Stata/SAS files. The output dataset will have the same name, but with the appropriate filename extension:

  • SAS Version 9/8: .sas7bdat
  • SAS Version 6 UNIX: .ssd01
  • SAS Version 6 Linux: .ssd02
  • SAS 6 Transport/Xport: .xpt, .xport, .exp, .export, .sasx, .stx, .v5x, .v6x, .trans, or .expt file extensions plus whatever file extension the file might have.
  • Stata: .dta.

SAVAS can convert SPSS portable files to Stata thanks to SAS’s SPSS read-only engine.

By default the Stata/SAS file is created in the same directory as the SAS/Stata data files, but with the appropriate filename extension and contains all observations and every variable in the SAS/Stata data files. SAVAS requires the use of both Stata and SAS on the same machine.

SAVAS cannot process files that have “filenames” or are in directories that contain single or double quotes.

The procedure is as follows:

1. SAVAS creates a Stata/SAS program that loads the Stata/SAS dataset into Stata/SAS and calls the SAVAS Stata/SAS program.

2. SAVAS uses either Stata’s command fdasave to save the dataset in memory temporarily as a SAS xport data file or has SAS write the data to ascii.

3. SAVAS writes a Stata/SAS input program to load the dataset into Stata/SAS and to assign variable names, labels (and formats).

4. SAVAS runs the program in Stata/SAS in batch mode to load the data.

5. Stata/SAS saves the data as whatever version Stata/SAS file type specified.

Note: If saving to old versions of SAS or Stata that have variable name restrictions less than the version of the dataset being processed, SAVAS checks for variable names that are too long for the output dataset; and, if the “-rename” option is issued, renames them to the first 8 characters or up to 7 plus a number. In addition, it will display this list of renamed variables.

If the SAS/Stata dataset is sorted by one or more variables, the Stata/SAS dataset will also be sorted by those same variables. For Stata SE users: the maximum length for a string variable to be passed on to SAS is 200 characters. In such cases, the first 200 characters will be taken and passed on to SAS (this is a limitation of the SAS xport dataset used to transfer data from Stata to SAS). If saving a SAS dataset as a Stata dataset, long character variables will be truncated to the maximum length that Stata will allow. This maximum may be 80 or 244 depending on what version of Stata is being used. Stata’s help page on “limits” will let you know which applies. SAVAS will report which, if any, variables were truncated and to what length they were truncated. Stata variables labels can be up to 80 characters in length.

Options

  • -c/-curdir

SAVAS saves the Stata/SAS dataset to the current working directory, even though the Stata/SAS dataset may be located elsewhere.

  • -r/-replace

By default, SAVAS warns the user if the output dataset already exists, and asks permission to overwrite it. Option -replace suppresses this interactive behavior and replaces any existing output dataset without warning.

  • -sas6

indicates to save the Stata file as a SAS version 6 file. SAS 9 will read/open SAS 6 files but will not save to a version 6 SAS dataset.

  • -sasx

indicates to save the Stata file as a SAS version 6 transport/xport file using the xport engine.

  • -o/-old

indicates to save the Stata file as previous version of Stata to the current version, e.g., version 7.

  • -i/-intercooled

indicates to save the Stata file as Intercooled. This is only necessary if Stata SE is being used.

  • -char2lab

indicates to use the SAS macro char2fmt to convert long character variables to numeric with Stata value labels. This is like Stata’s -encode- command. This option is only helpful when saving to a Stata 9 or higher dataset since Stata 9 added the feature of allowing value labels to be up to 32,000 characters long.

  • -fmts/-formats

specifies to either save value labels that exist in the Stata dataset as SAS formats in a file that will have the same name as the data file but with the “.sas7bcat” file extension or to use such a file if creating a Stata dataset. This formats catalog file will be created or needs to be in the same directory as the SAS data file. By default value labels are not saved or created. NOTE: SAS formats have to be 8 characters or less and cannot end in a number. SAVAS makes some attempt to rename invalid SAS formats, but it would be best for you to rename or drop them in Stata before using SAVAS. Stata does not allow string variables to have user-defined formats numbers with decimal values.

  • -q/-quotes

indicates to replace double quotes ( ” ) occurring in character variables with single quotes ( ‘ ) and replace compound quotes ( `” or “‘ ) occurring in variable labels or formats with single quotes ( ‘ ). SAVAS cannot process character variables with double quotes or variable labels or formats with compound quotes.

  • -x/-xport

SAVAS converts SAS transport files into Stata data files. Note: Multiple transport data files can be processed at a time but all data files need to be SAS transport files. There can be no intermixing of regular SAS/Stata data files and transport files when using this option.

  • -f/-float

prevents the use of Stata’s variable type `double’. All variables whose SAS precision would require Stata’s `double’ type are created as `float’. This option may lead to a loss of precision, but saves space: a `float’ is stored in 4 bytes, a `double’ in 8 bytes.

  • -check

creates two check files for the user to compare the input dataset with the output dataset to make sure SAVAS created the files correctly. This is a comparison that should be done after any data file is converted to any other type of data file by any software. The files are created in the same directory as the output data file and are named starting with the name of the data file followed by either “_SAScheck.lst” (SAS) or “_STATAcheck.log” (Stata), e.g. “mydata_SAScheck.lst” and “mydata_STATAcheck.log”.

  • -rights

sets the file permission of the new SAS file to be whatever default file permissions would be for a new file in that directory. The default permissions are the same as the Stata data file.

  • -rename

specifies that any required renaming of variable names is to be done. The -rename option is only necessary when saving to a older version of SAS or Stata or when variable names are not unique in SAS. When saving to an older version rename attempts to rename long variable names (more than 8 characters) to be unique by shortening all long variable names to the first 8 characters or up to the 7 plus a number. SAVAS lists all variables that were renamed.

  • -b/-beep

beeps upon completion.

  • -s/-silent

be silent; in this case, SAVAS does not print any output to the screen, except for error messages. By default, SAVAS tells what stage of the conversion process is currently being executed, and it reports number of variables, number of observations, and more.

  • -sascode

specifies that only a data file and an input program are to be created. By default, SAVAS executes all four steps outlined above. The -sascode option aborts this process after step (3). The user then needs to read in the data manually using Stata/SAS. SAVAS writes a SAS program (mydata_infile.sas) to read in the xport data file (mydata.xpt ).

  • -m/-messy

SAVAS specifies that all the intermediary files created by SAVAS during its operation are not to be deleted. The -messy option prevents SAVAS from cleaning up after it has finished. This option is mostly useful for debugging purposes in order to find out where something went wrong. All intermediary files have a name starting with an underscore (_) followed by the process ID and are located in the temp directory.

  • -MydIreKtOrYNaMehAsspAcEs

“My Direktory Name Has Spaces” allows for the input directory name or Stata filename to have spaces in the name. For example:

  $  savas  -MydIreKtOrYNaMehAsspAcEs  /project  im  working  on/data/MyStataFile.sas7bdat

This switch is not needed if the input Stata data file is in the current working directory and that directory has spaces in the name. Only one Stata data file can be processed at a time when this switch is used.

  • -obs=n

converts only the first n observations. By default, SAVAS converts all observations of the Stata/SAS dataset.

  • -varfile=filename

may be used to select only a subset of variables to be included in the Stata/SAS dataset. This will speed up the conversion process and is useful in situations where the number of variables is too large for a non-Stata SE (Special Edition) file, more than 2,047 variables. The filename is the name of a file whose contents are variable names only. These variable names are case-insensitive when saving to Stata. If saving to SAS, multiple variables can be listed using any of Stata’s specified varlist rules. For example, var* is understood as var1, var2, … or if saving to Stata, multiple variables with the same stem may be specified as ranges according to general SAS rules. For example, var1-var20 is understood as var1, var2, …, var20.

  • -n/-nice

runs SAS/Stata nicely. The default is 20. This should be used if you have a very large data file and there are others using the UNIX/Linux box. e.g. SAVAS -n 10 mystata.dta

FEATURES

SAVAS attempts to transfer Stata value labels to SAS formats and vice versa. Unlike other data transfer software, SAVAS creates only one format per value label and vice versa rather than creating a new format or value label for each variable that was assigned that format or value label. So, if you have a SAS data set with one yes_no. format assigned to twenty variables, the new Stata data set will have one yes_no value label assigned to those twenty variables. Date formats are translated as closely as possible. Fixed SAS formats (Fw.d) translate into Stata’s %w.df format. SAS date formats are translated as closely as possible. Unformatted variables get Stata’s default formats for the appropriate data type (%8.0g for bytes and ints, %9.0g for floats, and %10.0g for doubles), except for long variables, which SAVAS formats as %12.0g.

SAVAS can process multiple files at a time. Try: SAVAS *.sas7bdat or SAVAS *.dta .

SAVAS stamps the SAS creation date and time on the Stata data set name, so that the Stata user knows not only when the Stata data set was created, but also the original SAS creation date and time. Not all SAS variable names are acceptable in Stata. SAVAS attempts to prevent conflicts by using uppercase names for reserved names. These names are: `_all’, `_B’, `_coef’, `_cons’, `if’, `in’, `byte’, `int’, `long’, `float’, `double’, ‘_pi’,’_pred’,’_rc’,’_se’, ‘_skip’,’using’, and ‘with’ as well as names starting with `str’ and followed by an integer. (For example, name `street’ does not pose any problems, but SAS name `str10′ will be translated into Stata name `STR10′). SAS name `_n’ translates into `_______N’ (and a warning is issued. Not all Stata variable names are acceptable in SAS because Stata allows variable names to be different based on upper or lower or mixed case. So the variable gender can be in the same dataset as Gender or GENder etc. SAVAS attempts to prevent conflicts by testing for situations like the gender issue and when the -rename option is issued SAVAS attempts to rename the variables to be unique by adding a number to the end of the variable name. If saving to an older version, then -rename will shorten all variable names that are longer than 8 characters.

Useful links

  • UNC’s SAVAS page. This page allows you to download the script and the other SAS/Stata programs that SAVAS uses.
  • UNC’s A SAS User’s Guide to Stata where you can read more about savasas, usesas, and the SAVASTATA macro can be download them as well as other helpful Stata and SAS programs.

Additional help

Research Computing home page