You are using a version of internet explorer not supported by this system. Please update your browser to Google Chrome to continue.

How to structure SPSS files for Analysis

How to structure SPSS files for Analysis

17th July 2019 in Tutorials

To get the most from Tabx and spend as little time as possible in project setup, correctly structured SPSS files can help you get to insight faster.

Contents

Ignore Text Fields

Tabx is the fastest engine in the industry for crunching numbers. We hope to support textual analysis soon, but as of this article Tabx ignores Text Fields inside of SPSS. You can speed up the time taken to upload and import a file by removing any "String" or "Date" type fields in your SPSS file. To do this:

  • Browse to the "Variable View" tab (button at the bottom of the main SPSS window).
  • Highlight the fields you wish to remove by clicking the row number to the left of the field's Name.
  • Right Click once selected and click "Clear".
  • Save your SPSS file.


Remove a Text field in SPSS

Remember, files less than 5MB in size are immediately imported into Tabx. Larger files are sent to the import Queue, and you may have to wait as your project is created. Text fields account for the bulk of the filesize in SPSS files, so removing them can really speed up your journey to analsyis!

Don't use Compressed .sav formats

Compression has some strange effects depending on the system you are trying to open your SPSS .sav file in. Usually, compression is designed to make Text fields much smaller for transmission to another user. In the case of Tabx, we don't want these text fields anyway, so compression won't save much space. To save your file without compression:

  • Browse to File > Save As.
  • In the "Save as type" dropdown, select the default "SPSS Statistics (.sav)" option.
  • Name your file.
  • Hit Save.



Save an SPSS .sav file as none-compressed

Ensure all variables have a defined Measure

On a Numeric field, the measure type can be Nominal, Ordinal or Scalar. These are defined as:

  • Nominal fields have a defined values set (answer categories) but are incremental in nature, or have a defined relationship between codes. For example, Likert Scales should be set to Nominal.
  • Ordinal fields have a defined values set (answer cateogires) but the codes for each answer do not have any particular relationship. For example, any categorical question should be Ordinal.
  • Scalar fields are numeric fields that contain arbitrary values in a range. These values can be accurate to a defined decimal place. For example, weight fields should be scalar.

To set a mesaure on a Numeric field:

  • Find the variable you want to define a Measure for.
  • Select either Nominal, Ordinal or Scalar from the dropdown under the Measure column for that variable.

Define Display Ready labels

The text entered into the Label and Values columns are what Tabx will use to display on Charts, Tables and Cross Tabs. Ensuring every field has "Display Ready" text in the Label and Values columns saves time by not having to change these texts in Tabx. Don't worry though, you can change all of this text inside the Edit Project tool in Tabx at any time.

Values

Sometimes data processing companies will enter Values for only some of the available codes within a field. For example, a "Satisfaction" field might contain data codes 1,2,3,4 and 5. However, the only Values that have been defined are "1 > Very Disatisfied" and "5 > Very Satisfied". Be sure that every code in a Nominal or Ordinal field has a defined value and value label, even if it's just the Code itself. In our example, that would mean entering "2 > 2", "3 > 3", "4 > 4" in that field. 

Unassigned codes are not rejected in Tabx, so don't worry if you miss the odd one or two, as Tabx will add an "Unassigned" value label to that answer category.

All values in the data are represented by a value in the "Value Labels" list.

Multi Categorical Questions

Multi categorical questions (where the respondent can select multiple answers in the same question) are best stored as a list of component "single" type variables within an SPSS file. For example, consider the question:

What car brands have you EVER owned?:

  • Ford
  • BMW
  • Mercedes
  • Renault
  • VW
  • Skoda
  • Volvo

Respondents can choose multiple of the above. Represented in data, each answer category should have its own field, with a binary codeset (value labels). Using the above example, we might choose variables names:

  • EVER_OWNED_01
  • EVER_OWNED_02
  • EVER_OWNED_03
  • EVER_OWNED_04
  • EVER_OWNED_05
  • EVER_OWNED_06
  • EVER_OWNED_07

The naming doesn't strictly matter, but it is best practice to use some sort of naming convention to make navigating and using the collected data easier. We could just as well use variables names:

  • EVEROWNED_FORD
  • EVEROWNED_BMW
  • EVEROWNED_MERC
  • EVEROWNED_REN
  • EVEROWNED_VW
  • EVEROWNED_SKOD
  • EVEROWNED_VOLV

The most important feature of each of these fields is the value labels. Being binary means each single component variable should have ONLY codes "0" and "1" respresented, with NULL being used for "Not Asked" (i.e. during the course of the survey, the respondent was filtered to not be asked this qeustion at all).

The labels to go along with the (0,1) codes again don't matter, and two conventions are usually used: "Yes/No" and "(Thing)/Not":

"Yes/No":

  • 0 - "No"
  • 1 - "Yes"

"(Thing)/Not":

  • 0 - "Not"
  • 1 - "(Thing)"

In the case of the "Ford" answer category above, that would make the "(Thing)/Not" label system:

  • 0 - "Not"
  • 1 - "Ford"

None-binary Multi Fields

Sometimes data processing companies will supply data where each of the single-type variables that makes up a multi-type question contains three or more value labels. Often this is because they include catch codes 98/99 for "Don't Know" and "Refused". This is ok for quality control and data cleaning as it may allow the data processing person to better filter out respondents, however when it comes to analysis, these extra fields make the job harder. Where this is the case, those extra value labels should be recoded to "NULL", or "0" depending on if you want the respondent to be part of the base or not.

In Tabx, it's a quick and easy process to do this recode work - simply use the Derived Variables tool to generate a new field for each of the single-type component fields, then - as with any other multi - go ahead and use the Multi's tools to combine them into a new multi-type variable.




Categories

Feature Releases (8)

Tutorials (2)


Archive

Register Free

Lifetime license, 1000 rows / 200 variables per project, 2 project slots.

This website uses cookies. Privacy Policy