To get the most from Tabx and spend as little time as possible in project setup, correctly structured SPSS files can help you get to insight faster.
- Ignore Text Fields
- Don't use Compressed .sav formats
- Ensure all variables have a defined Measure
- Define Display Ready labels
- Values in the data
- Multi Categorical Questions
Ignore Text Fields
Tabx is the fastest engine in the industry for crunching numbers. We hope to support textual analysis soon, but as of this article Tabx ignores Text Fields inside of SPSS. You can speed up the time taken to upload and import a file by removing any "String" or "Date" type fields in your SPSS file. To do this:
- Browse to the "Variable View" tab (button at the bottom of the main SPSS window).
- Highlight the fields you wish to remove by clicking the row number to the left of the field's Name.
- Right Click once selected and click "Clear".
- Save your SPSS file.
Remember, files less than 5MB in size are immediately imported into Tabx. Larger files are sent to the import Queue, and you may have to wait as your project is created. Text fields account for the bulk of the filesize in SPSS files, so removing them can really speed up your journey to analsyis!
Don't use Compressed .sav formats
Compression has some strange effects depending on the system you are trying to open your SPSS .sav file in. Usually, compression is designed to make Text fields much smaller for transmission to another user. In the case of Tabx, we don't want these text fields anyway, so compression won't save much space. To save your file without compression:
- Browse to File > Save As.
- In the "Save as type" dropdown, select the default "SPSS Statistics (.sav)" option.
- Name your file.
- Hit Save.
Ensure all variables have a defined Measure
On a Numeric field, the measure type can be Nominal, Ordinal or Scalar. These are defined as:
- Nominal fields have a defined values set (answer categories) but are incremental in nature, or have a defined relationship between codes. For example, Likert Scales should be set to Nominal.
- Ordinal fields have a defined values set (answer cateogires) but the codes for each answer do not have any particular relationship. For example, any categorical question should be Ordinal.
- Scalar fields are numeric fields that contain arbitrary values in a range. These values can be accurate to a defined decimal place. For example, weight fields should be scalar.
To set a mesaure on a Numeric field:
- Find the variable you want to define a Measure for.
- Select either Nominal, Ordinal or Scalar from the dropdown under the Measure column for that variable.
Define Display Ready labels
The text entered into the Label and Values columns are what Tabx will use to display on Charts, Tables and Cross Tabs. Ensuring every field has "Display Ready" text in the Label and Values columns saves time by not having to change these texts in Tabx. Don't worry though, you can change all of this text inside the Edit Project tool in Tabx at any time.
Sometimes data processing companies will enter Values for only some of the available codes within a field. For example, a "Satisfaction" field might contain data codes 1,2,3,4 and 5. However, the only Values that have been defined are "1 > Very Disatisfied" and "5 > Very Satisfied". Be sure that every code in a Nominal or Ordinal field has a defined value and value label, even if it's just the Code itself. In our example, that would mean entering "2 > 2", "3 > 3", "4 > 4" in that field.
Unassigned codes are not rejected in Tabx, so don't worry if you miss the odd one or two, as Tabx will add an "Unassigned" value label to that answer category.
All values in the data are represented by a value in the "Value Labels" list.
Values in the Data
Sometimes SPSS is tricky - what looks like an integer is actually stored as a floating point / decimal number underneath. When we're dealing with categorical, nominal and ordinal type variables, this obviously doesn't make sense and can lead to odd errors inside programs that read your SPSS file, like having what you think is an integer 2 in the data, actually be 1.9983274875.
There is an easy fix to this - select the variable you know contains problem data in the Variable view in SPSS. Convert this field from a Numeric type field to a String type field. Then convert it back to a Numeric type field. By doing this, the underlying floating point / decimal numbers are replaced by text, then converted back to being integers. To use our example, we convert 1.9983274875 into the text "2" and then back into an integer of 2.
This problem can confuse many a researcher, so hopefully this fix is helpful!
Multi Categorical Questions
Multi categorical questions (where the respondent can select multiple answers in the same question) are best stored as a list of component "single" type variables within an SPSS file. For example, consider the question:
What car brands have you EVER owned?:
Respondents can choose multiple of the above. Represented in data, each answer category should have its own field, with a binary codeset (value labels). Using the above example, we might choose variables names:
The naming doesn't strictly matter, but it is best practice to use some sort of naming convention to make navigating and using the collected data easier. We could just as well use variables names:
The most important feature of each of these fields is the value labels. Being binary means each single component variable should have ONLY codes "0" and "1" respresented, with NULL being used for "Not Asked" (i.e. during the course of the survey, the respondent was filtered to not be asked this qeustion at all).
The labels to go along with the (0,1) codes again don't matter, and two conventions are usually used: "Yes/No" and "(Thing)
- 0 - "No"
- 1 - "Yes"
- 0 - "Not"
- 1 - "(Thing)
In the case of the "Ford" answer category above, that would make the "(Thing)
- 0 - "Not"
- 1 - "Ford"
None-binary Multi Fields
Sometimes data processing companies will supply data where each of the single-type variables that makes up a multi-type question contains three or more value labels. Often this is because they include catch codes 98/99 for "Don't Know" and "Refused". This is ok for quality control and data cleaning as it may allow the data processing person to better filter out respondents, however when it comes to analysis, these extra fields make the job harder. Where this is the case, those extra value labels should be recoded to "NULL", or "0" depending on if you want the respondent to be part of the base or not.
In Tabx, it's a quick and easy process to do this recode work - simply use the Derived Variables tool to generate a new field for each of the single-type component fields, then - as with any other multi - go ahead and use the Multi's tools to combine them into a new multi-type variable.