
Survey Data Processing in Tabx
Learn how to rapidly import and perform common data processing tasks on your survey data inside Tabx including Derivation, Recodes, Nets and Multi's definition.
Data and file preparation
This tutorial is the first in a series taking a user new to Tabx right through to delivery of Dashboards, Data Viz, Tables and Crosstabs.
For this tutorial, we'll be using the SPSS file found here: Brand Awareness and Considerations.
This data file has been structured following our guide How to structure SPSS files for Analysis. In particular, fields 2 - 6 ("imp_cost" to "imp_tech") are examples of fields that will be eventually turned into a Multi type variable for analysis, but are themselves Single choice question fields with several categories. More on this in the data processing section.
The file contains 1000 respondents, which is the limit per file when using a free Tabx license: Register to follow along! Here's a breakdown of the fields:
- brand_most_aware - A sample question that asked respondents to identify which brand out of a list they were most aware of.
- imp_cost - imp_tech - Questions that ask how important a particular concept is when choosing a brand.
- gender - A simple single choise question
- age - A scalar variable of the literal age of the respondent (needs netting later)
- wght - A scalar variable containg some computed weight values
- wave - A field to track respondents over a given month.
This file is ready to upload into Tabx - we'll be doing all the necessary data processing directly within Tabx itself.
Upload
To use the data within this file in Tabx, we need to upload it. When a data file is uploaded to Tabx it is not simply stored as a table of respondents and answers; instead, Tabx shapes and converts the data into a proprietary format to facilitate all the advanced features found in the Collation Engine, and to make the researchers life as easy as possible!
Upload is a simple process: From your Account page, select Upload a Project. Select your file (SPSS or CSV Excel file), name your project and import! Files larger than 5mb in size are queued for import and you'll receive an email once the data has finished importing. Smaller files are imported immediately.
We'll call our project Brand Awareness.
Once uploaded and imported, you'll be able to see your new project on your Account / Project (for editors) page. You'll also be able to see the current Total Case Count, which tells you how many respondents are currently in the project. This figure updates if more data is appended to the project. You'll also be able to access the Edit Project page, by clicking the blue Edit project button.
Data Processing
We can do a few quick operations to help us produce more legible analysis and data visualisation. We'll:
- Setup our variable Groups.
- Net our Age variable into a categorical Age Group variable.
- Derive some Multi component fields from the "imp_X" variables.
- Define a Multi variable using those component fields.
- Set the project Weight field.
Once you become familiar with these processes, you'll be able to utilise the full power of Tabx, making your research practices much more efficient and certainly saving costs on having 3rd party data processing commissioned. For all these data processes, we'll be working within the Edit Project page.
Groups
Grouping variables in Tabx is super simple. Groups are used to organise a projects variables into one of twenty avaiable "containers" that make selecting variables for analysis much quicker and more intuitive. Firstly, head to the Groups tab of the Edit Project page, enter the names for our groups and hit "Save Groups":
Next, we're going to add our variables to these two defined groups by using the "Bulk add to Group" tool. Navigate to the "Bulk add to Group" sub-tab. This sub-tab is an example of one of the data processes available within Tabx. You'll note that each tool is broken up into defined and numbered stages. In this case the stages are:
- Select the variables we want to add to a given group.
- Add the selected variables to the workspace.
- Choose the group to put these variables into.
- Perform the action of adding these variables to that group.
If we simply follow the numbered steps, we can add multiple variables to a given group at once:
Net Scalar Variables
Now we can look to our Age variable. As a scalar variable, we cannot view this variable on a chart without having to do an arithmetic calculation to it (i.e. SUM or COUNT etc). Scalar variables need to be netted into a categorical variable if they are to be used as Analysis or Break variables within Tabx, on a chart or table or as a filter or subset.
To do this, we again follow the numbered instructions within the tool:
- Navigate to the "Net Scalar Variable" sub-tab of the "Derived Variables" tab within the Edit Project page.
- Select the "Age" variable from the list of available root variables. In this case, only the two scalar variables are allowed for selection.
- Set a name (code) of "age_groups" and Label of "Age Groups".
- Select the Demographics project group for the resulting variable to be put into.
- Select "Do not Mark unassigned respondents as "Unassigned". This option is important, it basically means any respondents in the root variable with a value of NULL or System Missing are not captured in the resulting derived variable.
- Go through and define each answer category of the new variable, by using the "To" and "From" boxes. Note that Tabx lets you know what the minimum and maximum values in this field are.
- Save your new variable.
Depending on the size of the new variable, this may take a few moments to process. Once done, you'll be able to use this variable just like any other. Note also, that Tabx smartly recognises and tells the user whether or not the resulting variable will need to be constructed as a Multi or a Single type variable.
Recode Variables
In order to define a multi, we must first recode our "imp_X" variables into "Multi friendly" fields (Fields that contain only codes 0 and 1). To achieve this, we'll use the "Derive new Variable" tool on the "Derived Variables tab within our Edit Project page. The "imp_x" variables contain codes 1-10, where 1-9 are a scale and 10 is "Don't Know / Refused". Our end goal is to view the percentage of importance across all the "imp_x" variables on one chart as a Multi Type variable.
In the case of the "imp_x" variables, we're going to say anyone who answered codes 7,8,9 on the scale will represent "Important", and the rest of the codes (including 10 => "Don't Know / Refused") will respresent "Not Important". Your definition of the variable net may differ, but for example purposes we'll use this definition.
To do this in Tabx is very easy and once more we'll follow the numbered steps:
- Define the new Variable Name and Label.
- Select the Brand Awareness Group for our new variable to go into.
- Select "Do not Mark unassigned respondents as "Unassigned". This option is important, it basically means any respondents in the root variable with a value of NULL or System Missing are not captured in the resulting derived variable.
- Select the root variable (in this case any of the "imp_x" variables).
- Define codes 7,8,9 as new code 1 and the rest of the codes as new code 0. When we do this, Tabx will also tell us that the variable about to be created is "Multi friendly".
- Save your new variable.
Multi's
Once we've recoded each of the 5 "imp_x" variables, we can move onto defining the Multi component fields we've made as a Multi type variable.
Again, the process is as simple as following the numbered steps:
- Enter a label for the new Multi type variable.
- Select the component fields that will make up this new Multi Type variable and add them to the workspace.
- Select the Brand Awareness group for this Multi type variable to be put into.
- Add the Multi type variable into the project.
Adding Multi's is faster than deriving new data for variables. Multi's only exist as a definition, rather than adding more fields or rows into the database behind the project.
Tabx also has some ease of life features to help make the definition of Multi type variables quicker. Answers sets allow a user to define an answer category list that can be quickly applied to a newly defined Multi.
Weighting Data
To be able to use the Weighted Data feature of Tabx, we need to select which field will act as the weight field. Simply head to the Option tab, and select the appropriate variable from the dropdown:
Next
Read the second in this tutorial series, How to Generate Charts and Tables in Tabx