Over the years I have been presented with data that is painful to look at! CSV files with no accompanying metadata component is the favourite. Opening the CSV files and then seeing every data record filled with the answer category text, not even code values. Sadly, CSV files are still the primary MR data input source for Azure (the “Data Lake”) and its sidekick, Power BI and this is problematic!
Market Research data has 2 defining factors that differentiate its data architecture (its data format) from general data formats. Factor 1 = the multi-categorical question type. Factor 2 = we share results in percent values, not N numbers!
And, each of these factors is a key problem to the Microsoft centric CSV file format. Remember, Microsoft has to cater to the widest audience – it’s a generic software (and data) application company. MS Excel and MS Word are fantastic generic software applications. If your need is more niche re: DTP (desktop publishing) let’s say, then you’d probably use a dedicated DTP package or if you want to run some intricate statistical analysis then you might turn to SPSS or Stata etc…
The point here is that as Microsoft is a generic data application and MR data is anything but, then getting data into an Azure DB “Data Lake” isn’t the easiest. Next, as the MR world reports in percent values, Power BI (an extremely enhanced pivot table based architecture) is going to struggle too.
If you have a multi-categorical question say, with 15- pre-defined answer options and the question itself is “filtered”, i.e., at the collection stage it is only asked of those who qualify due to their answers to one or more previous questions, then this might be a challenge to extract the correct percent values using CSV fed Power BI pivot tables. Do we need to apply a weight? Do we need the percent values as being reported being based on “All respondents” or “All answering that question” etc&hellip
The reason we use market research specific applications (and note, Power BI isn’t one of these!) is because the MR data architecture (surrounding multi-categorical question types, in the main) does mean we need a data architecture that caters seamlessly to MR data. The headline here is that Microsoft is not MR data architecture friendly. (SPSS is just about! Tableau even, isn’t.)
So, why does Microsoft still advocate the use of CSV files as its preferred data file format. The answer is simple; they are only supporting generic data constructs, they are not interested in supporting (categorical, percent value focussed) MR data constructs. Our data and interpretative need of the data is simply too niche!
SPSS to a degree, even with its dominance within the MR industry, doesn’t handle multi-categorial data well. The reason a lot of us, who are experienced in handling MR data, use software platforms like Merlin, Snap, Askia or Tabx is that the multi-categorical data format is seamlessly handled within these applications. Filters, skip-logic and weighting to name but a few MR related aspects are also admirably served within such MR applications. If the software doesn’t support Triple-S XML is normally a good yardstick re: ignoring.
Coming back to the title to finish this article – remember CSV files on their own are not MR friendly (from a data management perspective). MR data needs accompanying metadata instruction sets too and Microsoft with all its muscle and clout is a painful application provider for handling MR data and associated information requirements. Generic applications (like Microsoft products) do not cater to such niche requirements well – and survey data is too niche for Microsoft!
Written by Andy Madeley, July 2021