Natural Language Processing for Surveys - it's got a long way to go (Qualtrics Text iQ)

Wm Peck 1958Business Intelligence Specialist
Data Analyst (Oracle, Business Objects, Qualtrics, some Salesforce & SPSS). Software Development (all phases). Bridge from users to techies.
Edited by: Rob Jurd
This post discusses my experience with the natural language processing component of Qualtrics (Text iQ) ( This software is used to analyze open-text comments from surveys or other qualitative data.

Overall we love Qualtrics, just not Text iQ


We love everything about Qualtrics, except their natural language processor (Text iQ).

Part I – My take 

Overall, we love Qualtrics
I can’t say enough about Qualtrics, other than Text iQ. Their support is fantastic, and they have tons of training resources which are very well organized and thorough. Here’s one example on relating data for statistical purposes -

It's been excellent for building surveys and doing various statistical analysis. Just not for Text iQ.

The Bottom Line – I find the model cumbersome, even at scale
After wrestling with Text iQ for 2 years, I find the model cumbersome and the output less than satisfying and the end result still requires careful inspection of all comments.

I feel that the actual model is too computer’y and not able to discern thoughts – it’s basically a dictionary exercise to look at synonyms, and it cannot accurately summarize thought patterns. "Pumping up the dictionary" is not the answer in my opinion.

In addition, any AI system has difficulty with misspellings, such as comeradery, which I can instantly recognize as camaraderie, plus the software often trips up on double negatives, e.g., “This was not the most awesome semester I’ve never had” and it surely can’t understand sarcasm and satire (which humans also have a hard time with [especially my wife …]).
Although the appeal of Text iQ is that it can read massive amounts of comments better than a human, it has turned out to be a less than satisfactory model for us. So even with massive amounts of comments, a bad model is still a bad model.

With Text iQ, it would take approximately one day per open-text question for me to properly categorize everything, but I would still tell the sponsor they needed to review every comment for best assessment.

For our environment, I am now recommending full manual review and categorization of comments.

Text iQ probably works in other environments, just not for us.    

Our environment
I work at a 4-year college, so our population groups are college students, faculty, and staff. Student population is around 4,500. There are various surveys throughout the year, ranging from academic preferences to summer activities to pre-graduation assessments of the institution.

The number of comments we deal with is relatively small – less than 1,000 for most surveys (for 5 or so questions), and the upper limit is around 2,000 comments. Comments can be short and sweet or perhaps 5 sentences or more.

My experience that leads me to the conclusion that it’s better to review comments manually

I stumbled upon three factors that were light-bulb moments for me. Once I realized these factors, I plowed / zipped through a key survey with around 750 comments, and I felt the end result was extremely helpful, and which was 10,000x better than Text iQ. Our manual solution was one that anyone could consume (and that many people SHOULD consume). Contrast that with the end result of Text iQ, which for us is always "here are some topics and their sentiment (and some bar charts), but you still have to read the comments carefully".

Sorting the comments alphabetically does wonders for the analysis – you can see thought patterns
The first factor that swayed me was plumb luck. Because I had loaded up the comments to Business Objects (and then downloaded to Excel), the comments were sorted alphabetically. And this was an eye-opener – I could see patterns in the comments, something I don’t think the computer can do, since it’s just looking at individual comments and comparing them to its giant dictionary.

I then realized that I could make this pattern search even better by removing first words such as “The ”, “My ”, etc.) from the beginning of the comment. Of course, the computer is also going to ignore these words, but by then re-sorting, the patterns emerged even more, e.g.,

- My friends …
- Friends …

- Leadership was disappointing
- The leadership team was outstanding

The computer can’t think and so is rudimentary at categorizing comments
The second factor was that I categorized comments in ways that the user logically inferred but did not explicitly state, for example, being able to categorize these comments as "Noble Endeavor":

- It's something worth doing. I have purpose [wearing a] uniform
- It gives me something higher to reach for

I doubt the computer would choose "Noble Endeavor" for these comments. Plus I’ve seen Qualtrics Text iQ in action enough to know it’s only going to pick up exact words from the comment as the “topic”.

The computer doesn’t know jargon, acronyms, and trips up on misspellings
The final factor was recognizing that the computer doesn’t know jargon, acronyms, nor our users. Plus any AI system has difficulty with misspellings, as I mentioned above (e.g., comeradery). Since I understood the perspective of the respondents (their language and nuances), I could assess their comment rather quickly. I also realized full well that Qualtrics could not produce the same thing. I didn’t even take the time to test my theory – I have already spent 2 years proving it.

Here’s a third example – I took a simple comment of “experiences and people I met” and created two categories:
- Relationships / Community
- Unique Opportunities

And I placed many other comments into those two categories.

Here is the template I used. The comments were sorted alphabetically, and thought patterns emerged. And after adding the categories (topics), you can see patterns there as well.

Qualtrics won’t work even if I review every comment and create topics on the fly.
After writing most of this article, I thought I’d give Text iQ a try again. But it’s still a no-go for me.

In Text iQ, you can create multiple topics for one comment on the fly. This is exactly the model that I use in Excel, but it won’t work in Text iQ, for two reasons:  
  1. you can’t sort the comments alphabetically, and 
  2. you can only see a couple of comments at a time, whereas in Excel you can see the patterns 1,000x better.

Here’s the Qualtrics view. It’s too hard to wrestle with (and too much white space), while Excel is much easier (see screen shot just above).

Part II – What exactly is Text iQ?

Qualtrics’ Bread and Butter is retail
 The core market for Qualtrics is retail (although there is much more). Even in their video about Text iQ (, they're mostly talking about retail. In a retail environment, survey questions remain stable (for long-term trends), and customer feedback is typically short and sweet, and easy to "topic'ize" and identify the sentiment.

The natural language model is clunky in my opinion
The model of the industry typically is to categorize comments into topics and sub-topics, with associated sentiment (positive, negative, neutral, mixed). Specifically with Text iQ you get a score for the intensity of the sentiment (-10 to 10), and even the polarity of the comment (how mixed is the sentiment, from 0 to 10), for both the overall topic as well as the sub-topics.

It’s clunky because, to me, it’s basically a Word Cloud of everything (although the actual Word Cloud is a nice output). And I spend most of my time telling the software what words to clump together. One reason for this is that the software doesn’t know jargon, acronyms, nor our users. Plus, as mentioned above, any AI system has difficulty with misspellings, such as comeradery.

With Text iQ, it would take me approximately one day per open-text question to get it right, but I would still tell the sponsor they needed to review every comment closely for best assessment. With my manual assessment, it’s about the same amount of time, but at the end I have a deliverable (two screen shots below).

 Here’s one of the key outputs of Text iQ – a really cool bubble graphic …

As you can see by the topics, it’s geared toward retail. This can be actionable for a store or a restaurant. But for us, even if the topics and sentiment were accurate (e.g., “Leadership”), it’s not going to be actionable unless the comments are read by a human and categorized to be more specific.

I feel our manual assessment is 10,000x better than Text iQ
My method is tedious but I feel it’s far more effective. So I read each comment closely and identify 1-6 categories per comment, then summarize the categories (done in Excel). I don't want to bog down this post with specifics, so contact me if you would like some tips. The end result is categorized comments with the # of responses for each category (just below), not a bubble chart. It’s perfect for a bar chart, or even a pie chart … but when you get right down to it, this data table tells the story quite clearly.

The end result of my manual analysis is a deliverable (screenshot above) that should be shared far and wide. Then you can complement this output by picking off 5-10 important comments for the final report.

Ok, I realize the computer is more efficient than the human – except for College Admissions
As a software development person, I realize the computer has to be relied on at massive scale, and ultimately, it's better and certainly more efficient (if programmed right, obviously). But here’s an example that supports my approach: College Admissions staff don’t rely on the computer – they CAREFULLY read all recommendations by teachers, coaches, and guidance counselors, and manually grade it.

For statistical analysis, Qualtrics Stats iQ is a lot easier to use than IBM SPSS
I wrestled with SPSS for 2 years, but then found I can do things much quicker with Qualtrics for quantitative data using their statistical component, Stats iQ. SPSS is great but it’s overkill for our environment. Qualtrics has been a great addition, and now I use SPSS for only 1 job, twice a year. You have to be a statistician / researcher to effectively use SPSS, but Qualtrics brings statistical analysis to your average IT person, with little training required.

It's all about managing expectations – with Qualtrics Stats iQ, you just “press the button”
My expectation was to press the button in Text iQ and get the answer, much as you can do with Stats iQ. But my expectations quickly went unmet - I spent many hours wrangling with Text iQ just to get proper topics, then I then still recommended to the sponsor that all comments should be reviewed carefully.

Using Text iQ, my average was 1 question per day, regardless of the overall number of comments. If you have 5 comments questions, that’s 5 days, but still, the hard work of reading the comments remained, since the Text iQ solution was simply topics and sentiment, and even that wasn’t so great.

As a comparison (and why we like Stats iQ), I can compare (relate) many, many variables to one key variable (e.g., Gender), and Qualtrics zips through the data while I take a sip of coffee. My only job is to press the button (after tagging all the variables to relate). By the time I take one sip, Qualtrics has completed the analysis.

Here’s a sample result from Stats iQ – there’s all manner of helpful tips and it should satisfy the itch of any statistician. It only took 5 minutes to compare 20 variables to one key variable.

But with text analysis (Text iQ), it’s not push the button, it’s more like a wrestling match with words and phrases, and as I keep re-iterating, you still have to review the comments carefully.

While I’m here – some tips for open text questions in Surveys
  • Consider breaking up a question like “Comments on our program” into two questions – “what did you like best?” and “what did you like the least?” This top-side categorization does wonders for the analysis, and I personally think it stimulates the respondent to a richer response. 
  • Limit the number of open-text questions. 
  • The most important result of the survey is not the results per se, but the widest distribution of the results to as many people as reasonably possible, not just to the primary stakeholders.

Final thoughts

I’d love to hear what you think – I’m relatively new to natural language processing, but I did speak to TWO experts in this area, and both said “Well, that’s about as good as it gets.”
Now, if only Qualtrics or someone could automate my approach – that would be awesome.

My model is pretty simple: 
  1. A comment has many categories.
  2. Provide two consumables:
    1. the data chart here:
    2. The ability to pick a category and view all of the associated comments.

Wm Peck 1958Business Intelligence Specialist
Data Analyst (Oracle, Business Objects, Qualtrics, some Salesforce & SPSS). Software Development (all phases). Bridge from users to techies.

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.

Get access with a 7-day free trial.
You Belong in the World's Smartest IT Community