Free EBook!
We've compiled much of the blog into a free, 73-page ebook, Survey Software Success. The book outlines seven best practices for conducting online surveys.
> Download your free copy
|  |
Survey Research & Enterprise Feedback Management
|
RSS Feed
Posted by Jeffrey Henning on Mon, Nov 02, 2009
Back in 2002, Mirta Galešic of the University of Zagreb wrote an interesting paper that examined respondents' perception of questionnaire length, "Effects of questionnaire length on response rates: Review of findings and guidelines for future research". If objects in a side-view mirror are closer than they appear, then questionnaires appear to respondents to be longer than they actually are.
Galešic analyzed the relationship between objective and subjective questionnaire length. For objective length, she used the number of questions actually answered (to keep it simple, she treated each question as a question, regardless of its length or type). For subjective length, respondents were asked to rate the questionnaire they had just completed as «too short», «optimal», «somewhat too long» or «absolutely too long» (actual labels were in Croatian, as was the entire questionnaire). Not a single one of the 2,059 respondents answered «too short»!
Galešic writes:
Across all three questionnaire types there was an overall significant, but very small positive correlation between number of questions the respondents answered and their perception of questionnaire length (r=0.11, p<.01). Perceived length was more strongly correlated to the level of interest for the questionnaire topic (r=-.26, p<.01). The less interesting the questionnaire topic was, the longer the questionnaire was perceived to be. Level of interest for the questionnaire topic was not correlated to the number of questions answered (r=.03, p>.05). Interestingly, however, respondents who had less interest in the topic judged the questionnaire equally long no matter how many questions were answered (the average number of questions answered ranged from 15 to 21 for each of the three surveys).
In the past I've provided six tips for shortening questionnaires. Thanks to this research, here's a seventh: Make the survey interesting to the respondent, and you will shorten the perceived length of the questionnaire.
Posted by Jeffrey Henning on Mon, Oct 12, 2009
I'm frequently asked whether it is better to show a single question per page or to show multiple questions per page. My advice has been to group questions together logically on a single page, using different pages to change topics.
Since that advice was intuitive rather than empirical, I decided to do an experiment. My control was a 5-page, 9-question questionnaire where some pages had one question and some pages had two or three. My experiment took the same version of the questionnaire but added a page break after the introduction and placed one question per page: this made it twice the length, at 10 pages. It was the exact same questionnaire, differing only in pagination. (For the record, both versions displayed a progress bar indicating how far the respondent was through the survey.)
After only 44 survey starts, the abandonment rate had jumped from 5% across 79 starts of my 5-page survey to 25% for its 10-page equivalent. I cut the experiment short due to the significant impact the additional pages had on abandonment. Broadening my analysis to include some surveys that differed from one another in other ways, I found an 18% abandonment rate for several 10-page surveys with one question per page vs. an abandonment rate of 3% for two surveys with logical groupings of questions.

Clearly, the need to click the Next button and wait for a new page of the survey to load represents enough additional effort to respondents that it discourages them from completing a survey.
Despite the small sample sizes, this is not an experiment I will be repeating anytime soon! When writing your own questionnaires, to maximize completion rate, avoid showing one question per page.
Posted by Jeffrey Henning on Mon, Sep 28, 2009
Nothing is easier for the survey author then to dash off a questionnaire asking respondents to rate a bunch of items on an agree-disagree scale (also known as the Likert scale). For instance:
For each of the following statements, please indicate if you: Completely disagree, Disagree, Somewhat disagree, Neither agree nor disagree, Somewhat agree, Agree, Completely agree.
- My overall job satisfaction is very high.
- The issue of excessive executive compensation is very important to me personally.
- I rarely feel discouraged with my work.
- I am very likely to seek employment elsewhere in the next six months.
You can easily add many other statements to this list for respondents to rate. In fact, I have seen questionnaires with 80 to 100 items, all to be rated on this agreement scale.
Unfortunately, in such batteries of questions, respondents exaggerate their actual agreement. Over 100 studies now have demonstrated acquiescence response bias, as some respondents will agree to almost any assertion. Saris, Krosnick and Shaeffer identify three reasons for this, in their paper "Comparing Questions with Agree/Disagree Response Options to Questions with Construct-Specific Response Options":
- Some respondents are simply agreeable, and indicate agreement out of politeness.
- Other respondents expect that the researchers agree with the listed items and defer to their judgment.
- Most respondents engage in survey satisficing and find that agreeing takes less effort than carefully weighing each optional level of disagreement and agreement.
The standard solution to acquiescence response bias has been to have a balanced battery of items, where each item has a negated counterpart somewhere else in the questionnaire. For instance, averaging the agreement level to "I am generally a satisfied employee" with "I am not generally a satisfied employee" was thought, in theory, to produce a rating that factored out the acquiescence response bias. Saris, Krosnick and Shaeffer put that to the test and found three ways it leads to lower data quality: having to answer twice as many questions, which leads to satisficing; processing negations, which are more complex cognitively; and placing respondents who acquiesce in the middle of the scale.
The solution with the highest data quality does lead to more work for the survey author. Each question needs to be asked with what Saris et al call "construct-specific response options": in other words, a rating scale that can be used to measure the item in question. Applying this recommendation to the four questions above yields:
- How would you rate your job satisfaction overall? Not at all satisfied, Slightly satisfied, Moderately satisfied, Very satisfied, Completely satisfied?
- How important is the issue of excessive executive compensation to you personally? Not at all important, Slightly important, Moderately important, Very important, Extremely important?
- How often do you feel discouraged with your work? Never, Rarely, Sometimes, Often, Always?
- How likely are you to seek employment elsewhere in the next six months? Not at all likely, Slightly likely, Moderately likely, Very likely, Completely likely?
At first glance, this looks like more work for the respondent, who must read the choice list for each question. While there is more to read, there is less to think about. For the statement, "I am generally a satisfied employee", respondents might have come up with four reasons to disagree:
- They are generally dissatisfied.
- They are neither dissatisfied nor satisfied.
- They are often satisfied, but not often enough to classify it as "generally".
- They are always satisfied, which is more often than "generally".
The agreement/disagreement scale gives the respondent too much to think about for each item and too many potential reasons to disagree. If this particular question were reworded to use a satisfaction scale, respondents are only rating their satisfaction. Less work mentally, and a more accurate answer. The best practices for agree-disagree scales are therefore simple:
- Avoid them if at all possible, rephrasing each question to use a common rating scale where possible, otherwise using a custom rating scale.
- When the executives or customer sponsoring the research dictate that you must use an agreement scale, use the seven-item bipolar scale: Completely disagree, Disagree, Somewhat disagree, Neither agree nor disagree, Somewhat agree, Agree, Completely agree.
The Likert scale has a long and illustrious history, having been invented in 1932. Sadly, it is now as obsolete as the cathode-ray tube (which RCA first demonstrated could be used as to receive TV transmissions in 1932). Construct-specific response options are the HDTV of the survey world.
Posted by Jeffrey Henning on Thu, Sep 24, 2009
My standard advice about structuring questionnaires is, after the screener, to begin the questionnaire proper with open-ended questions. Since respondents start fresh and then tire over the course of the survey (moving from giving the optimal answer to satisficing), you are going to get more feedback when you lead with these open-ended questions. Further, starting with open-ends removes the chance for earlier closed-ended questions to bias the answers to open-ends, as they raise other topics to consideration than what might have been top of mind for respondents.
Many of my customers push back on this advice, however, because they are concerned that leading with verbatim questions will increase the survey abandonment rate. They prefer to keep their open-ended questions near the end of the survey.
I recently had the opportunity to put my advice to the test. I did two versions of a 5-page, 9-question questionnaire. Survey A had the open-ends as the seventh and eighth questions. Survey B moved the same questions to the first and second position; everything else stayed the same. Respondents were not required to answer the open-ended questions; clicking the Next button would simply take them to the next question.
The test was fielded in August. Survey A had 70 responses from 290 invites (24% response rate), and Survey B had 79 responses from 379 invites (21% response rate) [had the test been the primary purpose of the survey, I would have kept these distributions equal].
As expected, the abandonment rate did increase when leading with open-ended questions, climbing from 1% for Survey A to 6% for Survey B. Yet that trade-off was worth it: 92% of respondents answered at least one of the open-ended questions for Survey B, compared to only 60% of the respondents of Survey A. So in exchange for a modest drop-off in overall response, we gathered fully 50% more open-ended comments!
Surprisingly to me, though, the abandonment rate for Survey B did not occur at the beginning of the survey-I fully expected respondents to exit without answering the open-ends, but no respondent did so. Instead what happened was that respondents who answered at least one of the open-ends were the respondents who were more likely to grow tired of the survey and abandon it. In fact, every respondent who skipped answering the two open-ended questions answered all seven subsequent closed-ended questions.

I had also hypothesized that you would get longer answers when leading with open-ended questions. In this, I was disappointed: the average length of an answer was 13 words in both surveys.
As a result of this test, I will continue to advise that open-ended questions are more productive when placed at the beginning of a survey rather than near its end.
Posted by Jeffrey Henning on Fri, Sep 04, 2009
I’ve ignored many of the initiatives to improve the quality of third-party online panels. To me, these initiatives are laughable. Yes, you should…
- Seek to identify panelists participating in the same survey multiple times under different names
- Remove respondents who speed through their answers
- Have a broad-based demographic representation so that you do not need to weight individual respondents
But these simply put lipstick on the piggy bank. They make it easier for organizations to continue to put cost before quality and to justify doing research on the cheap with third-party panels. “See? The panel companies are working hard to ensure consistent high quality!”
Um, a consistent high quality convenience panel is certainly better than a low quality convenience panel. But it’s still a pig. Er, piggy bank: a cheap alternative to a random sample.
The laws of mathematics have not been repealed: a convenience sample cannot be used to extrapolate to any target audience. A convenience sample is representative of its respondents only. This point keeps getting lost, as I saw last year at the MRA Conference at the presentation What's the Catch? Does Sample Sourcing Matter:
A pointed question from the audience said that probability sampling was the theoretical basis for the projectability of survey research and asked what the scientific underpinnings were for assuming that Internet research was similarly representative. Melanie [the presenter] answered that replicability is emerging as the standard instead of randomization and that the results from her research were replicable.
What "irrational exuberance" was to NASDAQ, the third-party online panel is to MR.
This week, Gary Langer, director of polling at ABC News, writes in his column:
A new study led by Stanford University researchers raises doubts about the accuracy of one of the most common forms of survey research, polls done among people who sign up to fill in questionnaires via the internet in exchange for cash and gifts. In the most extensive such analysis to date, David Yeager and Prof. Jon Krosnick compared seven non-random internet surveys with two surveys based instead on random or so-called probability samples. The non-probability internet surveys were less accurate, and customary adjustments did not uniformly improve them. While the random-sample surveys were “consistently highly accurate,” the internet surveys based on self-selected or “opt-in” panels “were always less accurate, on average, than probability sample surveys, and were less consistent in their level of accuracy,” the researchers said. Further, they said, adjusting these samples to known population values had no effect on accuracy (and in one case even worsened it) as often as that process, known as weighting, improved it.
Most Vovici customers are surveying house lists of customers, employees, resellers and other key constituencies. It’s very easy to do a random survey of employees when you have the email address of every employee and have empaneled the list of employees by synchronizing your HRIS. For surveys of prospects, many organizations are using the web for all lead generation and can easily field random samples of prospects. Unless you’re an e-commerce or SaaS business, though, it is more difficult to build a representative house list of customers that you can then random sample: check out these tips for creating and managing representative email lists of your customer base.
Putting in regular processes to build a quality house list is like setting up automatic monthly withdrawals from checking to savings: better than the panel piggy bank as way to save research costs in the long run. Building such a house list is a sound investment towards conducting quality, representative survey research.
Posted by Jeffrey Henning on Fri, Aug 28, 2009
Most survey researchers pay little mind to the research into the efficacy of different scales, frequently using obsolete rating scales in their surveys: changing scales can be a real bugbear.
That said, here are some well-researched best practices when it comes to scales:
- Use 5-point scales for unipolar scales and 7-point scales for bipolar scales:
"To explore the relation between scale length and reliability, we conducted a meta-analysis of the results of many past studies. Our data consist of results from 706 tests of reliability taken from thirty different between-subject studies. We combined various measures of reliability and various sample sizes, controlling for these and other factors in determining the relation of scale length to reliability. In general, we found that five- or seven-point scales produced the most reliable results. Bipolar scales performed best with seven points, whereas unipolar scales performed best with five." - Jon Krosnick, professor of communication at Stanford, "The Optimal Length of Rating Scales to Maximize Reliability and Validity"
- Use fully labeled scales without showing respondents numeric ratings. Such scales are preferred by respondents and have higher reliability and predictive validity than numeric scales.
- Exclude “Don’t know” and “No opinion” as a choice when presenting your scale.
- The 0-to-10 rating scale for Net Promoter has the lowest reliability and predictive validity of four scales tested.
The above findings are backed up by scientific research. The following best practices, on the other hand, are my personal preferences, for which I was not able to find supporting data:
That said, factoring in the research and my recommendations, this is what I consider to be the best CSAT scale:
What is your overall satisfaction with our company?
- Not at all satisfied
- Slightly satisfied
- Moderately satisfied
- Very satisfied
- Completely satisfied
When a study mixes different lengths of scales, consider standardizing the scales in survey analysis, for instance by mapping scales to a 0 to 10 scale. This can make reports of the results easier to understand. While respondents dislike numeric scales, fully labeled scales are typically analyzed numerically, and the 0-to-10 mapping can aid analysis.
Most organizations fail to standardize on rating scales, making it difficult to compare the results from study to study, from department to department. If you haven’t yet done so, please consider coming up with standard practices to guide your research. To contradict Emerson, when it comes to rating scales, a foolish inconsistency is the hobgoblin of little minds.
Posted by Jeffrey Henning on Fri, Aug 07, 2009
When writing closed-end questions should you include a choice for “Don’t know”, “Not applicable” or “No opinion”? The fear is that including this as an option will give respondents an easy way out (e.g., survey satisficing) rather than actually thinking through their best answer to the question.
According to many seasoned survey researchers, offering a no-opinion option should reduce the pressure to give substantive responses felt by respondents who have no true opinions. By contrast, the survey satisficing perspective suggests that no-opinion options may discourage some respondents from doing the cognitive work necessary to report the true opinions they do have. We address these arguments using data from nine experiments carried out in three household surveys. Attraction to no-opinion options was found to be greatest:
- among respondents lowest in cognitive skills (as measured by educational attainment),
- among respondents answering secretly instead of orally,
- for questions asked later in a survey,
- and among respondents who devoted little effort to the reporting process.
The quality of attitude reports obtained (as measured by over-time consistency and responsiveness to a question manipulation) was not compromised by the omission of no-opinion options. These results suggest that inclusion of no-opinion options in attitude measures may not enhance data quality and instead may preclude measurement of some meaningful opinions.
Use of no-opinion responses is greater for respondents “answering secretly instead of orally” - i.e., for respondents doing self-administered surveys such as paper, web and kiosk surveys, rather than responding to telephone or face-to-face surveys. The reason for this difference is that every respondent sees the no-opinion choice on a written survey, but in an oral survey that response is typically not read aloud to a respondent but is kept in reserve, checked off only if the respondent brings it up.
One way to approximate this in an online survey is to not include such a response in the choice list for a question, if an answer is not required. The respondent therefore doesn’t see the “Don’t know” option but, upon consideration of the available choices, can simply skip answering the question altogether. Accordingly, I prefer to only use no-opinion responses in choice lists only if the question is required, and only if the required question may be hard to answer for respondents: for instance, when asking about specific details about a past transaction or when asking for details about the respondent’s organization that they simply might not know.
When analyzing no-opinion responses it is often handy to omit such responses from pie charts and frequency percentages. Survey software applications may let you assign a code to a choice (e.g., in Vovici v4, you can assign a choice to a precoded meaning “Not applicable”, “Don’t know” or “Refused”) with such coded values then omitted from pie charts and the percentage column of frequency tables. Rather then say, “70% said yes, 20% said no and 10% didn’t know” with this option you can say “78% who knew said yes, 22% said no.”
I hope this gives you enough information to now have an opinion on the use of no-opinion responses!
Posted by Jeffrey Henning on Fri, Jun 05, 2009
Your boss says that questionnaire you're writing should ask your customers "What is your overall satisfaction with our product?" and should use a five-point scale. Sounds simple, you think to yourself, then go to write the question. Which format do you use?

Like many other tactical issues regarding questionnaire design, scale formats have been carefully studied. According to the summary of available research by Jon Krosnick and Leandre Fabrigar in "Designing rating scales for effective measurement in surveys":
- Respondents prefer rating scales with more verbal labels
- Respondents believe such scales provide more valid measurement
- Choosing a labeled choice is a more natural mental activity (not to mention more conversational) than selecting a number within a range
- Longitudinal reliability is greater when using fully labeled scales instead of partially-labeled scales
- Validity, especially inter-rater validity, is greater using fully labeled scales
- Using fully labeled scales provides greater reliability and greater validity from respondents with low to moderate education
- Because numeric values can confuse respondents and affect the choices they make, it is better to omit numeric labels altogether.
Given the importance of fully labeling a rating scale, choose an existing common scale where possible, rather than writing your own scale. Reword the question if necessary to fit a common scale. And, of course, take care when deciding how many points to use within the scale.
Posted by Jeffrey Henning on Tue, Jun 02, 2009
Having looked at the order effects of choices in web surveys a few weeks ago, I thought it appropriate to look at the order effects of questions themselves. Does re-arranging the order of the questions affect responses?
The market researcher would prefer that the respondent consider each question in isolation, unrelated to any questions that have been asked before. Of course, respondents are not robots, and earlier questions will unfortunately bring topics to mind that can "contaminate" later answers.
The example of such contamination that I have seen in my own surveys is the order of general questions vs. specific questions. If the general question is an open-ended question, many survey authors I've worked with prefer to put it after the closed-ended question, since open-ends are harder to answer (requiring thinking and typing rather than thinking and clicking a button). But when asking the verbatim question second, you will definitely get a greater percentage of respondents talking about the previous questions.
In the paper "Effects of Question Order on Survey Responses" by Sam McFarland, some respondents were asked general questions (describing their interest in politics and religion) and then specific questions (evaluating the state of the economy and the energy market) while others were asked the specific questions first. Asking the specific first increased the likelihood that respondents would report an interest in the general questions.
|
Test A |
Test B |
| Question Order |
1. General 2. Specific |
1. Specific 2. General |
| General Results |
Control |
Greater interest in specific items |
| Specific Results |
No change |
No change |
As a result, my preference continues to be to ask an open-ended question first about how an organization can improve a product or service, then follow up with a closed-ended question presenting a range of items to be rated.
Because of the ability for early questions to contaminate later questions, sometimes one question order for every respondent is the wrong approach. When asking a respondent to rate two or more contrasting items (typically products, services or organizations), it is customary to rotate the order of the items, so that the consistent assessment of one item before another doesn't introduce any bias into the results. In survey software, this is typically accomplished by setting up page rotations that randomly rotate pages or other blocks of questions. This is analogous to randomizing choices in a choice list.
A free subscription to this blog to the first person with an RSS reader who can tie the photo to this topic! (Clearly I need to hire the MR blogger Zebra Bites as a photo consultant.)
Posted by Jeffrey Henning on Wed, May 20, 2009
We last looked at respondent behavior with the post Long Surveys Turn Respondents into Liars. Well, similarly, long choice lists turn respondents into satisficers, selecting a satisfactory answer rather than the optimal answer.
Jon Krosnick and Duane Alwin in the report "An evaluation of a cognitive theory of response order effects in survey measurement" provide an excellent summary of the past research that documented this behavior:
Studies of impression formation1, the impact of persuasive communications2, sequential processing of performance information3, and the serial position effect4 all suggest that when items are presented visually on "show cards," primacy effects are to be expected. This occurs for two main reasons.
- Items presented early may establish a cognitive framework or standard of comparison that guides interpretation of later items. Because of their role in establishing the framework, early items may be accorded special significance in subsequent judgments.
- Items presented early in a list are likely to be subjected to deeper cognitive processing; by the time a respondent considers the final alternative, his or her mind is likely to be cluttered with thoughts about previous alternatives that inhibit extensive consideration of it. Research on problem-solving suggests that the deeper processing accorded to early items is likely to be dominated by generation of cognitions that justify selection of these early items5. Later items are less likely to stimulate generation of such justifications (because they are less carefully considered) and may therefore be selected less frequently.
So, now that we know that our respondents do this, how do we address this issue when constructing choice lists?
- If a long choice list can be structured into an outline, present the choices as a hierarchical question instead.
- Consolidate the long choice list into a shorter list that makes fewer distinctions.
- For long lists that can't be modified, use randomization. While it would be too costly in a paper survey to have multiple versions of the questionnaire, each presenting choice lists in different orders, for a web survey the ability to randomize choice lists is a built-in capability of most survey software and has no added cost to use. Such randomization isn't needed for long lists that respondents don't have to read; for instance, alphabetized lists of states or countries, where the respondent knows the answer without reading the choice list and is simply finding the choice in the list. Nor is randomization appropriate for rating scales. Instead, randomize the choices for any long list that lacks an inherent order.
- Finally, Krosnick and Alwin advise attempting to "to increase respondent motivation in order to increase concentration and decrease satisficing. Motivation may be increased by adding special instructions informing respondents that the question they are about to answer is relatively difficult and requires extra concentration."
1 Asch, 1946;Nisbett & Ross, 1980, p. 172-175; Anderson & Hubert, 1963; Sherif, 1935; 1936; Lingle & Ostrom, 1981; Anderson L Barrios, 1961; Dreben, Fiske, & Hastie,1979. 2 Miller & Campbell, 1959; Ronis et al., 1977; Crano, 1977; Hovland et al., 1957; Insko, 1964. 3 Jones et al., 1968. 4 Bruce & Papay, 1970; Crowder, 1969; Rundus, 1971. 5 Koriat, Lichtenstein, & Fischhoff,1980; Hoch, 1984; Klayman & Ha, 1984; Tschirgi, 1980; Wason & Johnson-Laird,1972.
All Posts
Error sending email
Email sent successfully
|