Survey Coding: Categorizing Answers to Verbatim Questions
Posted by Jeffrey Henning on Fri, Apr 16, 2010

What do you with all those answers to your open-ended questions? If you are in a hurry, you can read them all, then highlight three or four comments that you find representative of the types of answers you received and present those as typical. For all but the most tactical survey, however, you are going to want to develop and tally categories of common answers.
A coding frame or code book is simply a list of the categories and subcategories that you are tracking (also called nets and sub-nets). For product lists ("Which mascara have you most recently purchased?", "What was the last movie you watched?"), you might begin a coding frame by seeding it with bestselling products. For essay questions, you will develop a coding frame by reviewing the answers themselves. It's become hip to create a word cloud to visualize open ends; this technique, while colorful, is of limited analytical value but it can help you to see commonly used words as you begin drafting your code book.
For some open ends ("What model cell phone do you have?"), you will record one code per answer. For others ("What, if anything, do you like about...?"), multiple codes per answer make sense.
Managing coding across multiple languages can be quite expensive and time consuming. Often, to save time and money, the verbatim comments themselves are not translated but a standard code book is developed for use across all languages, with native speakers coding the verbatim responses in each language.
For the highest quality coding, follow the double blind process and have two analysts independently develop a coding frame and code all the verbatim responses using it. The analysts should then meet to reconcile their results and develop a consensus coding frame with final coded responses. Large firms that do extensive coding continually measure and monitor intercoder agreement (typically called intercoder reliability).
This manual coding process is fine for a few hundred responses but quickly becomes tedious. You can take a random sample of comments and manually code only those. Alternatively, you can automate the task. Systems such as Ascribe from Language Logic can learn from the manually prepared codes and automate subsequent coding. As an input, this application takes the manually-developed coding frame and coded comments and creates routines that can be used to categorize the uncoded comments. Text analytic systems, including those from Attensity and Clarabridge (both partners of Vovici), can be manually configured to provided automated coding.
Coding helps bring order and structure to your hundreds or thousands of open-ended comments.