The ABCs of Custom Dictionaries & the 1, 2, 3 of Sentiment Analysis
Posted by Jeffrey Henning on Wed, Jan 27, 2010
Shane Axtell, a computational linguist at Clarabridge, discussed custom dictionaries and advanced sentiment analysis at the 2010 Clarabridge Customer Connections conference.
The ABCs of Custom Dictionaries
One of the main purposes for using a custom dictionary in text analytics is to list terms that should be treated as keywords. For instance, a product named "Prime Choice" should be added to the custom dictionary so that it can be processed as a single entity rather than separately as the words "prime" and "choice". Build your custom dictionaries from domain-specific terms, brand names, acronyms and abbreviations. This will enable you to create better category rules and make sure that brand-name phrases don't carry the sentiment of their individual words.
Clarabridge custom dictionaries can also use regular expressions. For instance, Best Buy uses "cci" in its call-center system as an abbreviation for "customer called in"; this can be expanded by a regular expression into three separate words. For mining chat sessions, you can have custom dictionaries that translate chat speak and emoticons. You can even transform account numbers to obscured text (e.g., 123-45-6789 to xxx-xx-xxxx).
Grammatical rules can be added to custom dictionaries to detect terms. For instance, treating "Microsoft Windows 7", "Windows 7", "Win 7", "Win7", etc. as synonyms, and then treating the edition of Windows as a modifier ("Home", "Premium", "Small Business", etc.).
Setting up customer dictionaries can be as easy as A, B, C.
Advanced Sentiment: Detect the Customer Feeling
The second half of Shane's presentation was entitled "Advanced Sentiment: Detect the Customer Feeling". Shane highlighted three key capabilities of improving text analysis for sentiment processing:
- Modify the sentiment for a word. For example, by default "power" is a term with positive sentiment, but for a consumer-electronics retailer "power" will be about electricity and should be classified as neutral sentiment. As you think about the many terms you will need to modify, spend the least amount of time on reclassifying positive words; just change out any false positives for your domain. Spend a bit more time on negative words, since negative comments highlight opportunities for action. Spend the most time on words that Clarabridge classifies as "neutral subjective": words that are neutral and have a subjective tone to them; because of the subjectivity, you will need to reclassify these to your domain. Always focus on the most frequent words in your corpus first.
- Handle negation. For instance, process "not" so that "not happy" is marked with negative sentiment rather than positive sentiment. Clarabridge automatically inverts the sentiment for four negators: "not", "no", "never", "hardly".
- Use exception rules to specify context. For example, "would be <positive>" is a negative; "supposed to be a wonderful experience" should be negative rather than treating "wonderful" as positive. The word "friendly" is positive but the phrase "too friendly" is negative. Clarabridge 4.0 has 48 built-in exception rules. Many rules will need to be created specific to your data set.
You won't get sentiment processing perfect in the first pass; instead, you will need to iterate, auditing the results of your automated classification against manual classification of sample text fragments. Each audit will identify areas for further refinement. Wash, rinse, repeat!
And there you have advanced sentiment analysis, as easy as 1, 2, 3.