Correlation: A Building Block to Survey Analysis
Posted by Jeffrey Henning on Mon, Aug 02, 2010
Correlation measures the strength of the relationship between two variables, revealing whether they are independent of one another or strongly “co-related”. Correlation is an important analytical building block, supporting key driver analysis, factor analysis, estimation of predictive validity and other techniques. Correlation is a powerful way to prioritize important areas for improvement.
Example Correlation Coefficients
A correlation coefficient is a number ranging from -1 to 1 that measures the strength of the relationship between two variables. For instance, customer loyalty studies often seek out correlations to likelihood to repurchase. The following chart shows some scatterplots of two variables with some examples of possible correlations.

I reviewed a number of recent studies for examples of correlations with these correlation coefficients:
- 1 – The two variables increase in lockstep. In one study, customer satisfaction had a .95 correlation to likelihood to repurchase. (This matches Bob E. Hayes’ finding that both are good components of an Advocacy Loyalty Index.)
- 0.8 – The two variables almost increase in lockstep and are practically equivalent at measuring the same underlying construct, as with likelihood to recommend and repurchase likelihood.
- 0.4 – As the scatterplot shows, the two variables are in less sync, but in many studies the highest correlations are in this range. Since many attributes make up product or service loyalty, it is not surprising that no single attribute will have a much higher correlation than 0.4. In one study, price satisfaction had a 0.4 correlation coefficient to repurchase – many respondents were less satisfied with price than their overall satisfaction but this dissatisfaction was mitigated by the service’s unique attributes (in other words, it was moderately expensive but with no direct competition).
- 0 – Few attributes are completely independent of loyalty, but a Vovici customer recently asked me to review how much of a long survey was answered to see if more loyal customers completed more of the survey (the overall satisfaction and repurchase questions began the survey; q.v., Ray Poynter’s post Should we ask Overall Satisfaction at the Start or End of a Survey?). There was no correlation between likelihood to repurchase and the percent of the survey that was completed.
- -0.4 – Negative correlation coefficients indicate that as one variable increases, the other variable decreases. In my review of research results, I found few examples of this, and they were often counterintuitive and related to unique situations for different businesses. One more general finding was that likelihood to switch to a different provider decreased as loyalty increased; the two variables didn’t have a higher correlation, because some disloyal customers weren’t going to purchase similar services from any vendor.
You can calculate the correlation coefficient using spreadsheets [e.g., CORREL() in Excel and Google Docs], free online tools and survey software applications.
Significance of Correlations
Because correlations above 0.199 indicate a significant relationship 19 times out of 20 at a sample size as small as 100, a common analytical mistake is to ignore the effect of subsample sizes when calculating correlations for filtered results. The above table can be used for a quick reference as to whether or not a correlation is statistically significant.
Correlation does not Equal Causation
A correlation may be statistically significant but meaningless. For instance, in one analysis of a transactional survey against actual subsequent loyalty behavior (q.v., The Value of Old Surveys), the variable with the highest correlation was whether or not respondents entered their phone number! In other words, respondents who requested a call back for an open service request were more likely to be loyal than those who didn’t. Could the organization get customers to give them their phone number to increase loyalty? Of course not – what seemed to be happening was that customers who were more loyal anyway were more likely to seek out a personal response rather than an email response (see the HBR blog post, Why Your Customers Don't Want To Talk to You).
Correlation is a simple but powerful technique for teasing new insights out of your survey data.
See also: