Meet the Proxies: Stand-Ins for the Data You Really Want
Posted by Jeffrey Henning on Tue, Dec 21, 2010
“Thanks so much for shopping at Walmart today. Before I ring up your order, can you tell me what your household income was last year?”
Sometimes it’s rude to ask for the data you actually want from customers. And sometimes they might not know the real data point: “Welcome to Walmart. What was your discretionary income last year (gross income minus taxes minus household necessities)?”
In such cases, you need to ask for proxy variables, for data points with a high correlation to the variable that you are really interested in. “Thanks so much for shopping at Walmart today. Before I ring up your order, may I have your ZIP code?” (ZIP codes are 5-digit postal codes in the United States; they can represent one address to thousands, and they show large differences in household income.)
Recording ZIP codes with transactions enables the researcher to estimate the average household income for customers by using ZIP code data. ZIP codes at retail also, typically, tell the city a person lives in, and can be useful for understanding the geographic reach of a store’s customers. If you bought with a credit card, the retailer can even subscribe to a service that will provide your address from a look up on your name and ZIP code.
It is a more acceptable to ask household income within a self-administered survey than at a checkout counter. One survey studied the effect of discretionary income on technology adoption. Income and expenses were items that many respondents were reluctant to share or did not know (quick, tell me your total tax bill – federal, state and local - for 2009!). Respondents did share their household income, their monthly mortgage/rent, and the number of people in their household; this provided enough information to support a simple model that estimated taxes from income and mortgage and estimate the costs of necessities per household size, all to estimate discretionary income per respondent. Other studies, notably some from Pew Internet, have simply used household income itself as a proxy for discretionary income.
Moving from public opinion research to customer loyalty research, likelihood to recommend is often used as a proxy variable for measuring the future WOM (Word Of Mouth) marketing potential of a brand. “How likely are you to recommend us?” is an easier question to answer than “To how many people will you recommend us in the next six months?”
Asking proxy questions is often good questionnaire design, as it recognizes that respondents do not know everything about themselves or their households. A good proxy question will typically be an easier to answer question and one that returns more accurate results than asking for the target variable.
Care must always be taken in choosing a proxy variable to make sure it has a close relationship to the variable of interest. And the analyst should make certain not to overgeneralize from the findings. Still, a good proxy can provide answers that would otherwise be unobtainable.