Modeling the Underreporting Bias in Panel Survey Data

Authors: Sha Yang (Leonard N. Stern School of Business, New York & School of Economics and Management, China), Yi Zhao (J. Mack Robinson College of Business, Georgia), Ravi Dhar (Yale School of Management, Connecticut)

Publication: Marketing Science

Year: 2010

Focus Area: Impact, Research method development

Relevance: Surveys are used extensively to determine the prevalence and impact of financial fraud. Given the sensitivity of the topic, it is also particularly prone to underreporting biases in surveys, as respondents are reluctant to admit their own victimization. Determining this rate of underreporting is essential to untangling fraud’s actual social and economic impact.

Summary: While underreporting has been documented (Turner 1961, Waksberg and Neter 1965, Lee, Hu and Toh 2000) and modeled (Bailar 1975, Bollinger and David 1997) in other studies, this paper proposes a more sophisticated means of quantifying its impact.

By studying both reported behavior and partially observed behavior, the authors propose a mathematical modeling framework that determines who is underreporting, when, and how much.
This model allows collected information to be accurately interpreted, provides computation of true behavioral incidence, and identifies particular demographic and psychological characteristics that correlate with increased underreporting (such as sex, age, income, and motivation).
For example, in a study examining drinking habits (water and soft drinks), rates of underreporting varied with specific psychological and demographic characteristics (women reported less than men; those inclined to indulge reported less than those motivated by “looking good” and “health”).
By identifying factors that correspond with underreporting, the model highlights options to intervene in a targeted manner, and encourage full and accurate participation by respondents (such as incentives).

While this model is intended for longitudinal survey panel reporting, it has promise for similarly rigorous computational analysis of one-time surveys (commonly used to identify the rate of fraud).

Author Abstract: Panel survey data have been gaining importance in marketing. However, one challenge of estimating econometric models based on panel survey data is how to account for underreporting; that is, respondents do not report behavioral incidences that actually occur. Underreporting is especially likely to occur in a panel survey because the data-recording mechanism is often tedious, complex, and effortful. The probability of underreporting is likely to vary across respondents and also over the duration of the survey period. In this paper, we propose a model to simultaneously study reported behavioral incidences and partially observed actual behavioral incidences. We propose a Bayesian approach for estimating the proposed model. We treat those unobserved actual behavioral incidences as latent variables, and the Gibbs sampler makes it convenient to impute the nonreported consumption incidences along with making inferences on other model parameters. Our proposed method has two advantages. First, it offers a model-based approach to remove the underreporting bias in panel survey data and therefore allows marketing researchers to make accurate inferences about consumers’ actual behavior. Second, the method also offers a natural way to study factors that influence respondents’ propensity to underreport. Because we treat those underreported behavioral incidences as nonmissing at random, this underreporting propensity varies across respondents and over time. This understanding can help marketing researchers design the right strategy to intervene and incentivize respondents to authentically report and hence improve the quality of survey data. The proposed model and estimation approach are tested on both synthetic data and actual panel survey data on consumer-reported beverage-drinking behavior. Our analysis suggests that underreporting can significantly mask respondents’ true behavior.

Full article