Six Pitfalls of Predictive Modeling
8.1.2011 | Jesse Roberts
First, Facebook fans manage to get Betty White on SNL. Now, the latest “got milk?” ads have spawned “Got Discussion?”, a social-network campaign from the ever-clever milk people who’ve touched off a little brouhaha by touting the benefits of milk for the many long-suffering victims of PMS… men. Got discussion as to what segment of the population is not amused? Got awareness of how much press California’s milk board has received?
So whether negative or positive, for sheer entertainment or big budget advertising, the power of social media is undisputable. And in the advertising world, we’re all just a little, or a lot, enraptured. I’ll admit to being as excited about the power of social as the next marketer. But I’m a database marketer at heart, and still “enraptured” by proven DM basics: digging the lead-generation trenches, and crawling the miles to get to them; launching CRM plans to ferry those leads into best possible customer experiences (always readying measures should anything fail along the way); developing Retention efforts to ensure ongoing customer loyalty and engagement, and Winback campaigns to ensure we recoup the right lost clients.
The sophisticated analytic toolbox we use to do these things includes predictive modeling, the strongest record selection tool—a statistical rank-ordering of a prospect or customer universe, of their likelihood to behave in a certain way: Who’s most likely to respond to a direct mail campaign (in which case we’d build a Responder model)? Who’s most likely to convert? To own a cat? Love automobiles, or Betty White? Got milk?
When built and used correctly—with the appropriate guidance and thought leadership—predictive models can be the most impactful component of your marketing arsenal. Don’t underestimate the potential. And don’t underestimate the negative effects of these six common modeling pitfalls:
- Asking the wrong question. One client wanted to build a DM Responder model, with the intention of suppressing the “worst records.” We asked the client to reconsider developing a Suppression model instead. After all, the variables that correlate to being a responder may be very different from the variables that correlate to non-response.
- Building a model on too small of a population. As a rule of thumb, you need at least 2,500 to 5,000 records to ensure a quality model. Developing a model upon a smaller base weakens the model, as a smaller population may not adequately represent the overall universe. It’s always best to check for statistical significance first.
- Building a model on the wrong population. The data that goes into the modeling process must be reliable. If not, it can skew or bias the model’s true performance. For example, developing a model to identify at-risk churn candidates is quite a different thing than developing a model to identify loyal, high-ROI candidates.
- Shortcutting the process. A good model takes 4–6 weeks to build. If extremely intricate, it may take even longer, 6–8 weeks. Be sure to allow sufficient time for the model to be developed, tested and deployed effectively. A number of vendors promise turnkey software solutions. While a good option for a client with too small a base population, it’s typically not a good choice.
- Leveraging the appropriate diagnostics to ensure a model is not just explaining the past, but is predictive of the future. All models are built upon past behavior, but remember that prior marketing and promotional efforts can skew your results. For example, if prior campaigns only targeted females ages 45–65, chances are that will pop as a predictive variable in the model algorithm. Modeling is often hampered by self-fulfilling prophecy bias. To avoid this, modelers may choose to incorporate bias reduction measures to normalize the modeling population. Alternatively, if time allows for it, mailing a random sampling first can yield a better base population for model development.
- Diving too deep. Most models “break even” around the fifth or sixth decile. This is usually when a regression to the mean kicks in and undesirable records begin to enter the marketable universe. Each decile is weaker in its predictability than its predecessor, with broader ranges of error. Diving too deep can rapidly yield an unproductive campaign. The results from the model build should provide insight regarding depth selection, and several key performance indicators can be leveraged to identify appropriate cutoff points.
When used correctly, predictive analytics provide a great opportunity for a brand to gain a complete view of customer attitudes and preferences about its products.



YOU MUST SIGN IN OR REGISTER TO LEAVE COMMENTS. JOIN US!