Image copyright Petr Vaclavek
With all the advances in Big Data, shouldn’t it be easy to predict the future? Already, the algorithms and computer processing power exist to analyze petabytes of data every second, so it would seem logical we could us that power to predict outcomes — election results, the stock market, economic trends, or simply who will win the Super Bowl. The problem is, our technical capacity to analyze data is growing faster than our social capability to understand it.
Armed with mountains of data, most experts put in charge of predicting outcomes still can’t come up with accurate answers. Why? Because most people have a human bias to manipulate data to deliver the outcomes they really want to see.
Philip Tetlock, a psychologist at the University of California at Berkeley, published a book in 2005 titled “Expert Political Judgment — How Good Is It?” For the book, a team of researchers interviewed 284 political experts and extracted 80,000 predictions from them over 15 years, rating the probabilities that each event would occur (they gave each prediction three possible outcomes). The result was surprising: the experts’ predictions were worse than those made simply by assigning equal probabilities to all three outcomes. Dart-throwing monkeys could have performed just as well. It’s easy to make fun of talking heads and political pundits, but what about experts in other fields?
Another psychology professor and a past President of the American Psychological Association, Paul Meehl, has conducted similar research in other fields. In his work, he studied whether a simple algorithm could predict freshmen grades at the end of a school year better than expert college counselors could. The algorithms were only given SAT scores and grades. The expert counselors, on the other hand, were given the same data as well as a 45-minute interview and a personal essay. Again the simple algorithm won. In yet another example of experts versus algorithms, Paul’s work pitted algorithms predicting the future price of a bottle of wine against connoisseurs. In this case, the wine predicting algorithm had access only to three weather variables while the wine connoisseur, on the other hand, had all of the data and the ability to taste the wine. By now it should be obvious the algorithms beat the connoisseurs when it came to predicting the wine’s price. Paul’s analysis didn’t stop there. It extended to medical experts, flight trainers, parole officers, bankers, and many others. In all the cases, the algorithms won. So where did all these experts go wrong? There were several reasons — but we will focus on just one here.
The biggest bias experts have is they tend to value perceived causes or stories over the underlying data. Storytelling is a powerful way to inject order into chaos, but reality is often far more complex than a simplified story. Whenever data supports their stories, they promote the data, but when data contradicts their stories, they question or ignore the data. Experts become really good at “finding the data” to promote their case.
What does all this have to do with marketers? A marketer’s job is to tell a story. Any expert will have a tendency to manipulate data to get it to conform to his or her story. Marketers are susceptible to the same sins, except they’re even more prone to storytelling. There is also a terrible feedback loop at work here. There may be good marketers who don’t want to torture data to fit into a clean story, but these marketers couldn’t make the bold claims that end up as headlines. So in short, marketers are some of the worst “experts” at predicting specific outcomes because their job biases them to storytelling and rewards them for making predictions that end up in the news.
To better understand why marketers tend to ignore data, let’s look at one possible scenario.
Imagine a Father’s Day ad that’s unfortunately misplaced next to a news article with the headline “One in Four Women Victims of Domestic Violence.” As a marketer, you are presented with the following two formulations about the poorly placed ad and you want to find out which publisher you should blame:
Problem Description A
85 percent of the impressions in the Father’s Day ad campaign are served in publisher A, and 15 percent are served in publisher B. Ad verification software says the ad was found in a bad context on publisher B, and the verification software is right 80 percent of the time.
When asked, most marketers choose publisher B as the most likely cause of the problem. But let’s try an alternate formulation of the problem:
Problem Description B
Publisher A and publisher B served the same number of ad impressions.
We learn that 85 percent of ads served in publisher A are served into inappropriate content. Ad verification software says the ad was found in a bad context on publisher B. The verification software is right 80 percent of the time.
Now which publisher do you think caused the problem? Most marketers choose publisher A as the culprit.
It turns out both formulations of the problem result in the same probabilistic answer — that is, 61 percent probability that publisher A caused the problem, according to Bayes rule in odds form. Most marketers get the first version wrong and the second version right, even though it’s the same math problem! Why do most marketers get the second version right? In the first story, how you use the fact that 85 percent of impressions are served in publisher A is not obvious. In the second story, you have a cause. Publisher A is, for all practical purposes, almost always inappropriate. Marketers prefer stories that point to an understandable cause over statistics. Marketers will hone in on a story and ignore the rest of the data, no matter how big the data set.
Think of the many problems marketers have that involve some form of prediction. Who is the target audience for this product or service? Why do they need, like, or want the product or service? Where can we find this audience? What creative will succeed with this audience? We even have a whole industry built around the RFP (request for proposal), which itself is a form of a story. The RFP mechanism guides billions of dollars of ad spending, andad allocation decisions are forms of predictions.
So can marketers learn to overcome their biases and use data to make more accurate predictions? It may well be possible.
In his book, “The Signal and the Noise, ” Nate Silver tells a great story about how the PECOTA system he built to predict pitcher performance in baseball was beat by expert scouts. In 2006, Nate’s program PECOTA produced a list of 100 prospects from the minor leagues. He then pitted that list against the Baseball America Top 100 prospects list, which was constructed by scouts. After watching six years of actual performance, and worrying their jobs would disappear due to PECOTA, the scouts performed better than the PECOTA list by a good margin. It turns out the scouts had learned to tone down their own biases and instead “listen” to the data. They learned they could combine their expertise with the reams of public data available.
Marketers can learn to separate the signal from the noise. Marketers are dedicated storytellers who believe their own stories, downplay luck, and impose cause on randomness. In the world of Big Data, marketers have to learn to to let the data speak for itself. Otherwise, all the big data in the world won’t help marketers understand their customers better.
Omar is the co-founder and CEO of BlueKai, the data activation system that supplies both Fortune 100 companies and leading publishers with solutions for managing and activating first and third party data for creating highly effective customer and marketing campaigns. Omar’s previous roles include Chief Advertising Officer for mobile search and advertising solution Medio and Chief Marketing Officer for early behavioral data leader Revenue Science.
Comments are disabled on this post