Why we do what we do: When the Sum is Less than the Parts – Simpson’s Paradox

Some of the greatest mistakes people make is by having complete faith in numbers or in their own abilities to use them to get a desired result. While normally there are a great many just biases and logical fallacies that make up human cognition, sometimes there are factors in the real world that conspire to make it even more difficult to act in a meaningful and positive way. One of the more interesting phenomenon in the world of data is the statistical bias known as “simpson’s paradox”. Simpon’s Paradox is a great reminder that one look at data creates a fallacy that can often lead to a very wrong conclusion. Even worse it can allow for claims of success for actions that are negative in the context of the real world.

Simpson’s paradox is a pretty straight forward bias, it is when you have a correlation present in two different groups individually, but when combined they produce the exact opposite effect.

Here is a real world example:

We have run a analysis and show that a variation on the site produces a distinct winner for both our organic and our paid traffic:

But when we combine the 2, we have the exact inverse pattern play out. Version A won by a large margin for both Organic and Paid traffic, but combined it dramatically under performs B:

This seems so counter intuitive, but it plays out in many places in real world situations. You also may find the inverse pattern, one where you see no difference in distinct groups, but combined you see a meaningful difference.

In both cases, logically we would want to presume that A was better than B, but it was not until we add the larger context that we understand the true value.

While this is a trick of numbers, it presents itself far more than you might expect, especially as groups dive into segmentation and personalization. The more people leap directly into personalization with vigor, the more they are leaving themselves open to biases like Simpson’s Paradox. We get so excited when they are able to create a targeted message, and so desperate to show its value and to prove their “metal” that they don’t take the time to evaluate things on the holistic scale. Even worse, they don’t even compare it with other segments or account for the cost to maintain a system. They are so excited by their ability to present “relevant” content to a group that they think needs it, that they fail to measure if it adds value or if it is the best option. Even worse, they then go around telling the world about their great finding, only to be causing massive harm to the site as a whole.

One of the key rules to understand is that as you keep diving down to find something “useful” either from analytics or from causal feedback after the fact, the more likely this plays out. You can use numbers to come to any conclusion with creative enough “discovery”. If you keep diving, if you keep parsing, you are exponentially increasing the chances that you will arrive at a false or misleading conclusion. Deciding how you are going to use data after the fact is always going to lead to biased results. It is easy to prove a point whenever you forget the context of the information or you lose the discipline of trying to use it to find the best answer.

So how do you combat this? The fundamental way is to make sure that you are taking everything to the highest common denominator. Here is a really easy process if you are not sure how to proceed:

1) Decide what and how you are going to use your data BEFORE you act.

2) Test out the content – Serve it randomly to all groups, even if you design the content specifically for one group, test to everyone. If you are right, the data will tell you.

3) Measure the impact of every type of interaction to the same common denominator. Convert everything to the same fiscal scale, and use that to evaluate alternatives against each other. Converting to the same scale allows you to insure that you know the actual value of the change, not just the impact to specific segments.

4) Further modify your scale to account for the maintenance cost to serve to that group. If it takes you a whole new data system, 2 apis, cookie interaction and IT support to target to that group, then you have to get massively higher return then a group you can do in a few seconds.

What you will discover as you go down this path is that you are often wrong, in some cases dramatically so, about the value of targeting to a preconceived group. You will discover not only that many of the groups you think are valuable are not, but also many groups that you would not normal consider for value to be higher valuable (especially in terms of efficiency). If you do this with discipline and over time, you will also learn complete new ways to optimize your site, be it the types of changes, the groups that are actually exploitable, the cost of infrastructure, and the best ways to move forward with real unbiased data.

As always, it is the moments where you prove yourself wrong that you will get dramatic results. Just trying to prove yourself right does nothing but give you the right to make yourself look good.

I always differentiate a dynamic user experience from a “targeted experience”. In the first case, you are following a process, feeding a system, not dictating the outcome, and then measuring the possible outcomes and choosing the most efficient option. In the second, you are deciding that something is good based on conjecture, biases, and internal politics, serving to that group, and then justifying that action. Simpson’s paradox is just one of many ways that you can go wrong, so I challenge you to evaluate what you are doing? Is it valuable or are you just claiming it is? Are you looking at the whole picture, or only the parts that support what you are doing? Are you really improving things, or just talking about how great you are at improving things?