Why we do what we do: When the Sum is Less than the Parts – Simpson’s Paradox
Some of the greatest mistakes people make is by having complete faith in numbers or in their own abilities to use them to get a desired result. While normally there are a great many just biases and logical fallacies that make up human cognition, sometimes there are factors in the real world that conspire to make it even more difficult to act in a meaningful and positive way. One of the more interesting phenomenon in the world of data is the statistical bias known as “simpson’s paradox”. Simpon’s Paradox is a great reminder that one look at data creates a fallacy that can often lead to a very wrong conclusion. Even worse it can allow for claims of success for actions that are negative in the context of the real world.
Simpson’s paradox is a pretty straight forward bias, it is when you have a correlation present in two different groups individually, but when combined they produce the exact opposite effect.
Here is a real world example:
We have run a analysis and show that a variation on the site produces a distinct winner for both our organic and our paid traffic:
But when we combine the 2, we have the exact inverse pattern play out. Version A won by a large margin for both Organic and Paid traffic, but combined it dramatically under performs B:
This seems so counter intuitive, but it plays out in many places in real world situations. You also may find the inverse pattern, one where you see no difference in distinct groups, but combined you see a meaningful difference.
In both cases, logically we would want to presume that A was better than B, but it was not until we add the larger context that we understand the true value.
While this is a trick of numbers, it presents itself far more than you might expect, especially as groups dive into segmentation and personalization. The more people leap directly into personalization with vigor, the more they are leaving themselves open to biases like Simpson’s Paradox. We get so excited when they are able to create a targeted message, and so desperate to show its value and to prove their “metal” that they don’t take the time to evaluate things on the holistic scale. Even worse, they don’t even compare it with other segments or account for the cost to maintain a system. They are so excited by their ability to present “relevant” content to a group that they think needs it, that they fail to measure if it adds value or if it is the best option. Even worse, they then go around telling the world about their great finding, only to be causing massive harm to the site as a whole.
One of the key rules to understand is that as you keep diving down to find something “useful” either from analytics or from causal feedback after the fact, the more likely this plays out. You can use numbers to come to any conclusion with creative enough “discovery”. If you keep diving, if you keep parsing, you are exponentially increasing the chances that you will arrive at a false or misleading conclusion. Deciding how you are going to use data after the fact is always going to lead to biased results. It is easy to prove a point whenever you forget the context of the information or you lose the discipline of trying to use it to find the best answer.
So how do you combat this? The fundamental way is to make sure that you are taking everything to the highest common denominator. Here is a really easy process if you are not sure how to proceed:
1) Decide what and how you are going to use your data BEFORE you act.
2) Test out the content – Serve it randomly to all groups, even if you design the content specifically for one group, test to everyone. If you are right, the data will tell you.
3) Measure the impact of every type of interaction to the same common denominator. Convert everything to the same fiscal scale, and use that to evaluate alternatives against each other. Converting to the same scale allows you to insure that you know the actual value of the change, not just the impact to specific segments.
4) Further modify your scale to account for the maintenance cost to serve to that group. If it takes you a whole new data system, 2 apis, cookie interaction and IT support to target to that group, then you have to get massively higher return then a group you can do in a few seconds.
What you will discover as you go down this path is that you are often wrong, in some cases dramatically so, about the value of targeting to a preconceived group. You will discover not only that many of the groups you think are valuable are not, but also many groups that you would not normal consider for value to be higher valuable (especially in terms of efficiency). If you do this with discipline and over time, you will also learn complete new ways to optimize your site, be it the types of changes, the groups that are actually exploitable, the cost of infrastructure, and the best ways to move forward with real unbiased data.
As always, it is the moments where you prove yourself wrong that you will get dramatic results. Just trying to prove yourself right does nothing but give you the right to make yourself look good.
I always differentiate a dynamic user experience from a “targeted experience”. In the first case, you are following a process, feeding a system, not dictating the outcome, and then measuring the possible outcomes and choosing the most efficient option. In the second, you are deciding that something is good based on conjecture, biases, and internal politics, serving to that group, and then justifying that action. Simpson’s paradox is just one of many ways that you can go wrong, so I challenge you to evaluate what you are doing? Is it valuable or are you just claiming it is? Are you looking at the whole picture, or only the parts that support what you are doing? Are you really improving things, or just talking about how great you are at improving things?
Minority Report – How to avoid failure for personalization
One of the largest pushes in our industry over the last year is to create a massive personalization scheme on your website. Apparently people missed the point of Minority Report, because the movie mocked this behavior and people’s assumptions about what people are going to do. Time and time again, I hear about clients who are sold on “personalization” who have built out massive 35 and 48 point schemas about where and what they are going to target their personalized content, only to have them be shocked when I talk about how inefficient that is. Without fail, the clients who get the worst return on their optimization efforts are the ones that push full steam down this boondoggle without applying discipline or feedback into their efforts. Dynamic experiences built to meet user needs are an amazingly effective tool, but in order to get that value, you must first tackle your own assumptions.
You are making three massive assumptions when you just run down this road:
1) Assuming you know WHO matters most
2) Assuming you know WHAT type of changes matter to them
3) Assuming you know WHERE to make those changes
I would add a 4th one:
4) Assuming that the full cost of personalizing your website is much less then it really will be.
Who are you targeting to?
Often groups want to push out based on a site behavior or behaviors to try and create a more “dynamic personalized experience based on what the customer has already declared their intent is”. Things like changing the main banner based on the number of visits or an engagement scoring system. Groups will sit around in giant meetings trying to come up with the perfect scheme, which is counter productive to getting the very value that those meetings are hoping to achieve. Just deciding on what you are going to do completely misses the point in that the same person can be looked at 100 different ways if you so choose. That same person came from somewhere, looked for something, used some sort of browser, during some day of the week, at some time, with some sort of prior history… You can choose to target to ANY of those things, which means that you need to figure out a way to measure the value of each one AGAINST each other.
This gets us to the point of what I usually refer to as an “exploitable segment”, one that has shown through CAUSAL data that it creates a change in user outcome based on a change in the user experience. This is one of the many differences in the disciplines of analytics and optimization, since in the world of correlative data; you are looking for groups that have a different behavior. In optimization, we are looking for groups who CHANGE their behavior based on a test. This means that we don’t care that people who come from search spend half as much time on your site as people who come straight to the site, we only care if those two groups have a different “winner” for various test results. If the same thing wins for both, they may both have a different propensity of action, but you gain nothing from changing the user experience relative to each other; the same thing helps both.
In the worst case, groups start out by talking together or letting one person come up with a concept or schema that they are just sure will work. In the next step up we often we deal with groups strictly rely on CORRELATIVE data to try and answer this problem. Unfortunately, correlative data can’t tell you the value of an action nor the efficiency of an action. The best it can do is give you some insight into the rate of action and the likelihood of an outcome as long as you do not factor in cost or efficiency. Analytics data can tell you what the rate of action is for the search people, but it cannot answer if their behavior will change based on a dynamic experience. You need to use causal data, the value of the changes relative to each other, to really dive in and discover what groups are actually going to change their behavior and that you can leverage to improve site performance.
I cannot stress this enough… a user experience is more than banner content or copy. Usually the largest and the longest lasting changes to sites are based on real estate, or changing the spatial dynamics of a page either through layout, inclusion/exclusion of items, relative positioning of items. Your other options are changes to presentation (how something looks) or function (how it interacts or how it is programed) and then copy/content. All of these things can be changed, and figuring out the order of value is vital to efficiency. One of the very first things any program needs to do is figure out the value of different types of changes relative to each other. You have to think in terms of the entire user experience and then figure out what changes, relative to each other, are most valuable. Even if we are limited to one content area, we can look for efficiencies of scale by focusing on the components and rules of the content over the individual content item, in order to gain returns on otherwise perishable changes.
Where should I personalize?
Most teams walk into personalization with the want to tackle the entire site in one blast, to have the same type of personalization on the front door, landing pages, product pages, in cart, in the right gutter, everywhere where they can fit it in. In most cases, even if one of those or multiple of those spots is positive, the overabundance is inefficient to the point of counteracting a lot of the good that may come from putting personalized experiences up on the site. It also fails to account for the fact that maybe those spots shouldn’t exist? Or maybe different functions on the site need different types of “personalization”? Or maybe the most personalized thing you can do is move items on the page to increase the efficiency of the user flow. Like everything else, it requires study and discipline to figure these things out.
So often the first thing I do when working with groups is ask them to prove out the value of what they are targeting; without fail what they are doing has no positive value, and can often times be hurting performance. You have to be disciplined to get real, long term value, from your “personalization” program. Optimization gives you the ability to measure efficiency and the value of items relative to each other, which means the same tool you want to target with can give you so much more, with almost no effort, simply by being willing to ask this fundamental question. We are trying to learn about what matters most, not try to be “right”. Nothing is more valuable then when you are “wrong” about your assumptions.
Like so many things in the past and that will be, personalization is the next buzzword bingo item that has caught the attention of the online world. It doesn’t mean that the concept is not valuable or is not something you should strive for. Some of the sites that get the most value get it from dynamic targeted user experiences. What it does mean however is that you cannot jump in without understanding the discipline it takes to achieve real long term success. Without being willing to go down the path of learning, almost all efforts are doomed to failure.
I fully encourage you to find meaningful, exploitable, dynamic user experiences. You need to work to make your site a living breathing thing that shifts to meet new needs and is something that different people get different things from. What has to happen though, is you need to tackle each step with the attention that it requires and to apply discipline to reach meaningful results. You cannot just guess your way to victory, but you can get there easily if you are willing to answer key questions through action and allow the results to dictate the path you are on, not your own ego.