The study of history teaches you many key lessons, such as the lack of unreliability from the first person narrative, the inability to understand the scope of something while it is happening, and most importantly that history is written by the victors. Howard Zinn made a living out of showing people just how true this is and how little we understand our own world because of this. This same mistake is made not just in historical analysis, but all the time in data analysis as when we only look at attributes of the “winning” side while forgetting to analyze the non-winning side of the data. While closely related to the Halo Effect, this effect shows itself in what people like Degrasse Tyson and Taleb refer to as the graveyard of knowledge.
Neil Degrasse Tyson best explains the graveyard of knowledge with one of his stories:
You read a study that says that 80% of people who survived a plane crash studied the exit routes before the plane took off. Comfortable with this knowledge, the next time you board a plane, you quickly study the exit routes on the plane. As you do this, you start to analyze that data and you come to a sudden realization, what if 100% of the people who did not survive the crash studied the exit routes?
We don’t know the other side of the story, because there is no one there to report. We only know the people who came back and what they can tell us, but we have no clue what was going on with those that did not come back. Knowledge is lost all the time when people only look at the winners. Winners are the only ones left to tell their stories, so we only look to them for details. The reality is that the important details are rarely only on the winning side, and all the people who never returned give us just as vital knowledge. We lose all the really important information in that graveyard of the people who never returned. In fact, we only can start to understand anything if we have both pieces in order to have context for our information. We focus on those that “survived” that we completely ignore the context of the people who did not. We look only at the behaviors we want and then extract qualities about that group of people, without looking at the population as a whole or more importantly, what would have happened if we did nothing.
We look for people based on the end of their behavior, and not their definition before. We love to know what are the characteristics of people who made a purchase, or of people who come to our site more then 3 times. We look backwards from that winning behavior because that is all we think we have available to us. We love to describe past behavior through correlative behavior, and then attribute “value” to those actions. People who purchase use internal search 2 times on average, therefore internal search must be the cause of that action. People come from social sources spend $4.56 on average, therefore social is worth $4.56. We don’t know what would have happened if the same person didn’t use search or come from social, would they have spent more or less? All of these types of analysis attribute past behavior to end value, missing the point that we don’t know what they would have done otherwise. Is looking at the exit routes helping or hurting your ability to survive? We don’t know if more is better, we instead assume a linear relationship. If campaign X is generating value Y, then doubling spend on X will of course generate 2Y.
Looking only at the data from one group or that define the “winners” means that you have completely lost any value from that data. We can not express how much better or worse an action made things, only that we have X amount of search spend and ended up with Y revanue. Even worse, pretending that you can derive cause and effect from the larger context means that you are not getting value from the actual data itself, but instead propagating your own world view and using the data only to support it. Like the Texas sharpshooter fallacy, you are creating a story to fill in what is most likely random noise from the data. Rates of action, such as 80% of people looked at the exit routes, tell you nothing unless you know both that increasing that number increases your ability survive, and you know the cost and ability to influence people to make that action. I can tell you that 100% of people who are determined to spend $1000 on your site will spend at least $1000, but that doesn’t tell me how I get those people in the first place, or if it is worth my time to spend the resources there for that small population as opposed to the multitude of other alternatives.
People make this mistake all the time in the world of data analysis when they get so caught up on a set path or on looking backwards from an event. They want to know what all the people who purchased did, or what all the people who come to your site 4 times have in common. There is even a whole world of statistical analysis focused on clustering and personas which is making a large push in our industry that is focused on this tendency. The mistake people make is that only a small part of your population fails to tell you the context of that information. Like the plane, knowing the attributes of one group doesn’t tell you the attributes of the population as a whole. Even worse, it assumes that those attributes have anything to do with that behavior. We have no way of knowing if people who survived just happened to look at the exit routes, or if people who look at exit routes are more likely to survive.
In the world of testing, this bias makes itself present in people who want to know actions between steps. They want to know of people who purchased, did they go to a product page or the a search results page. They want to know what path or what people clicked on. Even if this knowledge was not ignoring the graveyard of knowledge, what would it tell you? More people went to the search results page, is that a good thing or a bad thing? You are accomplishing nothing with this data except adding cost and slowing down your ability to make the correct decision. It is easy to get lost in the world of data if you are trying to tell a story or if you want to find a preconceived point, but as soon as you are trying to use the data to find an answer and not just support your point of view, the discipline of what you look at and knowing what it can tell you becomes paramount.
So the question is, 40 years from now, will all the analysis you do be part of the “winning” group, or will it be lost in the graveyard? Stop pretending that data tells you more than it really does and stop only looking at the winning side, and you will be able to derive magnitudes greater value from your data. The discipline of looking at the whole context and of discovering the value of actions is what will grant you results, not just finding stories. Remember that patterns are only patterns, they are neither good nor bad, and it is incredibly easy to forget that even if they are perfect, they tell you nothing about your ability to change them, or the cost to do so. Data can be the most powerful tool in your arsenal, but it can also be abused to no end and provide negative value and a blanket justification for poor decisions.
When choosing the next fallacy to cover, I faced a tough choice as there are so many different fallacies that describe the same human behavior: The belief that we know or can answer things we can’t by assigning pattern or reason to things without actual cause. We are wired to want to explain why things happen, but in order to accomplish that task, we ignore or use only data we want and we supplant our own points of view as the core reason things happen. We believe that the world is far more established and easy to understand then it really is. My favorite fallacy that covers this behavior is the Texas Sharpshooter Fallacy, which is when someone assigns pattern or reason to random chance.
The name Texas Sharpshooter comes from this “story”:
A cowboy takes aim at a barn and starts shooting randomly. When he is done, we walks up and notices that there are a large number of holes in one area and fewer holes in another. He then paints a bull’s-eye over the area where there are a large number of holes. To anyone walking up, it looks like he was a good shot and mostly hit where he was aiming.
Now while I am sure that we can all think of cases where others have done this with data, the first thing you need to understand is that we all do this… all the time. We see patterns and rationalize our own actions, whether it is why we do things in a certain order or even why we believe certain “truths” about the world. We rationalize decisions after we make them, and while they are not all random, our understanding of why we do things is often flawed at best and completely delusional other times. The human brain actually engages the rationalization part after the action part, meaning that we always act, then think of why we act, not the other way around. We draw circles around the patterns of our own behavior and then accept those circles as the logic that lead to the decision. This makes our understanding of why people do things often extremely flawed, since so much of how we view others behaviors is through the context of our own “understanding” of what drives our own actions. We so want to come up with a why, and we dive so deep, that we miss the point that we will never truly know. Nor does it matter, sense we are describing a pattern, one that we can engage and interact with and build rules around, without needed to know all the causes of that pattern.
One of my favorite examples of this in the real world is a psychology professor in Baltimore that does the same demonstration each year. He starts his lecture by bringing a chicken up in a cage on stage. The cage has a feeder that is set to dispense food pellets at random time intervals. He then covers the cage and talks for an hour and half. At the end of the presentation, he takes the cover off and without fail, the chicken is found doing some behavior over and over again; it has convinced itself that this behavior is why the food comes out. The food comes out no matter what it does, and it has no control, but it has convinced itself that it is in control of the situation. We are all like that, we have to explain things so bad that we will believe anything, or will paint bulls-eyes, where they aren’t to make ourselves feel like we have more control then we really do.
We like to believe we are smarter then that chicken, but we aren’t. In our world, data is our food, so we assign patterns to explain changes in what we observe. Data becomes a crutch to accomplish this task. We so want to have a story to tell others and ourselves that we find one in the data. We believe that because conversions went up, the message must have “resonated” or because one group has a different winner then another group, that it is because of their socioeconomic status or because they are more familiar with technology. We have no way of knowing this, but we convince ourselves and others that this is the reason why. The reality of the situation is that we need “why” to help us feel like we understand, but acting and using data in no way requires a why so much as it requires a willingness to act.
Looked at from a data perspective, this means that when we see a noticeable meaningful change, often from testing, we are left to think of why it happened. People are fascinated with the “why?” often at the cost of what comes next. The reality is that we are always going to be looking only at a noticeable change and then apply rationalization after. We get so caught up in the why that we miss the truth that we will never really know nor does it matter. Having a clear plan of action for our data means that we never need to know the why to be successful, and in fact insures that the more we dive in and try to answer it, the more we are wasting resources. Acting on data requires willingness and alignment, it is decided before something happens. Rationalization is what happens afterwards. Why does not change your need to act on the data, nor does it allow you have some sudden insight into human behavior. At best you have a single data point, at worst you are painting bull’s-eyes around holes and calling them insight.
Marketers have been trying to figure out the “why” for a long time, and while there is a lot of people that claim to know, the reality is at best we have pattern, and at worst we have stories we present to make ourselves look good. You can not derive pattern from a single data point, yet we are obsessed with trying to do that very thing. If we are honest with how we go about collecting data, and we are open to consistent and meaningful action from testing, then why will never matter. If we are following the data and disciplined, then we know how we are going to act based on the results, not why the results happened. If you are disciplined in how you think about users, then you know that a story or a single data point will never tell you anything. If we really want to make things personal, then we won’t force “personas” on people, but instead let data tell you the casual value of changing the user experience and for whom it works best.
At its worst, the Texas Sharpshooter Fallacy represents our need to show that we are more in control or know more than we really do. We use the need to explain why to make stories and to help communicate our value to others. My background is in historical analysis, and one of the first things you learn is how little value comes from the first person narrative. It shows far more about the fallacies of the person speaking then it does for providing real information about what really is happening. Data at its heart is meant to improve situations, not to allow you to come up with a story that satisfies your world view.
Why is not a question that you can ever truly answer, yet most people in marketing are obsessed with a Sisyphean quest to answer it. The reality is that it is a question that has nothing to do with how you act on data or the disciplines needed to be successful. We do not need to know why for everything, even if it seems to hold all the answers. We just need to know what to do with what is in front of us and to appreciate how little we really know about the world in which we live.
Thanks to Brent Dykes, there has been a lot of talk recently about analytics action heroes. Everyone wants to be a hero, and everyone thinks that they are one or on the road to being one. My work unfortunately has me often facing the opposite; programs that are not succeeding, often due to villains. One of the great truths is that the villains never know that they are the villains, often thinking they are the real hero. They are constantly talking about action, they are involved, and more then anything they speak up for the use of data in the organization. To be a real villain, they have to be capable and smart, just like a hero, otherwise the damage they do would be mitigated. The problem is that they do it for all the wrong reasons and without the goal of actually improving performance. No one wants to be the villain, but why then do they so outnumber the heroes in our industry?
So how then do you know if you are the villain or the hero?
There is no magical litmus test to get your hero card, but there are many common traits that define the members of both groups. Here are a few barometers that might help you define where you are and what you need to work on to be whichever role you are trying to be.
There are heroes and villains at every level. It is not always a HiPPO versus the low man on the totem pole. Analysts and marketing managers are just as likely to run a program into the ground as VPs and CMOs. It isn’t about your title but about what the actions you take towards the program. Are you talking about making a difference while choosing actions that make you look good? Or are you actually doing the small things that aren’t looked at that really make a difference?
Heroes view their role as finding the best answer and doing what is needed to make the site succeed. Heroes judge their position by what they do to make others better. Villains view their roles as doing what their boss wants or what will make them look best. Villains use the position to focus on themselves. Heroes are interested in ignoring their “title” to do what is needed. Villains use their title to take credit for things and to keep things under their empire. Heroes know that there are many hurdles, but they won’t accept excuses. Villains are the first to complain about others, but then accept problems as excuses and then spend a great deal of time reminding you why it is the other person’s fault. Heroes know that you don’t know the answer to everything and that discovery is part of excellence. Villains tell everyone they have the answer and then find data to support their position and make them look better. Both sides talk about trying to do what is best, but the actions and the excuses determine quickly which side of the battle someone falls on. Everyone claims to do what is best for the site, but actions speak louder than words, and if you are worried about keeping people happy or doing only what you are told, then you are not doing what is best for the site.
Heroes’ skills are in finding multiple answers to problems and figuring out the efficiency and the value of each one. Their skills help educate people about what defines a good answer. They are capable of giving a presentation, but they are at their best with changing people’s misconceptions and finding the best answer, not just the first one that comes up. They know that to be successful, they need to know a little bit about everything and they never accept “I don’t know” as an acceptable answer. They go beyond what is asked and never settle for “best practices” or just returning a report. They know that just because their boss wants an answer to question A, that the company might be better served finding the answers to the questions that aren’t asked, so they focus their skills on finding those questions and answering them, even if that is not supporting someone’s agenda.
A villains’ primary use of their skill is directed towards self-promotion. They take every opportunity to show how valuable their “contributions” over focusing on what real value of the actions taken. They view their job as improving their “personal brand” and are more than happy to find data to support others claims or agenda, as opposed to finding the best answer. They are the first to dive in and find the answer to the questions their bosses are asking, even if that question has no real value. They blame others when they don’t know something and they are more than happy to tell others it’s their job to “figure it out”. They spend their time focusing on improving their presentation skills, networking, and self-promotion skills. All they want is to find an answer to the requests before them to make the people above them happy. They find no reason to find more than one answer or to challenge ideas, because the act of finding that answer makes others happy and helps them show their “value”.
Heroes love to research and view the thoughts of others. They do not however look at only one community or think that just because someone gives a great presentation that they are correct. They appreciate popularity, but know that the more people read a blog or buy a book, the more likely the material is to be what people want to hear and not actually valuable content. They don’t just accept a statement from anyone, especially when it sounds like exactly what they want to hear. They view the world through a lens trying to find everything that can be fixed and what is wrong with the current process. They don’t make excuses about time to dive into multiple disciplines or to find the latest news. They know that the time used to find a better way to do things will make them have multiples of that time available later. They take the time to read and find the best and worst quality materials out there because they care about content and know that simple almost never equals right. They know that you need lots of different perspectives on a problem to understand it, and that there is no single answer to any problem. They understand that today’s answers will prove to be wrong tomorrow, so they aren’t concerned with trying to prove themselves right as much as they are in finding the next “best” answer. They search out new perspectives and new people to continue a search for improvement.
Villains are also heavily involved in communities, in fact some of the most vocal and famous part of communities are villains. They use research and communities to promote their image and to tell the world how great they are. They find new ways to say the things that have already been said and view their self-worth and value as the act itself of making a presentation, not in the value of the content shared. They love to build their own groups in those communities in order to have more people propagate whatever myth they are selling at the moment. They are also always searching for the next big thing in order to get ahead of it, tell the world how they mastered it, and also to move on from what they were doing before the reality of their failure becomes evident. They don’t research or use community to find what is wrong with what they are doing, but instead to validate and promote their own agenda. They try to find what they can from every piece of information in order to make themselves look better and to bring others under their political umbrella.
Heroes view technology as a means to an ends, one that is often foolishly rushed into to meet someone’s agenda. There are great technologies out there, and no one would be able really achieve anything if it wasn’t for the great technologies in our industry, but they focus on getting things right, building out the right disciplines, the infrastructure, and not just learning one way but the best way to leverage any a tool to do a predisposed function. They aren’t impressed with having 50 tools running on your site, but instead with how many you have running in a way to really improve things. They live by the creed, “You can fail with any product” so they focus on creating the infrastructure to make the products they do have succeed. they know that just being able to collect data does not magically make it valuable. In order to do this, they are aware of all the various offerings on the market, but focus on the efficiency of each one. A hero is more interested in how often things go wrong and how to make sure they don’t fall into that trap then worrying about the latest great “success story“. They aren’t afraid to challenge sales pitches and “experts” to find the best answer.
Villains make their career on buying and getting the latest technologies. They love to be able to promise the next great thing internally and to “own” it to help themselves look good. They don’t care about what the likelihood of success is, but instead what they can sell internally about the “value” they are being promised. They rush to evaluate and get as many new technologies and to stay “ahead” of the field. They aren’t interested in building an infrastructure for success, but instead focus on what promises they can get to promote themselves internally. They spend their time “evangelizing” and not getting better. When things don’t work, they move on to the next technology or the newest industry buzz word and find someone to take the blame. They don’t care about building a successful program as much as they care about “integrating” all these technologies and finding a story to show their boss.
There are hundreds of other comparisons you can make between heroes and villains. The truth is that we are always having to balance one side versus the other. It may seem like a fine line between hero or villain, but remember that it is always up to you what actions you choose. Heroes know that you are forced to choose between doing those actions that make you look good and the ones that make an organization successful. Heroes accept the sacrifices and don’t make excuses. Villains convince themselves that they are the same thing and that what they are doing in all cases makes the organization better. No organizational structure or mental evolution of a program will make up for having villains in your program. We all talk about doing the right things, but at the end of the day, it’s not the stories your tell others or the justifications that you make to yourself, but your actions that determine which path you take.
The real question for you is, which do you want to be and if so, what are you doing to get there? All heroes have to go through a quest to earn their abilities, often with many hurdles and defeats. They are often not immediately rewarded for their skills and misunderstood, but in the end, they emerge victorious. There are always hurdles before you and you are always going to be searching for a way to get past them.
When your story is told by others, are you the hero or the villain?
Some of the greatest mistakes people make is by having complete faith in numbers or in their own abilities to use them to get a desired result. While normally there are a great many just biases and logical fallacies that make up human cognition, sometimes there are factors in the real world that conspire to make it even more difficult to act in a meaningful and positive way. One of the more interesting phenomenon in the world of data is the statistical bias known as “simpson’s paradox”. Simpon’s Paradox is a great reminder that one look at data creates a fallacy that can often lead to a very wrong conclusion. Even worse it can allow for claims of success for actions that are negative in the context of the real world.
Simpson’s paradox is a pretty straight forward bias, it is when you have a correlation present in two different groups individually, but when combined they produce the exact opposite effect.
Here is a real world example:
We have run a analysis and show that a variation on the site produces a distinct winner for both our organic and our paid traffic:
But when we combine the 2, we have the exact inverse pattern play out. Version A won by a large margin for both Organic and Paid traffic, but combined it dramatically under performs B:
This seems so counter intuitive, but it plays out in many places in real world situations. You also may find the inverse pattern, one where you see no difference in distinct groups, but combined you see a meaningful difference.
In both cases, logically we would want to presume that A was better than B, but it was not until we add the larger context that we understand the true value.
While this is a trick of numbers, it presents itself far more than you might expect, especially as groups dive into segmentation and personalization. The more people leap directly into personalization with vigor, the more they are leaving themselves open to biases like Simpson’s Paradox. We get so excited when they are able to create a targeted message, and so desperate to show its value and to prove their “metal” that they don’t take the time to evaluate things on the holistic scale. Even worse, they don’t even compare it with other segments or account for the cost to maintain a system. They are so excited by their ability to present “relevant” content to a group that they think needs it, that they fail to measure if it adds value or if it is the best option. Even worse, they then go around telling the world about their great finding, only to be causing massive harm to the site as a whole.
One of the key rules to understand is that as you keep diving down to find something “useful” either from analytics or from causal feedback after the fact, the more likely this plays out. You can use numbers to come to any conclusion with creative enough “discovery”. If you keep diving, if you keep parsing, you are exponentially increasing the chances that you will arrive at a false or misleading conclusion. Deciding how you are going to use data after the fact is always going to lead to biased results. It is easy to prove a point whenever you forget the context of the information or you lose the discipline of trying to use it to find the best answer.
So how do you combat this? The fundamental way is to make sure that you are taking everything to the highest common denominator. Here is a really easy process if you are not sure how to proceed:
1) Decide what and how you are going to use your data BEFORE you act.
2) Test out the content – Serve it randomly to all groups, even if you design the content specifically for one group, test to everyone. If you are right, the data will tell you.
3) Measure the impact of every type of interaction to the same common denominator. Convert everything to the same fiscal scale, and use that to evaluate alternatives against each other. Converting to the same scale allows you to insure that you know the actual value of the change, not just the impact to specific segments.
4) Further modify your scale to account for the maintenance cost to serve to that group. If it takes you a whole new data system, 2 apis, cookie interaction and IT support to target to that group, then you have to get massively higher return then a group you can do in a few seconds.
What you will discover as you go down this path is that you are often wrong, in some cases dramatically so, about the value of targeting to a preconceived group. You will discover not only that many of the groups you think are valuable are not, but also many groups that you would not normal consider for value to be higher valuable (especially in terms of efficiency). If you do this with discipline and over time, you will also learn complete new ways to optimize your site, be it the types of changes, the groups that are actually exploitable, the cost of infrastructure, and the best ways to move forward with real unbiased data.
As always, it is the moments where you prove yourself wrong that you will get dramatic results. Just trying to prove yourself right does nothing but give you the right to make yourself look good.
I always differentiate a dynamic user experience from a “targeted experience”. In the first case, you are following a process, feeding a system, not dictating the outcome, and then measuring the possible outcomes and choosing the most efficient option. In the second, you are deciding that something is good based on conjecture, biases, and internal politics, serving to that group, and then justifying that action. Simpson’s paradox is just one of many ways that you can go wrong, so I challenge you to evaluate what you are doing? Is it valuable or are you just claiming it is? Are you looking at the whole picture, or only the parts that support what you are doing? Are you really improving things, or just talking about how great you are at improving things?