Category: Psychology

Why we do what we do: Believing what you want to believe – Observer-Expectancy Effect

The human mind is a funny thing. We can be aware of all our own faults, and others, and yet when it comes to stopping ourselves from falling down the many holes that we create for ourselves, we find it much easier to see the same mistake by others then in ourselves. In the next bias I want to tackle is Observer-Expectancy Effect, or “when a researcher expects a given result and therefore unconsciously manipulates an experiment or misinterprets data in order to find it.” Like its sibling, congruence bias, Observer-Expectancy Effect impacts what we test, but it even more fully impacts what conclusions we arrive at. It is about the entire phenomenon of hearing what you want to hear, or using data to only justify what you already wanted to do.

This sinister worm pops its head up in all types of places, whether it is in experiment design, using data to justify decisions, sales pitches, or even in just our own view of our impact on the world. How much of just regular marketing is telling you what you want to hear? Yet, we lose focus that we are just as susceptible to those messages when we are trying to prove ourselves right.

What is important is not as much what the problem is, but how best to “fix” it. How do you insure that you are looking at data and getting the functional result that is best, and not just letting your own psyche lead you down its predetermined path. The trick is to think in terms of the null assumption. The challenge is to always assume that you are wrong, and to look at the inverse of all your “Experience”; challenge yourself to think in terms of “what if that wasn’t even there” or “what if we did the exact opposite”? Making sure that you are trying to prove the inverse, that you are wrong and you will suddenly have a much deeper understanding into the real impact of the outcomes that you are championing. When you try to prove you are right, you will find confirmation, just as when you try to prove you are wrong, you will also come to that conclusion. You have to be willing to be “wrong” in order to get a better outcome. Remember that when you are wrong, you get the benefit of the increased results, and you have learned something.

So what does this look like in the real world? Every time you decide that you are going to go down a path, you will intrinsically want to prove to yourself and others that what you are doing is valuable. The most common example of this is in the quest for personalization, where we get so caught up in proving we can target to groups that we forget to measure the real impact of this decision. We forget that the same person can be looked at a thousand different ways, so when we choose to pick one, that we fail to measure it against the other alternatives. The number of groups that have championed targeting to some minute segment, who when you look deeper into the numbers and find that targeting to browser or time of day would have magnitudes of greater impact, is legion.

The simplest way to test this is to make sure that all of your evaluations, correlative, causal, or qualitative, include the null assumption. What happens if I serve the same changed content to everyone? Or what happens if serve targeted content to firefox users instead? Despite the constant banter and my belief that a personalized experience is a good thing, what do I really see from my execution? What about if we target to the groups that don’t show different behavior in our analysis? Keep deconstructing ideas and keep trying to find ways to break all the rules, and you will find them. Even better, those are the moments where you truly learn and where you truly get value that you would not have gotten from just taking straight to the action.

This is not just a problem with analytics; it plays out with any sort of analysis, especially A/B testing. So many groups make the mistake of just testing their hypothesis against another, which they fail to see the bigger picture. Hypothesis testing is designed to be absolutely sure of the validity of a single idea, not to compare other ideas or to reach any conclusion at a meaningful speed. It is the end point of a long disciplined process, not the starting point where so many want to leverage it.

The final common way this plays out is when we mistake a rate of an action with the value of the action. We get so caught up in wanting to believe some linear relation between items, that having a great promotion and getting more people to click on it equals more value, that we fail to measure the end goal. We mistake the concept we are trying to propagate with the end goal, assuming that if we are successful in pushing towards a desired action, that we have accomplished our end goal. Having run on average 30 tests a week with different groups over the last 7 years, I can tell you that from my own experience, the times when this plays out in the real world I can count on 1 hand.

So much analysis loses all value because we are pre-wired to just accept the first thing we find, or to find data to confirm what we want to believe, or that we then send out that data to others to prove our point and ignore the larger world. We are so wired to want to think we are making a difference that we constantly fail to discover if this is true. Be better then what you are wired to believe and force yourself to think in terms of the null assumption. Think in terms of purposely looking at the opposite of what you are trying to prove or what you believe. The worst case is that you have spent a few more moments and confirmed, truly, what you believe. The best case scenario is that you have now changed your world view and gotten a better result, one that is not arrived at simply because you expected to arrive at that point.

Are you smarter than your trashcan?

Are you smarter than your trashcan?

This may seem like a really inane question, but think about it for a second. We are thinking breathing beings, how can we possible not be as smart as our trashcan?

First, let’s establish how we would measure this. We like to pretend that how smart someone is goes from nothing to absolute, but it doesn’t. Having no knowledge and doing nothing is far better than using bad knowledge, or thinking you have answered something correctly when you haven’t, or over reacting. We are fallible, and as such, we make bad decisions from time to time. Poor judgment, biases, and misinformation actually detract from an outcome, where zero knowledge does not have an impact. The scale for how smart you are is not one of 0 to 10, but one of -10 to 10, with 0 being the middle or neutral point in the absolute scale.

Your trashcan does not offer any knowledge, it does not react, and it does not have biases. It does not push its agenda, nor is it influenced subconsciously by the agenda of others. It is not wired to want to rationalize its own actions or to want to prove the value of its actions to itself or others. It is not impacted by Maslow’s hierarchy any more than it is by fear, or greed, or lust, or any of the other ways that we are wired to be influenced. Anything that goes into it, it can dump out just as easily. It doesn’t reject knowledge, nor does it change to fit the mood of the room. It does not provide any value and it has no knowledge, so it will always be stuck at 0 on that scale.

Now human beings are capable of amazing things, we have built great monuments, civilizations, history, art, cars, we have done it all. We have also had war, greed, genocide, hate, bigotry, and believe many crazy things. Those are the end points of all we are capable of, but I am not referring to the theoretical but this moment. You can be anywhere from the greatest level (10) to the lowest delusion (-10). We all constantly move up and down that continuum with each action we take, but do you know where you are at any given time?

How do we process information and how does it impact the decisions we make? How does it impact our view of ourselves and the world?

People like Philip Zimbardo and Robert Levine have shown that we are wired to look either at the future, or the present, or the past, but we fail to look at multiple perspectives. We get too caught up in reacting today, or planning for the future. We are full of biases and self-delusions, and even worse, knowing this in no way stops them from changing how we view our world around us. We know that in the past, we have made mistakes, and in the future, we will make them, but how do you know that right now you are making a mistake? We lose perspective, and because of this the meaning of data we use to make decisions changes constantly. We fail to balance what we are doing with where we go. To make up for this, we make assumptions, we rationalize, we ignore data, and we find things to confirm what we want. We are so wired to only confirm what we do that we ignore a majority of the information from the world around us.

This impacts everyone.

So I ask you, right now, not tomorrow or 20 minutes from now, are you smarter than your trashcan?

You can’t answer with what you are capable of, nor can you answer with what you have done in the past. In the here and now, how do you know that the answer is a positive? How do you know if you are currently adding value, or removing it. Are you really doing the right thing? Or are you just using misinformation, biases, and self-delusion to convince yourself that you are above 0, while all those other people are below it? Are you letting those biases rule what you view and letting you think you are smarter than the trashcan? Or are you really making an impact? Is your impact real, or is it hubris?

The only way to really make sure that you are improving your chances to be above zero is to put in place a system that limits the impact of those biases and gives you insight into your own decisions. You have to be humble enough to put a system in place to measure your decisions in context, away from any in the moment manipulations, and that allows you to know the efficiency of your choices. You’re never going to be perfect, but it’s up to you to make sure that you aren’t just calling trash gold.

Why we do what we do: Fighting fear – Loss Aversion

One of the challenges that just about any group new to testing has is trying to get buy-in and support from various other groups, usually with strong opposition from UX and branding teams, but also from just about any other group that interacts with the site. The largest reason for this is Loss Aversion, or “the disutility of giving up an object is greater than the utility associated with acquiring it.” To put simply, we get too caught up in what we lose that we miss what we gain. People fear the lack of control that opening up their ideas to analysis brings, and with that fear comes some of the biggest hurdles that programs need to overcome.

How many times have you had to try and get sign-off from a new group only to have them push back or say that doesn’t “feel right”. How many times have you wanted to bring testing to new people only to have them shy away the first moment what they were sure would win, loses? The irony of this is that some of the most ardent supporters of testing in mature programs are the very people who were challenging the programs at the very start. Anyone that has built their reputation on their artistic talent, or by declaring “here is how we are going to do this” has to overcome their fear of losing that control in order to gain the power and efficiency that testing can bring.

How many times have you gone into a conference room with people to brain storm? Or gone to an offsite just to come back with a long list of who you are going to target? Or some written rules on color guidelines or the like. There is a lot of hard work that goes into those efforts, but the problem is that they are filled with assumptions and compromise. They are designed to either make everyone feel like they contributed, or to make the HiPPO happy. What testing does in the best cases is stop that cycle, so that you are no longer trying to figure out the one way to make things work, but instead have open discussions on what is feasible. It democratizes ideas and is agnostic as to the value of them. The entire point is to be able to measure the value of each idea against the others and figure out the best one to go with. There is a great deal of fear, what if you are wrong? Does this make me look bad? My way has always been “right”, and so on. People have built empires on this fallacy, often with no one holding them accountable to the actual value of those ideas.

So how do you fight this? The first thing you must do is to get everyone to agree on what you are trying to accomplish. This has to be a single thing that you can measure and that is universal across the site (this is not about group A versus group B, this is about everyone working together to improve the site). I have seen so many programs struggle, get no value, or end up in political quagmires simply because they refused this first step. It can be very difficult or very easy, but at the end of the day, the single greatest determination of future success for optimization is agreement on what you are trying to accomplish.

Once you have that measure, then it becomes about taking those ideas and measuring them against that goal. Remember you want to challenge the common theory, meaning you should include null assumptions and things that contradict what you think will win. It serves you no good if you align everyone on finding an answer to a question if that question is irrelevant or sub optimal. What is funny is that there is almost an inverse correlation between what people think will win, and what does win. Get opinions from multiple sources, especially from one or two from outside the group that has owned that concept or portion of the site.

Step three is simply test. But at the end of the test, don’t worry if you were wrong, and don’t make it about you versus me. This is about everyone working together to find what works. If everyone is working for the same goal, then it is easy for everyone to get the credit and for everyone to align. The entire point was that the better the ideas feed into the system, the more diverse and risky those ideas are, the more you learn and the better the results you will have. You have to stop worrying about who was right and instead encourage people to be wrong. Being wrong gives you so much more than being right, and it gives you new learning to share and bring value to other parts of the site.

If you do this enough, you will get to the point where you no longer need to have those large conferences or off sites, you just need to compile the feasible options and move forward with letting the test tell you where to go. It becomes less about trying to fit the square peg into the round hole (or in some cases, into no hole) and more about aligning to move forward with what you learn. You will not end up at the feared 48 shades of blue axiom, but instead you will end up where you treat all feasible ideas as valuable. It frees up your UX and creative teams to try new things and to not worry about upsetting their superiors. It allows them the flexibility and the ability to be “wrong”.

Everyone is fearful of the unknown and the risk of giving something up. What is important is to share the challenge and the reward and to make it about adding value to what they were already doing, and to not blame anyone when they were wrong. Testing an idea is not about the loss of control, it is about helping to make it as successful as possible. This can’t be about you versus them, or my idea versus yours, in order to succeed you need everyone to work on achieving the same goal. Any system is only as good as its input, and any input without a proper system to facilitate it will always lack value in the end. Encourage new ideas, encourage trial and error, as long as you have a system in place to mitigate loss, you have so much more you can gain from learning a new path or stopping a bad practice on your site.

Why we do what we do: Forced Reality – Conjunction Fallacy

One of the funnier trick so of the human mind is the want to pigeon hole or describe things in as much detail as possible. While there are stereotypes and other harmful versions of this, the inverse is usually far more likely to cause havoc with your optimization program, and as such it is the next bias that you need to be aware of; Conjunction Fallacy or “the tendency to assume that specific conditions are more probable than general ones.”

The classic example of this fallacy is to ask someone, “which is most likely true about a person on your site? Did they come from search, or did they come from your paid search campaign code that landed on your #3 landing page and who then looked at 3 pages before entering your funnel?”. Statistically, there is no way for the second statement to be more likely then the first one, since the first one incorporates the second one and a much larger audience, meaning that the scale is magnitudes greater. Yet we often times find ourselves trying to think or do analysis in the most detailed terms possible, hoping that some persona or other minute sub segment is somehow more likely to be valuable then the much larger population.

This mental worm tends to make its appearance the most often when groups set out to do segment analysis or to evaluate user groups. We dive into groups and try to figure out the rate of actions that we want to exploit. Whether it is an engagement score, category affinity, or even simple campaign analysis, we dive so deep into the weeds that we will miss a very simple truth. If the group is not large enough, then no matter what work we do, it is never going to be worth the time and effort to exploit it for revenue. The other trade for this is the inability or want to not group these same users into larger groups that may be far more valuable to interact with. Whether it is people who have looked at 5 category pages and signed-up for newsletters or other inefficient levels of detail, you need to always keep an eye on your ability to do something with the data.

This also plays out in your biases towards what type of test you run. Even if internet explorer and Firefox users may be worth more or more exploitable than campaign code 784567 which is only 2% of your users, this bias makes you want to target to that specific group so much more, both as a sign of your great abilities, but also because we want to be more specific with our interactions with people. Even if the group is much more exploitable, the smaller scale of impact still make it far less valuable to your site.

Here are some very simple rules for segmentation that will make sure that you combat this fallacy:

1) Test all content to all feasible segments, never predispose that you are targeting to group X.

2) Measure all segmentation and targeting against the whole, so that you have the same scale in order to compare relative impact.

3) All segments needs to be actionable and comparable, meaning the smallest segments generally are going to be greater than 7-10% of your population depending on your traffic volume.

4) Segments need to incorporate more than site behaviors and direction to the site, try to include segments of all descriptions in your analysis. Just because you want to target to a specific behavior does not mean that behaviors have more value than non-controllable interactions such as the time of day.

5) Be very very excited when you prove your assumptions wrong on which segment matters most or is the best descriptor of exploitable user behavior.

If you follow those rules, you are going to get more value from your segment interactions and you will stop yourself from falling down this pitrap. We often times have to force a system on ourselves to insure that we are being better than we really are, but when it is over, we can look back and see how far we have come and how much we grew because of that discipline. Revel in those moments, as they will be the things that give you the greatest value to yourself and your program.