How Analysis Goes Wrong: The Week in Awful Analysis – Week #5

How Analysis goes wrong is a new weekly series focused on evaluating common forms of business analysis. All evaluation of the analysis is done with one goal in mind: Does the analysis present a solid case why spending resources in the manner recommended will generate additional revenue than any other action the company could take with the same resources. The goal here is not to knock down analytics, it is help highlight those that are unknowingly damaging the credibility of the rational use of data. What you don’t do is often more important then what you do choose to do. All names and figures have been altered where appropriate to mask the “guilt”.

I once again heard the same common mistake in determining where to test earlier today, and it reminded me of the topic for this next Awful Analysis. It is super easy to find data that you believe is “interesting” or “compelling” that in no way actually makes a rational argument that you are attributing to it. An example is as follows:

Analysis: We found that people that interact with internal search spend 1/3 as much as people who don’t. This tells us that there is a massive opportunity to optimize internal search and increase total revenue.

This is probably the most common type of search for “test ideas”, as it sounds like a rational argument and it is using real numbers. The problems come from the fact that the data presented is a non sequitur as far as what to test. These types of arguments are interesting stories and I actually do not always suggest you do not leverage them. My problem is when people start to believe the story and do not realize just how irrelevant the information presented is.

As a reminder, the only way to know what the efficiency or value of a test is requires three pieces of information: Population, Influence, and cost. With that in mind, I want to dive into this type of analysis:

1) You have no way of knowing from the analysis (or any correlative information) if people who spend less do so BECAUSE of the use of internal search, or if people who are going to spend less are the ones who aren’t quite sure what they are looking for and instead choose to buy cheaper things. It could also be that you get a lower RPV because people doing research ARE far more likely to use search to compare items.

2) You have no clue that even if you are right that they are more valuable, if the search results page is the place to influence them, or if it is the entry channel? Or the landing page? Or maybe the product page?

3) You have presented no evidence to your ability to influence that group even if you ignore #1 and #2. Even if you have the perfect group and the perfect place, you still have no insight into what to actually change.

4) There is nothing presented that says that this same group cannot be improved far more dramatically by looking at and interacting with them based on other population dimensions. Last I checked, new users, search users, purchasers, and IE users also use internal search.

5) There is no look at the cost to change this page (and population) versus known results or even just the technical ramifications. Search Results pages are often one of the one or two hardest pages to test simply from a technical resources and page interaction front.

More than anything, the threat of this type of analysis is that it sounds perfectly rational. Who wouldn’t want to “fix” a group that is spending 1/3 as much as another group? Why aren’t all users spending their entire paycheck on my site and my site only? You have to make sure that as the analyst that you are presenting rational fact based data if you expect anyone else to leverage data in a rational manner. You might be right or you might be wrong, but if you do not stop yourself from falling for these stories and do not hold yourself to a higher standard, then how can you expect anyone else to. If you are going to find data to tell a story, then what is the point of the data other than to present your opinions versus someone else’s?

RANT: Quora question – How has A/B testing evolved over the years?

If we look at just online A/B testing, I think that there have been a number of major changes over the past few years. Looking back, testing started out as a “cool” feature that a few companies did and that was only available through internal IT set-ups or from a very select few companies who offered very limited testing options. During this time you had major debates on things like partial versus full factorial testing, data waves, efficiency and iterative testing. You did not have a ton of segmentation built into tools and for the most part, tools required a bit more knowledge of how to work with different groups. You also had the first wave of people claiming to be “experts” starting to saturate the market.

If you look now, we have many more tools and much richer information to segment and target to. You have a preponderance of tools available and the cost to get these tools live has gone down dramatically. Testing has gone from the few to the many, and as such more people are far more interested in the impact that testing can have…

The problem is that the growth of tools and share of mind, there has not been an equal growth of understanding of data or testing discipline. We have created access to tools and made promises and support people who have no clue what they are doing. Instead of a few tools trying to create the best product available, the market is saturated and tools are instead focusing on the lowest common denominator with things like GUIs, integration, and very bad advice to companies to make it seem like what they want to do is actually going to drive revenue.

The power of these tools is light years ahead of where they were just 5-6 years ago, but the actual value derived by most organizations has dropped precipitously as focus shifts from discipline to things like targeting, or handing control to your analytics team, or content optimization. Even worse, the area of the “expert” has exploded with everyone and their brother talking about “best practices” that are nothing more then empty excuses for you to do their “test ideas”. They search out validation for their ideas from empty knowledge basis like whichtestwon. Personalization has become a common refrain, but there is so little understanding of what is needed to actual derive value from discipline that 7 out of the last 7 groups I looked into the outcomes of their programs were losing money, not even coming out neutral. In reality most testing is now nothing more then a function or commodity for organizations, believing that the mere fact of having a tool or running a test in some way correlates to the value derived.

As you see the market saturate with tools, the knowledge gap has become the driving factor in determining success. With so many programs unknowingly failing at just about anything they do, the reality is that the difference between the “haves” and the “have nots” has gotten critical. My favorite axiom of, “you can fail with any tool, it is only when you are trying to succeed that the tools matter.” has never been more true.

That is not to say that all is lost, because there has been 2 developments that are looking great for the future. The first is the growth of some of the tools in the marketplace do allow for much more value then ever before. The ability to segment and look for causal inference, and the move away from confidence as a blind measure of outcome have been great advancements and allow organizations to make much better decisions. While a majority of the market is lowering the common denominator in order to make groups feel better about their results, there is equally a few groups that are attempting to raise the bar and derive more value, not more myths. The second is that you are also seeing a growth of N-armed bandit type of yield optimization options hit the market, as we move farther and farther away from opinion dictating outcomes, and closer and closer we get to rational uses of data, the more value can be achieved and the more people get used to the fact that their opinion or their test ideas are pretty much useless.

My sincere hope is that in another year or two, that we have moved past this personalization insanity and instead are talking about dynamic user experiences. That we have stopped talking about data integrations, and are instead talking about data discipline. That tools stop trying to tell people how easy and fast it is to get a test live, and instead focus on the parts of testing where the value comes from. More then anything, I hope that testing matures to the point that people fully understand that it is a completely separate discipline and one that requires completely different ways to think about problems then analytics, or traditional marketing, or IT, or product management, or really anything else out there. Testing can provide far more value with far fewer resources then just about anything else an organization can do, but it is going to take the maturation of the entire testing world in order for people to stop being led not by results, but by bad marketing messages. Time will tell where things go from here.

How Analysis Goes Wrong: The Week in Awful Analysis – Week #4

How Analysis goes wrong is a new weekly series focused on evaluating common forms of business analysis. All evaluation of the analysis is done with one goal in mind: Does the analysis present a solid case why spending resources in the manner recommended will generate additional revenue than any other action the company could take with the same resources. The goal here is not to knock down analytics, it is help highlight those that are unknowingly damaging the credibility of the rational use of data. What you don’t do is often more important then what you do choose to do. All names and figures have been altered where appropriate to mask the “guilt”.

For this weeks How Analysis Goes Wrong, I will be tackling directly a suggestion made on one of the more “popular” testing websites in the space. I will not be touching on everything that is wrong with the actual evaluation, as my problems with WhichTestWon are long stated and it is just too easy to poke holes in. Needless to say, we have no idea if the test was called by the correct use of data and not just blindly following confidence, nor do we know about other feasible alternatives, how big the scale of impact relates to other tests for that organization, or even what the population and time frame of the test was. Without any of that information, the entire practice is purely ego fulfillment and provides no functional information viable to a company.

In this case, you can find the example here. The specific comment in question is the 5th one listed by an Igor. I understand the trollish nature of all online comments, but because this seems to be presented as straight faced as possible, I have no choice but to evaluate it as if it was designed to be legitimate meaningful analysis. The comment in question is:

“I also picked green mostly because it created a sense of appetite. Blue was a bit too disconnected, didnt cause any emotional response even though I also use a blue button on my ecommerce site. I guess I’ll have to change it 🙂

Based solely on my experience, I’d say here it wasnt a question of CTA visibility (since they tested orange too) but the green color’s ability to initiate an emotional response of freshness, airiness, freedom. Orange would cause an emotion of warm, a bit heavy sensation and overly stimulating.

Considering that we’re bombarded with heavy colors in supermarkets, we may be looking for a way to feel less forced to make a decision online, and green seems to be the color of choice…especially this particular shade of green.”

Understand that I am in no way diving into my beliefs into color theory. I am honestly agnostic about its validity, as it is important that what wins is not biased by prior beliefs. We are only looking into the value of the “analysis” presented as it pertains to acting on the results from this specific test. Because of the sheer scope of problems here, I am going to only highlight the top ones.

1) He assumes something that works on another site will work on his.

2) He assumes why it changed from a single data point

3) He starts a massive non sequitur time sink about the supermarket colors and the “forced” decision online.

4) He reads in that it was green as a whole and not the specific shade or just the execution of colors. I am trying to ignore the entire, it is only two colors, you can’t tell anything about if this was the best use of resources at all problem, but even ignoring that, it is a single data point.

5) He assumes the change had anything to do with an emotional response and not the millions of other possible explanations

6) The entire test measured conversion rate, not revenue, meaning that all conclusions could be drawn to something that loses revenue for the company. You can never assume more conversions means more revenue, or the linear relation between any two objects.

7) He ignores almost completely interaction between other elements.

With the large amount of storytelling and absolutely nothing presented that adds value to the conversation, the entire purpose of exchanges like this is to make it sound like you are an expert on something without presenting credible evidence to the value of that claim. If you are right, then a test will show your choice is best amongst all feasible alternatives. If you are wrong, then who cares what you think. In all cases, storytelling belongs in kindergarten and not in the business world.

Sadly, that is not the end of it. The follow-up comment shows why so much of the business world are people who exist solely for the propagation of people just like them:

Thanks for the insights Igor, I’m a huge fan of color theory and how it impacts us psychologically. Glad someone brought this up!

We have just seen the propagation of agendas in action. No one added anything to the conversation, no one presented anything resembling rational data, nor did they present anything that could possibly be used to rationally make a better decision in the future, but both feel justified that this is the value they present to a conversation. The only nice part of this is that people like this have made my job, turning organizations around and showing them how to get magnitudes higher outcome based on just betting against people, so easy. Without them, I too wouldn’t have a job. We are all connected on some level…

If there is an analysis that you would like to have reviewed, privately or publicly, you can send an email direct at antfoodz@gmail.com

Why we do what we do: What do you Really Know? Dunning – Kruger Effect

Most people are familiar with the famous Bertrand Russell Quote, “The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts.” The challenge is how would you understand where you are in that paradigm? Are you the fool or are you the wise? How do you even know your real level of competence at any one moment? What is the difference between an expert and the average person?

One of the most famous psychological studies in the last few years is the Dunning-Kruger Effect. Its best description is:

The Dunning–Kruger effect is a cognitive bias in which unskilled people make poor decisions and reach erroneous conclusions, but their incompetence denies them the metacognitive ability to recognize their mistakes. The unskilled therefore suffer from illusory superiority, rating their ability as above average, much higher than it actually is, while the highly skilled underrate their own abilities, suffering from illusory inferiority. To put simply, when you don’t know what you don’t know, you have no ability to differentiate what is right or wrong.

Confidence is a tricky thing, as you need it to be able to stand up to challengers, but at the same time you need to be careful to not inflate it based on impression and not reality. Without confidence, we would never be able to convince others of any point we are trying to make. We have all dealt with people who obviously talked much more about their impact then could possibly be based on reality, but how do we know that we are not repeating the same mistake. Even worse, how do we know when others are playing on this psychological trick to take advantage of us, even if they do not consciously know they are doing it at the time?

The truth is that you will find far fewer real experts then those that claim to be. Statistically, an expert would be in the upper 5 or 10% of a certain field, yet we both have no way of measuring this and we are over run with experts claiming to be the best at what they do. Everyone thinks that what they are doing is the best way, otherwise they wouldn’t be doing it. Even worse, we get caught up in all sorts of promises, be it other customers claims, or their own skill set to evaluate that information. As the world becomes more complex, or as people from outside disciplines attempt to take their prior knowledge and apply it to a new field, they become even more susceptible to this problem. Even worse, like all biases, this impacts the more intelligent people more then the less intelligent. Dunning-Kruger is a double edged sword, as those that are most likely to be susceptible to the claims of experts are those that are least skilled in their own right:

“The skills needed to produce logically sound arguments, for instance, are the same skills that are necessary to recognize when a logically sound argument has been made. Thus, if people lack the skills to produce correct answers, they are also cursed with an inability to know when their answers, or anyone else’s, are right or wrong. They cannot recognize their responses as mistaken, or other people’s responses as superior to their own.”

This entire phenomenon is what causes the vicious cycles and what explains the over saturation in the analytics communities of people propagating the same tired actions by giving them new names and by finding others to make them feel good about their failed actions. Stories become the ultimate shortcut to show how amazing something is, without ever actually providing logical evidence to arrive at that conclusion. The truth is that people are rewarded for their ability to give people who already hold or held power ways to continue to run their empire, often with little relevance to getting results or to doing the right thing. This is certainly not unique to analytics, but it is important to this audience. We follow the greatest speakers, not the greatest thinkers. We worry about the best ways to make a presentation or to get reports out, not trying to stop entire conversations that are both negative to the organization and inefficient. We focus on how do we get others to see things our way, not on if we are seeing things the right way? Far more time is spent on how to convince others then it is on trying to analyze our own actions. One of the best quotes I have heard is, “There is no correlation between being a great speaker and having great ideas.”

So in the end, we are responsible to ourselves to look in the mirror and ask if we are suffering from a belief in others, or if we are discovering the best answers. Is convincing others and yourself that you are expert the most important action you can take? Just because you can make an analysis and use it to convince others, does that actually make it correct or valuable? Or is the discovery of the next right answer more important then getting credit for owning an action? It is up to everyone to decide what they really want to accomplish in their time, but if you are more interested in doing the right thing, then you must always be aware of Dunning-Kruger.

In order to do this, we must first set rules that help us hold ourselves and others accountable for what they do, in order to remove as much bias from evaluating success as possible.

Here are some simple steps to help make sure you are reaching the levels of success that you might believe that you are achieving:

1) Always ask, “In what ways can we challenge what we are doing?” or “How can I break this process”? No gain comes from doing things the exact same way you have been doing them.

2) Read, grow, look beyond your group. Know that you have never found the right answer, and the search is more important than the actual answer.

3) Define success up front. This is not just the goals your boss sets for you, but more importantly what it is that will define a successful program?

4) Make sure you are not measuring the outcome, but your influence on the outcome.

5) Seek out those that will challenge everything you believe. You do not need to agree, but only talking to like minded people is the fastest way to become the observed with Dunning – Kruger.

6) Assume that if you have not found a way to break a process in the last year or two, that you are not trying hard enough

7) Challenge everyone to take an idea to the next level. The first thing we come up with is comfortable. The next is growth.

8) Know that you will get an outcome from any action, so measuring just that does not tell you anything about the value you bring

9) If the words “I don’t know” are the end of the conversation for you, then you can be sure you are the sufferer of this bias.

10) Most importantly, change all the rules, and challenge all the rules, not to be difficult, but because you only get better by making others around you better.

These may seem like abstract general concepts and not directly related to your business or your day to day job, but the reality is that these are the actions that should define success there far more then the outside world. Growth is the goal, not the status quo, and as such we need to make change and going out of our comfort zones the priority, not re-wording past actions as new in order to convince others or yourself you have changed. Take others past their comfort zone and they will take you past your own. Keep getting better, and always know that you are never done and that you do not have the “correct” answer. Keep searching, and always question those around you, and you will always be vigilant against falling into the wrong end of Dunning – Kruger.

Testing can be the ultimate expression of this, you are free to test things far past your current comfort zone. You are free to not validate tired ideas but to explore and discover in a rational and predetermined way the actual value of things, not just the perceived value. In order to do this though, you must fundamentally want and prepare to discover these things. The greatest problem with most test programs is they never enable themselves to find out they are wrong, but instead focus on proving someone right.

The nice part is that just because you or someone you know suffers from Dunning – Kruger, it does not mean they always will. Every person you meet thinks they are doing the right thing, even when they are not. Change the conversation to the end goal, and talk about all the options that are in front of you, and you get past the egos that keep conversations from truly moving forward. Take the time to talk and to challenge people, and do not trust anyone that does not challenge you. You have many impartial tools that allow you to measure things and to work with others, but these tools only work when we use them in an unbiased manner, not to tell us what we want to hear.