At any given time you will often hear me quoting many famous quotes about anything and everything; that is how I relate to new experiences by trying to tie them to some bit of knowledge that I had already picked up. Probably my most common refrain lately is the famous Mike Tyson quote, “Everyone has a plan until they get punched in the face.” It’s true that everyone talks doing the right thing and everyone wants things to succeed, but as soon as there is some challenge to the prevailing world view or as soon as a small bump in the road exists, people often revert back to what they know best and turn back in on themselves. Unfortunately this is especially problematic in the business world as the only way to move forward is to change behaviors and tackle existing problems in new ways. Even more distressing is that as people fall back to what they are most comfortable with they turn towards their own disciplines and their own previous experience, limiting the ability for people of disparate talents and backgrounds to work together.
One of the things that defines people is the concept of viewing the world through their own experiences, and the most powerful experiences that we have in the modern world is our professions. Be it marketing, or engineering, management or data, we all view the world through the lens of the things we do and the challenges that we face day to day. We view the challenge of improving numbers by looking to “dialogue with our customers” or “increase efficiency through data analysis” or by “building better tools and a better user experience”. All of these in isolation seem like and often are very good ideas except when they cloud our ability to prioritize and to focus on a single outcome. Each day in the business world is really a Sisyphean climb to the top and each time that boulder rolls back on us we run back to that which we are most comfortable with. This is especially dangerous when we do not even have true accountability for the tie between those concepts and the functional bottom line outcome that we need to generate.
Abraham Maslow is famous for many things, from his hierarchy of needs to his many contributions to modern psychology. What he is often not associated with is a quote that almost everyone is familiar with, “if all you have is a hammer, everything looks like a nail.” We are all carrying hammers in the form of our world views and our professional disciplines. The key is to accept that there are many things outside of what we accept as “true” about the way to do things and about how to tackle problems. Even more when we do get evidence that does not directly correlate with our existing world view we can not dismiss it or try to understand it through that same tired lens.
Optimization at its core is the act of adding accountability to these world views and about challenging assumptions. It is about taking the existing practices of the entire organization and standing them on their end, shaking them, and finding all the holes and least effective parts. It does this not maliciously but as a mutual benefit to everyone to add a different point of view on the functions and actions that they are taking. This is why the discipline of testing is about everything but test ideas. It is about building rational rules of action and building out alternative hypothesis. This is why you focus on efficiency and multiple options and not just on what won and not about what won elsewhere or about some great idea someone had. It is why it is about patterns and not about some artificial reasoning why something won. You serve the organization a great discipline when all you do is regurgitate the nail back to someone so that they can then hit it with the same tired hammer. Optimization is the act of putting any idea and discipline through a system that allows for it to get better and for everyone to learn and to get better results.
At the same time, it is important to understand that everyone else is viewing the world through a very different lens. They are trying to tie their past experiences with new actions and new results. A marketer has always thought in terms of a dialogue with a certain user or a certain persona. That mental model has gotten where they are today. When you come in and show that there might be more effective ways to look at those same users or the the concept of personalization most likely will not work the way they envision, you are creating a very powerful form of cognitive dissonance and you are forcing people outside of that hammer that they so readily wield. Too much and you will cause major push back and possibly form an ongoing barrier to success. Too little push and you are just confirming their biases and not providing any assistance.
The key in this and in all actions is to be firm on discipline but flexible on tactics. Work with the concepts and push them past their existing barriers. This is why it is so vital to not focus on test ideas when building out a successful test. Talk about what people were already focusing on and how best you can test out that concept against many others. You want to do personalization, great, here is how we take what you were doing and serve that and other concepts to everyone. If you are right, we get to see that and if you are wrong then we found something that is better. In reality there is no downside to performance when we tackle a problem that way. It is about reaching the ends, not about the means that get you there.
Another key to this is to get people to vote on what they think will win for each test. If you do this enough and with enough varied options and you will be amazed at just how bad people are at guessing the right answer. In the last 9 tests that I have done we have averaged 8 options for each test, with some variants coming from the team, some from myself, but a large many simply expressions of the various directions that are feasible. I have asked a large team to pick there favorite and second favorite. In those 9 tests, we have had exactly 1 second place vote for all of the winners combined, and the only reason that the option got that vote was because my very talented designer picked up on the pattern and voted her least favorite. The shock of where we are versus where people thought we would be and the impact to the bottom line (over 200% improvement) has helped open doors to new ways of tackling problems, and it has done so organically.
In both tactics you are giving people the chance to tie their world view in with the results and letting them have a stake in the outcome. You are welcoming that hammer they wield but helping them see that there are many different nails to hit.
Keep in mind however that you are just as guilty as they are. Spend too much time in the world of optimization and you will start to feel like no one has any idea what they are doing and that all ideas are going to fail. It is even more important for you to challenge yourself and for you to go beyond your comfort zone in where you let testing going. Make sure you include ideas from others as much as possible, even if you are sure they are not going to work. Make sure you tie optimization in on actions that you feel might not comfortable or worth your time. Remember that the smarter someone is, the more likely they are to be impacted by biases and that you serve no good to the organization or yourself if you are not more vigilant against your own biases then you are against others.
You are driving down a road when your GPS tells you to turn left. You make a sudden motion, finding yourself down a small side road. It doesn’t look like where you are trying to go, but you have to follow your GPS; otherwise, you will get lost. You continue, then your GPS tells you to go right. There isn’t a road there, and because you are stuck doing only what the GPS tells you, you turn and suddenly find yourself running off a cliff, flying to your demise in a canyon below. Sound like a bad TV skit? The reality is that this is how most people leverage their “roadmaps” in terms of how they run their optimization programs.
While hypothesis is still the most misunderstood term in all of optimization, the most abused may be roadmap. So many different groups claim they have a roadmap or to be following a roadmap or that it is on their “roadmap” and yet so few understand how one is meant to be used. A roadmap (little r) is a list of tests, most of which serve as a great excuse to waste time and effort and to get locked into a system of projects. A Roadmap (capital R) is a constantly shifting list of priorities by which you will create actions and act to discover where to go next. This distinction is fundamental if you have any hope of really achieving great results with your program, and yet so many happily focus on the first for the sake of internal processes or the inability to change how their optimization program operates in producing revenue.
Let’s start with what the goal of optimization is. It is not to run a test.
Tests are a means to an end.
The goal of an optimization program is to produce the maximum amount of revenue for whatever resources you spend on it. The same is true of every effort you do, be it personalization, SEO, content creation or a promotion. You are not just doing it because it is fun, you are doing those things to increase the revenue to your organization. This means that those are just tactics and not the end onto itself. This is fundamental to understanding the difference between a roadmap and a Roadmap.
Anytime we confuse the action for the end goal, we lose almost all possible value because we have lost the ability to go in any other direction. When we get stuck on a review process and a large series of tests you are making the decision to focus on the action and not the value it generates. You become a means to empty action, not a means to the end of generating revenue. You are saying, at that point, that you couldn’t care less if you make money, so long as these few specific tests get run.
If you instead focus on the end goal, then the first and most important piece is to discover how best to do that. You may have some test ideas and some things you are going to execute on, but they are fungible. You must and will constantly shift them as you learn more and as you go in new directions. You cannot be stuck on the path if the end goal is the most important, you must instead focus on the discipline and flexibility to go anywhere the data tells you.
This is why a Roadmap is just a series of places to focus. It might be on personalizing an experience, or improving a product page, or on improving your recommendation system, but that is what you are trying to do. You are hoping that doing that will result in more revenue, but you are not tied to specific tactics, just finding the best way to accomplish the end goal. Often times you will have no more then 1 or at most 2 tests for each area when you start, but you plan out the time to shift and the time to continue down any path that presents itself to you. From there you can work out actions which will produce answers, things like inclusion/exclusion testing, or MVT, or content serving so that you can measure the value of different alternatives. At that point, you then focus on whatever the answers you have are and continue to drive forward based on those results.
The amazing or frustrating part of this, depending on which approach you are used to, is that you never know where you will end up. You might end up with a dynamic layout for your product page, or targeting content based on time of day, or on removing your recommendations system from a page. The farther you end up from where you imagined the more revenue you make. Each step that takes you in a new direction can only do so by proving using rational measurements that it outperforms where you thought you were going to go. You can end up just about anywhere and that is what makes it so powerful.
The most common refrain you get when tackling problems this way is that it is hard to plan resources, but that argument just does not hold water. You know you are going to test and you know you are going to need resources. This just means you plan time. What you aren’t planning on is that time being spend on coding this one specific module 6 months from now. The action of that time is constantly shifting and updating, it isn’t set in stone. you can plan resources extremely easily. What you can’t do however is focus those resources only on one persons opinion or on a singular person’s agenda. It is not that you spend more resources or can’t plan, you just spend them differently and away from empty talks about a test and about building a successful and meaningful program.
The real challenge becomes not resource planning but accountability. So many programs hold onto their list of tests because it justifies their actions. It becomes about checking off that a test was done and not about the efficiency or the value of that test. At the end of the day the people in your program get to choose between their own accountability between just running tests or with actually providing value. If you are focusing on an empty series of tests, then you will always just be doing action. If you can instead view your Roadmap as a constantly shifting series of actions that focus only on the value they derive, then you will never worry about any specific test or about trying to validate test ideas.
In reality the biggest challenge to tackling problems like this is the ego of the people in your program and the executives who might be involved. People protect themselves at all cases because accountability is the scariest thing in the world for most people. The old systems have everything going through them and with their blessing is everything done. When you are going wherever the data takes you then you are faced with going in a direction that might not be where that executive thought of 3 weeks ago. When you just focus on your part of the a lager process or when you accept their divined vision as the only means to an end then you have essentially said that you have no value at all to the organization and are just a fungible means to an empty end.
This is why an education program and why a focus on discover is so vital for the value derived from your testing program. Management might view this as a loss of power but the reality is that it is so much more. They aren’t constrained by some random thought they had, no matter how great it was, and can instead encourage others to expand on their creativity. It is no longer about having the right answer but about measuring and going with the best ideas your entire team can come up with. You can tell just how far you are from this point with the number of empty I believe/I think/I feel conversations you hear in meetings. The less you hear of those the closer you are to achieving real value. It isn’t about a review process but instead about the creation process and the management of the system to ensure rational decision making.
So many organizations are led to drive into that canyon or into a random lake. Even worse there are always people at those organizations who will describe that water they are drowning in as the expected destination. If you really want to go to new places and really want to end up where you should then you are going to need to give up your belief in that roadmap that you hold so dearly to. Find your own Roadmap, let it shift and go where it needs to, and you will be amazed as just how far you can go and how many new sights you will see.
My first trip through the common heuristics of conversion rate optimization looked at two of the more common testing ideas and how they usually reach false or limiting conclusions. In my second part I want to look at general testing theory best practices and how they can be major limiting factors in the success of your program.
It is important to remember that you are always going to get an outcome so this is not about can you make money. How you and the people in your organization think about testing is the largest factor in what you value that optimization produces. This is an evaluation of the efficiency of the method and how much does it produce for the same or less resources. In concept you can spend infinite amount of resources to achieve any end goal, but the reality is that we are always faced with a finite amount of time and population, which means we must always be looking for ways to improve inefficient systems. If we continue to be limited by these common heuristics then the industry as a whole will continue to produce minimal results compared to what it can and should be producing.
Always have a Hypothesis –
There is not more misunderstood term then hypothesis. In all likelihood it is because most are familiar only with their 6th grade (at least in my school) science instruction or they took classroom formal science in college. In those fields we operate like we have unlimited time and resources and we are trying to validate whether a drug will cause cancer, not whether a banner will get more clicks if it is blue or red. The stakes are higher and the models are much more simple in classroom controlled studies for cancer. There is a lot to scientific method, especially when approached from a resource efficiency perspective that is not considered in such a simplistic view of idea validation.
We must apply scientific rigor, but we must also make sure that all actions make sense in real world situations, which means that efficiency and minimizing regret are more important than validation of an individual’s opinion. It is not that scientific method relies on the use of a hypothesis, it is simply that we mistake a hypothesis with a correct hypothesis; we seek validation for our opinions and not the discovery of the best way to proceed. Science is also about proving one idea versus all other alternative hypothesis yet we ignore that part of the discipline because it is not the part that allows someone to see if they are right. In the grand scheme of things we are drastically over valuing test ideas and that is distracting from the parts of the process that provide value.
Let’s start with the basics. You should never, and I mean never, run a test if you do not have a single success metric for your entire site. In most cases this is to make more money, but whatever it is, this goal exists outside of the concept of the test. You must also must have rigid measurement and action rules that are reproducible, which means that you must understand real world situations like the limitations of confidence and variance.
You can then have an opinion about what you think will happen when you make a change. The problem is when we confuse that opinion with the measured goals of the test. Even worse we limit what we compare resulting in massively inefficient use of your time and effort. Just because you believe that improving your navigation will get people to spend more time on your site, that is completely irrelevant to the end goal of making more money. Your belief that more engagement will result in more revenue is not enough to make it so. If you are right AND if that also produces more revenue, then you will know that from revenue. If you are wrong you will only know that from revenue. We must construct our actions to produce answers to our opinion and to what is best for our organization. Hypothesis and ideas are just a very small part of a much more complex and important picture, and over focus on them allows people to avoid the responsibility and the benefit on focusing on all those other parts, which are the ones that really make a difference over time for any and all testing programs.
The worst factor of this is that it allows people to fall for congruence bias and to fail to ask the right questions. We become so used to the conversation around a single idea that the concept of discovery and challenging assumptions is more word then action. Questions can be incredibly important to the success of a program, but only if they are tackled in the right order and used to focus attention, not as the final validation of spent attention. If your hypothesis is that a certain navigation change will result in more engagement, then the correct use of your resources are either which of a number of different versions of the navigation will produce the most revenue or if you can, which section on your site produces the most engagement when changed. In both cases you have adapted your “hypothesis” to present a more efficient and functional use of your time. The hypothesis exists, but it is not the constraint of the test. If you are right, you will see it. If you are wrong, you will make more money.
This means that having a hypothesis is important, but only if it is not the test charter. Have an idea what you are trying to accomplish and make sure that you go about seeing the value of certain actions compared to each other is more important. Sometimes the most effective hypothesis are “I believe that we do not know the value of different sections on our pages.” Don’t confuse your opinion on what will win with a successful test. Challenge assumptions and design efforts to maximize what you can do with what you have and you will never be without opinions. The best answers are always when you are proven wrong, but if you get too caught up on validating your hypothesis, then you will always be missing the largest lessons you could be learning.
We need to optimize X because it is losing Y
This is the classic problem of confusing rate and value, or more correctly correlative and causal inference. We confuse what we want to happen with what is really happening. Just because people were doing X and now they are doing Y, it doesn’t mean that this is directly causing any change, positive or negative to our end goals. Outside of the three rules of proving causation the real issue here is that we get tied to our beliefs about a pattern of events even when the data cannot possibly validate that conclusion. Understanding and acting on what you know as opposed to what you want to have happen is the difference between being data driven and simply being data justified.
Think about it this way, I have 23% clicks on one section of my page and 0% on another. If I were to improve one of those which one is going to produce the biggest returns? The answer here is that you do not know. A rate of interaction cannot possibly tell you the value of changing that item. Some of the most important parts of any user experience are things that can’t even be clicked.
This plays out outside of clicks too. We have a product funnel and we see more people leaving on page 3, therefore we need to test on page 3. The reality is that more or less people may or may not be tied to more or less revenue. Even if it is tied it may be a qualification issue higher, or a user interaction issue, or simply too many people in a prior step. This is called a linear assumption fallacy, where we assume that when we have 5 people and 2 convert that if we have 10 people 4 will convert. Linear models are rare in nature but are easy to understand, so we fall back on comfort over realistic understanding.
The act of figuring out what to test can be difficult but it is never improved by pretending we have validation of our own ideas when we have nothing to justify them. We need to be open to discovering where we should go and to focus on some set path. In almost all cases you will find that you are wrong, often dramatically so, about where problems really are and how to fix them. This is why it is so important to not try and focus solely on more or less correlative actions. We can and should be able to test fast enough and with few enough resources that we will never be limited to this realm unless we can are stuck there mentally.
Like so much else what you spend your time and effort on is incredibly important. There are a thousand things you can improve and there are always new ideas. Justifying them falsely or focusing on them instead of the discipline of testing is nothing but a drag on your entire testing program. Test ideation is about 1% of the value derived from a test program yet it is 90%+ of where people like to spend their time. A 5% gain that took 2 months is worth a lot less than a 10% gain that took 2 weeks. The most important issues we must face are not about generating test ideas or validating our beliefs about how to improve our site, it is about discovering and applying resources to make sure that we are doing the 10% option and not the 5% option. If we overly focus on test ideas and not the discipline of applying them correctly we are never going to going to achieve what should be achieved. If we get lost trying to focus only on where we want to go, then you will always be limited in the possible outcomes you can generate.
Talk to 5 people in the optimization space and you will get 5 different stories about how best to solve your website. Talk with 50 however and those 5 will get repeated more often than not. Such is the world we operate in where “best practices” become so common place and repeated that we often do not take the time to really think about or prove their effectiveness. Because of this phenomenon a lot of actions which are less than ideal or outright bad for companies become reinforced must do items.
The reality is that discipline is going to always win out over specific actions, and that often times the best answer is to measure everything against each other and take nothing for granted. While all of that is true it is still important you understand these common suggestions, where they work, how, why, and more importantly why people believe they are more valuable than they really may be.
Test Free Shipping or Price Changes
This is a real common one for retail sites as it is easy to understand, and a common tactic (thanks Amazon) and one that is easy to sell to the higher ups. The problem is not actually the concept, but how people measure the impact of it, and what that means to other similar tactics. What can easily seem like a huge win is often a massive loss, and even worse due to how most back-end systems are designed the actual amount of work needed to achieve these tests can be much higher than other more simple and extremely valuable uses of your finite resources.
Let’s look at the math of a basic free shipping test. In this simplified scenario, we sell 1 item for $90 dollars on our site, with an actual cost of $70 to us ($20 net profit). Our shipping is $10 dollars, which means that when it is normally purchased someone pays us $100.
We want to test free shipping, where we pay for the shipping and sell the same widget for now $90. We run the test and we have an 50% increase in sales! We should be getting promotions and in most cases the person who ran this project is shouting their accomplishments to the entire world and everyone that will listen. Obviously this is the greatest thing ever and everyone should be doing it… except you just lost a lot of money.
The problem here is that we often confused gross and net profit, especially because in a lot of different tests you are not directly changing the bottom line. In the case of free shipping or pricing tests though, we are directly change what a single sell means to us.
Let’s dive into the numbers of the above. Let’s say that we sell 1000 orders in our control normal group.
$100 X 1000 = $100000
But the real number that impacts the business is:
$20 x 1000 = $20000
In the free shipping option, we have cut our profit in half by paying for the $10 shipping, which means that at $10 profit we actually have to have twice as many orders JUST TO BREAK EVEN.
$20000 / $10 = 2000
This means that if we fall back to the standard RPV reporting that you look at for other types of tests, then the math says that:
$100 X 1000 = $100000
$90 X 2000 = $180000
So any option where we do not increase RPV by at least 180% means we are dramatically losing revenue. So many times you see reports of amazing results from these kinds of optimization efforts which are masking the realities behind the business. It can be hard, no matter how much this makes sense in conversation, to have the discipline to think about a 50% increase as a loss, but that is exactly what happened here. Sadly this hypothetical story plays out often in the real world, with the most likely result being the pushing of the results and not the rational evaluation of the impact to the business.
This same scenario plays out anytime we have varied margin and not as varied gross cost. The other common example is price changes, where the cost of the item remains fixed, but the test is only truly impacting how much margin we make off of the item. In both cases we are forced to set minimum marks prior to starting a test, and treating those as the neutral point, not the normal relative percentage lift that we might be accustomed to.
Always repeat content on your site
This and a large number of other common personalization type suggestions (who to target to and how to target to them) actually have a large number of issues inherent to them. The first is that even if what is suggested is true, it does not mean that it is the most valuable way to tackle the problem. Just because repeating content does improve performance by 3%, it doesn’t mean that doing something else completely will not result in a 10% or 50% increase.
The sad truth is that repeating content, when it does work, is often a very small incremental gain and pails in comparison to many other concepts of content that you could be trying. The goal is not to just do something that produces an outcome as every action produces an outcome, the goal is to find the action that produces the maximum outcome for the lowest amount of resources. In that light repeating content is often but not always a poor use of time and resources. The reason it is talked about is often not due to its performance but because it is easy to understand and easier to get buy-in from above.
The second major problem with these is that they skip the entire discipline that leads to the answer. There is no problem with repeating content as long as you also try 3-4 other completely different forms of content. Repeating content may be the right answer, it may be an ok answer, and it may be the worst answer, but you only know that if you are open to discovering the truth. There is no problem having a certain group or behavior you want to see if you can target to, the issue is when you target to them without looking at the other feasible alternatives. If you are not testing out multiple concepts to everyone and looking at them for the best combination, then no matter what you do you are losing revenue (and making you and your team do extra work).
The real irony of course is that if you test these out in a way to find out the impact compared to other alternatives, the absolutely worst case scenario is that you are correct and you target as you would have liked. Any other scenario presents you either with a piece of content or the group or both that results in better performance. Knowing this information allows you to save time and effort in the future as well as spend resources on actions that are more likely to produce a result.
It is not unusual to find that doing just targeting to a specific group will result in that group showing a slight increase, and if that is all that you look at you would have evidence to present and share internally as success. Looking at the issues deeper you commonly find that the overall impact to the business is negligible (within the standard 2% natural variance) or even worse negative to the whole. It is also not uncommon to find a combination that you never thought of presenting a massive gain.
One of my favorite stories in this line was when I worked with an organization that had decided exactly how and what to target to a number of specific groups based on a very complex statistical analysis of site behaviors. They had built out large amounts of infrastructure to facilitate this exact action. We instead took 100% of the same content they already had and presented it to everyone, looking at the impact to serving it to the groups they envisioned as well as others. We simple took all their existing content and serve it to everyone and also in a few different dynamic permutations. The result showed that if they had done only what they had envisioned they would have lost 18% total leads on the site (this is also a great example of why causal inference is so vital and to not rely on correlative inference). They also found that by serving 2 of their normal pieces of content based on behaviors they had not envisioned they would see a 20% gain. They were able to go from causing dramatic harm to their business to a large meaningful multimillion dollar gain simply by not relying solely on hearsay and instead testing their assumptions.
In both cases there are many different ways you can manipulate the data to look like there was a positive outcome while actually doing damage. In both cases massive amounts of time and effort was spent to try something only to find an outcome counter to people’s assumptions. In both cases testing out assumptions and exploring to discover the value of different actions prior would have better informed and created more value.
In the end, any idea is only going to be as valuable as the system you put it through. There is nothing inherently wrong with either concept as long as they are measured for efficiency and acted on rationally. If you can take a common heuristic and evaluate it properly, there is value to be had. That does not mean that they will act as magical panacea, nor should you plan your program around such flawed simple ideas. Focus on building the proper system and you will be able to provide value no matter what concepts get thrown your way.