One of the most difficult parts of starting your own program or of consulting with a new organization is the need to evaluate and change existing practices. In almost all cases groups have been optimizing for a while, often times with one or more people owning the program and who have built their reputations off of prior practice. Any prior actions have been done with their name attached and they have enjoyed the perceptions of success. The problem is though that people rarely evaluate the reality of their statements and are often not aware or too busy to really know if what they are saying is real or pure BS (this explains the entire agency system).
This can be extremely problematic as it is vital to stop any bad practices before you can implement needed discipline and really make a positive impact for your company. It does you no good to look into things like fragility or efficiency, or in controlled experiments or segment discovery if you are operating in a world where people expect to test out 1 or 2 ideas based on opinions and to do this in 2-3 days. If your organization actually thinks that things like 48 hours to run 8 tests and clicks on a button are a measure of success then no amount of real optimization is going to matter until you make it clear just how off the entire process is. Of course if you do this poorly then you are just making yourself public enemy number 1 and since you are the new guy in the room you are basically setting yourself up for failure.
The key is to understand the issues and tackle all of them without prejudice and to evaluate the program for all of them. That way people see that you are not attacking someone or something but simply evaluating the program for inefficiencies. If everything is up for grabs and somethings pass and something go then at the least you are removing the direct confrontational element from it. If you can further push the conversation into one of what defines success and simply focus on those components then many of the would be battles simple fall by the wayside.
Generally the things that need to evaluated and often changed fall into a number of common categories. These include:
Acting on test:
False belief in confidence
Acting too quickly
No consistent rules of action
Lack of Process:
No consistent way of getting results live
No single person owning test ideation, just random ideas thrown up
Lack of data control:
No variance study
Lack of proper segment analysis
The main problem with any or all of these is that there will be a library of tests that people have believed and most likely built entire strategies around. It doesn’t matter if it is what pages do or do not work, the impact of certain changes or where and who to test to, this misinformation is far more damaging then any positive result that you could generate.
All results are contextual, and as such this means that you must set the proper context in order to really evaluate the impact of a test or process. If you have people believing a 200% increase because they were looking at one group and on clicks on a button then it can be nearly impossible to talk about a 5% RPV increase because it just sounds too small and not as important to them, despite the fact that the 200% click increase could have actually caused a 10% loss in revenue. If you or others do not understand the core principles and math involved then they are more likely to fall for any BS that they come across. You must focus on education and on the disciplines, not just stories if you want to make meaningful long term impact.
This is why stopping the bleeding is such an important and difficult task to overcome. People don’t realize how far off they really are and often times have never been called out for their BS, resulting in entire careers built on bad outcomes and false conclusions. In my case I am looking at everything from acting too quickly (18 conversions versus 32 conversions is meaningless), a lack of variance understanding, and a lack of discipline on test ideas. These things were not done because someone was malicious or self serving. they were not done because of a lack of intelligence or a lack of want to improve the business, they were simply done because the person did not know better and because there is just so much bad information out there.
The real challenge here is controlling expectations and helping people understand the error in their ways. I am extremely lucky to work with a number of very smart people who are willing to listen to and understand issues which they never knew they were dealing with, like the variance problems I previously discussed. The challenge if far more in people understand that just because they come from a place that is used to testing in 1-2 days or in tracking a certain thing it just means that they were really good at wasting their companies time and resources. It is also important to also set proper expectations on what the movement speed will be. If they are thinking you can get a result in 2-3 days and it is going to take 2-3 weeks, this can completely shift your view of optimization to a the negative despite the fact that you are really moving from something that was damaging the company to something that is going to cause consistent positive growth.
More then anything it is important to realize that you have to stop all bleeding and make that the primary focus before you can overly concern yourself with making big changes. This doesn’t mean that you don’t do any tests or the like, in fact it is important for people to see what they should be doing so that they can really appreciate how far off they were prior. If someone doesn’t know what success looks like then any point on the map can be success for them. It simply means that controlling the message and focusing on education is vital at the start of any program.
The hardest challenge when working with different groups in the optimization space is often trying to get past their misconceptions and to help them view optimization in a different form. It doesn’t matter if they have been doing testing for 1 day or 10 years, there is still a massive difference in efficiency and the value that can be generated. Results are not random yet so many believe they are because they misunderstand optimization on the most fundamental levels. The reality of real successful optimization is often far from the perceived reality from those just entering the space. The number of misconceptions is so large that it can often be nearly impossible to prioritize them or to tackle them all.
Because this problem is so common, I reached out to the smartest people I know in the industry and asked them to share their thoughts about what the one thing they wished people understood about optimization.
Rhett Norton – Consultant
Without discipline companies go through the motions of testing without ever really achieving amazing long term results. The most successful companies I’ve worked with have been successful with creating discipline in parts of their testing program. I’ve never seen a company that is disciplined in every aspect of optimization, but hey, maybe your company could be the first.
Drew Phillips – Consultant
It is free form in that you need to have the flexibility to optimize elements that you find to be influential, not lock yourself into a specific roadmap. Optimization is a process that changes as you learn from each campaign. You will get the most out of your optimization efforts by iterating off of things you learn from previous tests.
Brandon Anderson – Consultant
Sometimes organizations that have been doing A/B testing for years feel like they need to work on complex activities in order to continue progressing. My experience is that even mature organizations need to look past the hype of new and shiny buzzwords and determine which activities will give them the highest efficiency. Get the 80% with 20% of the effort by focusing on the basics.
Ryan Roberts – Solution Architect
I also wish people were more careful about how they read test results. People that rely solely on confidence calculations are going to end up with a lot more wrong conclusions than they think. They need to understand what the rules of conclusive results should be for their site. And they have to apply them religiously to each test they run.
Doug Mumford – Consultant
While there are some tests that will require more time a lot of highly valuable tests can be done with three lines of CSS or jQuery, loaded up in four browsers to make sure everything looks good (and perhaps an iPhone and iPad), and launch. Have a bias for action.
If I had to characterize my own answer to the question it would be that there is a massive difference between action and value. Just running a test, be it one or 500, is not the mark that you are successfully optimizing. Optimization is about how you tackle large assumptions, and about how you act on data, and even how you think about what data can and can’t tell you. So much time is wasted in the pursuit of executing on assumptions and against the propagation of agendas which is the exact opposite of where the value of optimization comes from.
It is about discipline, and statistics, and variance, and technical solutions, and dealing with senior management and dealing with biases and assumptions. It is all that and more. It is a means to an ends, but that end is increased revenue for your organization, not just blindly reaching an audience or making an individual look good. The more you try to justify a specific action or the more complicated you make something, the less value you get and the more time you waste. Just understanding that action in and of itself is not the answer is the first step to being truly open to solving the largest challenges that optimization programs face. The challenge is never in running tests, the real challenge is finding solutions and ways to even have these conversations.
What do you find as the one thing you wished people understood about optimization? What are you doing to solve it?
My first trip through the common heuristics of conversion rate optimization looked at two of the more common testing ideas and how they usually reach false or limiting conclusions. In my second part I want to look at general testing theory best practices and how they can be major limiting factors in the success of your program.
It is important to remember that you are always going to get an outcome so this is not about can you make money. How you and the people in your organization think about testing is the largest factor in what you value that optimization produces. This is an evaluation of the efficiency of the method and how much does it produce for the same or less resources. In concept you can spend infinite amount of resources to achieve any end goal, but the reality is that we are always faced with a finite amount of time and population, which means we must always be looking for ways to improve inefficient systems. If we continue to be limited by these common heuristics then the industry as a whole will continue to produce minimal results compared to what it can and should be producing.
Always have a Hypothesis –
There is not more misunderstood term then hypothesis. In all likelihood it is because most are familiar only with their 6th grade (at least in my school) science instruction or they took classroom formal science in college. In those fields we operate like we have unlimited time and resources and we are trying to validate whether a drug will cause cancer, not whether a banner will get more clicks if it is blue or red. The stakes are higher and the models are much more simple in classroom controlled studies for cancer. There is a lot to scientific method, especially when approached from a resource efficiency perspective that is not considered in such a simplistic view of idea validation.
We must apply scientific rigor, but we must also make sure that all actions make sense in real world situations, which means that efficiency and minimizing regret are more important than validation of an individual’s opinion. It is not that scientific method relies on the use of a hypothesis, it is simply that we mistake a hypothesis with a correct hypothesis; we seek validation for our opinions and not the discovery of the best way to proceed. Science is also about proving one idea versus all other alternative hypothesis yet we ignore that part of the discipline because it is not the part that allows someone to see if they are right. In the grand scheme of things we are drastically over valuing test ideas and that is distracting from the parts of the process that provide value.
Let’s start with the basics. You should never, and I mean never, run a test if you do not have a single success metric for your entire site. In most cases this is to make more money, but whatever it is, this goal exists outside of the concept of the test. You must also must have rigid measurement and action rules that are reproducible, which means that you must understand real world situations like the limitations of confidence and variance.
You can then have an opinion about what you think will happen when you make a change. The problem is when we confuse that opinion with the measured goals of the test. Even worse we limit what we compare resulting in massively inefficient use of your time and effort. Just because you believe that improving your navigation will get people to spend more time on your site, that is completely irrelevant to the end goal of making more money. Your belief that more engagement will result in more revenue is not enough to make it so. If you are right AND if that also produces more revenue, then you will know that from revenue. If you are wrong you will only know that from revenue. We must construct our actions to produce answers to our opinion and to what is best for our organization. Hypothesis and ideas are just a very small part of a much more complex and important picture, and over focus on them allows people to avoid the responsibility and the benefit on focusing on all those other parts, which are the ones that really make a difference over time for any and all testing programs.
The worst factor of this is that it allows people to fall for congruence bias and to fail to ask the right questions. We become so used to the conversation around a single idea that the concept of discovery and challenging assumptions is more word then action. Questions can be incredibly important to the success of a program, but only if they are tackled in the right order and used to focus attention, not as the final validation of spent attention. If your hypothesis is that a certain navigation change will result in more engagement, then the correct use of your resources are either which of a number of different versions of the navigation will produce the most revenue or if you can, which section on your site produces the most engagement when changed. In both cases you have adapted your “hypothesis” to present a more efficient and functional use of your time. The hypothesis exists, but it is not the constraint of the test. If you are right, you will see it. If you are wrong, you will make more money.
This means that having a hypothesis is important, but only if it is not the test charter. Have an idea what you are trying to accomplish and make sure that you go about seeing the value of certain actions compared to each other is more important. Sometimes the most effective hypothesis are “I believe that we do not know the value of different sections on our pages.” Don’t confuse your opinion on what will win with a successful test. Challenge assumptions and design efforts to maximize what you can do with what you have and you will never be without opinions. The best answers are always when you are proven wrong, but if you get too caught up on validating your hypothesis, then you will always be missing the largest lessons you could be learning.
We need to optimize X because it is losing Y
This is the classic problem of confusing rate and value, or more correctly correlative and causal inference. We confuse what we want to happen with what is really happening. Just because people were doing X and now they are doing Y, it doesn’t mean that this is directly causing any change, positive or negative to our end goals. Outside of the three rules of proving causation the real issue here is that we get tied to our beliefs about a pattern of events even when the data cannot possibly validate that conclusion. Understanding and acting on what you know as opposed to what you want to have happen is the difference between being data driven and simply being data justified.
Think about it this way, I have 23% clicks on one section of my page and 0% on another. If I were to improve one of those which one is going to produce the biggest returns? The answer here is that you do not know. A rate of interaction cannot possibly tell you the value of changing that item. Some of the most important parts of any user experience are things that can’t even be clicked.
This plays out outside of clicks too. We have a product funnel and we see more people leaving on page 3, therefore we need to test on page 3. The reality is that more or less people may or may not be tied to more or less revenue. Even if it is tied it may be a qualification issue higher, or a user interaction issue, or simply too many people in a prior step. This is called a linear assumption fallacy, where we assume that when we have 5 people and 2 convert that if we have 10 people 4 will convert. Linear models are rare in nature but are easy to understand, so we fall back on comfort over realistic understanding.
The act of figuring out what to test can be difficult but it is never improved by pretending we have validation of our own ideas when we have nothing to justify them. We need to be open to discovering where we should go and to focus on some set path. In almost all cases you will find that you are wrong, often dramatically so, about where problems really are and how to fix them. This is why it is so important to not try and focus solely on more or less correlative actions. We can and should be able to test fast enough and with few enough resources that we will never be limited to this realm unless we can are stuck there mentally.
Like so much else what you spend your time and effort on is incredibly important. There are a thousand things you can improve and there are always new ideas. Justifying them falsely or focusing on them instead of the discipline of testing is nothing but a drag on your entire testing program. Test ideation is about 1% of the value derived from a test program yet it is 90%+ of where people like to spend their time. A 5% gain that took 2 months is worth a lot less than a 10% gain that took 2 weeks. The most important issues we must face are not about generating test ideas or validating our beliefs about how to improve our site, it is about discovering and applying resources to make sure that we are doing the 10% option and not the 5% option. If we overly focus on test ideas and not the discipline of applying them correctly we are never going to going to achieve what should be achieved. If we get lost trying to focus only on where we want to go, then you will always be limited in the possible outcomes you can generate.
Because of the nature of how most organizations work, it is common to find testing added onto the existing roles or organizational structure of the analytics group. It makes sense from a very high level view as both disciplines deal with using data to make decisions and both can be viewed as a shared resource throughout the entire organization and for each group. The failure comes however when people who are used to looking at and responding to problems like they are analytics issues try to force the same actions onto testing.
The real key to success in adding optimization to your organization is in how you tackle the fact that it is a new discipline then in the whom or where it fits into that larger picture. Most groups worry about head count and resource and fail to focus on the skills and different actions that determine success with optimization as opposed to general analytics or business intelligence work. The lack of knowledge by others is often abused or leveraged to help people gain oversight onto this project. The real problems lie however when people do not then adapt usage to meet the new needs and instead simply try and come up with stories as to the value of the program.
Once mature and with key people in place combining the programs can add a lot of value. Without deep understanding of those differences however, the inevitable conclusion is less value and wasted time as people make mistakes that they do not even know are mistakes. Resources are a precious commodity, and the ultimate expression of optimization is to leverage them in the most productive way possible. In order to ensure this outcome, it is important that you focus the time of your optimization team towards actions that will maximize outcomes and help grow understanding for your entire organization.
To maximize success and to ensure that focus is done on the right actions, here is a breakdown of the time spent and the value derived from the main actions of a successful optimization team.
1) Active Data Acquisition – 80% – 90% of the value of most optimization programs comes not from the commonly thought of validation role but in continual active data acquisition and comparison of feasible alternatives. It takes time for groups to achieve this role, but when they do the value given to the organization increased by magnitudes. Often this is based around the concepts of bandit based optimization and fragility and is used as an ongoing effort to challenge assumptions and to actively measure the value of different alternatives.
In this role the optimization team consistently leverages low resource efforts that consistently measure as many different feasible alternatives as possible and do this across the site. This is in the attempt to maximize the discovery of exploitable opportunities and the primary role is to challenge assumptions which otherwise will never be discovered. The team needs access to pages and clear rules on measuring success and leveraging of resources in order to produce constant and impactful lessons which can shape and direct product discussions and roadmaps.
2) Education – This is 5-15% of the value derived from the optimization group, but provides the ability to do the actions which produce the greatest returns. Because optimization requires very different ways of thinking about and executing on actions in order to provide the most value this means that one of the key roles for optimization is an ongoing and consistent conversation with groups about different ways to think about problems.
It is vital that this conversation always happens prior to any action and is that optimization is not just thought of as a simple action in a release calendar. Groups that fail to think different are guaranteed to get much lower value from their testing efforts, waste far more resources, and often have much slower and less productive product teams overall. They will get results, they will simply be far smaller results at a much higher cost. Failure to focus on education often leads group to a purely responsive role and leads to programs that are happy with the number of tests they run or by simply producing a single positive result.
There is no such thing as an organization that starts out looking at the correct things perfectly and without fail someone’s personal agenda leads them to subconsciously search out confirming actions and data in order to make themselves look good. Building, maintaining, and educating people on proper data discipline is the single most consistent and important topic of education.
Here are a few of the key topics that groups can and should focus on:
i. Rules of Action – Knowing how to act on data and to be disciplined in not acting too fast or too slow and looking at only metrics that matter to a decision are vital for any data organization.
ii. Statistical disciplines – There are many different ways to think about data and testing and it is vital that people be exposed and open to different ways then they are previously aware in order to maximize future growth.
iv. Knowledge Share – When you are running a successful optimization program you will be constantly learning things that go against all previously held beliefs and opinions. These lessons learned are the single most valuable part of a successful program and become a core component of a program once it has matured.
3) Ad hoc analysis and validation testing At most this represents 5% of possible value provided by testing – It is always fun to want to focus on who has the best idea to improve things, but ultimately this is the least important part of a successful program. The better the input, the better the output, but only if it is going through a great system. A poor system means that it really doesn’t matter how good any idea is.
This is the part that most groups are familiar with, where they respond to test ideas directly or asking for more data / details on specific tests.
Generally this time is best spent redirecting towards higher value uses of time and value of data.
Successful programs have their time breakdown in the range of:
• Active Data Acquisition / Ongoing optimization – 60-70% of time
• Education – 15-20% of time
• Post hoc analysis and validation optimization – 15-20% of time
It is generally more about the thought and usage of the program then just who owns it. The real keys to a successful program are to differentiate the roles and skills. Generally if a program is just starting out you they may have 1-2 people who work on testing in some form, with the primary focus on working with different groups to work with their ideas to provide value. Mature programs might move upwards to 5-10 people and higher as they continue to grow.
It does no good to add more people to a problem if you aren’t fixing the real problem, which is how the time is spent. More time does not equal more value, better usage of time means more value. It is never easy to go past your comfort zone, but that is where you will find all the value. Think about how and where you are spending your time.