January 30, 2012

Testing 202 – 5 disciplines to get greater value from your testing program

Testing seems like such a natural extension of most existing operations that very few groups take the time to evaluate as a separate discipline. They then are shocked when they hit the inevitable points of resistance that all programs run into. They are not prepared for the political, technical, or organizational barriers that riddle their journey. Even worse, those barriers interfere with the need for simple speedy execution, which hinders their ability to reach the level of monetary impact that the program is truly are capable of. What differentiates programs in the long haul are those that leverage their ability to push past those barriers and change the way they think about the testing, evolving beyond just focusing on simple tests. They start to focus on the power of the system itself to shape their forward trajectory. It takes people to be real thought leaders, not just for themselves, but also for their organizations, to really reach the next level of their program.

The theme of the first group of disciplines for programs was all about organizational consistency and getting everyone to work efficiently and quickly towards one goal. Once you have achieved that ability to move forward and act as needed, the main ingredient for success comes from your ability to think about testing differently, to challenge yourself and others to focus on learning and on not making assumptions. Way too many groups end up leveraging their testing program to only look a the impact of what people want to do, instead of using as a tool to learn and go in new directions. Often times the least efficient part of any system are the people running it, and as such we have to come up with ways to break down that barrier and to allow the test data to really dictate where we go, what we talk about, and how we allocate resources. How you think about testing, how you challenge yourself to not fall into the many biases that dictate human nature, and how you can bring people to challenge their own actions dictates the scale of the impact of each test and of the program as a whole.

So many groups run into the problem of not realizing that they are sub optimizing. They run a test, and they get a winner, and so they assume that everything is golden. They report the winner, they move on. We get so caught up in the immediate return on our actions that we never take the time to understand what it means in context. The problem is that we are not treating each action with the respect it deserves, and that they are taking they are busy looking at today and not tomorrow. It is never about getting a result, it is about the ability to differentiate different results from each other, to make sure that we are going down the most efficient path for our business. If you get a 3% lift, that is more then you had previously, but what if the 3% is just one recipe in a test with a 5%, a 7% and a 10% winner. Would you then think pushing the 3% winner was the best course of action? The only way to escape that trap is by changing how you think about testing, and the practices that you put in place around that change, you are given the ability to measure the efficiency of your actions, to view the causal relationships of alternatives and to measure the scale of impact and the cost to achieve them.

In order to make sure you are getting the most from the actions you have enabled, here are 5 disciplines which will help you achieve the results you want from the program.

Best instead of Better Testing –

I have already explored this concept here, but it is important to understand the distinction. “Better” testing is the act of trying to figure out if one idea is better than another. It limits the playing field and is used to make an immediate decision about who is “right”. Best testing is trying to figure out what the value of each feasible option is, and to figure out what the best places are to put resources or to move the site. Testing is a system, one that only produces value based on the quality of the input. If you limit your input to only popular opinion or a few ideas, then you are dramatically hindering the output of that system. It is about using testing to not just push preconceive notions but to instead democratize ideas, so that the system is more important than the idea. If you are able to let testing tell you where to go, you will not only get better results from this test, but you will better inform future tests and stop yourself from spending resources in an inefficient manner.

Best testing is the first step to allowing your program to produce exponential growth in a way that just a test would never be able to to do. You are able to use the program to build, not just run tests. The entire goal is to increase efficiency and to facilitate learning, and the easiest greatest step towards that goal is forcing your team to think in terms of what is best, not just what idea is better.

Focus on Learning –

Every test you run is the chance to learn about your site and to get outside of your comfort zone. So many groups fail because they only test what they think will win, or what they focus on trying to get a consensus about recipes. If you spend the time and energy you waste talking about test ideas and focus it on creating all the options, you will spend less time and energy and will get better results. Stop arguing and move those resources towards creating. If you are really trying to open up your testing to feasible alternatives, you will constantly find winners that fly in the face of crowds. If you are focused on that outcome, you suddenly find all sorts of new lessons waiting for you, with the added benefit of getting magnitudes of value on top of what you learn.

There are so many assumptions, misconceptions, and faulty “best practices” that dictate the online world. Even worse, we make assumptions that something that works elsewhere works for your site and your users. We gain nothing if we are just proving ourselves right, but instead when we challenge those ideas, we start to learn about what makes your site unique and what works best for what you do. We start learning about the best places to efficiently change your site, and even who the most exploitable user segments are. Even better, those lessons seep into your other conversations, to inform product plans and senior management and all other groups about what you know, not just what you think or pretend you know.

The most obvious example of this is with multivariate testing. So many groups, especially agencies, push MVT testing as a tool to find a single answer by throwing a number of variants for multiple items on a page. It is a big mixing machine to reach a new version of the page faster. If you change that and use MVT testing as a learning tool, to focus on what section of the page or what factor of a section is most influential, then you are able to leverage your resources in a way to maximize the ROI and learn, accomplishing both tasks far better then just throwing things up to see what sticks. Anytime you are running a massive MVT, full factorial or not, you are sub optimizing, both in a resource and time perspective, but also because you have failed to leverage the opportunity to learn.

Living Knowledge Base –

As programs grow, and as you use testing as a vehicle to learn, the most important thing you will accumulate is not lift, but functional knowledge about your site and users. Storing and sharing this knowledge, based off of causal data, informs future decisions in a way that analytics or just “Best Practices” will never be able to do. You have to make this storing and sharing a function of the team, and make it accessible and meaningful to all groups, even those that were not part of the test that gained the knowledge in the first place. For most groups, this accumulation and sharing of knowledge has scales of impact far greater then just the individual test results.

Building a repository of that knowledge, and having it be an active breathing thing, that interacts with people and exists outside of individual tests is vital to achieving the results that programs want to achieve. What it is not is just a list of tests run and their results. What it is meant to be is something that shares lessons, successes, and failure, but is focused on the learning you have done from your program, not the minutia of the actions that got you there. Every version of this is different, as is each and every organization, yet they share the same characteristics: they focus on what has been learned across tests, they are easily accessible, they are used to start conversations and as a barometer to go, and they see what does and more importantly what does not work. It also allows you to weigh the various actions against each other, as just lift alone does not tell you the scale of impact. If you continue to try the same things that fail, you will never be able to leverage the exponential efficiency that learning should allow you.

Iterative Testing –

Iterative testing may seem like a no brainer, but so many groups fail to understand or leverage it as a discipline, instead talking about but failing to act on it consistently. It should be an organizational rule that no test is ever “over”, but only at a new stage ready for the next test. If you are using tests to learn, then you will know how to prioritize a page, which then needs to have the different sections explored, for not only what the feasible options are, but also what the most influential parts of those are and then what the best way to tackle that winning factor may be. The goal here is to make sure that each test that you run maximizes the efficiency while mitigating the opportunity cost, and the only way to do that is to constantly build off prior knowledge to insure that you are maximizing your placement of resources. In many cases, this can be the most difficult barrier for groups to actually overcome, as while there is a lot of positive talk around the subject, there are so many pushes and pulls for your time and for the resources, that it is easy to lose track or to just stop at the end of any given test. You have to force yourself and your group to maintain that momentum, and more importantly leverage all of that learning, to really drive where you are and where you are going.

Here is an example:

You take a product page, you challenge yourself to learn about what the most influential section are, so you test out what belongs and what doesn’t by removing all possible sections on the page. You learn that 2 sections don’t matter, the top navigation and the brand information. You then test out removing them together and find that you have improved the page towards your site wide single success metric. At this point you have a new page after pushing the winner, but need to dive deeper. You then use a small 3X2 MVT to learn that the button is the most influential element on the new page. You follow that up to look at the factors of that button to figure out what about it drives that influence. You learn that color is far more influential that copy or size so you then test out 5-6 different colors, and learn that purple, the color that everyone thought never had a chance of winning, actually does better than all the other colors.

What you have is an entire path that you never could have preconceived. You have forced a way to make sure that popular opinion did not drive what you test. The resulting page is not one that anyone could have predicted, but is by far the best performing. You have learned what the importance of elements are, the best way to change them, and you have used very little in the way of resources. You are free to continue, as you optimize the second most important part of the page, or you optimize the second factor of the button. The process continues, either in that same part of the user experience, or where you can see other opportunities to do the same process based on what you have learned.

Iterative testing is something that is constantly thrown around, agreed on, but then why is it so rarely done consistently? Why do so many groups think that talking about iterative action is enough? Groups get too caught up on the single winner that they miss that it is just one part of a very fluid user experience. You have to force yourself to follow this path, that is why it is a discipline, and when things become difficult to push through and do this, always, in order to reap the rewards you are seeking. Even better, if you are using segmentation at each point, you will end up with a page that can be dynamic based on the winning alternatives for exploitable segments. Each “test” is really just the next evolution of the same process. You have to stop yourself and your organization from viewing the test as the ends to itself. Doing this consistently, not once but always, really differentiates the value you will receive from testing.

Deconstruction –

The ability to have someone present you a test idea, and then break it apart to find the assumptions that lead to it so that you can learn as you grow is a vital skill for optimization. One of the most common mistakes that testing groups make is to take each idea at face value. Every idea comes from only one point of view, and is riddled with biases. You have to force yourself and others to get past those points to really discover the value hidden behind what sounds like a conventionally accepted concept. Treat the most important part of your program as the system by which you discover new things or challenge biases, and you will always be able to get greater results. We need to take any idea, and challenge every core part of it, so that we leave nothing to chance and so that we can really evaluate what works, not just what sounds like it works.

So rarely are you presented with the chance to optimize a page from start to finish as mentioned above, but that does not mean you are not still responsible for applying the same discipline to any starting point. People will always come to you with test ideas, either through internal debates, feedback, or just as they are going across the site. There is most likely value from each idea, but the real skill is to break down the idea, to make sure you are not presuming the path, and to learn. Is that item even needed? Does what you have been doing, is it positive or negative? Is that the best place to put resources? What else can be done with that space? Being able to take an idea and challenge the components of it is how you will arrive at the most important lessons you will learn.

The idea of targeting content in your carousel on your homepage based on what people have already purchased sounds like a great idea, but lets evaluate all of those assumptions:

Is purchaser even exploitable (can you change their behavior by changing the user experience)?
Is it the most exploitable way to look at the same user?
Is the homepage the right place?
Is the carousel the most influential part of that page?
What type of changes to the content are most influential, is it the wording, the presentation, the location?
Does content have the largest impact?
How does that entire path compare to other alternatives?

That is just the very first pass. It is actually far more likely that just changing the layout of your homepage based on browser is going to be both much higher in yield, but also more efficient of a much larger scale.

It takes practice and discipline to force yourself to challenge all of these ideas. It can often lead to discomfort at first, but pushing past that and assisting others in seeing their own biases and their own assumptions helps everyone grow. Testing is not about who had the best idea, it is about becoming humble and realizing everyone is “wrong” and about creating a system to push past that and learn and challenge common convention for the betterment of everyone.

Conclusion –

I am often asked how I define a “successful” program. When it comes to measuring a program, I measure it on how often they have learned something new and unexpected. Have they done something that makes everyone go “That can’t be right”, or “that goes against everything I have ever heard.” The only way to have those moments and to get the magnitudes of value both short and long term that those conclusions present you is to break apart ideas and to challenge yourself and others on the questions that you are trying to tackle. It isn’t about being sadistic or altruistic, but instead about looking at the discipline as a means to an end that helps everyone achieve their goals.

In the end, it doesn’t matter if you can run 100 tests a month if you are running sub optimal tests. The point is never what you did get, but what you got in relation to what you could have gotten with the same or fewer resources. So many groups get lost because they focus on the actions, and not the disciplines that define them and the larger picture of the program that they exist in. You did an action, that you might have done anyways, but it is over and you are left with nothing but the next idea.

You have to change how you think and how you act to get the results you want and to make testing a fundamental part of who you are as an organization. I talk in terms of disciplines, because that is what they are, they are core beliefs that only take hold when we challenge ourselves to live by them every day, not just when it is convenient or easy. Challenging yourself to these disciplines and asking the tough questions of others will allow you to move down a path away from the obvious to the world where you learn, grow, and have results that really move people.

To navigate the entire testing series:
Testing 101 / Testing 202 / Testing 303 – Part 1 / Testing 303 – Part 2

TL;DR

Testing 202 – 5 disciplines to get greater value from your testing program

Join the Discussion Cancel reply

Share this:

Related

Join the Discussion Cancel reply