February 20, 2012

Are you smarter than your trashcan?

This may seem like a really inane question, but think about it for a second. We are thinking breathing beings, how can we possible not be as smart as our trashcan?

First, let’s establish how we would measure this. We like to pretend that how smart someone is goes from nothing to absolute, but it doesn’t. Having no knowledge and doing nothing is far better than using bad knowledge, or thinking you have answered something correctly when you haven’t, or over reacting. We are fallible, and as such, we make bad decisions from time to time. Poor judgment, biases, and misinformation actually detract from an outcome, where zero knowledge does not have an impact. The scale for how smart you are is not one of 0 to 10, but one of -10 to 10, with 0 being the middle or neutral point in the absolute scale.

Your trashcan does not offer any knowledge, it does not react, and it does not have biases. It does not push its agenda, nor is it influenced subconsciously by the agenda of others. It is not wired to want to rationalize its own actions or to want to prove the value of its actions to itself or others. It is not impacted by Maslow’s hierarchy any more than it is by fear, or greed, or lust, or any of the other ways that we are wired to be influenced. Anything that goes into it, it can dump out just as easily. It doesn’t reject knowledge, nor does it change to fit the mood of the room. It does not provide any value and it has no knowledge, so it will always be stuck at 0 on that scale.

Now human beings are capable of amazing things, we have built great monuments, civilizations, history, art, cars, we have done it all. We have also had war, greed, genocide, hate, bigotry, and believe many crazy things. Those are the end points of all we are capable of, but I am not referring to the theoretical but this moment. You can be anywhere from the greatest level (10) to the lowest delusion (-10). We all constantly move up and down that continuum with each action we take, but do you know where you are at any given time?

How do we process information and how does it impact the decisions we make? How does it impact our view of ourselves and the world?

People like Philip Zimbardo and Robert Levine have shown that we are wired to look either at the future, or the present, or the past, but we fail to look at multiple perspectives. We get too caught up in reacting today, or planning for the future. We are full of biases and self-delusions, and even worse, knowing this in no way stops them from changing how we view our world around us. We know that in the past, we have made mistakes, and in the future, we will make them, but how do you know that right now you are making a mistake? We lose perspective, and because of this the meaning of data we use to make decisions changes constantly. We fail to balance what we are doing with where we go. To make up for this, we make assumptions, we rationalize, we ignore data, and we find things to confirm what we want. We are so wired to only confirm what we do that we ignore a majority of the information from the world around us.

This impacts everyone.

So I ask you, right now, not tomorrow or 20 minutes from now, are you smarter than your trashcan?

You can’t answer with what you are capable of, nor can you answer with what you have done in the past. In the here and now, how do you know that the answer is a positive? How do you know if you are currently adding value, or removing it. Are you really doing the right thing? Or are you just using misinformation, biases, and self-delusion to convince yourself that you are above 0, while all those other people are below it? Are you letting those biases rule what you view and letting you think you are smarter than the trashcan? Or are you really making an impact? Is your impact real, or is it hubris?

The only way to really make sure that you are improving your chances to be above zero is to put in place a system that limits the impact of those biases and gives you insight into your own decisions. You have to be humble enough to put a system in place to measure your decisions in context, away from any in the moment manipulations, and that allows you to know the efficiency of your choices. You’re never going to be perfect, but it’s up to you to make sure that you aren’t just calling trash gold.

February 14, 2012

Understand the math behind it all: The N-Armed Bandit Problem

One of the great struggles marketers have when they enter new realms, especially those of analytics and testing, is trying to apply the disciplines of math to what they are doing. They are amazed by the promise of models and of applying a much more stringent discipline then the normal qualitative discussions they are used. The problem is that most marketers are not PHDs in statistics, nor have they really worked with the math applied to their real world issues. We have all this data and this promise of power before us, but most lack the discipline to interact and really derive value from the data. In this series, I want to explain some of the math concepts that impact daily analysis, especially those that a majority of people do not realize they are struggling with, and show you how and where use them, as well as their pragmatic limitations.

In the first of these, I want to introduce the N-Armed bandit problem as it is really at the heart of all testing programs and is a fundamental evaluation of the proper use of resources.

The N-Armed Bandit problem, also called the One-Armed bandit problem or the multi-armed bandit problem, is the fundamental concept of the balance of acquiring new knowledge while at the same time exploiting that knowledge for gain. The concept goes like this:

You walk into a casino with N number of slot machines. Each machine has a different payoff. If the goal is to walk away with the most money, then you need to go through a process of figuring out the slot machine with the highest payout, yet keep as much money back as possible in order to exploit that machine. How do you balance the need to test out the payouts from the different machines while reserving as much money as possible to put into the machine with the greatest payout?

Which one do you choose?

Exploring the casino

As we dive into the real world application of this concept, it is important that you walk away with some key understandings of why it matters to you. An evaluation of the N-Armed bandit problem and how we interact with it in the real world leads to two main goals:

1) Discovery of relative value of actions

2) The most efficient use of resources for this discovery and for exploitation

The N-Armed bandit problem is at the core of machine learning and of testing programs, and does not have a one-size fits all answer. There is no perfect way to learn and to exploit, but there are a number of well known strategies. In the real world, where the system is constantly shifting and the values are constantly moving it gets even more difficult, but that does not make it any less valuable. All organizations face the fundamental struggle in how best to apply resources, especially between doing what they are already doing and exploring new avenues or functional alternatives. Do you put resources where you feel safe, where you think you know the values? Or do you use them to explore and find out the value of other alternatives? The tactics used to solve the N-armed bandit problem come down to how greedy you try to be and about giving you ways to think about applying those resources. Where most groups falter is when they fail to balance those two goals, becoming lost in their own fear, egos, or biases; either diving too deep into “trusted” outlets, or going too far down the path of discovery. The challenge is trying to keep to the rules of value and of bounded loss.

The reason this problem comes into play for all testing programs is that the entire need for testing is the discovery of the various values for each variant, or for each concept, against one another. If you are not allowing for this question to enter your testing, then you are always only throwing resources towards what you assume is the value of a change. Knowing just one outcome can never help you be efficient. How do you know what value you could have gotten by just throwing all your money into one slot machine? While it is easy to convince yourself that because you did get a payout, that you did the right thing, the evaluation of the different payouts is the heart of improving your performance. You have to focus on applying resources, and for all groups there is a finite amount of resources, to achieve the highest possible return.

In an ideal world, you would already know all possible values, be able to intrinsically call the value of each action, and then apply all your resources towards that one action that causes you the greatest return (a greedy action). Unfortunately, that is not the world we live in, and the problem lies when we allow ourselves that delusion. The problem is that we do not know the value of each outcome, and as such need to maximize our ability of that discovery.

If the goal is to discover what the value of each action is, and then exploit them, then fundamentally the challenge is to how best to apply the least amount of resources, in this case time and work, to the discovery of the greatest amount of relative values. The challenge becomes one purely of efficiency. We have to create a meaningful testing system and efficiencies in our organization, either politically, infrastructure, or technically, in order to minimize the amount of resources we spend and to maximize the amount of variations that we can evaluate. Every time we get side tracked, or we do not run a test that has this goal of exploring at its heart, or we pretend we have a better understanding of the value of things via the abuse of data, we are being inefficient and are failing on this question for the highest possible value. The goal is to create a system that allows you to facilitate this need, to measure each value against each other, to discover and to exploit, in the shortest time and with the least amount of resources.

An example of a suboptimal design for testing based on this is any single recipe “challenger” test. Ultimately, any “better” test is going to limit your ability to see the relative values. You want to test out your banner on your front door, but how do you know that it is more important then your other promos? Or your navigation, or your call to action? Just because you have found an anomaly or pattern in data, what does that mean to other alternatives? If you only test or evaluate one thing by itself, or don’t test out feasible options against each other, then you will never know the relative value of those actions. You are just putting all your money into one slot machine, not knowing if has a higher payout then the others near it.

This means that any action that is taken by a system that limits the ability to measure values against each other, or that does not allow you to measure values in context, or that does not acknowledge the cost of that evaluation, is inefficient and is limiting the value of the data. Anything that is not directly allowing you the fastest way to figure out the payouts of the different slot machines is losing you value. It also means that any action that requires additional resources for that discovery is suboptimal.

If we have accepted that we have to be efficient in our testing program, we still have to deal with the greatest limiter of impact, the people in the system. Every time we are limited only to “best practices” or by a HiPPO, then we have lowered the possible value we can receive. Some of the great work by studiers of probability, especially by Nassim Nicholas Taleb, has shown that for systems, over time, the more human level interaction, or the less organic that the system is allowed to be, the lower the value and the higher the pain we create.

Comparing organic versus inorganic systems:

Taleb - Value of a System

We can see that for any inorganic system, one that has all of those rules forced onto it, over time there is a lot less unpredictability then what people think, and that there is almost a guarantee of loss of value for each rule and for each assumption that is entered into that system. One of the fastest ways to improve your ability to discover the various payouts is to have an understanding of just how many slot machines are before you. Every time that you think you are smarter then the system, or you get caught up in “best practices” or popular opinion, you have forced a non-organic limit into the system. You have artificially said that there are less machines available to you. This means that for the discovery part of the system, and the best thing for our program and for gaining value, that we must limit human subjection or rules, in order to insure the highest amount of value.

An example of these constraints is any hypothesis based test. If you are limiting your possible outcomes to only what you “think” will win, you will never be able to test out everything that is feasible. Just because you hear a “best practice” or someone has this golden idea, you have to make sure that you are still testing it relatively to other possibilities, nor can you let it impact your evaluation of that data. It is ok to have an idea of what you think will win going in, but you can not limit yourself to that in your testing. That is the same as walking up to the slot machine with the most flashy lights, just because the guy next to you said to, and only putting your money in that machine.

Everyone always says the house wins, and in Vegas that is how it works. In the real world, the deck may be stacked against you, but that does not mean that you are going to lose. Once you understand the rules of the game and can think in terms of efficiency and exploiting, then you have the advantage. If you can agree that at the end of the day your goal is to walk out of that casino with the largest stack of bills possible, then you have to focus on learning and exploiting. The odds really aren’t stacked against you here, but the only way to really win this game is to be willing to play it the right way. Do you choose the first and most flashy machine? Or do you want to make money?

February 8, 2012

Testing 303 – Advanced Optimization Paradigms – Part 2

In the first part of our look at advanced paradigms, I focused on the complex interplay of testing and other parts of your organization. As testing grows, it starts to interact on a nearly daily basis with every part of your organization. If you look at the evolution that we have taken, going from the very fundamental building blocks of a testing program, to the ways we look at tests and testing, and finally to the complex interactions of testing into everything, we have shifted the importance and the value that testing brings. The final stage of evolution is to start evaluating your own core beliefs of even what is a testing program, data, and even how we view the world. It is easy to challenge others to grow, but the most difficult and most rewarding changes always start from within. If the evolution starts with getting people to align, it ends with changing our fundamental beliefs about data. We have to ask extremely difficult questions and challenge our own interactions, breaking down our beliefs and rebuilding them to strengthen and evolve.

To that end, here is the final look at advanced optimization paradigms:

No More Focusing on Test Ideas –

If we view optimization as a discipline, one that never starts and never ends, one that is about the constant changing and learning of a user experience, then there is no longer any need for individual test ideas. An idea naturally has a start and an end, as any hypothesis comes from a belief in a specific solution to an existing problem. People get so caught on their idea, be it from their own experience, some piece of data that they just know means they have the solution to all your problems, or just “best practices” that their brains shut down and they stop trying to find the best answer. The problem here is the entire process leads to a myopia that gives us the right to stop, the ability to prove ourselves “right” and the natural affinity towards a set path.

If the focus is no longer on test ideas, however, then the system is what you focus on. If we are instead treating things as a cycle: explore, learn, ideate on all feasible alternatives, execute, learn, repeat, then there really is no individual campaign or test. The path is never about what you think will win, or what you want to tackle, but instead only on where the casual data leads you and the evaluation of all feasible alternatives. Test ideas become the least important thing you can discuss, and should be viewed with high levels of skepticism. There is no such thing as a good test idea, only a concept that can be broken apart, challenged, and improved. Fear any expert trying to tell you a single great test idea, or any guaranteed set of steps to improve your site, as they are only playing to your own insecurities; the reality is any single idea can not hold up to scrutiny. This impacts your own beliefs as much as any other, you have to hold yourself to the same level of scrutiny, not allowing what you want to happen to be the path that you go down or the answers you seek. You have to be willing to step outside your own opinion and be able to focus on all feasible alternatives and only what the efficiency of changes are; any bias that you allow to limit what you test, be it because of experience or popular opinion, devalues the outcome that you can generate.

The biggest challenge most people have is the feeling of a losing of control. We often like to blame other groups for this behavior, but by far the most guilty group are analysts who are so busy trying to prove a point with their data that they fail to see the larger picture. They so want to prove a path using their analytics that they fail to factor in the need to change to an active form of data acquisition in order to move forward. You have to worry about your own biases before you can stop others. It is easy for all groups to get focused on what their experience or gut tells them is right, often to poor and inefficient outcomes. Make it clear that no idea stands alone. Put in place measures to insure that you are not limited to popular opinion or only what you think or want to win. This often means that you have to prioritize resources in ways that you are not doing today, but ultimately this is the only way to insure you are getting the greatest value and insuring your own continued education about what the value of actions are.

Free yourself from the cycle of defending and pushing every idea, instead creating momentum and a consistent pattern of action. Everyone is afraid of moving towards the infamous 48 shades of blue extent of this path, but the reality is it frees you. You no longer need consensus and you can push the boundaries of what you try. Once you have gotten to blue as the most important element, you may, depending on your feeling for the N-Armed bandit problem, want to test out to 48 different variations, but that is not an affront to you. People built the system, fed the system, and control the system. Once you get to the point that you know what you need to know, why not let the system provide the answer for you? The system is only as valuable as the people who feed it, yet we fear the system and we fear becoming lost to the system.

Moving down this path, of avoiding individual ideas or about trying to find the perfect solution allows you to re-imagine and recreate who and what you are on the fly, without massive redesign efforts. It allows you to avoid holding anything sacred on the site or about ever worrying about the entire concept of “right”. The user experience becomes a fluid thing, where the true value of data, your creativity, and the ability to move past your own biases, determines the magnitudes of growth that you will experience. The true value of the individual is in how well they feed the system. The system is only as valuable as what goes into it and by democratizating all ideas; by forcing the conversation away from ideas and towards feasible alternatives, you are giving more value to the creative freedom of the members of your organization.

Optimization powered Analytics –

Let me pose a theorem to you: Analytics, by itself, is completely worthless.

Let me challenge you by looking at the entire current practice of analytics as nothing more than hubris. That the current use of analytics, especially by those that perpetuate to be experts, is nothing but a newer accepted justification for what you were already going to do or where already thinking. Every new misunderstanding of moneyball, or of advanced statistical models, is a sales pitch designed to make you feel like you are making a much larger impact then you really are. This is not to say that analytics can not be powerful, only that the way that data is abused by the practitioners of the industry to propagate myths and bad practices is worse then worthless, it’s inefficient.

People have gotten so lost in their ability to collect information, the speed we can get feedback, and the need to justify their existence that they never take a moment to question what can you really get from pure analytics. Numbers have become the new shield by which we persuade others of our “greatness”, not to actually provide value, but instead using data to tell stories driven by ego and a want to be the one making the decision. We so want to target a group that we find one that stands out, or we so want to show our value that we tell someone they are doing something wrong, only to replace their “bad” decision with an equally biased one, taking credit for any result that comes from this use resources. In the rare best case scenario with analytics, you are left with probabilities and no clear direction, in the worst and most common cases, we are left with biased “insights” powered by everything but data.

In reality, we are no longer trapped by this use of data, like so many other industries before, because of our ability to interact directly and in an efficient and speedy manner. The data loses all value when we force a path on it or we forget what it can and is really telling us. Because we are a new field, mostly manned by people without real practical data discipline, we allow our own lack of understanding of the nature of data to allow our own biases to paint a picture that does not exist. There are hundreds of agencies, groups, and people who claim to have the newest way to repeat the same types of “analysis” without any newer insight into the value of that data. There are always new ways to corrupt statistics or different analysis techniques that are used to push an agenda, not to actually provide real value. We are so busy trying to run full speed down a path that we miss some really important and fundamental facts. Using only correlative data, we have no way to know the cost to change, the real value that anything by itself provides, or the actual scale of impact of any future change.

If efficiency is a measure of outcome over cost, then we have no way to have any insight into any piece of that equation. All the analysis in the world can not overcome the limitation of a one directional limited data set from a constantly changing and imperfect ecosystem. We find something that sticks out in the data, and then pretend that this is the thing that is more valuable than all the other pieces of data, simply because we can “identify” it, despite the fact that we have no idea of the value of that change nor what some other undiscovered optimization would bring. How does knowing that people from search spend half as much time as people come to your site in any way tell you the cost to change their behavior? Do not confuse your ability to derive value and efficiency with your ability to “discover” something in analytics. That you can even change that behavior? Or the relative scale of impact compared to other feasible alternatives? How is the anomaly any more efficient than the thing that looks like everything else? What do you really know from just identifying something from correlative data?

Why do we accept that limitation and why do we not try to give the context necessary to better answer those questions? Why do we perpetuate the myth of only passive data acquisition as a means to answer so many of the questions that we pretend to be able to answer today? Why do we pretend we can start with this magical data set and somehow arrive at the best answer? We are forced to use conjecture to make assumptions and then pat ourselves on the back when we get a result. We decide on what we are going to do analysis on, find a single answer, and then defend it because it is backed by data. Is that result a good result? If I have a 100 possible positive outcomes, and I get the 2nd worst one, who would tell me that is a good thing? Yet when we do not account for that context of our answer, we are constantly shouting our accomplishments from the hilltop. Do we congratulate the outcome we got or the 98 that we missed? If scored 2% on any test, you would think you failed miserably, but yet we hide this truth from ourselves to make sure that we all feel like we got an A. The truth is that we will never know any of the important contextual information we seek from correlative data alone.

Let me propose that testing, as a creator of causal data in a controlled setting is the only way to actually achieve all those value propositions that you have been promised. That causal data, the seeking and creation of it and the use of it as a transformative agent to power that earlier data collection, to move past so many of the limitations of online data collection, is the only true way to answer these important questions. That by “powering” your analytics, being willing to look past the myths and bad practices, and by breaking down what you really know from your data is the point where myth become real and where you can truly and dramatically impact your businesses bottom line. This is why machine learning is such a big deal, why we move towards optimization algorithms, and why it is so vital that you understand the value propositions of your various types of data. All of those methods leverage casual information as a building block to grow and learn. There is a better way, but it requires you to be humble and disciplined to reach that “nirvana”.

The core problem with analytics is that you are limited to linear correlative data. No matter how pretty a model and how much statistics you apply, you will never know the value of an action, nor will you know the efficiency to change it. We are trapped because the passive nature of the data you are trying to use only looks one way (towards the past) and has no way of accounting for feasible alternatives, or even the null assumption. You are stuck in the land of rates of action; you have 2.8% CTR on your other products model on your product page, but is that good or bad? Even if that is much higher or lower, how do you know that acting on it is any better than acting on the thing that looks just like all the others? If you removed it, where do those clicks go and is that more valuable than what is there now? Does increasing it help or hurt, or more importantly, what happens if it is not there, or what is the cost of changing it as opposed to the cost to change another module? Are people who purchase more likely to sign up for newsletters, or is it the other way around? All of those questions can be answered directly and efficiently through testing, and once we have created a number of interactions, we can start to see patterns from those causal relationships. We have the power with very little effort to start to really see the impact of changes, not just try and extrapolate them blindly.

What if you instead ignore all of that data in its passive form, and instead look for the active interaction of data to inform those decisions? What if instead of starting with correlative data, we ignore it until we have the context to make it valuable? What if we use the causal relations with an eye towards efficiency. What if you viewed data as an active measure, one that gains more value the more you eliminate unnecessary waste in the system, and one that only takes hold once you are disciplined in how you think about, what you measure, and how you actively change it? What if we stop allowing our biases and misconceptions of data dictate the start of our analysis, and instead allow the data to truly tell us what matters? What if you start measuring the value of your correlative data by its interaction with the casual data to allow for a much deeper connection to efficiency. What if you start looking for the value of an action, not the rate of an action? Testing is your active arm, to change all of that correlative data into causal data, if you are willing to go down that path.

This is the opposite of the myth of using analytics to power testing, but instead forcing yourself to accept that correlative data, with all the limitations that are inherent in online analytics, is not enough to make meaningful decisions. This is not about using testing as a means to prove one point right, but as a means to understand and value alternatives against each other. Changing correlative data into causal data presents you with information that is truly actionable and that truly gives you insight into the outcomes, value, and costs that we pretend we already have the answers for. This is the last step of the evolution of looking for the best answer and of stopping biases from leading you astray.

The challenge is that you cannot just take one test, or any single data point and pretend you have meaningful inference. Just as you can not pretend to know the direction of a correlation or the value of something from its rate of action, you can not just pretend to answer everything from a single test result. Diving through all that analytics data from a single test result is a dead end that leads to the same problem that plagues most uses of analytics. You have to be disciplined and can only reach this point after you have run a full series of tests. Think in terms of using this data to increase the efficiency of the system. You get real value only when you apply testing to power your analytics. We can measure the value of the items on the page, their very existence, and the costs to change them. We can quickly get tests live on multiple page types and measure the relative value. We can run a series of a tests on a page, and induce changes that allow us to see what segments are exploitable, or even what the influence is of various parts of a user experience are to those segments. If we are disciplined, we learn, and we never stop, then we can induce answers to achieve a positive result, while also answering those great unknowns that are ignored by analytics alone.

To make this even better, the act of acquiring the data also comes with the benefit of meaningful lift and improvement to your business. There is no zero sum game of only acquiring data or of getting lift, instead using testing to power your analytics allows you to meet the needs of change and growth while giving you all the promised panacea that so many claim analytics is providing by itself. It allows you to truly think in terms of efficiency and to be able to know the value of the different feasible options before you. It requires you to change completely how you think about analytics, to look at as part of a larger ecosystem by which you are informing the data, and then using that data to inform future action. It is not just pretending that the data is informed and then blindly using it to prescribe action. If you instead act to create casual information, use that to filter your correlative data, and do this with discipline, you can actually get those answers that we pretend we have today.

The sad truth is that most people who are in testing come from an analytics background. Just as many old school marketers struggle to stay current in the face of change, so too do many data “experts” who give new names to the same misguided techniques. They view everything through the analytics lens, and as such this makes them want to try and justify their analytics via testing, and to apply the same problematic disciplines to testing in order to bring it in line with current efforts. They so want to justify what they have done that they ignore its fundamental weakness and try to force new disciplines to conform to what they are doing. This leads to an entire marketplace full of people stuck trying to justify their existence, but very few willing to challenge its entire value proposition. I challenge you to avoid that black hole, be willing to challenge your own worldview and your own core beliefs about data, and to instead look at how you can best get and acquire meaningful data and how best to leverage it outside of what you are comfortable with. Very few people try and look at testing as its own discipline, or even better to see how that discipline can impact and change how you view other actions. There is a giant fishbowl of people who are in a race to the bottom justifying and preaching analytics as a feeding system for testing. I challenge you to be better than the current environment.

Let me instead suggest that you will only achieve real value if you flip that system, challenge yourself to think outside of that box, and to power your analytics via your testing. Testing is just one skill of many, but it deserves its own place at the table, not one that is a filter by which you justify other actions.

Conclusion –

The goal of these posts is to introduce new ways of thinking and to challenge your current mindset. I have shown the evolution from the most fundamental skill to paradigms that challenge your entire data worldview. It is only by changing what we do that we grow, and it is only by challenging our own core assumptions about what works that we are able to really make the dramatic impact to the bottom line that we all claim to want to achieve. You can not just accept that everything you hold true today will be the same in the future, nor can you expect to get improve if you refuse to change your own behaviors.

The reality is that there is no such thing as “right”; in the entirety of human history we continue to find better answers to all our questions. What I am proposing is allowing these new ways of thinking to interact with what you are doing and to see if you can then find a newer “righter” answer that brings your program to a whole new level. It is only through changing our fundamental building blocks of what we do that we achieve the scale and impact that we want to achieve. Change who you are, what you think, and let in other ways of thinking and try to be better than the water you are swimming in. Be willing to leave your current lake and find the diverse ocean of disciplines and ideas that are out there, and you will always be growing and getting better at what you do.

To navigate the entire testing series:
Testing 101 / Testing 202 / Testing 303 – Part 1 / Testing 303 – Part 2

February 6, 2012

Testing 303 – Advanced Optimization Paradigms – Part 1

One of the great truths about any organization is that no matter what it is you are doing, each program eventually plateaus and finds a normalization point where it no longer grows at the same rate or with the same push it did before. Whether it is mental fatigue, new objectives, changes in leadership, or more commonly reaching the end point of the current path of thinking, each program can only go so far forward without a re-invigoration of new ways of thinking and by challenging itself to get better. It is only by bringing in new ways of thinking and challenging core beliefs that you are free to grow past those self-imposed limits.

In our introduction, we talked about disciplines that enable you to move faster and align on a common goal. In the second series, we went over disciplines to help you think about tests and testing differently to get more value from your actions. The third and final evolution takes us in new ways to view the world and our organization, and challenges us to go in new directions, to understand new paradigms that should fundamentally challenge some of the most common and fundamental beliefs about data and optimization.

One of the great quotes that I keep close at heart comes from John Maxwell, “If we are growing, we are always going to be out of our comfort zone.” With that in mind, I want to introduce these paradigms for your program and challenge you to take these and evaluate them outside of the fishbowl, as an idea on its own that can help you program get past its current plateau and to help you grow in your thinking about optimization.

Analytics and Optimization as very Different Disciplines –

There are many different ways that you look at and act on data in testing that are the exact opposites of analytics. Where so many programs fail is when they force one way of thinking onto their data, resorting back to what they are most familiar with. In the world of analytics, you have to look for patterns and anomalies, look across large data sets and try to find something that doesn’t belong, and that doesn’t fit with the rest of the data. You are constantly looking for outliers that show a difference and then extrapolating value from those measurable differences. In the world of optimization, you have to limit yourself from looking at anything but what you are trying to achieve, and to act on data that answers fundamental questions. It becomes extremely easy to fall back into more comfortable ways of thinking, because the data sounds and looks similar, but ultimately success is dictated by your ability to only look at the data through a different lens. You have to stop yourself from trying to dive down every possible data set and instead focus on the action from the casual relationship around the single end goal.

It is what you don’t do that defines you as much as what you do. It is about “did removing this module improve revenue performance“, not “did the CTR drop of this change increase CTR to the main image and where did those paths lead“. It is also about not allowing linear thinking to interrupt what you are doing. You have to focus on the value of actions, not the rate of them. You are looking at the value add from a user (RPV), not the amount of short term actions (CTR). Never look at how many people moved from point A to point B, but instead only look at the measurable impact towards your site goals. Just because you increased clicks, or got more people into a funnel, or even got more transactions, it does not mean that you increased revenue. Assuming there is a linear relationship between action and value can be extremely dangerous and myopic, many programs have been ran into the ground because they do not understand the difference between the count of and action and the value of the action. Analytics forces you to think in terms of rates of action, but optimization forces you to think about the value of actions and the cost to change a person’s propensity of action.

Think of your site as a giant system. You have an input of people, with each input type interacting differently with the system. The things you sell, the layout, the experience, all of it makes up a giant equation. When those people enter your site, they go through, and they come out the other end at some rate or some value. The numbers or rates associated with that one path is analytics. That inherent behavior based on the current user experience is their propensity of action. In testing, you have to solely focus on your ability to increase or decrease that propensity of action, not about the absolute value of that action. We care that we increased that behavior by some delta, some lift percentage, not that it was 45% and moved to 49%, but that we increased it by 8.9%.

In testing, the answers you might receive will be of the nature of “we got more of the high value and less of the low value populations” or “the system improved as a whole by 4%.” Ultimately “answers” matter far less then the changes observed and your ability to act quickly and decisively on it. What you won’t receive is why, or what each individual part of the system did to get you there. Those answers are stuck to the realm of correlation and as such have to be ignored because you at best have only a single data point. We are trying to move forward as quickly as possible in the realm of optimization, so getting lost in loops of trying to answer questions that are not answerable only hinders your efforts. No matter how much you analyze an individual test result, you will never have more than a correlation. This means you have to think differently in order to use that data. It doesn’t matter why, or which piece, or even which individual population (though dynamic experiences on outcomes is important) so you have to force yourself to not go down those roads.

It is also about the ability to hold yourself accountable for change. So many analysts fail because they view their job responsibility ends on the moment they make a recommendation. There is a revolution taking place in our industry, lead by people like Brent Dykes, that is changing the entire view of optimization away from the recommendation and data, but to the final output. In optimization, you are only successful if the results you find are acted on and made live. It requires you to view the cycle as one of action and not one of inaction. It is not that both don’t have their place, but you to be really successful you have to be able to step away from your analytics self and instead think differently and force yourself to act differently in order to get the results you need.

Testing Applied to Multiple Teams

Testing is something that has many core disciplines, but takes on a very different look and value for different groups. Your IT team may get a completely different “value” from testing than your merchandising team, as your design teams might from your analytics team. Many groups believe that because they have applied their testing from their landing pages to their product pages, that they have expanded the value of testing throughout the organization. Instead they need to rethink how optimization disciplines can interact with the different groups efforts on a fundamental basis. Testing is not just changing a marketing message, it is the evaluation of possible feasible altneratives, something that all groups need to do to improve their effectiveness. Testing is just as applicable to your SEM team, your merchandising, your product management, your IT team, your personalization team and many others. Each group has different needs and different disciplines, and as such you have to apply the disciplines of testing to them in different ways. A IT team can use testing to decide on which project to apply long term resources to. Your UX team can tie testing to their qualitative research to understand the interconnection of positive feedback to overall site performance. Your SEM team can use testing to measure the downstream impact of their various branding campaigns.

The reality is that applying all the unique benefits of testing to different groups, and not just increasing the space that you do the same things to can fundamentally improve your entire organization. While this might sound like a simple one, the reality is that most groups do the same type of testing or try to apply the same techniques across multiple parts of the site, not for different teams. Each group may be aligning on the same goal, but they do things in a very different way. Applying optimization to those groups looks and acts in very different ways, and such it is difficult for most groups to really apply these disciplines in a way that truly impacts the fundamental practices of more than one group.

Instilling this use of testing as a fundamental building block also allows you to get ahead of a large number of major problems. It forces organizations to test out concepts well before they decide on them as long term initiatives. One of the most common examples of this is in the realm of personalization, where so many groups are sold on the concept, but not willing to go through all the hard work of figuring out exploitable segments or the value and efficiency of various ways of interacting with the same user. Getting ahead of the curve and testing out the efficiency of the effort will save dramatically improve the performance of the effort. If you test out a complex idea in one spot against other feasible simpler ideas, and find the simpler idea is better performing, as it almost always is, you save massive IT resources while getting better results. It is far more likely that simple dynamic layout changes for firefox users are going to be magnitudes more valuable then a complex data feed system from your CRM solutions, and testing is the bridge to know that before you fall down that rabbit hole.

Each group tends to end up at the Nth degree of the same thing they bought the tool for. So often, the fear of the unknown or of challenging someone’s domain stops new groups from allowing testing in, but when you can overcome those barriers, you can have an exponential impact on the organization. When you start trying to apply optimization to multiple types of internal practices, and you are able to bring the results together in a real synergy, that is when you are able to really see optimization spread and to see the barriers drop throughout an entire organization. It also the point where those lessons you learn become three-dimensional and become universal across the entire organization.

Testing has No Start and No End –

Optimization is not a project. It is not something that is just one person’s job and it is most definitely not something you can just choose to end some random Tuesday. So why then do people view it as a series of projects, with a start and a stop? Why do they view it is only part of one person’s role or responsibility, or something that is done when they have the chance. There are functional reasons to have set people assigned to testing, and as programs grow to have a separate specialized team, but that is not the end of the battle. Why do we try to force artificial time constraint on it, with starts and stops and talk about it as something we did, or will do. It is either an action that you live, or it is not. If everything your organization is doing, be it some small tweak, or a redesign, or the release of a new feature is not viewed as part of an ongoing process, with lessons to learn and to be evaluated democratically through the system of optimization, then optimization has been allowed to have this artificial start or stop just to appease various members of your organization.

Optimization has to be something you live. You have to be thinking in terms of it every day, you have to view each task as something that can get better, you have to view each idea as just one of many, and that it is not up to the HiPPo or anyone else to decide on. It is a responsibility to not let projects, or holidays, or new CMOs or anything else stop you from this constant quest to improve, the site, the processes, the people. Do not confuse the actions of running or a test as the entirety of optimization. It is vital that you view the act of creating something new as just the very first step, and not the end point. There should be no point where anything is thought of as “perfect”, or “done” or that you can just throw something live and walk away. Optimization is part of every process, it is part of every job, and it is something that everyone works together to make sure that it is part of every action that the organization takes.

When you have finally started to incorporate testing into your organization, all projects will view it as another natural part of their evolution. Project plans will incorporate not only the concept of optimization as an ongoing basis so that it is part of your expected timeline, but they will also stop trying to get everything “perfect”. If you view your projects as never finished, then there is no need to have everything get perfect signoff, nor do you need to have perfect agreement on each and every piece. What is important is that you spend as much time and resources on testing out all those ideas that you have discussed, instead of just sitting around a room and compromising on a final version. You will no longer be so caught up on your pet project, as the entire concept is that it will and must change.

So much of what happens in organizations is about the politics of owning and taking credit for different initiatives. There are people’s reputations and egos on the line when they propose and lead dramatic changes, especially redesigns, for the site. If you can truly incorporate testing and optimization as a vital part of all processes, one that is not just a “project” but is part of the very existence of the site and the group, then you free people up to no longer being so tied to their “baby”. Treat all ideas as malleable and transient, to the point that everyone is really working together to constantly move the idea forward. It will can be a dramatic shift to organizations once they reach this point, but ultimately it is when groups really start to see dramatic improvements on a continuous basis.

So often we talk about not following through with each of the concepts I have brought forth, but the reality is each action is tantalizingly easy, but the real discipline, the ability to keep pushing 6 months from now is what really differentiates programs and people. Being willing to move past the barriers, put the pieces in place that make a difference, and being willing to change how you and others think are the real keys of a successful program. If you are always trying to do what is easy, or just listen to the pushers of magic beans and myths, then you can never really grow your program to the levels that are possible. Do the hard work, get out of your comfort zone, and you can continue to get better and can continue to see more and more value from your testing program.

To navigate the entire testing series:
Testing 101 / Testing 202 / Testing 303 – Part 1 / Testing 303 – Part 2

Share this:

Share this:

Share this:

Share this: