Category: Methodology

March 5, 2012

MVT – Why Full Factorial vs. Partial Factorial Misses the Entire Point

One of my first introductions to the larger world of testing was getting a chance to serve on a panel about Multivariate testing. I remember how divergent the opinions were and how bad the misconceptions were of the entire process. Just about everyone I talked to had these same common preconceived notion of how to use multivariate testing, and even worse almost all those notions were based on their need to propagate their sales pitches. Now as I work with more and more organizations, you see the same bad ideas replicating and groups continue to not understand the true value from multivariate testing. MVT testing is something that holds all these promises, but when done for the wrong reasons, multiplies the worst of testing, instead of facilitating the best of testing. Even worse, groups then confuse the issue, focusing on the method of the test, and not the fundamental mindset that created it. Many groups then get into debates around the “value” of the different multivariate methods out there, which is nothing more than a fools errand since any method is going to fail.

Too many times people get caught up on the “advantages” or “disadvantages” of the various forms of multivariate analysis. There are many advantages of full factorial testing, from fewer rules, better insight into interactions across tested elements, and the ability to test out non uniform concept arrays. There are many advantages to partial factorial testing, speed, forced conformity to better testing rules, more efficient use of resources. What does not matter is which one allows you to throw things at a wall and get an answer. When you are busy trying to answer the wrong question, then you can fail with any tool. It is only when you are trying to succeed that the differences between tools matter.

The fundamental use of multivariate testing for most groups is to combine multiple badly conceived A/B tests, so that they can quickly throw them all together so they can find a combination that increases results. So many groups want to try out this combination of ideas, so they think a MVT campaign is the solution. Fundamentally you can use the test that way, it is a both statistically a valid outcome and will guarantee a result, but at what cost? The challenge is that you will wasting resources, time, and are guaranteed to get a suboptimal outcome from this flawed way of thinking. Any form of multivariate testing that is just used as a massive collaboration of individual tests is always going to be inefficient, since you are replicating and adding the imperfections of those individual tests in a way that magnifies those imperfections. If your goal is simply that individual outcome, and it is for way too many programs and especially agencies, then you will never get any true value from multivariate testing until you change your mindset.

Fundamentally the concept of trying to just find a combination misses a fundamental truth, that you are spending a massive amount of resources, creating all these permutations and offers, without an understanding of the efficiency of each resources.

1) All the ideas come from preconceptions and hypothesis about what does work

2) The addition of all new variants adds cost in the creation and the data acquisition to be meaningful

If we instead focus on multivariate testing as a means to filter our resources instead of simply combine them, then we are able to achieve efficiency. If we try to limit our resources and only apply them where we will get the most return, then we must always via multivariate testing as a tool to learn and be efficient, not one to just throw things out to see what works.

The classic example of a multivariate test is testing a button. Let us say I have a medium orange purchase button currently on my site. I might think that red might be better than orange, and my UX person thinks that buy now will perform better because he saw it on a few other competitor sites. You throw it out by also adding a slightly larger button and you get a predicted best combination of large orange buy now. You slap yourself on the back, and you move forward. The reality is that each of those factors, size, color, copy have a massive amount of feasible alternatives, and all we did was look at a very limited biased set of them.

Let me propose a better way. Look at that same test, but instead of preconceiving the outcome, look for the value of each factor. If we took the same test, and we found out that size matters more than color, despite what you thought going in. If we spend as little resources as possible to achieve that understanding, then we have left the maximum amount of resources available to apply to the winning factor or element. If we have learned that size matters, we can shift our resources away from less influential elements and then apply the resources towards as many different feasible alternatives of the execution of the winning factor. Instead of being limited to testing 3-4 sizes, we can know the value of size and then create as many different alternatives as possible. Not only have we used less resources, but they have been applied towards the most influential part of our experience.

Even better, I now have learned that size matters most, and I have an outcome that is different and greater then I would have before. In fact I have shifted the system so that the absolute worst thing that can happen is that I end up with the same alternative I would have before, but for less time and resources. I have also added a much higher upside so that I can get a better outcome by having an alternative that I would not have previously included come out the winner. I have also tested out more alternatives of the important factor so that I am not limiting my output by the single input of popular opinion. I have leveraged multivariate testing as a way to learn what matters and to focus my future efforts on that. I no longer have to create alternatives for factors that have no influence, and can instead focus resources on testing as many different feasible alternatives I can for the things that do influence behavior.

The less you spend to reach a conclusion, the greater the ROI. The faster you move, the faster you can get to the next value as well, also increasing the outcome of your program. What is more important is to focus on the use of multivariate as a learning tool ONLY, one that was used to tell us where to apply resources. One that frees us up to test out as many resources for feasible alternatives on the most valuable or influential factor, while eliminating the equivalent waste on factors that do not have the same impact. The goal is to get the outcome, getting overly caught up in doing it in one massive step as opposed to smaller easier steps, is fool’s gold.

You CAN leverage multivariate tests in a large number of ways, and let me tell you that there are enough 15×8 tests out there to show that statistically, it is a statistically valid approach. The question is never what can you do, but what SHOULD you do. Just because I can test a massive amount of permutations does not mean that I am being efficient or getting the return on my efforts that I should. We can’t just ignore the context of the output to make you feel better about your results. You will get a result no matter what you do, the trick is constantly getting better results for fewer resources.

If you are stuck in the realm of trying to show results from a single test, or are not thinking in terms of your testing program as a learning optimization machine, then you aren’t going to get results you need no matter what you do. multivariate tests are useful only in the context of your program, if you are stuck thinking in terms of just the outcome of that specific test, you will never achieve the results that you want.

If you shift to think about it in context of a larger program, then multivariate tests are just one of many tools you have at your disposal to achieve those goals. Don’t let the promises and sales pitches of a few divert your attention away what matters. And if you are focusing on what matters, then the nature of which type of multivariate test you use becomes almost completely moot.

February 27, 2012

Bridging the Gap: Dealing with Variance between Data Systems

One of the problems that never seems to be eliminated from the world of data is education and understanding on the nature of comparing data between systems. When faced with the issue, too many companies find the variance between their different data solutions to be a major sign of a problem with their reporting, but in reality variance between systems is expected. One of the hardest lessons that groups can learn is to focus on the value and the usage of information over the exact measure of the data. This plays itself out now more than ever as more and more groups find themselves with a multitude of tools, all offering reporting and other features about their sites and their users. As more and more users are dealing with the reality of multiple reporting solutions, they are discovering that all the tools report different numbers, be it visits, visitors, conversion rates, or just about anything else. There can be a startling realization that there is no single measure of what you are or what you are doing, and for some groups this can strip them of their faith in their data. This variance problem is nothing new, but if not understood correctly, it can lead to some massive internal confusion and distrust of the data.

I had to learn this lesson the hard way. I worked for a large group of websites who used 6 different systems for basic analytics reporting alone. I led a team to dive into the different systems and understand why they reported different things and to figure out which one was ”right.” After losing months of time and almost losing complete faith in our data, we discovered some really important hard won lessons. We learned that the use of the data is paramount, that there is no one view or right answer, that variance is almost completely predictable once you learn the systems, and that we would have been far better served spending that time on how to use the data instead of why they were different.

I want to help your organization avoid the mistakes that we made. The truth is that no matter how deep you go, you will never find all the reasons for the differences. The largest lesson learned was that your organization can be so caught up in the quest for perfect data that they forget about the actual value of that data. To make sure you don’t get caught in this trap, I want to help establish when and if you do have a problem, the most common reasons for variance between systems, and some suggestions about how to think about and how to use the new data challenge that multiple reporting systems presents.

Do you have a problem?

First, we must set some guidelines around when you have a variance problem and when you do not. When you have systems designed for different purposes, they will leverage that data in very different ways. No systems will match, and in a lot of cases, being too close represents artificial constraints on the data that is actually hindering its usability. At the same time, if you are too far apart, then that is a sign that there might be a reporting issue with one or both of the solutions.

Here are two simple questions to evaluate if you do have a variance “problem”:

1) What is the variance percentage?

Normal variance between similar data systems is almost always between 15-20%.
For non-similar data systems the range is much larger, and is usually between 35-50%.

If the gap is too low or too large, then you may have a problem. A 2% variance is actually a worse sign then a 28% variance on similar data systems.

Many groups run into the issue of trying too hard to constrain variance. The result is that they put artificial constraints on their data, causing the representative nature of the data to be severely hampered. Just because you believe that variance should be lower does not mean that it really should be or that lower is always a good thing.

This analysis should be done on non-targeted groups of the same population (e.g., all users to a unique page.) The variance for defendant tracking (segments) is going to always be higher.

2) Is the variance consistent in a small range?

You may see variance be in a series of 13, 17, 20, 14, 16, 21, 12 over a few days, but you should not see 5, 40, 22, 3, 78, 12.

If you are within the normal range and you are in the normal range of outcomes, then congratulations, you are dealing with perfectly normal behavior and I could not more strongly suggest that you spend your time and energy on how best to use the different data.

Data is only as valuable as how you use it, and while we love the idea of one perfect measure of the online world, we have to remember that each system is designed for a purpose, and that making one universal system comes with the cost of losing specialized function and value.

Always keep in mind these two questions when it comes to your data:

1) Do I feel confident that my data accurately reflects my users’ digital behavior?

2) Do I feel that things are tracked in a consistent and actionable fashion?

If you can’t answer those questions with a yes, then variance is not your issue. Variance is the measure of the differences between systems. If you are not confident in a single system, then there is no point in comparing it. Equally, if you are comfortable with both systems, then the differences between them should mean very little.

The most important thing I can suggest is that you pick a single data system as a system of record for each action you do. Every system is designed for different purposes, and with that purpose in mind, each one has advantages and disadvantages. You can definitely look at each system for similar items, but when it comes time to act or report, you need to be consistent and have all concerned parties aligned on which system is the one that everyone looks at. Choosing how and why you are going to act before you get to that part of the process is the easiest fastest way to insure the reduction of organizational barriers. Getting this agreement is far more important for going forward than the dive into the causes behind normal variance.

Why do systems always have variance?

For those of you who are still not completely sold or who need to at least have some quick answers for senior management, I want to make sure you are prepared.
Here are the most common reasons for variance between systems:

1) The rules of the system – Visit based systems track things very differently than visitor based systems. They are meant for very different purposes. In most cases, a visit based system is used for incremental daily counting, while a visitor based system is designed to measure action over time.

2) Cookies – Each system has different rules about tracking and storing of cookie information over time. This tracking will dramatically impact what is or not tracked. This is even more true for 1st versus 3rd party cookie solutions.

3) Rules of inclusion vs. Rules of exclusion – For the most part, all analytics solutions are rules of exclusion, meaning that you really have to do something (IP filter, data scrubbing, etc.) to not be tracked. A lot of other systems, especially testing, are rules of inclusion, meaning you have to meet very specific criteria to be tracked. This will dramatically impact the populations, and also any tracked metrics from those populations.

4) Definitions – What something means can be very specific to a system. Be it a conversion, a segment, a referrer, or even a site action. The very definition can be different. An example of this would be a paid keyword segment. If I land on the site, and then see a second page, what is the referrer for that page? Is it the visit or the referring page? Is it something I did on an earlier visit?

5) Mechanical Variance – There are mechanical differences in how systems track things. Are you tracking the click of a button with an onclick? Or is landing on the previous page? Or is it he server request? Do you use a log file system or a beacon system? Is that a unique request or added on to the next page tag? Do you rely on cookies or are all actions independent? What are the different timing mechanisms for each system? Do they collide with each other or other site functions?

Every system does things differently, and as such these smaller changes can build up over time, especially when combined with some of the other reasons listed above. There are hundreds of reasons beyond those listed, and the reality is that each situation is unique and each one is the culmination of the impact of these hundred different reasons. There is no way to ever get to the point where you can accurately describe with 100% certainty why you get the variance.

Variance is not a new issue, but it is one that can be the death of programs if not dealt with in a proactive manner. Armed with this information, I would strongly suggest that you hold conversations with your data stakeholders before you run into the questions that inevitably come. Establishing what is normal, how you act, and a few reasons why you are dealing with the issue should help cut all of these problems off at the pass.

February 8, 2012

Testing 303 – Advanced Optimization Paradigms – Part 2

In the first part of our look at advanced paradigms, I focused on the complex interplay of testing and other parts of your organization. As testing grows, it starts to interact on a nearly daily basis with every part of your organization. If you look at the evolution that we have taken, going from the very fundamental building blocks of a testing program, to the ways we look at tests and testing, and finally to the complex interactions of testing into everything, we have shifted the importance and the value that testing brings. The final stage of evolution is to start evaluating your own core beliefs of even what is a testing program, data, and even how we view the world. It is easy to challenge others to grow, but the most difficult and most rewarding changes always start from within. If the evolution starts with getting people to align, it ends with changing our fundamental beliefs about data. We have to ask extremely difficult questions and challenge our own interactions, breaking down our beliefs and rebuilding them to strengthen and evolve.

To that end, here is the final look at advanced optimization paradigms:

No More Focusing on Test Ideas –

If we view optimization as a discipline, one that never starts and never ends, one that is about the constant changing and learning of a user experience, then there is no longer any need for individual test ideas. An idea naturally has a start and an end, as any hypothesis comes from a belief in a specific solution to an existing problem. People get so caught on their idea, be it from their own experience, some piece of data that they just know means they have the solution to all your problems, or just “best practices” that their brains shut down and they stop trying to find the best answer. The problem here is the entire process leads to a myopia that gives us the right to stop, the ability to prove ourselves “right” and the natural affinity towards a set path.

If the focus is no longer on test ideas, however, then the system is what you focus on. If we are instead treating things as a cycle: explore, learn, ideate on all feasible alternatives, execute, learn, repeat, then there really is no individual campaign or test. The path is never about what you think will win, or what you want to tackle, but instead only on where the casual data leads you and the evaluation of all feasible alternatives. Test ideas become the least important thing you can discuss, and should be viewed with high levels of skepticism. There is no such thing as a good test idea, only a concept that can be broken apart, challenged, and improved. Fear any expert trying to tell you a single great test idea, or any guaranteed set of steps to improve your site, as they are only playing to your own insecurities; the reality is any single idea can not hold up to scrutiny. This impacts your own beliefs as much as any other, you have to hold yourself to the same level of scrutiny, not allowing what you want to happen to be the path that you go down or the answers you seek. You have to be willing to step outside your own opinion and be able to focus on all feasible alternatives and only what the efficiency of changes are; any bias that you allow to limit what you test, be it because of experience or popular opinion, devalues the outcome that you can generate.

The biggest challenge most people have is the feeling of a losing of control. We often like to blame other groups for this behavior, but by far the most guilty group are analysts who are so busy trying to prove a point with their data that they fail to see the larger picture. They so want to prove a path using their analytics that they fail to factor in the need to change to an active form of data acquisition in order to move forward. You have to worry about your own biases before you can stop others. It is easy for all groups to get focused on what their experience or gut tells them is right, often to poor and inefficient outcomes. Make it clear that no idea stands alone. Put in place measures to insure that you are not limited to popular opinion or only what you think or want to win. This often means that you have to prioritize resources in ways that you are not doing today, but ultimately this is the only way to insure you are getting the greatest value and insuring your own continued education about what the value of actions are.

Free yourself from the cycle of defending and pushing every idea, instead creating momentum and a consistent pattern of action. Everyone is afraid of moving towards the infamous 48 shades of blue extent of this path, but the reality is it frees you. You no longer need consensus and you can push the boundaries of what you try. Once you have gotten to blue as the most important element, you may, depending on your feeling for the N-Armed bandit problem, want to test out to 48 different variations, but that is not an affront to you. People built the system, fed the system, and control the system. Once you get to the point that you know what you need to know, why not let the system provide the answer for you? The system is only as valuable as the people who feed it, yet we fear the system and we fear becoming lost to the system.

Moving down this path, of avoiding individual ideas or about trying to find the perfect solution allows you to re-imagine and recreate who and what you are on the fly, without massive redesign efforts. It allows you to avoid holding anything sacred on the site or about ever worrying about the entire concept of “right”. The user experience becomes a fluid thing, where the true value of data, your creativity, and the ability to move past your own biases, determines the magnitudes of growth that you will experience. The true value of the individual is in how well they feed the system. The system is only as valuable as what goes into it and by democratizating all ideas; by forcing the conversation away from ideas and towards feasible alternatives, you are giving more value to the creative freedom of the members of your organization.

Optimization powered Analytics –

Let me pose a theorem to you: Analytics, by itself, is completely worthless.

Let me challenge you by looking at the entire current practice of analytics as nothing more than hubris. That the current use of analytics, especially by those that perpetuate to be experts, is nothing but a newer accepted justification for what you were already going to do or where already thinking. Every new misunderstanding of moneyball, or of advanced statistical models, is a sales pitch designed to make you feel like you are making a much larger impact then you really are. This is not to say that analytics can not be powerful, only that the way that data is abused by the practitioners of the industry to propagate myths and bad practices is worse then worthless, it’s inefficient.

People have gotten so lost in their ability to collect information, the speed we can get feedback, and the need to justify their existence that they never take a moment to question what can you really get from pure analytics. Numbers have become the new shield by which we persuade others of our “greatness”, not to actually provide value, but instead using data to tell stories driven by ego and a want to be the one making the decision. We so want to target a group that we find one that stands out, or we so want to show our value that we tell someone they are doing something wrong, only to replace their “bad” decision with an equally biased one, taking credit for any result that comes from this use resources. In the rare best case scenario with analytics, you are left with probabilities and no clear direction, in the worst and most common cases, we are left with biased “insights” powered by everything but data.

In reality, we are no longer trapped by this use of data, like so many other industries before, because of our ability to interact directly and in an efficient and speedy manner. The data loses all value when we force a path on it or we forget what it can and is really telling us. Because we are a new field, mostly manned by people without real practical data discipline, we allow our own lack of understanding of the nature of data to allow our own biases to paint a picture that does not exist. There are hundreds of agencies, groups, and people who claim to have the newest way to repeat the same types of “analysis” without any newer insight into the value of that data. There are always new ways to corrupt statistics or different analysis techniques that are used to push an agenda, not to actually provide real value. We are so busy trying to run full speed down a path that we miss some really important and fundamental facts. Using only correlative data, we have no way to know the cost to change, the real value that anything by itself provides, or the actual scale of impact of any future change.

If efficiency is a measure of outcome over cost, then we have no way to have any insight into any piece of that equation. All the analysis in the world can not overcome the limitation of a one directional limited data set from a constantly changing and imperfect ecosystem. We find something that sticks out in the data, and then pretend that this is the thing that is more valuable than all the other pieces of data, simply because we can “identify” it, despite the fact that we have no idea of the value of that change nor what some other undiscovered optimization would bring. How does knowing that people from search spend half as much time as people come to your site in any way tell you the cost to change their behavior? Do not confuse your ability to derive value and efficiency with your ability to “discover” something in analytics. That you can even change that behavior? Or the relative scale of impact compared to other feasible alternatives? How is the anomaly any more efficient than the thing that looks like everything else? What do you really know from just identifying something from correlative data?

Why do we accept that limitation and why do we not try to give the context necessary to better answer those questions? Why do we perpetuate the myth of only passive data acquisition as a means to answer so many of the questions that we pretend to be able to answer today? Why do we pretend we can start with this magical data set and somehow arrive at the best answer? We are forced to use conjecture to make assumptions and then pat ourselves on the back when we get a result. We decide on what we are going to do analysis on, find a single answer, and then defend it because it is backed by data. Is that result a good result? If I have a 100 possible positive outcomes, and I get the 2nd worst one, who would tell me that is a good thing? Yet when we do not account for that context of our answer, we are constantly shouting our accomplishments from the hilltop. Do we congratulate the outcome we got or the 98 that we missed? If scored 2% on any test, you would think you failed miserably, but yet we hide this truth from ourselves to make sure that we all feel like we got an A. The truth is that we will never know any of the important contextual information we seek from correlative data alone.

Let me propose that testing, as a creator of causal data in a controlled setting is the only way to actually achieve all those value propositions that you have been promised. That causal data, the seeking and creation of it and the use of it as a transformative agent to power that earlier data collection, to move past so many of the limitations of online data collection, is the only true way to answer these important questions. That by “powering” your analytics, being willing to look past the myths and bad practices, and by breaking down what you really know from your data is the point where myth become real and where you can truly and dramatically impact your businesses bottom line. This is why machine learning is such a big deal, why we move towards optimization algorithms, and why it is so vital that you understand the value propositions of your various types of data. All of those methods leverage casual information as a building block to grow and learn. There is a better way, but it requires you to be humble and disciplined to reach that “nirvana”.

The core problem with analytics is that you are limited to linear correlative data. No matter how pretty a model and how much statistics you apply, you will never know the value of an action, nor will you know the efficiency to change it. We are trapped because the passive nature of the data you are trying to use only looks one way (towards the past) and has no way of accounting for feasible alternatives, or even the null assumption. You are stuck in the land of rates of action; you have 2.8% CTR on your other products model on your product page, but is that good or bad? Even if that is much higher or lower, how do you know that acting on it is any better than acting on the thing that looks just like all the others? If you removed it, where do those clicks go and is that more valuable than what is there now? Does increasing it help or hurt, or more importantly, what happens if it is not there, or what is the cost of changing it as opposed to the cost to change another module? Are people who purchase more likely to sign up for newsletters, or is it the other way around? All of those questions can be answered directly and efficiently through testing, and once we have created a number of interactions, we can start to see patterns from those causal relationships. We have the power with very little effort to start to really see the impact of changes, not just try and extrapolate them blindly.

What if you instead ignore all of that data in its passive form, and instead look for the active interaction of data to inform those decisions? What if instead of starting with correlative data, we ignore it until we have the context to make it valuable? What if we use the causal relations with an eye towards efficiency. What if you viewed data as an active measure, one that gains more value the more you eliminate unnecessary waste in the system, and one that only takes hold once you are disciplined in how you think about, what you measure, and how you actively change it? What if we stop allowing our biases and misconceptions of data dictate the start of our analysis, and instead allow the data to truly tell us what matters? What if you start measuring the value of your correlative data by its interaction with the casual data to allow for a much deeper connection to efficiency. What if you start looking for the value of an action, not the rate of an action? Testing is your active arm, to change all of that correlative data into causal data, if you are willing to go down that path.

This is the opposite of the myth of using analytics to power testing, but instead forcing yourself to accept that correlative data, with all the limitations that are inherent in online analytics, is not enough to make meaningful decisions. This is not about using testing as a means to prove one point right, but as a means to understand and value alternatives against each other. Changing correlative data into causal data presents you with information that is truly actionable and that truly gives you insight into the outcomes, value, and costs that we pretend we already have the answers for. This is the last step of the evolution of looking for the best answer and of stopping biases from leading you astray.

The challenge is that you cannot just take one test, or any single data point and pretend you have meaningful inference. Just as you can not pretend to know the direction of a correlation or the value of something from its rate of action, you can not just pretend to answer everything from a single test result. Diving through all that analytics data from a single test result is a dead end that leads to the same problem that plagues most uses of analytics. You have to be disciplined and can only reach this point after you have run a full series of tests. Think in terms of using this data to increase the efficiency of the system. You get real value only when you apply testing to power your analytics. We can measure the value of the items on the page, their very existence, and the costs to change them. We can quickly get tests live on multiple page types and measure the relative value. We can run a series of a tests on a page, and induce changes that allow us to see what segments are exploitable, or even what the influence is of various parts of a user experience are to those segments. If we are disciplined, we learn, and we never stop, then we can induce answers to achieve a positive result, while also answering those great unknowns that are ignored by analytics alone.

To make this even better, the act of acquiring the data also comes with the benefit of meaningful lift and improvement to your business. There is no zero sum game of only acquiring data or of getting lift, instead using testing to power your analytics allows you to meet the needs of change and growth while giving you all the promised panacea that so many claim analytics is providing by itself. It allows you to truly think in terms of efficiency and to be able to know the value of the different feasible options before you. It requires you to change completely how you think about analytics, to look at as part of a larger ecosystem by which you are informing the data, and then using that data to inform future action. It is not just pretending that the data is informed and then blindly using it to prescribe action. If you instead act to create casual information, use that to filter your correlative data, and do this with discipline, you can actually get those answers that we pretend we have today.

The sad truth is that most people who are in testing come from an analytics background. Just as many old school marketers struggle to stay current in the face of change, so too do many data “experts” who give new names to the same misguided techniques. They view everything through the analytics lens, and as such this makes them want to try and justify their analytics via testing, and to apply the same problematic disciplines to testing in order to bring it in line with current efforts. They so want to justify what they have done that they ignore its fundamental weakness and try to force new disciplines to conform to what they are doing. This leads to an entire marketplace full of people stuck trying to justify their existence, but very few willing to challenge its entire value proposition. I challenge you to avoid that black hole, be willing to challenge your own worldview and your own core beliefs about data, and to instead look at how you can best get and acquire meaningful data and how best to leverage it outside of what you are comfortable with. Very few people try and look at testing as its own discipline, or even better to see how that discipline can impact and change how you view other actions. There is a giant fishbowl of people who are in a race to the bottom justifying and preaching analytics as a feeding system for testing. I challenge you to be better than the current environment.

Let me instead suggest that you will only achieve real value if you flip that system, challenge yourself to think outside of that box, and to power your analytics via your testing. Testing is just one skill of many, but it deserves its own place at the table, not one that is a filter by which you justify other actions.

Conclusion –

The goal of these posts is to introduce new ways of thinking and to challenge your current mindset. I have shown the evolution from the most fundamental skill to paradigms that challenge your entire data worldview. It is only by changing what we do that we grow, and it is only by challenging our own core assumptions about what works that we are able to really make the dramatic impact to the bottom line that we all claim to want to achieve. You can not just accept that everything you hold true today will be the same in the future, nor can you expect to get improve if you refuse to change your own behaviors.

The reality is that there is no such thing as “right”; in the entirety of human history we continue to find better answers to all our questions. What I am proposing is allowing these new ways of thinking to interact with what you are doing and to see if you can then find a newer “righter” answer that brings your program to a whole new level. It is only through changing our fundamental building blocks of what we do that we achieve the scale and impact that we want to achieve. Change who you are, what you think, and let in other ways of thinking and try to be better than the water you are swimming in. Be willing to leave your current lake and find the diverse ocean of disciplines and ideas that are out there, and you will always be growing and getting better at what you do.

To navigate the entire testing series:
Testing 101 / Testing 202 / Testing 303 – Part 1 / Testing 303 – Part 2

February 6, 2012

Testing 303 – Advanced Optimization Paradigms – Part 1

One of the great truths about any organization is that no matter what it is you are doing, each program eventually plateaus and finds a normalization point where it no longer grows at the same rate or with the same push it did before. Whether it is mental fatigue, new objectives, changes in leadership, or more commonly reaching the end point of the current path of thinking, each program can only go so far forward without a re-invigoration of new ways of thinking and by challenging itself to get better. It is only by bringing in new ways of thinking and challenging core beliefs that you are free to grow past those self-imposed limits.

In our introduction, we talked about disciplines that enable you to move faster and align on a common goal. In the second series, we went over disciplines to help you think about tests and testing differently to get more value from your actions. The third and final evolution takes us in new ways to view the world and our organization, and challenges us to go in new directions, to understand new paradigms that should fundamentally challenge some of the most common and fundamental beliefs about data and optimization.

One of the great quotes that I keep close at heart comes from John Maxwell, “If we are growing, we are always going to be out of our comfort zone.” With that in mind, I want to introduce these paradigms for your program and challenge you to take these and evaluate them outside of the fishbowl, as an idea on its own that can help you program get past its current plateau and to help you grow in your thinking about optimization.

Analytics and Optimization as very Different Disciplines –

There are many different ways that you look at and act on data in testing that are the exact opposites of analytics. Where so many programs fail is when they force one way of thinking onto their data, resorting back to what they are most familiar with. In the world of analytics, you have to look for patterns and anomalies, look across large data sets and try to find something that doesn’t belong, and that doesn’t fit with the rest of the data. You are constantly looking for outliers that show a difference and then extrapolating value from those measurable differences. In the world of optimization, you have to limit yourself from looking at anything but what you are trying to achieve, and to act on data that answers fundamental questions. It becomes extremely easy to fall back into more comfortable ways of thinking, because the data sounds and looks similar, but ultimately success is dictated by your ability to only look at the data through a different lens. You have to stop yourself from trying to dive down every possible data set and instead focus on the action from the casual relationship around the single end goal.

It is what you don’t do that defines you as much as what you do. It is about “did removing this module improve revenue performance“, not “did the CTR drop of this change increase CTR to the main image and where did those paths lead“. It is also about not allowing linear thinking to interrupt what you are doing. You have to focus on the value of actions, not the rate of them. You are looking at the value add from a user (RPV), not the amount of short term actions (CTR). Never look at how many people moved from point A to point B, but instead only look at the measurable impact towards your site goals. Just because you increased clicks, or got more people into a funnel, or even got more transactions, it does not mean that you increased revenue. Assuming there is a linear relationship between action and value can be extremely dangerous and myopic, many programs have been ran into the ground because they do not understand the difference between the count of and action and the value of the action. Analytics forces you to think in terms of rates of action, but optimization forces you to think about the value of actions and the cost to change a person’s propensity of action.

Think of your site as a giant system. You have an input of people, with each input type interacting differently with the system. The things you sell, the layout, the experience, all of it makes up a giant equation. When those people enter your site, they go through, and they come out the other end at some rate or some value. The numbers or rates associated with that one path is analytics. That inherent behavior based on the current user experience is their propensity of action. In testing, you have to solely focus on your ability to increase or decrease that propensity of action, not about the absolute value of that action. We care that we increased that behavior by some delta, some lift percentage, not that it was 45% and moved to 49%, but that we increased it by 8.9%.

In testing, the answers you might receive will be of the nature of “we got more of the high value and less of the low value populations” or “the system improved as a whole by 4%.” Ultimately “answers” matter far less then the changes observed and your ability to act quickly and decisively on it. What you won’t receive is why, or what each individual part of the system did to get you there. Those answers are stuck to the realm of correlation and as such have to be ignored because you at best have only a single data point. We are trying to move forward as quickly as possible in the realm of optimization, so getting lost in loops of trying to answer questions that are not answerable only hinders your efforts. No matter how much you analyze an individual test result, you will never have more than a correlation. This means you have to think differently in order to use that data. It doesn’t matter why, or which piece, or even which individual population (though dynamic experiences on outcomes is important) so you have to force yourself to not go down those roads.

It is also about the ability to hold yourself accountable for change. So many analysts fail because they view their job responsibility ends on the moment they make a recommendation. There is a revolution taking place in our industry, lead by people like Brent Dykes, that is changing the entire view of optimization away from the recommendation and data, but to the final output. In optimization, you are only successful if the results you find are acted on and made live. It requires you to view the cycle as one of action and not one of inaction. It is not that both don’t have their place, but you to be really successful you have to be able to step away from your analytics self and instead think differently and force yourself to act differently in order to get the results you need.

Testing Applied to Multiple Teams

Testing is something that has many core disciplines, but takes on a very different look and value for different groups. Your IT team may get a completely different “value” from testing than your merchandising team, as your design teams might from your analytics team. Many groups believe that because they have applied their testing from their landing pages to their product pages, that they have expanded the value of testing throughout the organization. Instead they need to rethink how optimization disciplines can interact with the different groups efforts on a fundamental basis. Testing is not just changing a marketing message, it is the evaluation of possible feasible altneratives, something that all groups need to do to improve their effectiveness. Testing is just as applicable to your SEM team, your merchandising, your product management, your IT team, your personalization team and many others. Each group has different needs and different disciplines, and as such you have to apply the disciplines of testing to them in different ways. A IT team can use testing to decide on which project to apply long term resources to. Your UX team can tie testing to their qualitative research to understand the interconnection of positive feedback to overall site performance. Your SEM team can use testing to measure the downstream impact of their various branding campaigns.

The reality is that applying all the unique benefits of testing to different groups, and not just increasing the space that you do the same things to can fundamentally improve your entire organization. While this might sound like a simple one, the reality is that most groups do the same type of testing or try to apply the same techniques across multiple parts of the site, not for different teams. Each group may be aligning on the same goal, but they do things in a very different way. Applying optimization to those groups looks and acts in very different ways, and such it is difficult for most groups to really apply these disciplines in a way that truly impacts the fundamental practices of more than one group.

Instilling this use of testing as a fundamental building block also allows you to get ahead of a large number of major problems. It forces organizations to test out concepts well before they decide on them as long term initiatives. One of the most common examples of this is in the realm of personalization, where so many groups are sold on the concept, but not willing to go through all the hard work of figuring out exploitable segments or the value and efficiency of various ways of interacting with the same user. Getting ahead of the curve and testing out the efficiency of the effort will save dramatically improve the performance of the effort. If you test out a complex idea in one spot against other feasible simpler ideas, and find the simpler idea is better performing, as it almost always is, you save massive IT resources while getting better results. It is far more likely that simple dynamic layout changes for firefox users are going to be magnitudes more valuable then a complex data feed system from your CRM solutions, and testing is the bridge to know that before you fall down that rabbit hole.

Each group tends to end up at the Nth degree of the same thing they bought the tool for. So often, the fear of the unknown or of challenging someone’s domain stops new groups from allowing testing in, but when you can overcome those barriers, you can have an exponential impact on the organization. When you start trying to apply optimization to multiple types of internal practices, and you are able to bring the results together in a real synergy, that is when you are able to really see optimization spread and to see the barriers drop throughout an entire organization. It also the point where those lessons you learn become three-dimensional and become universal across the entire organization.

Testing has No Start and No End –

Optimization is not a project. It is not something that is just one person’s job and it is most definitely not something you can just choose to end some random Tuesday. So why then do people view it as a series of projects, with a start and a stop? Why do they view it is only part of one person’s role or responsibility, or something that is done when they have the chance. There are functional reasons to have set people assigned to testing, and as programs grow to have a separate specialized team, but that is not the end of the battle. Why do we try to force artificial time constraint on it, with starts and stops and talk about it as something we did, or will do. It is either an action that you live, or it is not. If everything your organization is doing, be it some small tweak, or a redesign, or the release of a new feature is not viewed as part of an ongoing process, with lessons to learn and to be evaluated democratically through the system of optimization, then optimization has been allowed to have this artificial start or stop just to appease various members of your organization.

Optimization has to be something you live. You have to be thinking in terms of it every day, you have to view each task as something that can get better, you have to view each idea as just one of many, and that it is not up to the HiPPo or anyone else to decide on. It is a responsibility to not let projects, or holidays, or new CMOs or anything else stop you from this constant quest to improve, the site, the processes, the people. Do not confuse the actions of running or a test as the entirety of optimization. It is vital that you view the act of creating something new as just the very first step, and not the end point. There should be no point where anything is thought of as “perfect”, or “done” or that you can just throw something live and walk away. Optimization is part of every process, it is part of every job, and it is something that everyone works together to make sure that it is part of every action that the organization takes.

When you have finally started to incorporate testing into your organization, all projects will view it as another natural part of their evolution. Project plans will incorporate not only the concept of optimization as an ongoing basis so that it is part of your expected timeline, but they will also stop trying to get everything “perfect”. If you view your projects as never finished, then there is no need to have everything get perfect signoff, nor do you need to have perfect agreement on each and every piece. What is important is that you spend as much time and resources on testing out all those ideas that you have discussed, instead of just sitting around a room and compromising on a final version. You will no longer be so caught up on your pet project, as the entire concept is that it will and must change.

So much of what happens in organizations is about the politics of owning and taking credit for different initiatives. There are people’s reputations and egos on the line when they propose and lead dramatic changes, especially redesigns, for the site. If you can truly incorporate testing and optimization as a vital part of all processes, one that is not just a “project” but is part of the very existence of the site and the group, then you free people up to no longer being so tied to their “baby”. Treat all ideas as malleable and transient, to the point that everyone is really working together to constantly move the idea forward. It will can be a dramatic shift to organizations once they reach this point, but ultimately it is when groups really start to see dramatic improvements on a continuous basis.

So often we talk about not following through with each of the concepts I have brought forth, but the reality is each action is tantalizingly easy, but the real discipline, the ability to keep pushing 6 months from now is what really differentiates programs and people. Being willing to move past the barriers, put the pieces in place that make a difference, and being willing to change how you and others think are the real keys of a successful program. If you are always trying to do what is easy, or just listen to the pushers of magic beans and myths, then you can never really grow your program to the levels that are possible. Do the hard work, get out of your comfort zone, and you can continue to get better and can continue to see more and more value from your testing program.

To navigate the entire testing series:
Testing 101 / Testing 202 / Testing 303 – Part 1 / Testing 303 – Part 2

Share this:

Share this:

Share this:

Share this: