Google Experiments, Variance, and Why Confidence can really suck

There are many unique parts to optimizing on a lower traffic site, but by far the most annoying is an expected high level of variance. As part of my new foray into the world of lead generation I am conducting a variance study on one of our most popular landing pages.

For those that are not clear what a variance study is, it is when you do multiple variations of the same control and you measure all of the interactions against each other. In this case I have 5 versions of control which gives you a total of 20 data points (all 5 compared to the other 4). The point of these studies is to evaluate what the normal expected variance range is as well as the minimum and maximum outcomes from the range. It is also designed to measure this over time so that you can see when and where it normalizes down to as each site and page will have a normalization curve and a normal level of variance. For a large retail site with thousands of conversions a day you can expect around 2% variance after 7-10 days. For a lead generation site with a limited product catalog and much lower numbers, you can expect higher. You will always have more variance in a visit based metric system then a visitor based metric system as you are adding the complexity of multiple interactions being treated distinctly instead of in aggregate.

There are many important outcomes to these studies. It helps you design your rules of action including needed differentiation and needed amounts of data. It helps you understand what the best measure of confidence is for your site and how actionable it is. It also helps you understand normalization curves, especially in visitor based metric systems as you can start to understand if your performance is going to normalize in 3 days or 7. Assume you will need a minimum of 6-7 days past that period for the average test to end.

The most annoying thing is understanding all the complexities of confidence and how variance can really mess it up. There are many different ways to measure confidence, from frequentest to Bayesian and P-Score to Chi Square. The most common ways are Z-test or T-Test calculations. While there are many different calculations they all generally are supposed to tell you very similar things. The most important of which is what is the likelihood that the change you are making is causing the lift you see. Higher confidence means that you are more likely to get the desired result. This means that in a perfect world a variance study should have 0% confidence and you are hoping for very low marks. The real world is rarely so kind though and knowing just how far off from that ideal is extremely important to knowing how and when to act on data.

This is what I get from my 5 experience variance study:


To clarify, this is using a normal Z-Test P-Score approach and there are over the bare minimum conversions that most people recommend (100 per experience). This is being done through Google Experiments. The highest variance I have ever dealt with on a consistent basis is 5% and anything over 3% is pretty rare. Getting an average variance of 11.83% after 5 days is just insane:


This is just not acceptable. I should not be able to get 97% confidence from forced noise. It makes any normal form of confidence almost completely meaningless. To make it worse, if I did not do this type of study or if I did not understand variance and confidence then I can easily make a false positive claim from a change. These types of errors (both type 1 and type 2) are especially dangerous because it allows people to claim an impact when there is not one and allow people to justify their opinions through purely random noise.

If you do not know your variance or do have never done a variance study, I strongly recommend that you do so. They are vital to really making functional changes to your site and will allow you to avoid wasting so much resources and times on false leads.


The New Long Road Ahead

So much has changed in my world recently and I wanted to give everyone a heads-up. After 5+ years trying to fix some of the largest and most complicated organizational optimization issues I have stepped away from Adobe and have decided to go in a somewhat new direction. I have taken a position as Director of Optimization for a small company in the Carlsbad, CA area called Questpoint where I will be overseeing optimization of a number of lead gen situations.

What this means is that I now deal with much smaller but much more meaningful measures of success. It also means that I can now talk much more directly about the challenges I face and the solutions as they present themselves to me. I will continue to investigate the theoretical challenges of optimization but will also be more directly talking about the realities of testing on a budget. I will be using a number of tools including Google Analytics and Google Experiments and will be breaking down the advantages and disadvantages of them in comparison to the enterprise level tools that I was familiar with.

Here is to the new path before me and here is to the many barriers and hills one must climb to bring that boulder to the top of the mountain.

Taking a Wrong Turn – How your roadmap may be Leading you off a Cliff

You are driving down a road when your GPS tells you to turn left. You make a sudden motion, finding yourself down a small side road. It doesn’t look like where you are trying to go, but you have to follow your GPS; otherwise, you will get lost. You continue, then your GPS tells you to go right. There isn’t a road there, and because you are stuck doing only what the GPS tells you, you turn and suddenly find yourself running off a cliff, flying to your demise in a canyon below. Sound like a bad TV skit? The reality is that this is how most people leverage their “roadmaps” in terms of how they run their optimization programs.

While hypothesis is still the most misunderstood term in all of optimization, the most abused may be roadmap. So many different groups claim they have a roadmap or to be following a roadmap or that it is on their “roadmap” and yet so few understand how one is meant to be used. A roadmap (little r) is a list of tests, most of which serve as a great excuse to waste time and effort and to get locked into a system of projects. A Roadmap (capital R) is a constantly shifting list of priorities by which you will create actions and act to discover where to go next. This distinction is fundamental if you have any hope of really achieving great results with your program, and yet so many happily focus on the first for the sake of internal processes or the inability to change how their optimization program operates in producing revenue.

Let’s start with what the goal of optimization is. It is not to run a test.

Tests are a means to an end.

The goal of an optimization program is to produce the maximum amount of revenue for whatever resources you spend on it. The same is true of every effort you do, be it personalization, SEO, content creation or a promotion. You are not just doing it because it is fun, you are doing those things to increase the revenue to your organization. This means that those are just tactics and not the end onto itself. This is fundamental to understanding the difference between a roadmap and a Roadmap.

Anytime we confuse the action for the end goal, we lose almost all possible value because we have lost the ability to go in any other direction. When we get stuck on a review process and a large series of tests you are making the decision to focus on the action and not the value it generates. You become a means to empty action, not a means to the end of generating revenue. You are saying, at that point, that you couldn’t care less if you make money, so long as these few specific tests get run.

If you instead focus on the end goal, then the first and most important piece is to discover how best to do that. You may have some test ideas and some things you are going to execute on, but they are fungible. You must and will constantly shift them as you learn more and as you go in new directions. You cannot be stuck on the path if the end goal is the most important, you must instead focus on the discipline and flexibility to go anywhere the data tells you.

This is why a Roadmap is just a series of places to focus. It might be on personalizing an experience, or improving a product page, or on improving your recommendation system, but that is what you are trying to do. You are hoping that doing that will result in more revenue, but you are not tied to specific tactics, just finding the best way to accomplish the end goal. Often times you will have no more then 1 or at most 2 tests for each area when you start, but you plan out the time to shift and the time to continue down any path that presents itself to you. From there you can work out actions which will produce answers, things like inclusion/exclusion testing, or MVT, or content serving so that you can measure the value of different alternatives. At that point, you then focus on whatever the answers you have are and continue to drive forward based on those results.

The amazing or frustrating part of this, depending on which approach you are used to, is that you never know where you will end up. You might end up with a dynamic layout for your product page, or targeting content based on time of day, or on removing your recommendations system from a page. The farther you end up from where you imagined the more revenue you make. Each step that takes you in a new direction can only do so by proving using rational measurements that it outperforms where you thought you were going to go. You can end up just about anywhere and that is what makes it so powerful.

The most common refrain you get when tackling problems this way is that it is hard to plan resources, but that argument just does not hold water. You know you are going to test and you know you are going to need resources. This just means you plan time. What you aren’t planning on is that time being spend on coding this one specific module 6 months from now. The action of that time is constantly shifting and updating, it isn’t set in stone. you can plan resources extremely easily. What you can’t do however is focus those resources only on one persons opinion or on a singular person’s agenda. It is not that you spend more resources or can’t plan, you just spend them differently and away from empty talks about a test and about building a successful and meaningful program.

The real challenge becomes not resource planning but accountability. So many programs hold onto their list of tests because it justifies their actions. It becomes about checking off that a test was done and not about the efficiency or the value of that test. At the end of the day the people in your program get to choose between their own accountability between just running tests or with actually providing value. If you are focusing on an empty series of tests, then you will always just be doing action. If you can instead view your Roadmap as a constantly shifting series of actions that focus only on the value they derive, then you will never worry about any specific test or about trying to validate test ideas.

In reality the biggest challenge to tackling problems like this is the ego of the people in your program and the executives who might be involved. People protect themselves at all cases because accountability is the scariest thing in the world for most people. The old systems have everything going through them and with their blessing is everything done. When you are going wherever the data takes you then you are faced with going in a direction that might not be where that executive thought of 3 weeks ago. When you just focus on your part of the a lager process or when you accept their divined vision as the only means to an end then you have essentially said that you have no value at all to the organization and are just a fungible means to an empty end.

This is why an education program and why a focus on discover is so vital for the value derived from your testing program. Management might view this as a loss of power but the reality is that it is so much more. They aren’t constrained by some random thought they had, no matter how great it was, and can instead encourage others to expand on their creativity. It is no longer about having the right answer but about measuring and going with the best ideas your entire team can come up with. You can tell just how far you are from this point with the number of empty I believe/I think/I feel conversations you hear in meetings. The less you hear of those the closer you are to achieving real value. It isn’t about a review process but instead about the creation process and the management of the system to ensure rational decision making.

So many organizations are led to drive into that canyon or into a random lake. Even worse there are always people at those organizations who will describe that water they are drowning in as the expected destination. If you really want to go to new places and really want to end up where you should then you are going to need to give up your belief in that roadmap that you hold so dearly to. Find your own Roadmap, let it shift and go where it needs to, and you will be amazed as just how far you can go and how many new sights you will see.

One Problem but Many Voices – The One Thing People Need to Understand about Optimization

The hardest challenge when working with different groups in the optimization space is often trying to get past their misconceptions and to help them view optimization in a different form. It doesn’t matter if they have been doing testing for 1 day or 10 years, there is still a massive difference in efficiency and the value that can be generated. Results are not random yet so many believe they are because they misunderstand optimization on the most fundamental levels. The reality of real successful optimization is often far from the perceived reality from those just entering the space. The number of misconceptions is so large that it can often be nearly impossible to prioritize them or to tackle them all.

Because this problem is so common, I reached out to the smartest people I know in the industry and asked them to share their thoughts about what the one thing they wished people understood about optimization.

Rhett Norton – Consultant

One thing that I wish people understood about successful optimization is that testing is about discipline. To truly be successful you need discipline in how to think about testing, how to take action, how to organize internally, how to learn iteratively, how to communicate results, how to learn what influences segments, how to build a program, and how to create a culture. It isn’t about launching tests or how many tests you run. It isn’t about creating really big tests. It isn’t about personalization. It isn’t about moving your political agenda forward.

Without discipline companies go through the motions of testing without ever really achieving amazing long term results. The most successful companies I’ve worked with have been successful with creating discipline in parts of their testing program. I’ve never seen a company that is disciplined in every aspect of optimization, but hey, maybe your company could be the first.

Drew Phillips – Consultant

I wish that more people understood that optimization is a disciplined, yet free form process. It is disciplined in that you can’t be successful by simply throwing the spaghetti at the wall. Testing random ideas will get you nowhere fast.

It is free form in that you need to have the flexibility to optimize elements that you find to be influential, not lock yourself into a specific roadmap. Optimization is a process that changes as you learn from each campaign. You will get the most out of your optimization efforts by iterating off of things you learn from previous tests.

Brandon Anderson – Consultant

The one thing I wish optimization practitioners understood is the 80/20 rule and the need for focusing on the “basics”. 80% of optimization ROI comes from doing 20% of optimization activities. The optimization umbrella is getting bigger and bigger – web, mobile web, mobile app, email, display ad – and the number of activities in these areas is almost infinite – banners, images, copy, buttons, layout, color, page flow, etc. It’s very easy to get excited about new initiatives like personalization and omnichannel. These things may have value. But is their value greater than the “basic” activity of optimizing page layout in the checkout funnel?

Sometimes organizations that have been doing A/B testing for years feel like they need to work on complex activities in order to continue progressing. My experience is that even mature organizations need to look past the hype of new and shiny buzzwords and determine which activities will give them the highest efficiency. Get the 80% with 20% of the effort by focusing on the basics.

Ryan Roberts – Solution Architect

I wish more people realized that successful optimization has to be a process that will require time, effort and thoughtful strategy. Just throwing together some random tests misses the point and the benefit of a well-run optimization program.

I also wish people were more careful about how they read test results. People that rely solely on confidence calculations are going to end up with a lot more wrong conclusions than they think. They need to understand what the rules of conclusive results should be for their site. And they have to apply them religiously to each test they run.

Doug Mumford – Consultant

Many great tests don’t (or shouldn’t) take much development time to setup. Orgs should actively work to reduce lead time from idea to launch. Launching a test in under an hour is very possible. Orgs tend to anchor their perception of development time based on what they’ve done in the past – 4-8 hours for dev slated out two weeks in advance, 3 hours for QA. Why?

While there are some tests that will require more time a lot of highly valuable tests can be done with three lines of CSS or jQuery, loaded up in four browsers to make sure everything looks good (and perhaps an iPhone and iPad), and launch. Have a bias for action.

If I had to characterize my own answer to the question it would be that there is a massive difference between action and value. Just running a test, be it one or 500, is not the mark that you are successfully optimizing. Optimization is about how you tackle large assumptions, and about how you act on data, and even how you think about what data can and can’t tell you. So much time is wasted in the pursuit of executing on assumptions and against the propagation of agendas which is the exact opposite of where the value of optimization comes from.

It is about discipline, and statistics, and variance, and technical solutions, and dealing with senior management and dealing with biases and assumptions. It is all that and more. It is a means to an ends, but that end is increased revenue for your organization, not just blindly reaching an audience or making an individual look good. The more you try to justify a specific action or the more complicated you make something, the less value you get and the more time you waste. Just understanding that action in and of itself is not the answer is the first step to being truly open to solving the largest challenges that optimization programs face. The challenge is never in running tests, the real challenge is finding solutions and ways to even have these conversations.

What do you find as the one thing you wished people understood about optimization? What are you doing to solve it?