Cauterizing Open Wounds

One of the most difficult parts of starting your own program or of consulting with a new organization is the need to evaluate and change existing practices. In almost all cases groups have been optimizing for a while, often times with one or more people owning the program and who have built their reputations off of prior practice. Any prior actions have been done with their name attached and they have enjoyed the perceptions of success. The problem is though that people rarely evaluate the reality of their statements and are often not aware or too busy to really know if what they are saying is real or pure BS (this explains the entire agency system).

This can be extremely problematic as it is vital to stop any bad practices before you can implement needed discipline and really make a positive impact for your company. It does you no good to look into things like fragility or efficiency, or in controlled experiments or segment discovery if you are operating in a world where people expect to test out 1 or 2 ideas based on opinions and to do this in 2-3 days. If your organization actually thinks that things like 48 hours to run 8 tests and clicks on a button are a measure of success then no amount of real optimization is going to matter until you make it clear just how off the entire process is. Of course if you do this poorly then you are just making yourself public enemy number 1 and since you are the new guy in the room you are basically setting yourself up for failure.

The key is to understand the issues and tackle all of them without prejudice and to evaluate the program for all of them. That way people see that you are not attacking someone or something but simply evaluating the program for inefficiencies. If everything is up for grabs and somethings pass and something go then at the least you are removing the direct confrontational element from it. If you can further push the conversation into one of what defines success and simply focus on those components then many of the would be battles simple fall by the wayside.

Generally the things that need to evaluated and often changed fall into a number of common categories. These include:

Acting on test:

    False belief in confidence
    Acting too quickly
    No consistent rules of action

Lack of Process:

    No consistent way of getting results live
    No single person owning test ideation, just random ideas thrown up

Lack of data control:

    Wrong metrics
    No variance study
    Lack of proper segment analysis

The main problem with any or all of these is that there will be a library of tests that people have believed and most likely built entire strategies around. It doesn’t matter if it is what pages do or do not work, the impact of certain changes or where and who to test to, this misinformation is far more damaging then any positive result that you could generate.

All results are contextual, and as such this means that you must set the proper context in order to really evaluate the impact of a test or process. If you have people believing a 200% increase because they were looking at one group and on clicks on a button then it can be nearly impossible to talk about a 5% RPV increase because it just sounds too small and not as important to them, despite the fact that the 200% click increase could have actually caused a 10% loss in revenue. If you or others do not understand the core principles and math involved then they are more likely to fall for any BS that they come across. You must focus on education and on the disciplines, not just stories if you want to make meaningful long term impact.

This is why stopping the bleeding is such an important and difficult task to overcome. People don’t realize how far off they really are and often times have never been called out for their BS, resulting in entire careers built on bad outcomes and false conclusions. In my case I am looking at everything from acting too quickly (18 conversions versus 32 conversions is meaningless), a lack of variance understanding, and a lack of discipline on test ideas. These things were not done because someone was malicious or self serving. they were not done because of a lack of intelligence or a lack of want to improve the business, they were simply done because the person did not know better and because there is just so much bad information out there.

The real challenge here is controlling expectations and helping people understand the error in their ways. I am extremely lucky to work with a number of very smart people who are willing to listen to and understand issues which they never knew they were dealing with, like the variance problems I previously discussed. The challenge if far more in people understand that just because they come from a place that is used to testing in 1-2 days or in tracking a certain thing it just means that they were really good at wasting their companies time and resources. It is also important to also set proper expectations on what the movement speed will be. If they are thinking you can get a result in 2-3 days and it is going to take 2-3 weeks, this can completely shift your view of optimization to a the negative despite the fact that you are really moving from something that was damaging the company to something that is going to cause consistent positive growth.

More then anything it is important to realize that you have to stop all bleeding and make that the primary focus before you can overly concern yourself with making big changes. This doesn’t mean that you don’t do any tests or the like, in fact it is important for people to see what they should be doing so that they can really appreciate how far off they were prior. If someone doesn’t know what success looks like then any point on the map can be success for them. It simply means that controlling the message and focusing on education is vital at the start of any program.

Google Experiments, Variance, and Why Confidence can really suck

There are many unique parts to optimizing on a lower traffic site, but by far the most annoying is an expected high level of variance. As part of my new foray into the world of lead generation I am conducting a variance study on one of our most popular landing pages.

For those that are not clear what a variance study is, it is when you do multiple variations of the same control and you measure all of the interactions against each other. In this case I have 5 versions of control which gives you a total of 20 data points (all 5 compared to the other 4). The point of these studies is to evaluate what the normal expected variance range is as well as the minimum and maximum outcomes from the range. It is also designed to measure this over time so that you can see when and where it normalizes down to as each site and page will have a normalization curve and a normal level of variance. For a large retail site with thousands of conversions a day you can expect around 2% variance after 7-10 days. For a lead generation site with a limited product catalog and much lower numbers, you can expect higher. You will always have more variance in a visit based metric system then a visitor based metric system as you are adding the complexity of multiple interactions being treated distinctly instead of in aggregate.

There are many important outcomes to these studies. It helps you design your rules of action including needed differentiation and needed amounts of data. It helps you understand what the best measure of confidence is for your site and how actionable it is. It also helps you understand normalization curves, especially in visitor based metric systems as you can start to understand if your performance is going to normalize in 3 days or 7. Assume you will need a minimum of 6-7 days past that period for the average test to end.

The most annoying thing is understanding all the complexities of confidence and how variance can really mess it up. There are many different ways to measure confidence, from frequentest to Bayesian and P-Score to Chi Square. The most common ways are Z-test or T-Test calculations. While there are many different calculations they all generally are supposed to tell you very similar things. The most important of which is what is the likelihood that the change you are making is causing the lift you see. Higher confidence means that you are more likely to get the desired result. This means that in a perfect world a variance study should have 0% confidence and you are hoping for very low marks. The real world is rarely so kind though and knowing just how far off from that ideal is extremely important to knowing how and when to act on data.

This is what I get from my 5 experience variance study:

day6variance

To clarify, this is using a normal Z-Test P-Score approach and there are over the bare minimum conversions that most people recommend (100 per experience). This is being done through Google Experiments. The highest variance I have ever dealt with on a consistent basis is 5% and anything over 3% is pretty rare. Getting an average variance of 11.83% after 5 days is just insane:

variancegraph

This is just not acceptable. I should not be able to get 97% confidence from forced noise. It makes any normal form of confidence almost completely meaningless. To make it worse, if I did not do this type of study or if I did not understand variance and confidence then I can easily make a false positive claim from a change. These types of errors (both type 1 and type 2) are especially dangerous because it allows people to claim an impact when there is not one and allow people to justify their opinions through purely random noise.

If you do not know your variance or do have never done a variance study, I strongly recommend that you do so. They are vital to really making functional changes to your site and will allow you to avoid wasting so much resources and times on false leads.

The New Long Road Ahead

So much has changed in my world recently and I wanted to give everyone a heads-up. After 5+ years trying to fix some of the largest and most complicated organizational optimization issues I have stepped away from Adobe and have decided to go in a somewhat new direction. I have taken a position as Director of Optimization for a small company in the Carlsbad, CA area called Questpoint where I will be overseeing optimization of a number of lead gen situations.

What this means is that I now deal with much smaller but much more meaningful measures of success. It also means that I can now talk much more directly about the challenges I face and the solutions as they present themselves to me. I will continue to investigate the theoretical challenges of optimization but will also be more directly talking about the realities of testing on a budget. I will be using a number of tools including Google Analytics and Google Experiments and will be breaking down the advantages and disadvantages of them in comparison to the enterprise level tools that I was familiar with.

Here is to the new path before me and here is to the many barriers and hills one must climb to bring that boulder to the top of the mountain.

Taking a Wrong Turn – How your roadmap may be Leading you off a Cliff

You are driving down a road when your GPS tells you to turn left. You make a sudden motion, finding yourself down a small side road. It doesn’t look like where you are trying to go, but you have to follow your GPS; otherwise, you will get lost. You continue, then your GPS tells you to go right. There isn’t a road there, and because you are stuck doing only what the GPS tells you, you turn and suddenly find yourself running off a cliff, flying to your demise in a canyon below. Sound like a bad TV skit? The reality is that this is how most people leverage their “roadmaps” in terms of how they run their optimization programs.

While hypothesis is still the most misunderstood term in all of optimization, the most abused may be roadmap. So many different groups claim they have a roadmap or to be following a roadmap or that it is on their “roadmap” and yet so few understand how one is meant to be used. A roadmap (little r) is a list of tests, most of which serve as a great excuse to waste time and effort and to get locked into a system of projects. A Roadmap (capital R) is a constantly shifting list of priorities by which you will create actions and act to discover where to go next. This distinction is fundamental if you have any hope of really achieving great results with your program, and yet so many happily focus on the first for the sake of internal processes or the inability to change how their optimization program operates in producing revenue.

Let’s start with what the goal of optimization is. It is not to run a test.

Tests are a means to an end.

The goal of an optimization program is to produce the maximum amount of revenue for whatever resources you spend on it. The same is true of every effort you do, be it personalization, SEO, content creation or a promotion. You are not just doing it because it is fun, you are doing those things to increase the revenue to your organization. This means that those are just tactics and not the end onto itself. This is fundamental to understanding the difference between a roadmap and a Roadmap.

Anytime we confuse the action for the end goal, we lose almost all possible value because we have lost the ability to go in any other direction. When we get stuck on a review process and a large series of tests you are making the decision to focus on the action and not the value it generates. You become a means to empty action, not a means to the end of generating revenue. You are saying, at that point, that you couldn’t care less if you make money, so long as these few specific tests get run.

If you instead focus on the end goal, then the first and most important piece is to discover how best to do that. You may have some test ideas and some things you are going to execute on, but they are fungible. You must and will constantly shift them as you learn more and as you go in new directions. You cannot be stuck on the path if the end goal is the most important, you must instead focus on the discipline and flexibility to go anywhere the data tells you.

This is why a Roadmap is just a series of places to focus. It might be on personalizing an experience, or improving a product page, or on improving your recommendation system, but that is what you are trying to do. You are hoping that doing that will result in more revenue, but you are not tied to specific tactics, just finding the best way to accomplish the end goal. Often times you will have no more then 1 or at most 2 tests for each area when you start, but you plan out the time to shift and the time to continue down any path that presents itself to you. From there you can work out actions which will produce answers, things like inclusion/exclusion testing, or MVT, or content serving so that you can measure the value of different alternatives. At that point, you then focus on whatever the answers you have are and continue to drive forward based on those results.

The amazing or frustrating part of this, depending on which approach you are used to, is that you never know where you will end up. You might end up with a dynamic layout for your product page, or targeting content based on time of day, or on removing your recommendations system from a page. The farther you end up from where you imagined the more revenue you make. Each step that takes you in a new direction can only do so by proving using rational measurements that it outperforms where you thought you were going to go. You can end up just about anywhere and that is what makes it so powerful.

The most common refrain you get when tackling problems this way is that it is hard to plan resources, but that argument just does not hold water. You know you are going to test and you know you are going to need resources. This just means you plan time. What you aren’t planning on is that time being spend on coding this one specific module 6 months from now. The action of that time is constantly shifting and updating, it isn’t set in stone. you can plan resources extremely easily. What you can’t do however is focus those resources only on one persons opinion or on a singular person’s agenda. It is not that you spend more resources or can’t plan, you just spend them differently and away from empty talks about a test and about building a successful and meaningful program.

The real challenge becomes not resource planning but accountability. So many programs hold onto their list of tests because it justifies their actions. It becomes about checking off that a test was done and not about the efficiency or the value of that test. At the end of the day the people in your program get to choose between their own accountability between just running tests or with actually providing value. If you are focusing on an empty series of tests, then you will always just be doing action. If you can instead view your Roadmap as a constantly shifting series of actions that focus only on the value they derive, then you will never worry about any specific test or about trying to validate test ideas.

In reality the biggest challenge to tackling problems like this is the ego of the people in your program and the executives who might be involved. People protect themselves at all cases because accountability is the scariest thing in the world for most people. The old systems have everything going through them and with their blessing is everything done. When you are going wherever the data takes you then you are faced with going in a direction that might not be where that executive thought of 3 weeks ago. When you just focus on your part of the a lager process or when you accept their divined vision as the only means to an end then you have essentially said that you have no value at all to the organization and are just a fungible means to an empty end.

This is why an education program and why a focus on discover is so vital for the value derived from your testing program. Management might view this as a loss of power but the reality is that it is so much more. They aren’t constrained by some random thought they had, no matter how great it was, and can instead encourage others to expand on their creativity. It is no longer about having the right answer but about measuring and going with the best ideas your entire team can come up with. You can tell just how far you are from this point with the number of empty I believe/I think/I feel conversations you hear in meetings. The less you hear of those the closer you are to achieving real value. It isn’t about a review process but instead about the creation process and the management of the system to ensure rational decision making.

So many organizations are led to drive into that canyon or into a random lake. Even worse there are always people at those organizations who will describe that water they are drowning in as the expected destination. If you really want to go to new places and really want to end up where you should then you are going to need to give up your belief in that roadmap that you hold so dearly to. Find your own Roadmap, let it shift and go where it needs to, and you will be amazed as just how far you can go and how many new sights you will see.