Why we do what we do: Not everything is a nail – Maslow’s Hammer

At any given time you will often hear me quoting many famous quotes about anything and everything; that is how I relate to new experiences by trying to tie them to some bit of knowledge that I had already picked up. Probably my most common refrain lately is the famous Mike Tyson quote, “Everyone has a plan until they get punched in the face.” It’s true that everyone talks doing the right thing and everyone wants things to succeed, but as soon as there is some challenge to the prevailing world view or as soon as a small bump in the road exists, people often revert back to what they know best and turn back in on themselves. Unfortunately this is especially problematic in the business world as the only way to move forward is to change behaviors and tackle existing problems in new ways. Even more distressing is that as people fall back to what they are most comfortable with they turn towards their own disciplines and their own previous experience, limiting the ability for people of disparate talents and backgrounds to work together.

One of the things that defines people is the concept of viewing the world through their own experiences, and the most powerful experiences that we have in the modern world is our professions. Be it marketing, or engineering, management or data, we all view the world through the lens of the things we do and the challenges that we face day to day. We view the challenge of improving numbers by looking to “dialogue with our customers” or “increase efficiency through data analysis” or by “building better tools and a better user experience”. All of these in isolation seem like and often are very good ideas except when they cloud our ability to prioritize and to focus on a single outcome. Each day in the business world is really a Sisyphean climb to the top and each time that boulder rolls back on us we run back to that which we are most comfortable with. This is especially dangerous when we do not even have true accountability for the tie between those concepts and the functional bottom line outcome that we need to generate.

Abraham Maslow is famous for many things, from his hierarchy of needs to his many contributions to modern psychology. What he is often not associated with is a quote that almost everyone is familiar with, “if all you have is a hammer, everything looks like a nail.” We are all carrying hammers in the form of our world views and our professional disciplines. The key is to accept that there are many things outside of what we accept as “true” about the way to do things and about how to tackle problems. Even more when we do get evidence that does not directly correlate with our existing world view we can not dismiss it or try to understand it through that same tired lens.

Optimization at its core is the act of adding accountability to these world views and about challenging assumptions. It is about taking the existing practices of the entire organization and standing them on their end, shaking them, and finding all the holes and least effective parts. It does this not maliciously but as a mutual benefit to everyone to add a different point of view on the functions and actions that they are taking. This is why the discipline of testing is about everything but test ideas. It is about building rational rules of action and building out alternative hypothesis. This is why you focus on efficiency and multiple options and not just on what won and not about what won elsewhere or about some great idea someone had. It is why it is about patterns and not about some artificial reasoning why something won. You serve the organization a great discipline when all you do is regurgitate the nail back to someone so that they can then hit it with the same tired hammer. Optimization is the act of putting any idea and discipline through a system that allows for it to get better and for everyone to learn and to get better results.

At the same time, it is important to understand that everyone else is viewing the world through a very different lens. They are trying to tie their past experiences with new actions and new results. A marketer has always thought in terms of a dialogue with a certain user or a certain persona. That mental model has gotten where they are today. When you come in and show that there might be more effective ways to look at those same users or the the concept of personalization most likely will not work the way they envision, you are creating a very powerful form of cognitive dissonance and you are forcing people outside of that hammer that they so readily wield. Too much and you will cause major push back and possibly form an ongoing barrier to success. Too little push and you are just confirming their biases and not providing any assistance.

The key in this and in all actions is to be firm on discipline but flexible on tactics. Work with the concepts and push them past their existing barriers. This is why it is so vital to not focus on test ideas when building out a successful test. Talk about what people were already focusing on and how best you can test out that concept against many others. You want to do personalization, great, here is how we take what you were doing and serve that and other concepts to everyone. If you are right, we get to see that and if you are wrong then we found something that is better. In reality there is no downside to performance when we tackle a problem that way. It is about reaching the ends, not about the means that get you there.

Another key to this is to get people to vote on what they think will win for each test. If you do this enough and with enough varied options and you will be amazed at just how bad people are at guessing the right answer. In the last 9 tests that I have done we have averaged 8 options for each test, with some variants coming from the team, some from myself, but a large many simply expressions of the various directions that are feasible. I have asked a large team to pick there favorite and second favorite. In those 9 tests, we have had exactly 1 second place vote for all of the winners combined, and the only reason that the option got that vote was because my very talented designer picked up on the pattern and voted her least favorite. The shock of where we are versus where people thought we would be and the impact to the bottom line (over 200% improvement) has helped open doors to new ways of tackling problems, and it has done so organically.

In both tactics you are giving people the chance to tie their world view in with the results and letting them have a stake in the outcome. You are welcoming that hammer they wield but helping them see that there are many different nails to hit.

Keep in mind however that you are just as guilty as they are. Spend too much time in the world of optimization and you will start to feel like no one has any idea what they are doing and that all ideas are going to fail. It is even more important for you to challenge yourself and for you to go beyond your comfort zone in where you let testing going. Make sure you include ideas from others as much as possible, even if you are sure they are not going to work. Make sure you tie optimization in on actions that you feel might not comfortable or worth your time. Remember that the smarter someone is, the more likely they are to be impacted by biases and that you serve no good to the organization or yourself if you are not more vigilant against your own biases then you are against others.

Cauterizing Open Wounds

One of the most difficult parts of starting your own program or of consulting with a new organization is the need to evaluate and change existing practices. In almost all cases groups have been optimizing for a while, often times with one or more people owning the program and who have built their reputations off of prior practice. Any prior actions have been done with their name attached and they have enjoyed the perceptions of success. The problem is though that people rarely evaluate the reality of their statements and are often not aware or too busy to really know if what they are saying is real or pure BS (this explains the entire agency system).

This can be extremely problematic as it is vital to stop any bad practices before you can implement needed discipline and really make a positive impact for your company. It does you no good to look into things like fragility or efficiency, or in controlled experiments or segment discovery if you are operating in a world where people expect to test out 1 or 2 ideas based on opinions and to do this in 2-3 days. If your organization actually thinks that things like 48 hours to run 8 tests and clicks on a button are a measure of success then no amount of real optimization is going to matter until you make it clear just how off the entire process is. Of course if you do this poorly then you are just making yourself public enemy number 1 and since you are the new guy in the room you are basically setting yourself up for failure.

The key is to understand the issues and tackle all of them without prejudice and to evaluate the program for all of them. That way people see that you are not attacking someone or something but simply evaluating the program for inefficiencies. If everything is up for grabs and somethings pass and something go then at the least you are removing the direct confrontational element from it. If you can further push the conversation into one of what defines success and simply focus on those components then many of the would be battles simple fall by the wayside.

Generally the things that need to evaluated and often changed fall into a number of common categories. These include:

Acting on test:

    False belief in confidence
    Acting too quickly
    No consistent rules of action

Lack of Process:

    No consistent way of getting results live
    No single person owning test ideation, just random ideas thrown up

Lack of data control:

    Wrong metrics
    No variance study
    Lack of proper segment analysis

The main problem with any or all of these is that there will be a library of tests that people have believed and most likely built entire strategies around. It doesn’t matter if it is what pages do or do not work, the impact of certain changes or where and who to test to, this misinformation is far more damaging then any positive result that you could generate.

All results are contextual, and as such this means that you must set the proper context in order to really evaluate the impact of a test or process. If you have people believing a 200% increase because they were looking at one group and on clicks on a button then it can be nearly impossible to talk about a 5% RPV increase because it just sounds too small and not as important to them, despite the fact that the 200% click increase could have actually caused a 10% loss in revenue. If you or others do not understand the core principles and math involved then they are more likely to fall for any BS that they come across. You must focus on education and on the disciplines, not just stories if you want to make meaningful long term impact.

This is why stopping the bleeding is such an important and difficult task to overcome. People don’t realize how far off they really are and often times have never been called out for their BS, resulting in entire careers built on bad outcomes and false conclusions. In my case I am looking at everything from acting too quickly (18 conversions versus 32 conversions is meaningless), a lack of variance understanding, and a lack of discipline on test ideas. These things were not done because someone was malicious or self serving. they were not done because of a lack of intelligence or a lack of want to improve the business, they were simply done because the person did not know better and because there is just so much bad information out there.

The real challenge here is controlling expectations and helping people understand the error in their ways. I am extremely lucky to work with a number of very smart people who are willing to listen to and understand issues which they never knew they were dealing with, like the variance problems I previously discussed. The challenge if far more in people understand that just because they come from a place that is used to testing in 1-2 days or in tracking a certain thing it just means that they were really good at wasting their companies time and resources. It is also important to also set proper expectations on what the movement speed will be. If they are thinking you can get a result in 2-3 days and it is going to take 2-3 weeks, this can completely shift your view of optimization to a the negative despite the fact that you are really moving from something that was damaging the company to something that is going to cause consistent positive growth.

More then anything it is important to realize that you have to stop all bleeding and make that the primary focus before you can overly concern yourself with making big changes. This doesn’t mean that you don’t do any tests or the like, in fact it is important for people to see what they should be doing so that they can really appreciate how far off they were prior. If someone doesn’t know what success looks like then any point on the map can be success for them. It simply means that controlling the message and focusing on education is vital at the start of any program.

Google Experiments, Variance, and Why Confidence can really suck

There are many unique parts to optimizing on a lower traffic site, but by far the most annoying is an expected high level of variance. As part of my new foray into the world of lead generation I am conducting a variance study on one of our most popular landing pages.

For those that are not clear what a variance study is, it is when you do multiple variations of the same control and you measure all of the interactions against each other. In this case I have 5 versions of control which gives you a total of 20 data points (all 5 compared to the other 4). The point of these studies is to evaluate what the normal expected variance range is as well as the minimum and maximum outcomes from the range. It is also designed to measure this over time so that you can see when and where it normalizes down to as each site and page will have a normalization curve and a normal level of variance. For a large retail site with thousands of conversions a day you can expect around 2% variance after 7-10 days. For a lead generation site with a limited product catalog and much lower numbers, you can expect higher. You will always have more variance in a visit based metric system then a visitor based metric system as you are adding the complexity of multiple interactions being treated distinctly instead of in aggregate.

There are many important outcomes to these studies. It helps you design your rules of action including needed differentiation and needed amounts of data. It helps you understand what the best measure of confidence is for your site and how actionable it is. It also helps you understand normalization curves, especially in visitor based metric systems as you can start to understand if your performance is going to normalize in 3 days or 7. Assume you will need a minimum of 6-7 days past that period for the average test to end.

The most annoying thing is understanding all the complexities of confidence and how variance can really mess it up. There are many different ways to measure confidence, from frequentest to Bayesian and P-Score to Chi Square. The most common ways are Z-test or T-Test calculations. While there are many different calculations they all generally are supposed to tell you very similar things. The most important of which is what is the likelihood that the change you are making is causing the lift you see. Higher confidence means that you are more likely to get the desired result. This means that in a perfect world a variance study should have 0% confidence and you are hoping for very low marks. The real world is rarely so kind though and knowing just how far off from that ideal is extremely important to knowing how and when to act on data.

This is what I get from my 5 experience variance study:

day6variance

To clarify, this is using a normal Z-Test P-Score approach and there are over the bare minimum conversions that most people recommend (100 per experience). This is being done through Google Experiments. The highest variance I have ever dealt with on a consistent basis is 5% and anything over 3% is pretty rare. Getting an average variance of 11.83% after 5 days is just insane:

variancegraph

This is just not acceptable. I should not be able to get 97% confidence from forced noise. It makes any normal form of confidence almost completely meaningless. To make it worse, if I did not do this type of study or if I did not understand variance and confidence then I can easily make a false positive claim from a change. These types of errors (both type 1 and type 2) are especially dangerous because it allows people to claim an impact when there is not one and allow people to justify their opinions through purely random noise.

If you do not know your variance or do have never done a variance study, I strongly recommend that you do so. They are vital to really making functional changes to your site and will allow you to avoid wasting so much resources and times on false leads.

The New Long Road Ahead

So much has changed in my world recently and I wanted to give everyone a heads-up. After 5+ years trying to fix some of the largest and most complicated organizational optimization issues I have stepped away from Adobe and have decided to go in a somewhat new direction. I have taken a position as Director of Optimization for a small company in the Carlsbad, CA area called Questpoint where I will be overseeing optimization of a number of lead gen situations.

What this means is that I now deal with much smaller but much more meaningful measures of success. It also means that I can now talk much more directly about the challenges I face and the solutions as they present themselves to me. I will continue to investigate the theoretical challenges of optimization but will also be more directly talking about the realities of testing on a budget. I will be using a number of tools including Google Analytics and Google Experiments and will be breaking down the advantages and disadvantages of them in comparison to the enterprise level tools that I was familiar with.

Here is to the new path before me and here is to the many barriers and hills one must climb to bring that boulder to the top of the mountain.