There are many challenges for anyone entering a new field of study or a new discipline. We are all coming into any new concept with all of our previous held knowledge and previous held beliefs filtering and changing how we view the new thing before us. Some choose to make it fit their world view, others dismiss it from fear, and others look for how it can change their current world view. Usually in these situations I quote Sherlock Holmes, “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” Nothing represents this challenge more in online marketing then the differences between analytics and optimization, and nothing represents that struggle more than the debate about visit based measurement versus visitor based measurement.
The debate about should someone use visits, impressions, or visitor basis for analysis is a perfect example of this problem, as it is not as simple as always use one or the other. When you are doing analytics, usually visits are the best way to look at data. When you are doing optimization, there is never a time where visits would present to you more relevant information then using a visitor based view of the data.
Analytics = Visit
Optimization = Visitor
The only possible exceptions are when you are using adaptive learning tools. While the rules can be simple, a deep understanding of the way presents many other opportunities to improve your overall data usage and value derived from every action.
Since most people reading this start in an analytics background, let’s look at what works best in that environment. Analytics is a single data set correlative data metric system, which is a long way of say, it counts things on a consistent basis and only one set of data, even if that data has many different dimensions. You are only recording what was, not what could or should be. In that environment, you have to look at data in some very particular ways. The first amongst those is a very tight control on accuracy, since in many cases the use of that data is to represent what the business did, and to hopefully make predictions about the future.
It is also important that you are consistent with how you measure and that you look at things in a common basis. Because most people are comfortable looking at a day or shorter term basis, this means the easiest method is going to be a visit. It is works great because you are trying to look at interactions and to measure in a raw count of things that did happen, e.g. how many conversions, or how many people came from SEO. In those cases, a raw count in a correlative area is going to be best represented using a visit basis, since it mitigates lost data (though it is not a massive amount) and it best reflects the common basis that people look at data.
In the world of optimization however, you have a completely different usage and type of data. In optimization we are looking at a single comparative data point, and trying to represent an entire different measure, which is influence on behavior over time. It doesn’t matter if your site changes once a year or once an hour, or if your buying cycle is 1 visit or 180 days, all of those things are irrelevant to the fact that you are influencing a population over time. Because behavior is defined as influence on a population, and because we are looking comparatively over time, the measurement techniques used in analytics need to be rethought. Any concern about accuracy, past a simple point, become far less important than a measure of precision (consistency of data collection) since all error derived is going to be equally distributed. It doesn’t matter if the common basis is $4.50 or $487.62, what matters is the relative change based on the controlled factor. It is also important that we are focusing far more on the influence then the raw count, which means we are really talking about the behavior of the population.
In analytics you are thinking in terms of, what was the count of the outcome (rate) as opposed to in optimization the focus is on what was the influence (value). To really understand optimization, you have to understand that all groups start with a standard propensity of action which is represented by your control group. If you do nothing the people coming to your site, people in all stages and all types of interaction, measure up to one standard measure across your site (though all measurement systems do have internal variance in a small degree). Since we are measuring not what the propensity of action is but what are ability to positively or negatively influence it is, we need to think in terms of reporting based on visitors and based on the change (lift) and not the raw count.
You also have the case of time, where we need to measure total impact over time. While it is correct that every time a visitor hits your site you have a chance to influence them, it is important to remember that the existing propensity of action measurement already accounts for this. What we are looking for is a simple measure of what did we accomplish by in terms of getting them to spend more. This means that we have to think in terms of both long and short term behavior. Some people will purchase today, some 3 visits later, but all of that is part of standard business as usual. It is incredibly easy to have scenarios where you get more immediate actions but less long term actions. This means that on a daily basis you might see a short term spike, but for the business overall you are going to be making actually less revenue. This possibility creates two possible measurement scenarios:
1) There is no difference between short term and long term behavior, meaning the short term spike continues through and is positive also in the long term. In this scenario the only way to know that is to look at the long term.
2) There is a difference and short and long term behaviors differ and we are getting a different outcome by looking at the visitor metric over time. In this scenario the only positive outcome for the business is the visitor based metric view.
In both cases the visitor based metric view gives us the full picture of what is good for the business, while the visit based metric system either has no additional value or a negative value by reaching a false conclusion. In either case the only measure that adds value and gives us a full picture is the visitor based view of the world. We have a case where visitor is both the most complete view, no matter the situation, but the only one that can give you a rational view of the impact of a change. To top it off, the choice to only look at the shorter window creates a distribution bias, by valuing short term behavior over long term behavior, which may create questions into the relevance of the data used to make any conclusion.
The visitor vs. visit based view of the world is just one of many massive differences that reduce the value derived from optimization if not understood or not evaluated as a separate discipline. Because it is so easy to rationalize sticking with what is comfortable, it is common to find this massive weakness being propagated throughout organizations with no measure of what the cost really is. While not as damaging as others, like not having a single success metric or not understanding variance, it is vital that you are thinking about visit and visitor based data as attached the end goal and not as a single answer to everything.
In the end, the debate about which version to use is not really one about visits or visitors, there are clear reasons to choose visits for analytics and visitor for optimization. The real challenge is if you and your organization understand the different data disciplines that are being leveraged. If you constantly look for different ways to think about each action you will find new and better ways to improve value, if you fail to do so you will cause damage throughout your organization and will not even know you are doing it.
The life of working with organizations to bring about change is a difficult and frustrating one, but at the same time there are some truly amazing moments. The times you see an organization change and truly look at optimization differently, when they get past misconceptions and “best practices” are truly amazing. While much of my writing focuses on the dark side of this world, I did want to take some time and share a few small stories from the front-line that are what to me consulting is all about.
Story #1: I was working with a hospitality and travel company where I was hoping to come in and change a number of practices that may not have been beneficial to the company. Starting out an example test series to work through discover and exploit, they decided to run an inclusion/exclusion test on a page template for their destination landing pages. We discovered that their main content block section was having a negative impact to their page and that removing it would result in 12% lift to RPV. We also discovered that a small section on their right rail that they never even thought about was actually the most influential part of the page.
We were able to do this with very little effort (basic CSS), but because we challenged a number of internal assumptions, we learned about things they never considered, reduced maintenance cost, and achieved a major lift to their bottom line. The real kicker of this testing was that it was both the lowest resource test they had ever run and their most valuable.
Story #2: Working with another travel/hospitality organization, I came in because they had not been able to get any momentum with their testing due to a previous belief that it required too much resources. We were able to leverage their existing infrastructure and ran a simple inclusion/exclusion test on one of their landing pages. We did this all in one conversation without any pretext. In this case we discovered that almost every element individually was negative to the page, meaning that as a whole the total was far less then the sum of the parts. We run a simple follow-up and discovered that they could generate 3-8 million dollars by simply removing one of their main offer parts, and that they could generate 4.5-12 million if they were willing to create a dynamic experience and show one of two sections based on new and returning users.
I love these types of moments because you have so many good things come out of very small actions. We now have a group that is amazed by the power of optimization. We now have a group that sees that a number of their assumptions have been proven incorrect about what works, or even for whom. We were able to generate massive fiscal impact to the business, with extremely low effort, and were able to give them insight into other opportunities to do the same thing. They were able to learn about the disciplines of testing, find out that conversion rate and revenue are not correlated, and were able to think about their entire user experience differently.
Story #3: Working with a major online retailer, they were just re-introducing optimization to their organization. They had already decided on what they were going to test, how, and when. They then jumped into their test, but I gave them guidance to not over react and to understand how contextual changes are going to play out (very large initial climb and then narrow down to almost nothing quickly). They were so excited when they had a 40+% lift in a couple of days with 99% confidence, but I asked them to wait and talked to them about what confidence really means. They held off and within the next 7 days, the results of the test were less then 2% to the point it had no impact.
Now normally you don’t like outcomes that do not generate lift, what made my day is that they stopped themselves from rushing around claiming a result that had no basis in reality. If they had ran to tell the rest of the organization about that 40+% lift, they would have done irrecoverable damage to the company as others would have attempted to do the same thing and reach the same false conclusions. Because they waited, they get to understand the nature of statistics, contextual versus spatial changes, and started thinking more about the true disciplines of optimization. Since that time they have now expanded their testing to multiple countries and are testing at an accelerated pace and driving real value throughout the organization.
There are so many moments in working with different groups where you can get frustrated, or give up, or even worse give in and tell someone what they want to hear. Equally there are many moments where you can create a false impression of impact and fake numbers for your own gain. The moments that are truly amazing are those that prove that you don’t have to do any of those things, that you can achieve amazing results, quickly, and for far less time and effort then someone might otherwise think. All it takes is to think about and tackle problems wrong, and the results that any group can achieve are truly amazing.
After your optimization program has been running for a while, there will naturally come times when you get to share the amazing success that has been generated by running your program correctly. Often times groups are so happy to get to brag, they don’t realize that they are causing the program long term problems.
There are many keys to sharing results correctly, but the most important thing to remember is why are you sharing them. Yes, you are talking to others about how great you are, but ultimately results are not about the past, but about the future. Should others invest in the program? Should they expand where and what you optimize? What do others think about and interact with optimization and most importantly, why should they listen to you and your team when it comes to how and why you should be running certain tests. The critical moments come not from the sharing of results, but the framing of why and how you get those results, and what it can do for other parts of the organization. Focus on the wrong parts, and you will are asking for far more trouble than most can imagine.
Some of the worst moments for programs come after big results presentations, where they have gotten buy in from others and the massive expansion of the program, but fail to share the right message and to help others understand that testing is often times not what they think it is. These moments inevitably lead to large resource drains, negative impressions, and massive time sinks for the original testing group, leading them to frustration and less total revenue generation then before the influx of resources. Looking back 6 and 12 months later, you can easily see the moment that things went south.
With all of those concerns in mind, here are the keys to successfully sharing results within your organization.
DO – Focus on what you got by proving assumptions wrong
It is so fun to talk about getting an 8% lift, or getting a 20% lift on multiple tests, but often times the lift is secondary to the how did you get it. If you are testing to find the optimal use of resources, then it is inevitable that you will find many times when popular assumptions have been proven wrong. As you share your results, it is important that this is the primary part of the message. It is not about “we got a 15% lift“, it is about “everyone wanted to do X, which would have generated a 3% lift, but we found that of these 5 feasible alternatives, that doing Y actually was dramatically better and generated a 15% lift, or 12% better than we would have gotten if all we did was test what people thought was going to win.”
DON’T – Report test reports as a single revenue number
It is so fun and easy to report results as, “the test generated 6.2 million in additional revenue.” The problem is that there is absolutely no way to know specifics with accuracy, and you will garner a lack of trust if later the P&L does not show that exact figure of gain. I understand how impressive it is to point to a single number and how much it can get you credit, but ultimately it is far more damaging then the temporary good it might generate.
Instead, even if we ignore all the real world problems with confidence, it is important to understand that confidence and most measures only measure the likelihood of pattern, not the actual outcome. If I have a 10% lift and 96% confidence, it is not 96% confident that you will get a 10% lift, only 96% confident that the measured experience will beat control. Confidence intervals can also be tricky because of the many assumptions of the Gaussian bell curves that they are based off of.
Instead focus on report tests as a range, based on a preset range. What that range is somewhat arbitrary, as long as it is sufficiently large enough to convey the massive range of possibility. If I have not done deep analysis of past results, I will often times report test results in a 50% – 200% range, so that the 6.2 million becomes an expected outcome of 3.1 to 12.4 million dollars. Ultimately the range is arbitrary, though there are ways to look back at results over time and see an expected range. Express everything in a large but relevant range and you will avoid all the massive problems of credibility false reporting creates.
DO – Report all tests based solely on revenue impact
While you can’t report an absolute number that does not mean that you should not be reporting the fiscal impact of a test. Translating all tests to a revenue figure gives you the ability to express your efforts as they impact the bottom line, while also giving you the ability to rationally compare the results amongst tests. Even if you are not a retail site, you can translate leads or page-views to average value or CPM. Revenue also serves the purpose of making you evaluate your single success metric to ensure that it is tied to the purpose of your site and are not being caught up on side goals that do not impact the bottom line.
DON’T – Forget that you are measuring gross revenue, not net revenue
Except in rare circumstances, most groups end up measuring gross revenue when it comes to the impact to the business. While this makes numbers seem much larger than they really are, it often times leads to groups over estimating their impact to the business as a whole. If you cannot express impact based on pure revenue generation, at least make it clear what numbers and assumptions you are using and what you expect the entire program to deliver to the bottom line. Nothing kills credibility then numbers that any rational executive can not believe.
DO – Report on the scale of impact of various tests
So much is missed if we do not look at patterns across tests. One of the critical things for groups to understand is that lift by itself does not tell you revenue. A smaller population measured with a very large lift is often worth far less than a larger population with a much smaller lift. If you are translating all tests to revenue, then you can easily figure out that where you have been able to generate the most revenue, not necessarily the most lift. This active data acquisition is what allows you to plan out and increase resource efficiency in the future, and becomes vital for the long term growth of a program. Often times the lessons learned here really shape how people look at the impact of various channels. This type of analysis also helps people start to understand the differences between revenue allocation and revenue generation.
DON’T – Forget that the most valuable results from tests is not the lift
It is vital that overtime you start to get a deep causal understanding of what your ability to influence various parts of the user experience, as well as user groups, and what the cost to do so is. While it is fun to talk about the revenue impact, knowing that in 4 out of 5 places changing content did not have much of an impact compared to spatial changes completely changes how other parts of the business even operate. These lessons, about where you have been able to make an impact, how, and what it took to do it can help shape entire product roadmaps and help drive exponential revenue generation in the future.
There is no better time to express that testing is not just a list of actions, but an active acquisition of knowledge then when and how you talk about results. Failing to look at these patterns across tests and failure to really use this as a way to filter your other data can lead to massively inefficient uses of resources. Your program is worth far more then the individual actions you take, so why would you allow others to overly focus on tests when it is the act of optimization that drives the largest value opportunities? Make this the focus, what you learned, how, why, and what the impact was, and you will be able to make others see what testing can really do for them.
There is no greater time to really see where a program is at then to see how they communicate results. You can tell how efficient they have been, how they work with other groups, and most of all how much the personal ego of the people involved on both ends of the presentation gets in the way of real meaningful results. If you think about and focus on the right parts of expressing results, you will be able to move forward and really change your organization. Nothing drives others to want to invest in and expand a programs impact if you can show it improves every other part of the business. Focus on just the lift and just numbers, and you are setting yourself and others up for failure.
As personalization and testing continues to become more and more mainstream you are starting to see a whole slew of groups that are being introduced to testing, or who may believe they have more functional optimization knowledge then reality. So many groups would get be better off if they just avoided a number of common pitfalls that befall new programs. While earlier I put together my list of the top 7 deadly sins of testing (no single success metric, weak leadership, failure to focus on efficiency, testing only what you want, confusing rate and value, and falling for the graveyard of knowledge), I want to instead give you a very quick checklist to make sure that at least your first few efforts in testing are not more painful and far more fruitful then what normally happens.
It does not matter if you have tested before, or are new to testing. What matters instead is how you tackle today’s issues and how you set yourself up to succeed. Breaking down the components of a successful first test allows you see the action items, and allows you to move towards the moments that really make or break a program. Nothing makes everyone’s life harder than starting out testing on the wrong foot, sine everyone will think that is how things are going to happen from then on. With that in mind, if you do nothing but follow this simple checklist, you are going to be far better off.
1) Decide on what you are trying to accomplish – Pick one metric for all your tests, across your entire site or org that defines success. This might be easy or hard, but it is vital. This is especially difficult for people coming from an analytics background who are used to reporting on multiple things and finding stories.
2) Pick one or two pages to start – Do not try to eat the entire elephant in one sitting.
3) Do not focus on your test ideas – You are going to have test ideas in mind, and you are going to want to only talk about and focus resources on that one test. Without fail this is where most groups want to focus, but I cannot stress enough how not important your concept for improvement will be to a successful program.
4) Make sure your first test is very simple – Don’t start trying to do a full redesign of your entire site or completely change a user flow. Pick one page and one thing to start. If you are not sure, then pick a page and design your first test to measure the relative value of all the items on the page.
5) Decide on your initial segments – Make sure all tests have a segment list. Not being focus on a specific segment will make it far easier to find exploitable segments and will start the process of learning even if you do not intend to use them right away.
Here are some basic rules for segments to make your life easier:
• Must be at least 5% of your total population (10% of a smaller site). This is total, not identified traffic
• Must have a comparable measure (you can’t measure new users versus Google users, since there are new Google users).
• Have to be defined PRIOR to the start of any campaign
• Need to cover the 4 big user information types (the listed items are just examples):
Day of week
Operating system (basically all the stuff you get from the user agent string)
How did the user get to the site
Key word types
Used internal search before
6) Start building a proper infrastructure – Build out a technical frameworkfor your entire site for testing. This may not be tied to the first test, thought that will be part of it. Get access so that you can run tests on 80% of your pages, using a combination of local and global set-up. A little pain up front will save you from lots of pain later on. It is always best to avoid death by a thousand cuts whenever possible, even if you don’t see that issue immediately.
7) Decide on rules of action – Make sure everyone is very clear on how you are going to call a test and act on winners before you launch your first test.
8) Making sure you are not going to QA tests like you do other parts of your site – So many testing programs are destroyed by letting IT decide on the work flow and on how you are going to QA your tests. You may have to work with your IT group, but it is vital that testing is not owned by your IT group but instead your marketing and product teams.
9) Point your resources towards feasible alternatives, not adding more metrics – Any additional time and additional resources need to be used on the following two things:
A) Adding as many alternatives to the test as possible, traffic permitting
B) Educating and working with groups to make sure they understand why you are testing and how you are going to act.
10) Remember that the most important time for your program is not the first test – Your first test is fun to focus on, but the period after your first test is where you can start to really focus on testing discipline. This is the time that defines whether you get good at just running tests, or if you build a rock star optimization program. So many groups fail because they miss how vital this time is.
There you go, 10 simple steps to make sure that your first moments in testing do not take your program astray. This is hardly the end of the road, but if you simply avoid setting yourself up for failure, then you can really start to look ahead to all the opportunity that is out there.