Because of the nature of how most organizations work, it is common to find testing added onto the existing roles or organizational structure of the analytics group. It makes sense from a very high level view as both disciplines deal with using data to make decisions and both can be viewed as a shared resource throughout the entire organization and for each group. The failure comes however when people who are used to looking at and responding to problems like they are analytics issues try to force the same actions onto testing.
The real key to success in adding optimization to your organization is in how you tackle the fact that it is a new discipline then in the whom or where it fits into that larger picture. Most groups worry about head count and resource and fail to focus on the skills and different actions that determine success with optimization as opposed to general analytics or business intelligence work. The lack of knowledge by others is often abused or leveraged to help people gain oversight onto this project. The real problems lie however when people do not then adapt usage to meet the new needs and instead simply try and come up with stories as to the value of the program.
Once mature and with key people in place combining the programs can add a lot of value. Without deep understanding of those differences however, the inevitable conclusion is less value and wasted time as people make mistakes that they do not even know are mistakes. Resources are a precious commodity, and the ultimate expression of optimization is to leverage them in the most productive way possible. In order to ensure this outcome, it is important that you focus the time of your optimization team towards actions that will maximize outcomes and help grow understanding for your entire organization.
To maximize success and to ensure that focus is done on the right actions, here is a breakdown of the time spent and the value derived from the main actions of a successful optimization team.
1) Active Data Acquisition – 80% – 90% of the value of most optimization programs comes not from the commonly thought of validation role but in continual active data acquisition and comparison of feasible alternatives. It takes time for groups to achieve this role, but when they do the value given to the organization increased by magnitudes. Often this is based around the concepts of bandit based optimization and fragility and is used as an ongoing effort to challenge assumptions and to actively measure the value of different alternatives.
In this role the optimization team consistently leverages low resource efforts that consistently measure as many different feasible alternatives as possible and do this across the site. This is in the attempt to maximize the discovery of exploitable opportunities and the primary role is to challenge assumptions which otherwise will never be discovered. The team needs access to pages and clear rules on measuring success and leveraging of resources in order to produce constant and impactful lessons which can shape and direct product discussions and roadmaps.
2) Education – This is 5-15% of the value derived from the optimization group, but provides the ability to do the actions which produce the greatest returns. Because optimization requires very different ways of thinking about and executing on actions in order to provide the most value this means that one of the key roles for optimization is an ongoing and consistent conversation with groups about different ways to think about problems.
It is vital that this conversation always happens prior to any action and is that optimization is not just thought of as a simple action in a release calendar. Groups that fail to think different are guaranteed to get much lower value from their testing efforts, waste far more resources, and often have much slower and less productive product teams overall. They will get results, they will simply be far smaller results at a much higher cost. Failure to focus on education often leads group to a purely responsive role and leads to programs that are happy with the number of tests they run or by simply producing a single positive result.
There is no such thing as an organization that starts out looking at the correct things perfectly and without fail someone’s personal agenda leads them to subconsciously search out confirming actions and data in order to make themselves look good. Building, maintaining, and educating people on proper data discipline is the single most consistent and important topic of education.
Here are a few of the key topics that groups can and should focus on:
i. Rules of Action – Knowing how to act on data and to be disciplined in not acting too fast or too slow and looking at only metrics that matter to a decision are vital for any data organization.
ii. Statistical disciplines – There are many different ways to think about data and testing and it is vital that people be exposed and open to different ways then they are previously aware in order to maximize future growth.
iv. Knowledge Share – When you are running a successful optimization program you will be constantly learning things that go against all previously held beliefs and opinions. These lessons learned are the single most valuable part of a successful program and become a core component of a program once it has matured.
3) Ad hoc analysis and validation testing At most this represents 5% of possible value provided by testing – It is always fun to want to focus on who has the best idea to improve things, but ultimately this is the least important part of a successful program. The better the input, the better the output, but only if it is going through a great system. A poor system means that it really doesn’t matter how good any idea is.
This is the part that most groups are familiar with, where they respond to test ideas directly or asking for more data / details on specific tests.
Generally this time is best spent redirecting towards higher value uses of time and value of data.
Successful programs have their time breakdown in the range of:
• Active Data Acquisition / Ongoing optimization – 60-70% of time
• Education – 15-20% of time
• Post hoc analysis and validation optimization – 15-20% of time
It is generally more about the thought and usage of the program then just who owns it. The real keys to a successful program are to differentiate the roles and skills. Generally if a program is just starting out you they may have 1-2 people who work on testing in some form, with the primary focus on working with different groups to work with their ideas to provide value. Mature programs might move upwards to 5-10 people and higher as they continue to grow.
It does no good to add more people to a problem if you aren’t fixing the real problem, which is how the time is spent. More time does not equal more value, better usage of time means more value. It is never easy to go past your comfort zone, but that is where you will find all the value. Think about how and where you are spending your time.
There are many challenges for anyone entering a new field of study or a new discipline. We are all coming into any new concept with all of our previous held knowledge and previous held beliefs filtering and changing how we view the new thing before us. Some choose to make it fit their world view, others dismiss it from fear, and others look for how it can change their current world view. Usually in these situations I quote Sherlock Holmes, “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” Nothing represents this challenge more in online marketing then the differences between analytics and optimization, and nothing represents that struggle more than the debate about visit based measurement versus visitor based measurement.
The debate about should someone use visits, impressions, or visitor basis for analysis is a perfect example of this problem, as it is not as simple as always use one or the other. When you are doing analytics, usually visits are the best way to look at data. When you are doing optimization, there is never a time where visits would present to you more relevant information then using a visitor based view of the data.
Analytics = Visit
Optimization = Visitor
The only possible exceptions are when you are using adaptive learning tools. While the rules can be simple, a deep understanding of the way presents many other opportunities to improve your overall data usage and value derived from every action.
Since most people reading this start in an analytics background, let’s look at what works best in that environment. Analytics is a single data set correlative data metric system, which is a long way of say, it counts things on a consistent basis and only one set of data, even if that data has many different dimensions. You are only recording what was, not what could or should be. In that environment, you have to look at data in some very particular ways. The first amongst those is a very tight control on accuracy, since in many cases the use of that data is to represent what the business did, and to hopefully make predictions about the future.
It is also important that you are consistent with how you measure and that you look at things in a common basis. Because most people are comfortable looking at a day or shorter term basis, this means the easiest method is going to be a visit. It is works great because you are trying to look at interactions and to measure in a raw count of things that did happen, e.g. how many conversions, or how many people came from SEO. In those cases, a raw count in a correlative area is going to be best represented using a visit basis, since it mitigates lost data (though it is not a massive amount) and it best reflects the common basis that people look at data.
In the world of optimization however, you have a completely different usage and type of data. In optimization we are looking at a single comparative data point, and trying to represent an entire different measure, which is influence on behavior over time. It doesn’t matter if your site changes once a year or once an hour, or if your buying cycle is 1 visit or 180 days, all of those things are irrelevant to the fact that you are influencing a population over time. Because behavior is defined as influence on a population, and because we are looking comparatively over time, the measurement techniques used in analytics need to be rethought. Any concern about accuracy, past a simple point, become far less important than a measure of precision (consistency of data collection) since all error derived is going to be equally distributed. It doesn’t matter if the common basis is $4.50 or $487.62, what matters is the relative change based on the controlled factor. It is also important that we are focusing far more on the influence then the raw count, which means we are really talking about the behavior of the population.
In analytics you are thinking in terms of, what was the count of the outcome (rate) as opposed to in optimization the focus is on what was the influence (value). To really understand optimization, you have to understand that all groups start with a standard propensity of action which is represented by your control group. If you do nothing the people coming to your site, people in all stages and all types of interaction, measure up to one standard measure across your site (though all measurement systems do have internal variance in a small degree). Since we are measuring not what the propensity of action is but what are ability to positively or negatively influence it is, we need to think in terms of reporting based on visitors and based on the change (lift) and not the raw count.
You also have the case of time, where we need to measure total impact over time. While it is correct that every time a visitor hits your site you have a chance to influence them, it is important to remember that the existing propensity of action measurement already accounts for this. What we are looking for is a simple measure of what did we accomplish by in terms of getting them to spend more. This means that we have to think in terms of both long and short term behavior. Some people will purchase today, some 3 visits later, but all of that is part of standard business as usual. It is incredibly easy to have scenarios where you get more immediate actions but less long term actions. This means that on a daily basis you might see a short term spike, but for the business overall you are going to be making actually less revenue. This possibility creates two possible measurement scenarios:
1) There is no difference between short term and long term behavior, meaning the short term spike continues through and is positive also in the long term. In this scenario the only way to know that is to look at the long term.
2) There is a difference and short and long term behaviors differ and we are getting a different outcome by looking at the visitor metric over time. In this scenario the only positive outcome for the business is the visitor based metric view.
In both cases the visitor based metric view gives us the full picture of what is good for the business, while the visit based metric system either has no additional value or a negative value by reaching a false conclusion. In either case the only measure that adds value and gives us a full picture is the visitor based view of the world. We have a case where visitor is both the most complete view, no matter the situation, but the only one that can give you a rational view of the impact of a change. To top it off, the choice to only look at the shorter window creates a distribution bias, by valuing short term behavior over long term behavior, which may create questions into the relevance of the data used to make any conclusion.
The visitor vs. visit based view of the world is just one of many massive differences that reduce the value derived from optimization if not understood or not evaluated as a separate discipline. Because it is so easy to rationalize sticking with what is comfortable, it is common to find this massive weakness being propagated throughout organizations with no measure of what the cost really is. While not as damaging as others, like not having a single success metric or not understanding variance, it is vital that you are thinking about visit and visitor based data as attached the end goal and not as a single answer to everything.
In the end, the debate about which version to use is not really one about visits or visitors, there are clear reasons to choose visits for analytics and visitor for optimization. The real challenge is if you and your organization understand the different data disciplines that are being leveraged. If you constantly look for different ways to think about each action you will find new and better ways to improve value, if you fail to do so you will cause damage throughout your organization and will not even know you are doing it.
The life of working with organizations to bring about change is a difficult and frustrating one, but at the same time there are some truly amazing moments. The times you see an organization change and truly look at optimization differently, when they get past misconceptions and “best practices” are truly amazing. While much of my writing focuses on the dark side of this world, I did want to take some time and share a few small stories from the front-line that are what to me consulting is all about.
Story #1: I was working with a hospitality and travel company where I was hoping to come in and change a number of practices that may not have been beneficial to the company. Starting out an example test series to work through discover and exploit, they decided to run an inclusion/exclusion test on a page template for their destination landing pages. We discovered that their main content block section was having a negative impact to their page and that removing it would result in 12% lift to RPV. We also discovered that a small section on their right rail that they never even thought about was actually the most influential part of the page.
We were able to do this with very little effort (basic CSS), but because we challenged a number of internal assumptions, we learned about things they never considered, reduced maintenance cost, and achieved a major lift to their bottom line. The real kicker of this testing was that it was both the lowest resource test they had ever run and their most valuable.
Story #2: Working with another travel/hospitality organization, I came in because they had not been able to get any momentum with their testing due to a previous belief that it required too much resources. We were able to leverage their existing infrastructure and ran a simple inclusion/exclusion test on one of their landing pages. We did this all in one conversation without any pretext. In this case we discovered that almost every element individually was negative to the page, meaning that as a whole the total was far less then the sum of the parts. We run a simple follow-up and discovered that they could generate 3-8 million dollars by simply removing one of their main offer parts, and that they could generate 4.5-12 million if they were willing to create a dynamic experience and show one of two sections based on new and returning users.
I love these types of moments because you have so many good things come out of very small actions. We now have a group that is amazed by the power of optimization. We now have a group that sees that a number of their assumptions have been proven incorrect about what works, or even for whom. We were able to generate massive fiscal impact to the business, with extremely low effort, and were able to give them insight into other opportunities to do the same thing. They were able to learn about the disciplines of testing, find out that conversion rate and revenue are not correlated, and were able to think about their entire user experience differently.
Story #3: Working with a major online retailer, they were just re-introducing optimization to their organization. They had already decided on what they were going to test, how, and when. They then jumped into their test, but I gave them guidance to not over react and to understand how contextual changes are going to play out (very large initial climb and then narrow down to almost nothing quickly). They were so excited when they had a 40+% lift in a couple of days with 99% confidence, but I asked them to wait and talked to them about what confidence really means. They held off and within the next 7 days, the results of the test were less then 2% to the point it had no impact.
Now normally you don’t like outcomes that do not generate lift, what made my day is that they stopped themselves from rushing around claiming a result that had no basis in reality. If they had ran to tell the rest of the organization about that 40+% lift, they would have done irrecoverable damage to the company as others would have attempted to do the same thing and reach the same false conclusions. Because they waited, they get to understand the nature of statistics, contextual versus spatial changes, and started thinking more about the true disciplines of optimization. Since that time they have now expanded their testing to multiple countries and are testing at an accelerated pace and driving real value throughout the organization.
There are so many moments in working with different groups where you can get frustrated, or give up, or even worse give in and tell someone what they want to hear. Equally there are many moments where you can create a false impression of impact and fake numbers for your own gain. The moments that are truly amazing are those that prove that you don’t have to do any of those things, that you can achieve amazing results, quickly, and for far less time and effort then someone might otherwise think. All it takes is to think about and tackle problems wrong, and the results that any group can achieve are truly amazing.
After your optimization program has been running for a while, there will naturally come times when you get to share the amazing success that has been generated by running your program correctly. Often times groups are so happy to get to brag, they don’t realize that they are causing the program long term problems.
There are many keys to sharing results correctly, but the most important thing to remember is why are you sharing them. Yes, you are talking to others about how great you are, but ultimately results are not about the past, but about the future. Should others invest in the program? Should they expand where and what you optimize? What do others think about and interact with optimization and most importantly, why should they listen to you and your team when it comes to how and why you should be running certain tests. The critical moments come not from the sharing of results, but the framing of why and how you get those results, and what it can do for other parts of the organization. Focus on the wrong parts, and you will are asking for far more trouble than most can imagine.
Some of the worst moments for programs come after big results presentations, where they have gotten buy in from others and the massive expansion of the program, but fail to share the right message and to help others understand that testing is often times not what they think it is. These moments inevitably lead to large resource drains, negative impressions, and massive time sinks for the original testing group, leading them to frustration and less total revenue generation then before the influx of resources. Looking back 6 and 12 months later, you can easily see the moment that things went south.
With all of those concerns in mind, here are the keys to successfully sharing results within your organization.
DO – Focus on what you got by proving assumptions wrong
It is so fun to talk about getting an 8% lift, or getting a 20% lift on multiple tests, but often times the lift is secondary to the how did you get it. If you are testing to find the optimal use of resources, then it is inevitable that you will find many times when popular assumptions have been proven wrong. As you share your results, it is important that this is the primary part of the message. It is not about “we got a 15% lift“, it is about “everyone wanted to do X, which would have generated a 3% lift, but we found that of these 5 feasible alternatives, that doing Y actually was dramatically better and generated a 15% lift, or 12% better than we would have gotten if all we did was test what people thought was going to win.”
DON’T – Report test reports as a single revenue number
It is so fun and easy to report results as, “the test generated 6.2 million in additional revenue.” The problem is that there is absolutely no way to know specifics with accuracy, and you will garner a lack of trust if later the P&L does not show that exact figure of gain. I understand how impressive it is to point to a single number and how much it can get you credit, but ultimately it is far more damaging then the temporary good it might generate.
Instead, even if we ignore all the real world problems with confidence, it is important to understand that confidence and most measures only measure the likelihood of pattern, not the actual outcome. If I have a 10% lift and 96% confidence, it is not 96% confident that you will get a 10% lift, only 96% confident that the measured experience will beat control. Confidence intervals can also be tricky because of the many assumptions of the Gaussian bell curves that they are based off of.
Instead focus on report tests as a range, based on a preset range. What that range is somewhat arbitrary, as long as it is sufficiently large enough to convey the massive range of possibility. If I have not done deep analysis of past results, I will often times report test results in a 50% – 200% range, so that the 6.2 million becomes an expected outcome of 3.1 to 12.4 million dollars. Ultimately the range is arbitrary, though there are ways to look back at results over time and see an expected range. Express everything in a large but relevant range and you will avoid all the massive problems of credibility false reporting creates.
DO – Report all tests based solely on revenue impact
While you can’t report an absolute number that does not mean that you should not be reporting the fiscal impact of a test. Translating all tests to a revenue figure gives you the ability to express your efforts as they impact the bottom line, while also giving you the ability to rationally compare the results amongst tests. Even if you are not a retail site, you can translate leads or page-views to average value or CPM. Revenue also serves the purpose of making you evaluate your single success metric to ensure that it is tied to the purpose of your site and are not being caught up on side goals that do not impact the bottom line.
DON’T – Forget that you are measuring gross revenue, not net revenue
Except in rare circumstances, most groups end up measuring gross revenue when it comes to the impact to the business. While this makes numbers seem much larger than they really are, it often times leads to groups over estimating their impact to the business as a whole. If you cannot express impact based on pure revenue generation, at least make it clear what numbers and assumptions you are using and what you expect the entire program to deliver to the bottom line. Nothing kills credibility then numbers that any rational executive can not believe.
DO – Report on the scale of impact of various tests
So much is missed if we do not look at patterns across tests. One of the critical things for groups to understand is that lift by itself does not tell you revenue. A smaller population measured with a very large lift is often worth far less than a larger population with a much smaller lift. If you are translating all tests to revenue, then you can easily figure out that where you have been able to generate the most revenue, not necessarily the most lift. This active data acquisition is what allows you to plan out and increase resource efficiency in the future, and becomes vital for the long term growth of a program. Often times the lessons learned here really shape how people look at the impact of various channels. This type of analysis also helps people start to understand the differences between revenue allocation and revenue generation.
DON’T – Forget that the most valuable results from tests is not the lift
It is vital that overtime you start to get a deep causal understanding of what your ability to influence various parts of the user experience, as well as user groups, and what the cost to do so is. While it is fun to talk about the revenue impact, knowing that in 4 out of 5 places changing content did not have much of an impact compared to spatial changes completely changes how other parts of the business even operate. These lessons, about where you have been able to make an impact, how, and what it took to do it can help shape entire product roadmaps and help drive exponential revenue generation in the future.
There is no better time to express that testing is not just a list of actions, but an active acquisition of knowledge then when and how you talk about results. Failing to look at these patterns across tests and failure to really use this as a way to filter your other data can lead to massively inefficient uses of resources. Your program is worth far more then the individual actions you take, so why would you allow others to overly focus on tests when it is the act of optimization that drives the largest value opportunities? Make this the focus, what you learned, how, why, and what the impact was, and you will be able to make others see what testing can really do for them.
There is no greater time to really see where a program is at then to see how they communicate results. You can tell how efficient they have been, how they work with other groups, and most of all how much the personal ego of the people involved on both ends of the presentation gets in the way of real meaningful results. If you think about and focus on the right parts of expressing results, you will be able to move forward and really change your organization. Nothing drives others to want to invest in and expand a programs impact if you can show it improves every other part of the business. Focus on just the lift and just numbers, and you are setting yourself and others up for failure.